From schristley at mac.com Wed Oct 3 19:54:23 2007 From: schristley at mac.com (Scott Christley) Date: Wed, 3 Oct 2007 19:54:23 -0400 Subject: [Biococoa-dev] biococoa svn and everything In-Reply-To: References: <9828231.1189601230166.JavaMail.root@elwamui-muscovy.atl.sa.earthlink.net> Message-ID: <360D8D9A-D971-47D7-BAB4-A572B011E165@mac.com> Hey Koen, On Sep 26, 2007, at 8:53 PM, Koen van der Drift wrote: > I noticed you did a lot of updates recently with these class - this > is great. Form your comments in the code I have a hard time > understanding what exactly a suffix array is (I'm an analytical > chemist that mainly works with proteins, so pardon my > ignorance ;-) Maybe you could add some more info in the source > code to help better understand these classes? Definitely! I noticed that there is a documentation setup similar to how javadoc works, so I will eventually add those comments. I tend to be an iterative programmer, trying out a design then tweaking it until it converges on a best setup, so I think I'm getting there and once I do I will document more thoroughly. The suffix array is a nifty data structure, it essentially holds all of the strings in a sequence in sorted order, making it quick to search for exact string matches. > Another question I have is why you are using calls such as fopen, > fread, etc instead of the methods that Obj-C and Cocoa provide for > I/O. Mind you, I am just trying to understand the code, it's no > criticism at all. I presume you mean NSFileHandle? I was actually thinking of using it, the current code which uses fopen, fread, etc is from the original code for my standalone programs. The main reason why I didn't switch over is that NSFileHandle can only return data with NSData, and the type of programs which use suffix arrays and etc do alot of file reading, which would mean lots and lots of NSData objects being created and released. If only NSFileHandle could put the data directly into a buffer provided by the user, or an existing NSMutableData, that would be perfect. cheers Scott From schristley at mac.com Wed Oct 3 20:21:51 2007 From: schristley at mac.com (Scott Christley) Date: Wed, 3 Oct 2007 20:21:51 -0400 Subject: [Biococoa-dev] BioCocoa Applications In-Reply-To: <04EE5A75-4C88-47B3-9AA0-B25740D9478B@gmail.com> References: <97F29A26-242D-4FDA-B025-C2D7B1F165B1@mac.com> <04A9DB3C-333A-420E-8CCD-8423371DF825@gmail.com> <2F609EBD-2A38-4FDC-8141-E433F71CDEB4@mac.com> <04EE5A75-4C88-47B3-9AA0-B25740D9478B@gmail.com> Message-ID: <482ED76F-5E6B-4171-A00D-BEB9C23ED53E@mac.com> Great! Yes, the SVN book recommends your Option 1 and that sounds good to me too. Too bad that the Applications directory couldn't be made at the root level with BioCocoa, so something like this: BioCocoa/ trunk/ tags/ branches/ Applications MyApp trunk/ tags/ branches/ YourApp trunk/ tags/ branches/ The current command on the wiki: svn checkout svn+ssh://bioinformatics.org/svnroot/BioCocoa will currently get everything including all the applications. But with the root split out people could do: svn checkout svn+ssh://bioinformatics.org/svnroot/BioCocoa/BioCocoa to get the core framework and svn checkout svn+ssh://bioinformatics.org/svnroot/BioCocoa/Applications to get the applications. But yes, I think you are saying that moving around the root directories may cause confusion for older revisions. So I agree, not touch the root directory and just add the Applications directory cheers Scott On Sep 26, 2007, at 12:23 PM, Charles Parnot wrote: > SVN is much simpler than CVS, once you understand more of the > basics. It is fairly easy to grasp those concepts. I really > recommand reading the free SVN book online, particularly the parts > explaining the 'philopsophy' of the system. > > By convention, and because it works well this way, you want to have > a 'trunk', a 'tags' and a 'branches' directory for each project. > The current svn tree is: > > BioCocoa/ > trunk/ > tags/ > branches/ > > I would not recommand changing that too much (though svn makes that > easy, it might still be confusing when going back to older > revisions). Since the trunk directory contains all the BioCocoa > framework code directly, with no other subdirectory, I would not > recommand having the apps in there. > > Instead, I would suggest adding an additional directory under the > root, called Applications: > > BioCocoa/ > trunk/ > tags/ > branches/ > Applications/ > > Then, you have these 2 options: > > > *Option 1: > > BioCocoa/ > trunk/ > tags/ > branches/ > Applications/ > MyApp/ > trunk/ > tags/ > branches/ > YourApp/ > trunk/ > tags/ > branches/ > > *Option 2: > > BioCocoa/ > trunk/ > tags/ > branches/ > Applications/ > trunk/ > MyApp/ > YourApp/ > tags/ > MyApp/ > YourApp/ > branches/ > MyApp/ > YourApp/ > > I would have a slight preference for Option 1, but it really does > not matter that much, and there is no technical reason that I > foresee why one option is better than the other. You might give it > more thoughts, and maybe there would be some technical reasons why > one option is better than the other. > > Again, I recommand following the svn convention because: (1) it > works, (2) anybody familiar with svn will be instantly confortable. > > hope that helps! > > charles > > > > > > On Sep 26, 2007, at 8:14 AM, Scott Christley wrote: > >> >> I agree as well. The question though is how to set up the SVN >> repository to support this properly. I'm more familiar with CVS >> than SVN, I understand the concepts of branches and tags but it is >> not clear to me how this works with SVN. I need to read up on this. >> >> Preferably people should be able to SVN the BioCocoa core >> framework without getting other stuff; likewise with the >> applications, they should be able to SVN either all the >> applications or just specific ones they are interested in. Any >> ideas on how to set up the repository? Would the current >> repository need to be re-structured to support separate applications? >> >> Thinking with my CVS mind, I would consider making the repository >> look something like this: >> >> trunk/ >> BioCocoa/ >> Applications/ >> MyApp/ >> YourApp/ >> >> >> thanks >> Scott >> >> >> On Sep 24, 2007, at 5:53 PM, Charles Parnot wrote: >> >>>> >>>> What do people think about creating a source repository with >>>> community donated applications that use BioCocoa? >>>> >>>> I'm thinking that having the BioCocoa library is great, but >>>> still people are required to write their own applications on top >>>> of it. Some could be sample applications, but I suspect that >>>> others would be useful full-fledged apps that maybe focus on >>>> specific area of analysis, etc. I certainly have some end-user >>>> oriented tools that I would like to provide, but don't have >>>> anyplace to put them except create a new project somewhere. >>>> >>>> cheers >>>> Scott >>> >>> I agree that there is nothing better than some real-world app >>> using the framework, to get the framework in the best shape. A >>> lot of design and optimizations in the framework will then be >>> triggered by real issues in real apps, not just what we think >>> could be better. >>> >>> These apps can also serve as extra testing tools, in addition to >>> the automated tests that are built in the framework itself. >>> >>> charles >>> >>> -- >>> Xgrid-at-Stanford >>> Help science move fast forward: >>> http://cmgm.stanford.edu/~cparnot/xgrid-stanford >>> >>> Charles Parnot >>> charles.parnot at gmail.com >>> >>> >>> >>> >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev > > -- > Xgrid-at-Stanford > Help science move fast forward: > http://cmgm.stanford.edu/~cparnot/xgrid-stanford > > Charles Parnot > charles.parnot at gmail.com > > > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From kvddrift at earthlink.net Wed Oct 3 20:32:33 2007 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 3 Oct 2007 20:32:33 -0400 Subject: [Biococoa-dev] biococoa svn and everything In-Reply-To: <360D8D9A-D971-47D7-BAB4-A572B011E165@mac.com> References: <9828231.1189601230166.JavaMail.root@elwamui-muscovy.atl.sa.earthlink.net> <360D8D9A-D971-47D7-BAB4-A572B011E165@mac.com> Message-ID: <26EE975E-E40C-4C00-99E3-2EB4C49FD060@earthlink.net> Hi Scott, >> I noticed you did a lot of updates recently with these class - >> this is great. Form your comments in the code I have a hard time >> understanding what exactly a suffix array is (I'm an analytical >> chemist that mainly works with proteins, so pardon my >> ignorance ;-) Maybe you could add some more info in the source >> code to help better understand these classes? > > Definitely! I noticed that there is a documentation setup similar > to how javadoc works, so I will eventually add those comments. We actually use HeaderDoc: http://bioinformatics.org/biococoa/wiki/ pmwiki.php?n=Main.UsingHeaderDoc > I tend to be an iterative programmer, trying out a design then > tweaking it until it converges on a best setup, so I think I'm > getting there and once I do I will document more thoroughly. The > suffix array is a nifty data structure, it essentially holds all of > the strings in a sequence in sorted order, making it quick to > search for exact string matches. Ok, ignorant question #2, isn't a sequence just one (1) string? Again, this is from someone who works with proteins ;-) > >> Another question I have is why you are using calls such as fopen, >> fread, etc instead of the methods that Obj-C and Cocoa provide for >> I/O. Mind you, I am just trying to understand the code, it's no >> criticism at all. > > I presume you mean NSFileHandle? I was actually thinking of using > it, the current code which uses fopen, fread, etc is from the > original code for my standalone programs. The main reason why I > didn't switch over is that NSFileHandle can only return data with > NSData, and the type of programs which use suffix arrays and etc do > alot of file reading, which would mean lots and lots of NSData > objects being created and released. If only NSFileHandle could put > the data directly into a buffer provided by the user, or an > existing NSMutableData, that would be perfect. A quick search in cocoabuilder.com gave the following snippet: NSMutableData *data = [NSMutableData data]; NSData *someData; NSFileHandle *readHandle = [[aTask standardOutput] fileHandleForReading]; while ((someData = [readHandle availableData]) && [someData length]) { [data appendData:someData]; // Or, if possible, process the data here } would that help? cheers, - Koen. From schristley at mac.com Wed Oct 3 20:50:25 2007 From: schristley at mac.com (Scott Christley) Date: Wed, 3 Oct 2007 20:50:25 -0400 Subject: [Biococoa-dev] biococoa svn and everything In-Reply-To: <26EE975E-E40C-4C00-99E3-2EB4C49FD060@earthlink.net> References: <9828231.1189601230166.JavaMail.root@elwamui-muscovy.atl.sa.earthlink.net> <360D8D9A-D971-47D7-BAB4-A572B011E165@mac.com> <26EE975E-E40C-4C00-99E3-2EB4C49FD060@earthlink.net> Message-ID: <5ECAF51B-287C-423C-9698-39574B834B85@mac.com> On Oct 3, 2007, at 8:32 PM, Koen van der Drift wrote: >> I tend to be an iterative programmer, trying out a design then >> tweaking it until it converges on a best setup, so I think I'm >> getting there and once I do I will document more thoroughly. The >> suffix array is a nifty data structure, it essentially holds all >> of the strings in a sequence in sorted order, making it quick to >> search for exact string matches. > > Ok, ignorant question #2, isn't a sequence just one (1) string? > Again, this is from someone who works with proteins ;-) To be more specific, it maintains a sorted list of suffix strings. So if this is your sequence: ATTGCAGTCCG Then the suffix array keeps a sorted list of suffix strings: AGTCCG ATTGCAGTCCG CAGTCCG CCG CG G GCAGTCCG GTCCG TCCG TGCAGTCCG TTGCAGTCCG So if you are searching for exact strings or almost exact strings in a large sequence, using a suffix array is considerably faster than trying to use BLAST for example. > >> >>> Another question I have is why you are using calls such as fopen, >>> fread, etc instead of the methods that Obj-C and Cocoa provide >>> for I/O. Mind you, I am just trying to understand the code, it's >>> no criticism at all. >> >> I presume you mean NSFileHandle? I was actually thinking of using >> it, the current code which uses fopen, fread, etc is from the >> original code for my standalone programs. The main reason why I >> didn't switch over is that NSFileHandle can only return data with >> NSData, and the type of programs which use suffix arrays and etc >> do alot of file reading, which would mean lots and lots of NSData >> objects being created and released. If only NSFileHandle could >> put the data directly into a buffer provided by the user, or an >> existing NSMutableData, that would be perfect. > > A quick search in cocoabuilder.com gave the following snippet: > > NSMutableData *data = [NSMutableData data]; > NSData *someData; > NSFileHandle *readHandle = [[aTask standardOutput] > fileHandleForReading]; > > while ((someData = [readHandle availableData]) && [someData > length]) { > [data appendData:someData]; // Or, if possible, process the > data here > } Unfortunately not, if you look in the while loop, it is still creating a temporary autoreleased NSData object with [readHandle availableData], so imagine that while loop being called billions of times to read sort pieces of data from the file, that's a lot of objects being created and released. What I really want is an interface something like this: char buffer[1000]; while ([readHandle: buffer length: 1000]) { // do something with data in buffer } If you look at BCCachedSequenceFile, you will see that I implemented such an interface. cheers Scott From vuoiz.com at ourneighborhoodnews.com Wed Oct 10 19:17:53 2007 From: vuoiz.com at ourneighborhoodnews.com (Tyson Baker) Date: Thu, 11 Oct 2007 01:17:53 +0200 Subject: [Biococoa-dev] FYI Message-ID: <000401c80acb$25455a00$0100007f@ulmnom> OEM means Original Equipment Manufacturer. So OEM is synonym for lowest price. OEM software means no CD/DVD, no packing case, no booklets and no overhead cost! Buy directly from the manufacturer, pay for software ONLY and save 75-90%! See discounts! Find special offers! Software for home and office! TOP ITEMS $49 Windows XP Pro w/SP2 $79 MS Office Enterprise 2007 $79 Adobe Acrobat 8 Pro $79 Microsoft Windows Vista Ultimate $99 Macromedia Studio 8 $59 Adobe Premiere 2.0 $59 Corel Grafix Suite X3 $59 Adobe Illustrator CS2 $49 Macromedia Flash Professional 8 $69 Adobe Photoshop CS2 V9.0 $99 Macromedia Studio 8 $129 Autodesk Autocad 2007 $149 Adobe Creative Suite 2 http://sto.lunhere5.cn/?5E896E0EA6C4F1C8375644FCAD345338558E6702BBCAFFDD735152A482245D&t0 ---- Top items for Mac: $69 Adobe Acrobat PR0 7 $49 Adobe After Effects $49 Macromedia Flash Pro 8 $149 Adobe Creative Suite 2 Premium $49 Ableton Live 5.0.1 $49 Adobe Photoshop CS http://sto.lunhere5.cn/-software-for-mac-.php?5E896E0EA6C4F1C8375644FCAD345338558E6702BBCAFFDD735152A482245D&t6 ---- Popular eBooks: $10 Home Networking For Dummies 3rd Edition $10 Windows XP Gigabook For Dummies $10 Adobe CS2 All in One Desk Reference For Dummies $10 Adobe Photoshop CS2 Classroom in a Book(Adobe Press) ---- Find more by these manufacturers: Microsoft...Mac...Adobe...Borland...Macromedia...IBM http://sto.lunhere5.cn/?5E896E0EA6C4F1C8375644FCAD345338558E6702BBCAFFDD735152A482245D&t4 ---- Connor wasnt certain why he tr Dare you try to placate me by Nay, Connor. I was simply tryi His smile gave him away of cou She wasnt at all happy about h From mekentosj at gmail.com Thu Oct 11 05:22:51 2007 From: mekentosj at gmail.com (Alexander Griekspoor) Date: Thu, 11 Oct 2007 10:22:51 +0100 Subject: [Biococoa-dev] biococoa svn and everything In-Reply-To: <58718893-20DF-4673-9BEE-C842AE0BF68A@mac.com> References: <9828231.1189601230166.JavaMail.root@elwamui-muscovy.atl.sa.earthlink.net> <5752D67B-7E27-43D5-BC57-D0EB37D8EF65@gmail.com> <3419726E-6BCC-4068-AE36-A6B03EB35F6A@mac.com> <58718893-20DF-4673-9BEE-C842AE0BF68A@mac.com> Message-ID: <332ACAED-62EC-45A6-AEA5-9FCA15B900A9@gmail.com> Hi guys, I just had a look "corebio" without capitals does the trick. Scott, it's great to see that someone is willing to pick things up again (and even has time for a change ;-). Like Peter I still hope to contribute again one day but it won't be anywhere soon. Still, if I can help out with small things let me know as well. Perhaps could we use the biococoa mailinglist again for questions and comments like these? That way we are all kept in the loop and we could jump in if time permits to help out with certain things. Cheers, Alex On 11-okt-2007, at 9:50, Peter Schols wrote: > Hi Scott, > > As I did not setup the Wiki myself (I think Koen did), I'm not sure > about the password (I'm sorry). It might be something like CoreBio, > but I'll put Koen and Alex in CC, maybe they still remember... ;-) > > best wishes, > > Peter > > > On 10 Oct 2007, at 23:21, Scott Christley wrote: > >> Hello Peter, >> >> Are you able to give me access to the wiki? It appears to need a >> password for editing. >> >> thanks >> Scott >> >> On Sep 21, 2007, at 3:57 AM, Peter Schols wrote: >> >>> Hi Scott, >>> >>> Great to see the BioCocoa project being alive again! >>> I would be happy to share admin access or give it to someone else >>> >>> While I'm quite busy with Undercover these days (and with a new >>> microscopy app we are developing), I'm still very interested in >>> BioCocoa and my (probably naive) dream is that I will become an >>> active member again in the future. So while I don't have time to >>> contribute code to the project at this time, I'd be very >>> interested in helping out with smaller things and with spreading >>> the word. I think the NAR article is a great idea, btw. >>> >>> best wishes, >>> >>> Peter >>> >>> >>> On 19 Sep 2007, at 21:32, Scott Christley wrote: >>> >>>> Hello Charles! >>>> >>>> I'm not sure if I'm nominating myself to be project leader, >>>> seems a bit ambitious for somebody new to the community, but I >>>> certainly have the time and (most importantly) the desire to >>>> move BioCocoa forward. What I worry about mostly is not losing >>>> the ability to add new developers if and when they come along, I >>>> remember that I tried to send an email to Peter Schols from >>>> bioinformatics.org and it went into a black hole, I had to find >>>> another email to reach him. He was responsive though (thanks >>>> Peter if you are out there) once I got the email right, but if >>>> he has moved on maybe it would be good to give some others admin >>>> access to the project? >>>> >>>> You are exactly right that there are a lot of avenues that can >>>> be taken. I keep thinking to myself that BioCocoa can >>>> differentiate itself by providing functionality not provided by >>>> the other packages like BioPerl and BioJava. Not sure what a >>>> "killer" app would be, one thing that I think would be very cool >>>> though is a desktop genome browser (versus the web-based ones) >>>> which integrates all the genome information with analysis tools. >>>> >>>> Has anybody thought about putting an article together and submit >>>> to Nucleic Acids Research journal? Might be a good way to get a >>>> little awareness as well as have a solid reference that research >>>> articles can point to. >>>> >>>> cheers >>>> Scott >>>> >>>> On Sep 13, 2007, at 12:48 PM, Charles Parnot wrote: >>>> >>>>> Hi there! >>>>> >>>>> I was one of the people that did the "moving on" thing >>>>> >>>>> I think there is a pretty strong basis in the framework, at >>>>> least for the export/import tools, and then for the basics >>>>> BCSequence stuff. I did set up the initial test suite, which I >>>>> think would need to be updated and extended. >>>>> >>>>> When the project went into hibernation mode, the status was (at >>>>> least from my point of view): >>>>> >>>>> * in search of a project leader, that would have some basic >>>>> amount of time to make decision as to where to go, and do some >>>>> coding too >>>>> * needing a "killer" app to wrap the framework and put it to >>>>> use. This is the only way things would be tested in the real >>>>> world by real users. The killer app can be a simple sequence >>>>> editor that expose as much as possible of the underlying framework >>>>> * a design decision has to be made to allow 2 aspects of the >>>>> framework to coexist: a core framework that provides the basic >>>>> functionality; an extension mechanisms that allows people to >>>>> easily contribute additional more specialized functionality (we >>>>> had some talks for instance with Phil Seibel about how the >>>>> NSImage and NSImageRepresentation design could inspire >>>>> something. But really more thoughts need to be put into that, >>>>> and nothing has been decided). The idea is that not everybody >>>>> will be interested in the specialized stuff, so having optional >>>>> modules would be a good thing. >>>>> * one of the feature that was in the works was to add >>>>> annotation/feature to the basic BCSequence class >>>>> >>>>> so, a big roadmap, with lots of avenues ;-) >>>>> >>>>> And yes, if things start moving again, or a project leader self >>>>> nominates, that would certainly warrant a post on macresearch. >>>>> Hosting the project is also still a possibility, but that would >>>>> mean some additional work for the project leader too in setting >>>>> things up and maintaining it, as well as some kind of >>>>> commitment for a reasonable amount of time. >>>>> >>>>> charles >>>>> >>>>> >>>>> >>>>> On Sep 12, 2007, at 5:47 AM, Koen van der Drift wrote: >>>>> >>>>>>> Is the BioCocoa project still "alive"? Don't get me wrong, I've >>>>>>> received the last mail from the mailing list in... may 2007! >>>>>>> How many >>>>>>> people are involved? >>>>>> >>>>>> I don't think the project was ever officially shut down, >>>>>> however, the people working on it about two years ago have >>>>>> moved on in their lives, and have no more time to actively >>>>>> work on the project. There were about 5 or 6 people actively >>>>>> involved. The current released version 2.0 is more or less a >>>>>> good starting point to use in apps, but there are also still >>>>>> many things unfinished or missing. So any input is more than >>>>>> welcome! >>>>>> >>>>>> There was also some talk that macresearch.org would host the >>>>>> project (giving us a lot of visibility), but I have not heard >>>>>> about that in a long time. >>>>>> >>>>>> - Koen. >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Biococoa-dev mailing list >>>>>> Biococoa-dev at bioinformatics.org >>>>>> https://bioinformatics.org/mailman/listinfo/biococoa-dev >>>>> >>>>> -- >>>>> Xgrid-at-Stanford >>>>> Help science move fast forward: >>>>> http://cmgm.stanford.edu/~cparnot/xgrid-stanford >>>>> >>>>> Charles Parnot >>>>> charles.parnot at gmail.com >>>>> >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Biococoa-dev mailing list >>>> Biococoa-dev at bioinformatics.org >>>> https://bioinformatics.org/mailman/listinfo/biococoa-dev >>> >> > ********************************************************* ** Alexander Griekspoor PhD ** ********************************************************* EMBL Outstation - Hinxton, European Bioinformatics Institute, Rebholz Textmining group Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK Tel: + 44 1223 492 605 Fax: + 44 1223 492 468 Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: From kvddrift at earthlink.net Sun Oct 21 09:13:31 2007 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 21 Oct 2007 09:13:31 -0400 Subject: [Biococoa-dev] reading large fasta files Message-ID: <1C42F0EC-682E-456F-A722-C22C1D3A2970@earthlink.net> Hi, I was trying to load a large fasta file (2 MB) in the Translation demo, and right now it takes a long time. Scott recently added BCCachedFastaFile for exactly this purpose, but the code in BCSequenceReader doesn't use it yet. How could we implement this in BCSequenceReader? One possibility is to always use BCCachedFastaFile for any fasta file. Or should we set a datasize limit in readFastaFile that determines when to use BCCachedFastaFile or the regular code? In the latter case, what would be a reasonable value for this? For now I will implement to always use BCCachedFastaFile, but this can be change later, I will not delete any code ;-) cheers, - Koen. From charles.parnot at gmail.com Sun Oct 21 11:22:49 2007 From: charles.parnot at gmail.com (Charles Parnot) Date: Sun, 21 Oct 2007 08:22:49 -0700 Subject: [Biococoa-dev] reading large fasta files In-Reply-To: <1C42F0EC-682E-456F-A722-C22C1D3A2970@earthlink.net> References: <1C42F0EC-682E-456F-A722-C22C1D3A2970@earthlink.net> Message-ID: <0B41C92E-FF3F-4469-80E6-BEF96B6ECD23@gmail.com> In general, it would be best to have the implementation hidden, so that indeed, the framework decides when to use one subclass or another. Just like NSString, NSData, or NSArray use different underlying data structures depending on the size of the data (I think). This is of course all hidden behind the class cluster design... I also don't know how things are already implemented, maybe things are already addressed this way? charles On Oct 21, 2007, at 6:13 AM, Koen van der Drift wrote: > Hi, > > I was trying to load a large fasta file (2 MB) in the Translation > demo, and right now it takes a long time. Scott recently added > BCCachedFastaFile for exactly this purpose, but the code in > BCSequenceReader doesn't use it yet. How could we implement this in > BCSequenceReader? One possibility is to always use BCCachedFastaFile > for any fasta file. Or should we set a datasize limit in > readFastaFile that determines when to use BCCachedFastaFile or the > regular code? In the latter case, what would be a reasonable value > for this? For now I will implement to always use BCCachedFastaFile, > but this can be change later, I will not delete any code ;-) > > cheers, > > - Koen. > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev -- Xgrid-at-Stanford Help science move fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford Charles Parnot charles.parnot at gmail.com From kvddrift at earthlink.net Sun Oct 21 13:30:38 2007 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 21 Oct 2007 13:30:38 -0400 Subject: [Biococoa-dev] reading large fasta files In-Reply-To: <0B41C92E-FF3F-4469-80E6-BEF96B6ECD23@gmail.com> References: <1C42F0EC-682E-456F-A722-C22C1D3A2970@earthlink.net> <0B41C92E-FF3F-4469-80E6-BEF96B6ECD23@gmail.com> Message-ID: <30C0D157-7CB6-4610-934F-62D0659492DE@earthlink.net> On Oct 21, 2007, at 11:22 AM, Charles Parnot wrote: > In general, it would be best to have the implementation hidden, so > that indeed, the framework decides when to use one subclass or > another. Just like NSString, NSData, or NSArray use different > underlying data structures depending on the size of the data (I > think). This is of course all hidden behind the class cluster > design... > > I also don't know how things are already implemented, maybe things > are already addressed this way? Yes, I agree with having all that code hidden, so that there's only one class for users to implement when reading data, whether it's from a path or a string or data. Right now the class to read large (fasta) files is a separate class that works with a filePath, but is not a subclass of BCSequenceReader. So we need to think about how to implement it. The way we use BCSequenceReader right now is as follows: BCSequenceReader *sequenceReader = [[BCSequenceReader alloc] init]; BCSequenceArray *sequenceArray = [sequenceReader readFileUsingPath: aPath]; BCSequence *mySequence = [sequenceArray objectAtIndex: i]; We could change this (or add the possibility) to use it as follows: BCSequenceReader *sequenceReader = [[BCSequenceReader alloc] initWithPath: aPath]; BCSequenceArray *sequenceArray = [sequenceReader readSequenceArray]; BCSequence *mySequence = [sequenceArray objectAtIndex: i]; However, to make it more complicated, BCCachedFastaFile doesn't return an array of sequences, IIRC, it is actually a standalone object that can be used to access regions of very large files, without reading the whole sequence. I can't think of a way right now to combine this with BCSequenceReader. Anyone has a suggestion? cheers, - Koen. From schristley at mac.com Mon Oct 22 11:30:49 2007 From: schristley at mac.com (Scott Christley) Date: Mon, 22 Oct 2007 11:30:49 -0400 Subject: [Biococoa-dev] reading large fasta files In-Reply-To: <30C0D157-7CB6-4610-934F-62D0659492DE@earthlink.net> References: <1C42F0EC-682E-456F-A722-C22C1D3A2970@earthlink.net> <0B41C92E-FF3F-4469-80E6-BEF96B6ECD23@gmail.com> <30C0D157-7CB6-4610-934F-62D0659492DE@earthlink.net> Message-ID: <3EBB0775-973E-4089-BB98-7D00A7BC42FB@mac.com> On Oct 21, 2007, at 1:30 PM, Koen van der Drift wrote: > > On Oct 21, 2007, at 11:22 AM, Charles Parnot wrote: > >> In general, it would be best to have the implementation hidden, so >> that indeed, the framework decides when to use one subclass or >> another. Just like NSString, NSData, or NSArray use different >> underlying data structures depending on the size of the data (I >> think). This is of course all hidden behind the class cluster >> design... >> >> I also don't know how things are already implemented, maybe things >> are already addressed this way? > > > Yes, I agree with having all that code hidden, so that there's only > one class for users to implement when reading data, whether it's > from a path or a string or data. Right now the class to read large > (fasta) files is a separate class that works with a filePath, but > is not a subclass of BCSequenceReader. So we need to think about > how to implement it. The way we use BCSequenceReader right now is > as follows: > > BCSequenceReader *sequenceReader = [[BCSequenceReader alloc] init]; > BCSequenceArray *sequenceArray = [sequenceReader > readFileUsingPath: aPath]; > BCSequence *mySequence = [sequenceArray objectAtIndex: i]; > > > We could change this (or add the possibility) to use it as follows: > > BCSequenceReader *sequenceReader = [[BCSequenceReader alloc] > initWithPath: aPath]; > BCSequenceArray *sequenceArray = [sequenceReader > readSequenceArray]; > BCSequence *mySequence = [sequenceArray objectAtIndex: i]; > > > However, to make it more complicated, BCCachedFastaFile doesn't > return an array of sequences, IIRC, it is actually a standalone > object that can be used to access regions of very large files, > without reading the whole sequence. I can't think of a way right > now to combine this with BCSequenceReader. Anyone has a suggestion? The change to BCSequenceReader sounds reasonable; it follows the design of the Cocoa collection classes for initializing from a file. I suppose the conceptual difference though is that a BCSequenceReader can be used to read from multiple files, it doesn't represent the collection itself, it is BCSequenceArray which represents the collection. So one could consider changing BCSequenceArray instead ... BCSequenceArray *sequenceArray = [[BCSequenceArray alloc] initWithPath: aPath] Then BCSequenceArray would use BCSequenceReader or BCCacheSequenceFile to perform the operation. Now while I like this design, it is complicated because we have many possible sequence file formats, but we can certainly extend the initWithPath: method to support an additional format: parameter like I recently did with BCSequenceReader. The point about the BCCachedFastaFile interface is well taken; in order for programs to use it with the implementation hidden would require a "cached" version of BCSequence. Conceptually think of it where BCSequenceReader has data in memory and BCSequence holds pointers to memory data, while BCCachedFastaFile has data on disk and "BCCachedSequence" holds pointers to disk data. BCSequence and BCCachedSequence would implement the same interface, so the user wouldn't know the difference. cheers Scott From dqnkpnrccehd at hotmail.com Wed Oct 31 04:50:06 2007 From: dqnkpnrccehd at hotmail.com (Furuichi Shouko) Date: Wed, 31 Oct 2007 14:50:06 +0600 Subject: よろしくお願い致します Message-ID: ???????????????R?~?e?B?????????????B ???????????????s?????????R?~?j?e?B???????????????? ???????????????????????????????A ???????????????g???????????y???????v???????O?O ???????????????????????????????????????????????A ???????????????????????????????????????????????????H?H ?????????????????o?^???????????????v?????????O?O ???????????????????? http://sexfriend-club.net/pure/?fo16