From tony at bcihealthsearch.com Thu Feb 5 16:09:16 2009 From: tony at bcihealthsearch.com (Tony Pietrzak) Date: Thu, 5 Feb 2009 15:09:16 -0600 Subject: [Biococoa-dev] Healthcare Staffing Solutions Message-ID: <81ea94a1071580cc70f2d349001b60c4@bcihealthsearch.com> Hello, I hope 2009 is off to a good start for you and your firm! I'm with BCI -- a recruiting firm that specializes in providing search and staffing solutions to healthcare, pharmaceutical, biotech, life science and venture capital/financial clients. BCI also has a division that focuses solely on staffing for temporary/consulting roles. Given the current economic climate, temporary staffing can provide extremely cost-effective solutions for our clients. We have a deep candidate pool. Whether you're seeking candidates for full time or temporary roles, we have the caliber of candidate you need to give your business a competitive edge. We'd love the opportunity to earn your business, and would be willing to meet or beat the rates you're currently receiving from other contingency-based recruiting firms. Thanks for your consideration, Tony Pietrzak | Partner | BCI Healthcare 10 S. Wacker Drive, Suite 1250 | Chicago, IL 60606 312.460.8222 x104 | www.bcihealthcare.com | tony at bcihealthcare.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeedward at yahoo.com Fri Feb 13 18:32:55 2009 From: jeedward at yahoo.com (John Edward) Date: Fri, 13 Feb 2009 15:32:55 -0800 (PST) Subject: [Biococoa-dev] Draft paper submission deadline extended: BCBGC-09 Message-ID: <239231.15189.qm@web45915.mail.sp1.yahoo.com> Draft paper submission deadline extended: BCBGC-09 ? The deadline for draft paper submission at the 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-09) (website: http://www.PromoteResearch.org ) is extended due to numerous requests from the authors. The conference will be held during July 13-16 2009 in Orlando, FL, USA. We invite draft paper submissions. The conference will take place at the same time and venue where several other international conferences are taking place. The other conferences include: ????????? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-09) ????????? International Conference on Automation, Robotics and Control Systems (ARCS-09) ????????? International Conference on Enterprise Information Systems and Web Technologies (EISWT-09) ????????? International Conference on High Performance Computing, Networking and Communication Systems (HPCNCS-09) ????????? International Conference on Information Security and Privacy (ISP-09) ????????? International Conference on Recent Advances in Information Technology and Applications (RAITA-09) ????????? International Conference on Software Engineering Theory and Practice (SETP-09) ????????? International Conference on Theory and Applications of Computational Science (TACS-09) ????????? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-09) ? The website http://www.PromoteResearch.org contains more details. ? Sincerely John Edward Publicity committee -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeedward at yahoo.com Fri Feb 20 10:23:14 2009 From: jeedward at yahoo.com (John Edward) Date: Fri, 20 Feb 2009 07:23:14 -0800 (PST) Subject: [Biococoa-dev] Draft paper submission deadline extended: BCBGC-09 Message-ID: <742626.63600.qm@web45910.mail.sp1.yahoo.com> Draft paper submission deadline extended: BCBGC-09 ? The deadline for draft paper submission at the 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-09) (website: http://www.PromoteResearch.org ) is extended due to numerous requests from the authors. The conference will be held during July 13-16 2009 in Orlando, FL, USA. We invite draft paper submissions. The conference will take place at the same time and venue where several other international conferences are taking place. The other conferences include: ????????? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-09) ????????? International Conference on Automation, Robotics and Control Systems (ARCS-09) ????????? International Conference on Enterprise Information Systems and Web Technologies (EISWT-09) ????????? International Conference on High Performance Computing, Networking and Communication Systems (HPCNCS-09) ????????? International Conference on Information Security and Privacy (ISP-09) ????????? International Conference on Recent Advances in Information Technology and Applications (RAITA-09) ????????? International Conference on Software Engineering Theory and Practice (SETP-09) ????????? International Conference on Theory and Applications of Computational Science (TACS-09) ????????? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-09) ? The website http://www.PromoteResearch.org contains more details. ? Sincerely John Edward Publicity committee -------------- next part -------------- An HTML attachment was scrubbed... URL: From craig at batemanspace.com Sat Feb 21 15:43:44 2009 From: craig at batemanspace.com (Craig Bateman) Date: Sat, 21 Feb 2009 12:43:44 -0800 Subject: [Biococoa-dev] Introducing myself Message-ID: I'm an experienced software engineer looking for an open source mac project to contribute to, and I'm recently very interested in genetics. So BioCocoa seemed an obvious choice. I looked at the To Do list, and fear that 2+ years later it must be out of date unless there's just nobody left working on this project. Is it officially dead? There hasn't been a lot of movement on this list in the past few months since the 2.1.0 "non"-release. I've checked out the source and will start digging now to get a feel for what's here and how it works. What/where are the primary missing pieces? Has all the 1.x functionality been incorporated to 2.1? Is anything on the todo list still up for doing? Should I be looking at the framework itself or the applications? Anyway, to whoever is still alive on this project, let me know how and where I can help and I'll be glad to. Thanks, Craig Bateman From koenvanderdrift at gmail.com Sat Feb 21 16:44:46 2009 From: koenvanderdrift at gmail.com (Koen van der Drift) Date: Sat, 21 Feb 2009 16:44:46 -0500 Subject: [Biococoa-dev] Introducing myself In-Reply-To: References: Message-ID: <792E4901-8BCE-4382-92CD-0E71C0FD148D@gmail.com> Hi Craig, First of all, welcome to BioCocoa. As you already noticed, the project is indeed very quiet. I think recently only Scott Christley contributed new code to the project. All others are too busy with their real-life-jobs and/or other activities. The ToDo list seems to be still valid. Things that are high on the list are further development of BCSequenceReader and BCSequenceWriter, as well as the implementation of BCAnnotation and BCFeature. Once those are in place, additional functionality can be added. Other item high on the list is the so-called 'killer app' :-) Feel free to dig around in the code, and if you are ready to add code, please contact Peter Schols for repository access. Cheers, - Koen. On Feb 21, 2009, at 3:43 PM, Craig Bateman wrote: > I'm an experienced software engineer looking for an open source mac > project to contribute to, and I'm recently very interested in > genetics. So BioCocoa seemed an obvious choice. > > I looked at the To Do list, and fear that 2+ years later it must be > out of date unless there's just nobody left working on this > project. Is it officially dead? There hasn't been a lot of > movement on this list in the past few months since the 2.1.0 "non"- > release. I've checked out the source and will start digging now to > get a feel for what's here and how it works. What/where are the > primary missing pieces? Has all the 1.x functionality been > incorporated to 2.1? Is anything on the todo list still up for > doing? Should I be looking at the framework itself or the > applications? > > Anyway, to whoever is still alive on this project, let me know how > and where I can help and I'll be glad to. > > Thanks, > Craig Bateman > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/biococoa-dev From schristley at mac.com Mon Feb 23 02:16:25 2009 From: schristley at mac.com (Scott Christley) Date: Sun, 22 Feb 2009 23:16:25 -0800 Subject: [Biococoa-dev] Introducing myself In-Reply-To: References: Message-ID: <8CE96EC9-3B7F-4968-91D9-62781EE3E9FE@mac.com> Hello Craig, The coding I've been doing lately is primarily related to the research I'm doing, so from this sense it doesn't necessarily go fast. My long- term goal is to add some advanced analysis techniques into BioCocoa. One of the key things I would like to do is make the sequence and cached sequence class correspond in their interface. The cached sequence class is important to do large scale analysis on large genomes, because they are too big to load completely into memory. This is something that BioCocoa can offer above other toolkits like BioPerl and BioPython, high performance and large scale analysis. What interests you about genetics? Much of the algorithms in genetics, bioinformatics and so on are still being developed, even things like assembly of genomes is not a "done" technology. If you have a specific interest area, then I can help lay out a series of tasks that would be both highly useful and be interesting algorithmic work. Koen is right, the todo list is still accurate, and those are certainly useful enhancements to make. And the creation of a "killer app" is definitely desired, especially to bring these advanced analysis techniques together into an easy-to-use GUI and/or command line applications that biologists can use. cheers Scott On Feb 21, 2009, at 12:43 PM, Craig Bateman wrote: > I'm an experienced software engineer looking for an open source mac > project to contribute to, and I'm recently very interested in > genetics. So BioCocoa seemed an obvious choice. > > I looked at the To Do list, and fear that 2+ years later it must be > out of date unless there's just nobody left working on this > project. Is it officially dead? There hasn't been a lot of > movement on this list in the past few months since the 2.1.0 "non"- > release. I've checked out the source and will start digging now to > get a feel for what's here and how it works. What/where are the > primary missing pieces? Has all the 1.x functionality been > incorporated to 2.1? Is anything on the todo list still up for > doing? Should I be looking at the framework itself or the > applications? > > Anyway, to whoever is still alive on this project, let me know how > and where I can help and I'll be glad to. > > Thanks, > Craig Bateman > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/biococoa-dev From craig at batemanspace.com Tue Feb 24 17:31:23 2009 From: craig at batemanspace.com (Craig Bateman) Date: Tue, 24 Feb 2009 14:31:23 -0800 Subject: [Biococoa-dev] Fwd: Introducing myself In-Reply-To: <023A6563-26C1-43C8-AFC6-CC54BB1CB285@batemanspace.com> References: <8CE96EC9-3B7F-4968-91D9-62781EE3E9FE@mac.com> <023A6563-26C1-43C8-AFC6-CC54BB1CB285@batemanspace.com> Message-ID: I accidently dropped the list in my reply, so Scott was the only one that got it. ---------- Forwarded message ---------- From: Craig Bateman Date: Mon, Feb 23, 2009 at 2:01 AM Subject: Re: [Biococoa-dev] Introducing myself To: Scott Christley Well, unfortunately I can't state what, in particular interests me about genetics, mostly because I know so little. I read the blind watchmaker and was intrigued by the author's explanation of how genes work, and since then have read other books about the human genome and the effects of certain genes on human development, etc. I guess I'm just vaguely interested in genetics research because I want to know. I certainly can't state that I'm interested in any one sub-topic over any other. In short, I've barely scratched the surface, and want to learn so much more... I am, however, an avid programmer, and was hoping that my vague interest in the domain of genetics coupled with my years of writing software (banking analysis software, but software all the same) would combine to provide a great developer resource for the project. As far as a "killer app" goes, I couldn't even guess what something like that would look like for BioCocoa... If you have some ideas I can certainly bring something to light, but honestly I haven't a clue about how any of this sequence information is actually used and/or what features in such an app would be useful. Unifying the BC*Sequence classes is a good idea, maybe I'll look at that first as a tooth-cutting exercise. Aside from that, I read a bit about "shotgun" sequencing, which may not be what it's actually called, but where overlapping bits of a sequence are used to assemble an entire sequence. So I've got a lot to learn, but anything I can contribute to this project or genetics/proteins/cancer/whatever research in general is a win in my book. On Feb 22, 2009, at 11:16 PM, Scott Christley wrote: Hello Craig, > > The coding I've been doing lately is primarily related to the research I'm > doing, so from this sense it doesn't necessarily go fast. My long-term goal > is to add some advanced analysis techniques into BioCocoa. > > One of the key things I would like to do is make the sequence and cached > sequence class correspond in their interface. The cached sequence class is > important to do large scale analysis on large genomes, because they are too > big to load completely into memory. This is something that BioCocoa can > offer above other toolkits like BioPerl and BioPython, high performance and > large scale analysis. > > What interests you about genetics? Much of the algorithms in genetics, > bioinformatics and so on are still being developed, even things like > assembly of genomes is not a "done" technology. If you have a specific > interest area, then I can help lay out a series of tasks that would be both > highly useful and be interesting algorithmic work. > > Koen is right, the todo list is still accurate, and those are certainly > useful enhancements to make. And the creation of a "killer app" is > definitely desired, especially to bring these advanced analysis techniques > together into an easy-to-use GUI and/or command line applications that > biologists can use. > > cheers > Scott > > On Feb 21, 2009, at 12:43 PM, Craig Bateman wrote: > > I'm an experienced software engineer looking for an open source mac >> project to contribute to, and I'm recently very interested in genetics. So >> BioCocoa seemed an obvious choice. >> >> I looked at the To Do list, and fear that 2+ years later it must be out of >> date unless there's just nobody left working on this project. Is it >> officially dead? There hasn't been a lot of movement on this list in the >> past few months since the 2.1.0 "non"-release. I've checked out the source >> and will start digging now to get a feel for what's here and how it works. >> What/where are the primary missing pieces? Has all the 1.x functionality >> been incorporated to 2.1? Is anything on the todo list still up for doing? >> Should I be looking at the framework itself or the applications? >> >> Anyway, to whoever is still alive on this project, let me know how and >> where I can help and I'll be glad to. >> >> Thanks, >> Craig Bateman >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/biococoa-dev >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From schristley at mac.com Wed Feb 25 19:01:18 2009 From: schristley at mac.com (Scott Christley) Date: Wed, 25 Feb 2009 16:01:18 -0800 Subject: [Biococoa-dev] Introducing myself In-Reply-To: References: <8CE96EC9-3B7F-4968-91D9-62781EE3E9FE@mac.com> <023A6563-26C1-43C8-AFC6-CC54BB1CB285@batemanspace.com> Message-ID: Hey Craig, Well that is great, really. Sounds similar to my experience, I entered my PhD to do core computer science, software engineering, then got involved with a biology project, was hooked and been following every since. One thing you might want to look at are the two main genome browsers that exist today, one by UC Santa Cruz and the other by Ensembl. http://genome.cse.ucsc.edu/ http://www.ensembl.org/index.html There is also a project that I'm involved with, VectorBase, which also uses the Ensembl browser. http://www.vectorbase.org/index.php The reason I point these out is because all of them are web-based, which is great, but a potential killer app "might be" to have a local application which would allow researchers to analyze their local data. Reproducing the functionality of these genome browsers isn't the way to go, but there are many potential niches to be filled. Yes, shotgun sequencing is exactly what it is called. Humorous name for sure, but you are exactly right, the "shotgun" blasts the genome into many smaller bits, which are then assembled together afterward. It was quite controversial when Venter's company took the approach for the human genome project, in defiance of the public consortium which was doing it the expensive, slow, but more accurate way. But now it is the standard way, though its not perfect, and assembly in general is a difficult problem. So for the BC*Sequence classes, if you look in the BCSequenceIO group then you will find a BCCachedSequenceFile and BCCachedFastaFile classes, which handle the file I/O. What is missing is a BCCachedSequence class, to correspond to BCSequence. From a design perspective, the two classes should stay separate (memory-based versus file-based) but I think a protocol which defines a common interface is what is needed. cheers Scott On Feb 24, 2009, at 2:31 PM, Craig Bateman wrote: > I accidently dropped the list in my reply, so Scott was the only one > that got it. > > ---------- Forwarded message ---------- > From: Craig Bateman > Date: Mon, Feb 23, 2009 at 2:01 AM > Subject: Re: [Biococoa-dev] Introducing myself > To: Scott Christley > > > Well, unfortunately I can't state what, in particular interests me > about genetics, mostly because I know so little. I read the blind > watchmaker and was intrigued by the author's explanation of how > genes work, and since then have read other books about the human > genome and the effects of certain genes on human development, etc. > I guess I'm just vaguely interested in genetics research because I > want to know. I certainly can't state that I'm interested in any > one sub-topic over any other. In short, I've barely scratched the > surface, and want to learn so much more... > > I am, however, an avid programmer, and was hoping that my vague > interest in the domain of genetics coupled with my years of writing > software (banking analysis software, but software all the same) > would combine to provide a great developer resource for the project. > > As far as a "killer app" goes, I couldn't even guess what something > like that would look like for BioCocoa... If you have some ideas I > can certainly bring something to light, but honestly I haven't a > clue about how any of this sequence information is actually used and/ > or what features in such an app would be useful. > > Unifying the BC*Sequence classes is a good idea, maybe I'll look at > that first as a tooth-cutting exercise. Aside from that, I read a > bit about "shotgun" sequencing, which may not be what it's actually > called, but where overlapping bits of a sequence are used to > assemble an entire sequence. > > So I've got a lot to learn, but anything I can contribute to this > project or genetics/proteins/cancer/whatever research in general is > a win in my book. > > > > On Feb 22, 2009, at 11:16 PM, Scott Christley wrote: > > Hello Craig, > > The coding I've been doing lately is primarily related to the > research I'm doing, so from this sense it doesn't necessarily go > fast. My long-term goal is to add some advanced analysis techniques > into BioCocoa. > > One of the key things I would like to do is make the sequence and > cached sequence class correspond in their interface. The cached > sequence class is important to do large scale analysis on large > genomes, because they are too big to load completely into memory. > This is something that BioCocoa can offer above other toolkits like > BioPerl and BioPython, high performance and large scale analysis. > > What interests you about genetics? Much of the algorithms in > genetics, bioinformatics and so on are still being developed, even > things like assembly of genomes is not a "done" technology. If you > have a specific interest area, then I can help lay out a series of > tasks that would be both highly useful and be interesting > algorithmic work. > > Koen is right, the todo list is still accurate, and those are > certainly useful enhancements to make. And the creation of a > "killer app" is definitely desired, especially to bring these > advanced analysis techniques together into an easy-to-use GUI and/or > command line applications that biologists can use. > > cheers > Scott > > On Feb 21, 2009, at 12:43 PM, Craig Bateman wrote: > > I'm an experienced software engineer looking for an open source mac > project to contribute to, and I'm recently very interested in > genetics. So BioCocoa seemed an obvious choice. > > I looked at the To Do list, and fear that 2+ years later it must be > out of date unless there's just nobody left working on this > project. Is it officially dead? There hasn't been a lot of > movement on this list in the past few months since the 2.1.0 "non"- > release. I've checked out the source and will start digging now to > get a feel for what's here and how it works. What/where are the > primary missing pieces? Has all the 1.x functionality been > incorporated to 2.1? Is anything on the todo list still up for > doing? Should I be looking at the framework itself or the > applications? > > Anyway, to whoever is still alive on this project, let me know how > and where I can help and I'll be glad to. > > Thanks, > Craig Bateman > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/biococoa-dev > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From craig at batemanspace.com Fri Feb 27 15:39:10 2009 From: craig at batemanspace.com (Craig Bateman) Date: Fri, 27 Feb 2009 12:39:10 -0800 Subject: [Biococoa-dev] BCSequence class cluster? [Was Re: Introducing myself] Message-ID: After looking at this for a while, I agree that a protocol would do it, and would be consistent with using an Interface in many other languages, but a BCSequence class cluster (and probably a BCSequenceArray cluster that included a sequenceWithId: method since many file formats support multiple sequences) might be a bit more elegant. Especially if we're serious about wanting a BCMutableSequence. This pattern is common in objective-c when you have multiple classes that all implement the same interface and the actual class to use is discernible at the time of construction. It's a little harder to implement, but then consumers of the library don't need to worry about which class(es) they need for a given purpose. The pseudo-code to use them would then be something like: (Sorry about the naming here, I don't have the source in front of me as I write this) BCSequenceFile *myFile = [BCCachedSequenceFile fileWithContentsofFile:@ "Whatever.fs"]; BCSequenceArray *myArray = [BCSequenceArray arrayWithSequenceFile:myFile]; BCSequence *first = [myArray sequenceAtIndex:0]; or BCSequence *mySeq = [myArray sequenceWithId:@"GYS2"] The end user would be given an instance of BCCachedFastaFile in myFile, BCCachedSequenceArray in myArray and BCCachedSequence for the two sequence calls. This would all happen transparently behind the scenes and they wouldn't necessarily need to know what class they were using. Externally the memory vs file sequences look the same. Internally the memory BCSequence utilizes an NSData while the file-based one utilizes an NSFileHandle with an NSRange over the sequence (at lesat that's how FASTA would work, other format implementations would vary significantly). I'm pretty sure this can be done without introducing any breaking changes. Does anyone object to me attempting to implement these this way? On Wed, Feb 25, 2009 at 4:01 PM, Scott Christley wrote: > Hey Craig, > > Well that is great, really. Sounds similar to my experience, I entered my > PhD to do core computer science, software engineering, then got involved > with a biology project, was hooked and been following every since. > > One thing you might want to look at are the two main genome browsers that > exist today, one by UC Santa Cruz and the other by Ensembl. > > http://genome.cse.ucsc.edu/ > http://www.ensembl.org/index.html > > There is also a project that I'm involved with, VectorBase, which also uses > the Ensembl browser. > > http://www.vectorbase.org/index.php > > The reason I point these out is because all of them are web-based, which is > great, but a potential killer app "might be" to have a local application > which would allow researchers to analyze their local data. Reproducing the > functionality of these genome browsers isn't the way to go, but there are > many potential niches to be filled. > > Yes, shotgun sequencing is exactly what it is called. Humorous name for > sure, but you are exactly right, the "shotgun" blasts the genome into many > smaller bits, which are then assembled together afterward. It was quite > controversial when Venter's company took the approach for the human genome > project, in defiance of the public consortium which was doing it the > expensive, slow, but more accurate way. But now it is the standard way, > though its not perfect, and assembly in general is a difficult problem. > > So for the BC*Sequence classes, if you look in the BCSequenceIO group then > you will find a BCCachedSequenceFile and BCCachedFastaFile classes, which > handle the file I/O. What is missing is a BCCachedSequence class, to > correspond to BCSequence. From a design perspective, the two classes should > stay separate (memory-based versus file-based) but I think a protocol which > defines a common interface is what is needed. > > cheers > Scott > > > On Feb 24, 2009, at 2:31 PM, Craig Bateman wrote: > > I accidently dropped the list in my reply, so Scott was the only one that > got it. > > ---------- Forwarded message ---------- > From: Craig Bateman > Date: Mon, Feb 23, 2009 at 2:01 AM > Subject: Re: [Biococoa-dev] Introducing myself > To: Scott Christley > > > Well, unfortunately I can't state what, in particular interests me about > genetics, mostly because I know so little. I read the blind watchmaker and > was intrigued by the author's explanation of how genes work, and since then > have read other books about the human genome and the effects of certain > genes on human development, etc. I guess I'm just vaguely interested in > genetics research because I want to know. I certainly can't state that I'm > interested in any one sub-topic over any other. In short, I've barely > scratched the surface, and want to learn so much more... > > I am, however, an avid programmer, and was hoping that my vague interest in > the domain of genetics coupled with my years of writing software (banking > analysis software, but software all the same) would combine to provide a > great developer resource for the project. > > As far as a "killer app" goes, I couldn't even guess what something like > that would look like for BioCocoa... If you have some ideas I can certainly > bring something to light, but honestly I haven't a clue about how any of > this sequence information is actually used and/or what features in such an > app would be useful. > > Unifying the BC*Sequence classes is a good idea, maybe I'll look at that > first as a tooth-cutting exercise. Aside from that, I read a bit about > "shotgun" sequencing, which may not be what it's actually called, but where > overlapping bits of a sequence are used to assemble an entire sequence. > > So I've got a lot to learn, but anything I can contribute to this project > or genetics/proteins/cancer/whatever research in general is a win in my > book. > > > > On Feb 22, 2009, at 11:16 PM, Scott Christley wrote: > > Hello Craig, >> >> The coding I've been doing lately is primarily related to the research I'm >> doing, so from this sense it doesn't necessarily go fast. My long-term goal >> is to add some advanced analysis techniques into BioCocoa. >> >> One of the key things I would like to do is make the sequence and cached >> sequence class correspond in their interface. The cached sequence class is >> important to do large scale analysis on large genomes, because they are too >> big to load completely into memory. This is something that BioCocoa can >> offer above other toolkits like BioPerl and BioPython, high performance and >> large scale analysis. >> >> What interests you about genetics? Much of the algorithms in genetics, >> bioinformatics and so on are still being developed, even things like >> assembly of genomes is not a "done" technology. If you have a specific >> interest area, then I can help lay out a series of tasks that would be both >> highly useful and be interesting algorithmic work. >> >> Koen is right, the todo list is still accurate, and those are certainly >> useful enhancements to make. And the creation of a "killer app" is >> definitely desired, especially to bring these advanced analysis techniques >> together into an easy-to-use GUI and/or command line applications that >> biologists can use. >> >> cheers >> Scott >> >> On Feb 21, 2009, at 12:43 PM, Craig Bateman wrote: >> >> I'm an experienced software engineer looking for an open source mac >>> project to contribute to, and I'm recently very interested in genetics. So >>> BioCocoa seemed an obvious choice. >>> >>> I looked at the To Do list, and fear that 2+ years later it must be out >>> of date unless there's just nobody left working on this project. Is it >>> officially dead? There hasn't been a lot of movement on this list in the >>> past few months since the 2.1.0 "non"-release. I've checked out the source >>> and will start digging now to get a feel for what's here and how it works. >>> What/where are the primary missing pieces? Has all the 1.x functionality >>> been incorporated to 2.1? Is anything on the todo list still up for doing? >>> Should I be looking at the framework itself or the applications? >>> >>> Anyway, to whoever is still alive on this project, let me know how and >>> where I can help and I'll be glad to. >>> >>> Thanks, >>> Craig Bateman >>> >>> _______________________________________________ >>> Biococoa-dev mailing list >>> Biococoa-dev at bioinformatics.org >>> http://www.bioinformatics.org/mailman/listinfo/biococoa-dev >>> >> >> >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From akhudek at cs.uwaterloo.ca Fri Feb 27 16:02:27 2009 From: akhudek at cs.uwaterloo.ca (Alexander K. Hudek) Date: Fri, 27 Feb 2009 16:02:27 -0500 Subject: [Biococoa-dev] Introducing myself In-Reply-To: References: <8CE96EC9-3B7F-4968-91D9-62781EE3E9FE@mac.com> <023A6563-26C1-43C8-AFC6-CC54BB1CB285@batemanspace.com> Message-ID: I would personally love to see a local sequence analysis platform. Currently I use hand rolled applications with R for visualization, or Vista. But Vista is very specific in what it does and R is less than optimal. I did make an attempt to start such an application several months ago, but sadly found that I needed results rather quicker than it was taking to both learn Cocoa and build a GUI analysis application at the same time. I did, however, manage to build a very basic BLAST like alignment view. I've attached the view source in case someone finds it useful. Alex Alexander K. Hudek School of Computer Science, University of Waterloo 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada Phone: +1 519 888 4567 x37886 Fax: +1 519 885 1208 On 2009-02-25, at 7:01 PM, Scott Christley wrote: > Hey Craig, > > Well that is great, really. Sounds similar to my experience, I > entered my PhD to do core computer science, software engineering, > then got involved with a biology project, was hooked and been > following every since. > > One thing you might want to look at are the two main genome browsers > that exist today, one by UC Santa Cruz and the other by Ensembl. > > http://genome.cse.ucsc.edu/ > http://www.ensembl.org/index.html > > There is also a project that I'm involved with, VectorBase, which > also uses the Ensembl browser. > > http://www.vectorbase.org/index.php > > The reason I point these out is because all of them are web-based, > which is great, but a potential killer app "might be" to have a > local application which would allow researchers to analyze their > local data. Reproducing the functionality of these genome browsers > isn't the way to go, but there are many potential niches to be filled. > > Yes, shotgun sequencing is exactly what it is called. Humorous name > for sure, but you are exactly right, the "shotgun" blasts the genome > into many smaller bits, which are then assembled together > afterward. It was quite controversial when Venter's company took > the approach for the human genome project, in defiance of the public > consortium which was doing it the expensive, slow, but more accurate > way. But now it is the standard way, though its not perfect, and > assembly in general is a difficult problem. > > So for the BC*Sequence classes, if you look in the BCSequenceIO > group then you will find a BCCachedSequenceFile and > BCCachedFastaFile classes, which handle the file I/O. What is > missing is a BCCachedSequence class, to correspond to BCSequence. > From a design perspective, the two classes should stay separate > (memory-based versus file-based) but I think a protocol which > defines a common interface is what is needed. > > cheers > Scott > > > On Feb 24, 2009, at 2:31 PM, Craig Bateman wrote: > >> I accidently dropped the list in my reply, so Scott was the only >> one that got it. >> >> ---------- Forwarded message ---------- >> From: Craig Bateman >> Date: Mon, Feb 23, 2009 at 2:01 AM >> Subject: Re: [Biococoa-dev] Introducing myself >> To: Scott Christley >> >> >> Well, unfortunately I can't state what, in particular interests me >> about genetics, mostly because I know so little. I read the blind >> watchmaker and was intrigued by the author's explanation of how >> genes work, and since then have read other books about the human >> genome and the effects of certain genes on human development, etc. >> I guess I'm just vaguely interested in genetics research because I >> want to know. I certainly can't state that I'm interested in any >> one sub-topic over any other. In short, I've barely scratched the >> surface, and want to learn so much more... >> >> I am, however, an avid programmer, and was hoping that my vague >> interest in the domain of genetics coupled with my years of writing >> software (banking analysis software, but software all the same) >> would combine to provide a great developer resource for the project. >> >> As far as a "killer app" goes, I couldn't even guess what something >> like that would look like for BioCocoa... If you have some ideas I >> can certainly bring something to light, but honestly I haven't a >> clue about how any of this sequence information is actually used >> and/or what features in such an app would be useful. >> >> Unifying the BC*Sequence classes is a good idea, maybe I'll look at >> that first as a tooth-cutting exercise. Aside from that, I read a >> bit about "shotgun" sequencing, which may not be what it's actually >> called, but where overlapping bits of a sequence are used to >> assemble an entire sequence. >> >> So I've got a lot to learn, but anything I can contribute to this >> project or genetics/proteins/cancer/whatever research in general is >> a win in my book. >> >> >> >> On Feb 22, 2009, at 11:16 PM, Scott Christley wrote: >> >> Hello Craig, >> >> The coding I've been doing lately is primarily related to the >> research I'm doing, so from this sense it doesn't necessarily go >> fast. My long-term goal is to add some advanced analysis >> techniques into BioCocoa. >> >> One of the key things I would like to do is make the sequence and >> cached sequence class correspond in their interface. The cached >> sequence class is important to do large scale analysis on large >> genomes, because they are too big to load completely into memory. >> This is something that BioCocoa can offer above other toolkits like >> BioPerl and BioPython, high performance and large scale analysis. >> >> What interests you about genetics? Much of the algorithms in >> genetics, bioinformatics and so on are still being developed, even >> things like assembly of genomes is not a "done" technology. If you >> have a specific interest area, then I can help lay out a series of >> tasks that would be both highly useful and be interesting >> algorithmic work. >> >> Koen is right, the todo list is still accurate, and those are >> certainly useful enhancements to make. And the creation of a >> "killer app" is definitely desired, especially to bring these >> advanced analysis techniques together into an easy-to-use GUI and/ >> or command line applications that biologists can use. >> >> cheers >> Scott >> >> On Feb 21, 2009, at 12:43 PM, Craig Bateman wrote: >> >> I'm an experienced software engineer looking for an open source mac >> project to contribute to, and I'm recently very interested in >> genetics. So BioCocoa seemed an obvious choice. >> >> I looked at the To Do list, and fear that 2+ years later it must be >> out of date unless there's just nobody left working on this >> project. Is it officially dead? There hasn't been a lot of >> movement on this list in the past few months since the 2.1.0 "non"- >> release. I've checked out the source and will start digging now to >> get a feel for what's here and how it works. What/where are the >> primary missing pieces? Has all the 1.x functionality been >> incorporated to 2.1? Is anything on the todo list still up for >> doing? Should I be looking at the framework itself or the >> applications? >> >> Anyway, to whoever is still alive on this project, let me know how >> and where I can help and I'll be glad to. >> >> Thanks, >> Craig Bateman >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/biococoa-dev >> >> >> >> >> > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/biococoa-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: BlastView.m Type: application/octet-stream Size: 6531 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: BlastView.h Type: application/octet-stream Size: 558 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 194 bytes Desc: This is a digitally signed message part URL: From charles.parnot at gmail.com Fri Feb 27 16:18:50 2009 From: charles.parnot at gmail.com (Charles Parnot) Date: Fri, 27 Feb 2009 13:18:50 -0800 Subject: [Biococoa-dev] BCSequence class cluster? [Was Re: Introducing myself] In-Reply-To: References: Message-ID: <215B031C-298A-4C3D-848A-7BA71FC4318A@gmail.com> Maybe it's not fair for me to vote, since I don't contribute (anymore) to BioCocoa, but my preference as a hypothetical user of the framework develping an hypothetical application, I would prefer a class cluster that allows me to not have to care about the size of the data, and to let the framework make the right decision for me (and for the hypothetical users of my hypothetical application) :) charles On Feb 27, 2009, at 12:39 PM, Craig Bateman wrote: > After looking at this for a while, I agree that a protocol would do > it, and would be consistent with using an Interface in many other > languages, but a BCSequence class cluster (and probably a > BCSequenceArray cluster that included a sequenceWithId: method since > many file formats support multiple sequences) might be a bit more > elegant. Especially if we're serious about wanting a > BCMutableSequence. > > This pattern is common in objective-c when you have multiple classes > that all implement the same interface and the actual class to use is > discernible at the time of construction. It's a little harder to > implement, but then consumers of the library don't need to worry > about which class(es) they need for a given purpose. > > The pseudo-code to use them would then be something like: (Sorry > about the naming here, I don't have the source in front of me as I > write this) > > BCSequenceFile *myFile = [BCCachedSequenceFile > fileWithContentsofFile:@"Whatever.fs"]; > BCSequenceArray *myArray = [BCSequenceArray > arrayWithSequenceFile:myFile]; > BCSequence *first = [myArray sequenceAtIndex:0]; > or > BCSequence *mySeq = [myArray sequenceWithId:@"GYS2"] > > The end user would be given an instance of BCCachedFastaFile in > myFile, BCCachedSequenceArray in myArray and BCCachedSequence for > the two sequence calls. This would all happen transparently behind > the scenes and they wouldn't necessarily need to know what class > they were using. Externally the memory vs file sequences look the > same. Internally the memory BCSequence utilizes an NSData while the > file-based one utilizes an NSFileHandle with an NSRange over the > sequence (at lesat that's how FASTA would work, other format > implementations would vary significantly). > > I'm pretty sure this can be done without introducing any breaking > changes. Does anyone object to me attempting to implement these > this way? > > On Wed, Feb 25, 2009 at 4:01 PM, Scott Christley > wrote: > Hey Craig, > > Well that is great, really. Sounds similar to my experience, I > entered my PhD to do core computer science, software engineering, > then got involved with a biology project, was hooked and been > following every since. > > One thing you might want to look at are the two main genome browsers > that exist today, one by UC Santa Cruz and the other by Ensembl. > > http://genome.cse.ucsc.edu/ > http://www.ensembl.org/index.html > > There is also a project that I'm involved with, VectorBase, which > also uses the Ensembl browser. > > http://www.vectorbase.org/index.php > > The reason I point these out is because all of them are web-based, > which is great, but a potential killer app "might be" to have a > local application which would allow researchers to analyze their > local data. Reproducing the functionality of these genome browsers > isn't the way to go, but there are many potential niches to be filled. > > Yes, shotgun sequencing is exactly what it is called. Humorous name > for sure, but you are exactly right, the "shotgun" blasts the genome > into many smaller bits, which are then assembled together > afterward. It was quite controversial when Venter's company took > the approach for the human genome project, in defiance of the public > consortium which was doing it the expensive, slow, but more accurate > way. But now it is the standard way, though its not perfect, and > assembly in general is a difficult problem. > > So for the BC*Sequence classes, if you look in the BCSequenceIO > group then you will find a BCCachedSequenceFile and > BCCachedFastaFile classes, which handle the file I/O. What is > missing is a BCCachedSequence class, to correspond to BCSequence. > From a design perspective, the two classes should stay separate > (memory-based versus file-based) but I think a protocol which > defines a common interface is what is needed. > > cheers > Scott > > > On Feb 24, 2009, at 2:31 PM, Craig Bateman wrote: > >> I accidently dropped the list in my reply, so Scott was the only >> one that got it. >> >> ---------- Forwarded message ---------- >> From: Craig Bateman >> Date: Mon, Feb 23, 2009 at 2:01 AM >> Subject: Re: [Biococoa-dev] Introducing myself >> To: Scott Christley >> >> >> Well, unfortunately I can't state what, in particular interests me >> about genetics, mostly because I know so little. I read the blind >> watchmaker and was intrigued by the author's explanation of how >> genes work, and since then have read other books about the human >> genome and the effects of certain genes on human development, etc. >> I guess I'm just vaguely interested in genetics research because I >> want to know. I certainly can't state that I'm interested in any >> one sub-topic over any other. In short, I've barely scratched the >> surface, and want to learn so much more... >> >> I am, however, an avid programmer, and was hoping that my vague >> interest in the domain of genetics coupled with my years of writing >> software (banking analysis software, but software all the same) >> would combine to provide a great developer resource for the project. >> >> As far as a "killer app" goes, I couldn't even guess what something >> like that would look like for BioCocoa... If you have some ideas I >> can certainly bring something to light, but honestly I haven't a >> clue about how any of this sequence information is actually used >> and/or what features in such an app would be useful. >> >> Unifying the BC*Sequence classes is a good idea, maybe I'll look at >> that first as a tooth-cutting exercise. Aside from that, I read a >> bit about "shotgun" sequencing, which may not be what it's actually >> called, but where overlapping bits of a sequence are used to >> assemble an entire sequence. >> >> So I've got a lot to learn, but anything I can contribute to this >> project or genetics/proteins/cancer/whatever research in general is >> a win in my book. >> >> >> >> On Feb 22, 2009, at 11:16 PM, Scott Christley wrote: >> >> Hello Craig, >> >> The coding I've been doing lately is primarily related to the >> research I'm doing, so from this sense it doesn't necessarily go >> fast. My long-term goal is to add some advanced analysis >> techniques into BioCocoa. >> >> One of the key things I would like to do is make the sequence and >> cached sequence class correspond in their interface. The cached >> sequence class is important to do large scale analysis on large >> genomes, because they are too big to load completely into memory. >> This is something that BioCocoa can offer above other toolkits like >> BioPerl and BioPython, high performance and large scale analysis. >> >> What interests you about genetics? Much of the algorithms in >> genetics, bioinformatics and so on are still being developed, even >> things like assembly of genomes is not a "done" technology. If you >> have a specific interest area, then I can help lay out a series of >> tasks that would be both highly useful and be interesting >> algorithmic work. >> >> Koen is right, the todo list is still accurate, and those are >> certainly useful enhancements to make. And the creation of a >> "killer app" is definitely desired, especially to bring these >> advanced analysis techniques together into an easy-to-use GUI and/ >> or command line applications that biologists can use. >> >> cheers >> Scott >> >> On Feb 21, 2009, at 12:43 PM, Craig Bateman wrote: >> >> I'm an experienced software engineer looking for an open source mac >> project to contribute to, and I'm recently very interested in >> genetics. So BioCocoa seemed an obvious choice. >> >> I looked at the To Do list, and fear that 2+ years later it must be >> out of date unless there's just nobody left working on this >> project. Is it officially dead? There hasn't been a lot of >> movement on this list in the past few months since the 2.1.0 "non"- >> release. I've checked out the source and will start digging now to >> get a feel for what's here and how it works. What/where are the >> primary missing pieces? Has all the 1.x functionality been >> incorporated to 2.1? Is anything on the todo list still up for >> doing? Should I be looking at the framework itself or the >> applications? >> >> Anyway, to whoever is still alive on this project, let me know how >> and where I can help and I'll be glad to. >> >> Thanks, >> Craig Bateman >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/biococoa-dev >> >> >> >> >> > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/biococoa-dev -- OpenMacGrid Help science move fast forward: http://www.macresearch.org/openmacgrid Charles Parnot charles.parnot at gmail.com From koenvanderdrift at gmail.com Fri Feb 27 18:11:11 2009 From: koenvanderdrift at gmail.com (Koen van der Drift) Date: Fri, 27 Feb 2009 18:11:11 -0500 Subject: [Biococoa-dev] BCSequence class cluster? [Was Re: Introducing myself] In-Reply-To: <215B031C-298A-4C3D-848A-7BA71FC4318A@gmail.com> References: <215B031C-298A-4C3D-848A-7BA71FC4318A@gmail.com> Message-ID: <72827045-8996-41EA-8887-6173AC39D0AB@gmail.com> I non-hypothetically agree with Charles :) Also, I can highly recommend the new developers to look at the mailinglist archives to see some elongated discussions about the design and structure of the framework. I think at the end we all agreed that this structure works very well. The two main important ones are the use of sequence-alphabets and class clusters. The first one allows only the use of the BCSequence class, instead of seperate classes for proteins, nucleotides, etc. By specifying the alphabet, the right type of sequence will be created. It also takes care of allowing only those actions on a sequence that are sensible. Eg, you cannot translate a protein, etc. The alphabet design was 'borrowed' from the BioJava project. The class cluster (IIRC) allows the hiding of the implementation for a group of similar classes, such as BCSequence and BCCachedSequence, as pointed out by Charles. Cheers, - Koen. On Feb 27, 2009, at 4:18 PM, Charles Parnot wrote: > Maybe it's not fair for me to vote, since I don't contribute > (anymore) to BioCocoa, but my preference as a hypothetical user of > the framework develping an hypothetical application, I would prefer > a class cluster that allows me to not have to care about the size of > the data, and to let the framework make the right decision for me > (and for the hypothetical users of my hypothetical application) :) > > charles > > > On Feb 27, 2009, at 12:39 PM, Craig Bateman wrote: > >> After looking at this for a while, I agree that a protocol would do >> it, and would be consistent with using an Interface in many other >> languages, but a BCSequence class cluster (and probably a >> BCSequenceArray cluster that included a sequenceWithId: method >> since many file formats support multiple sequences) might be a bit >> more elegant. Especially if we're serious about wanting a >> BCMutableSequence. >> >> This pattern is common in objective-c when you have multiple >> classes that all implement the same interface and the actual class >> to use is discernible at the time of construction. It's a little >> harder to implement, but then consumers of the library don't need >> to worry about which class(es) they need for a given purpose. >> >> The pseudo-code to use them would then be something like: (Sorry >> about the naming here, I don't have the source in front of me as I >> write this) >> >> BCSequenceFile *myFile = [BCCachedSequenceFile >> fileWithContentsofFile:@"Whatever.fs"]; >> BCSequenceArray *myArray = [BCSequenceArray >> arrayWithSequenceFile:myFile]; >> BCSequence *first = [myArray sequenceAtIndex:0]; >> or >> BCSequence *mySeq = [myArray sequenceWithId:@"GYS2"] >> >> The end user would be given an instance of BCCachedFastaFile in >> myFile, BCCachedSequenceArray in myArray and BCCachedSequence for >> the two sequence calls. This would all happen transparently behind >> the scenes and they wouldn't necessarily need to know what class >> they were using. Externally the memory vs file sequences look the >> same. Internally the memory BCSequence utilizes an NSData while >> the file-based one utilizes an NSFileHandle with an NSRange over >> the sequence (at lesat that's how FASTA would work, other format >> implementations would vary significantly). >> >> I'm pretty sure this can be done without introducing any breaking >> changes. Does anyone object to me attempting to implement these >> this way? >> >> On Wed, Feb 25, 2009 at 4:01 PM, Scott Christley >> wrote: >> Hey Craig, >> >> Well that is great, really. Sounds similar to my experience, I >> entered my PhD to do core computer science, software engineering, >> then got involved with a biology project, was hooked and been >> following every since. >> >> One thing you might want to look at are the two main genome >> browsers that exist today, one by UC Santa Cruz and the other by >> Ensembl. >> >> http://genome.cse.ucsc.edu/ >> http://www.ensembl.org/index.html >> >> There is also a project that I'm involved with, VectorBase, which >> also uses the Ensembl browser. >> >> http://www.vectorbase.org/index.php >> >> The reason I point these out is because all of them are web-based, >> which is great, but a potential killer app "might be" to have a >> local application which would allow researchers to analyze their >> local data. Reproducing the functionality of these genome browsers >> isn't the way to go, but there are many potential niches to be >> filled. >> >> Yes, shotgun sequencing is exactly what it is called. Humorous >> name for sure, but you are exactly right, the "shotgun" blasts the >> genome into many smaller bits, which are then assembled together >> afterward. It was quite controversial when Venter's company took >> the approach for the human genome project, in defiance of the >> public consortium which was doing it the expensive, slow, but more >> accurate way. But now it is the standard way, though its not >> perfect, and assembly in general is a difficult problem. >> >> So for the BC*Sequence classes, if you look in the BCSequenceIO >> group then you will find a BCCachedSequenceFile and >> BCCachedFastaFile classes, which handle the file I/O. What is >> missing is a BCCachedSequence class, to correspond to BCSequence. >> From a design perspective, the two classes should stay separate >> (memory-based versus file-based) but I think a protocol which >> defines a common interface is what is needed. >> >> cheers >> Scott >> >> >> On Feb 24, 2009, at 2:31 PM, Craig Bateman wrote: >> >>> I accidently dropped the list in my reply, so Scott was the only >>> one that got it. >>> >>> ---------- Forwarded message ---------- >>> From: Craig Bateman >>> Date: Mon, Feb 23, 2009 at 2:01 AM >>> Subject: Re: [Biococoa-dev] Introducing myself >>> To: Scott Christley >>> >>> >>> Well, unfortunately I can't state what, in particular interests me >>> about genetics, mostly because I know so little. I read the blind >>> watchmaker and was intrigued by the author's explanation of how >>> genes work, and since then have read other books about the human >>> genome and the effects of certain genes on human development, >>> etc. I guess I'm just vaguely interested in genetics research >>> because I want to know. I certainly can't state that I'm >>> interested in any one sub-topic over any other. In short, I've >>> barely scratched the surface, and want to learn so much more... >>> >>> I am, however, an avid programmer, and was hoping that my vague >>> interest in the domain of genetics coupled with my years of >>> writing software (banking analysis software, but software all the >>> same) would combine to provide a great developer resource for the >>> project. >>> >>> As far as a "killer app" goes, I couldn't even guess what >>> something like that would look like for BioCocoa... If you have >>> some ideas I can certainly bring something to light, but honestly >>> I haven't a clue about how any of this sequence information is >>> actually used and/or what features in such an app would be useful. >>> >>> Unifying the BC*Sequence classes is a good idea, maybe I'll look >>> at that first as a tooth-cutting exercise. Aside from that, I >>> read a bit about "shotgun" sequencing, which may not be what it's >>> actually called, but where overlapping bits of a sequence are used >>> to assemble an entire sequence. >>> >>> So I've got a lot to learn, but anything I can contribute to this >>> project or genetics/proteins/cancer/whatever research in general >>> is a win in my book. >>> >>> >>> >>> On Feb 22, 2009, at 11:16 PM, Scott Christley wrote: >>> >>> Hello Craig, >>> >>> The coding I've been doing lately is primarily related to the >>> research I'm doing, so from this sense it doesn't necessarily go >>> fast. My long-term goal is to add some advanced analysis >>> techniques into BioCocoa. >>> >>> One of the key things I would like to do is make the sequence and >>> cached sequence class correspond in their interface. The cached >>> sequence class is important to do large scale analysis on large >>> genomes, because they are too big to load completely into memory. >>> This is something that BioCocoa can offer above other toolkits >>> like BioPerl and BioPython, high performance and large scale >>> analysis. >>> >>> What interests you about genetics? Much of the algorithms in >>> genetics, bioinformatics and so on are still being developed, even >>> things like assembly of genomes is not a "done" technology. If >>> you have a specific interest area, then I can help lay out a >>> series of tasks that would be both highly useful and be >>> interesting algorithmic work. >>> >>> Koen is right, the todo list is still accurate, and those are >>> certainly useful enhancements to make. And the creation of a >>> "killer app" is definitely desired, especially to bring these >>> advanced analysis techniques together into an easy-to-use GUI and/ >>> or command line applications that biologists can use. >>> >>> cheers >>> Scott >>> >>> On Feb 21, 2009, at 12:43 PM, Craig Bateman wrote: >>> >>> I'm an experienced software engineer looking for an open source >>> mac project to contribute to, and I'm recently very interested in >>> genetics. So BioCocoa seemed an obvious choice. >>> >>> I looked at the To Do list, and fear that 2+ years later it must >>> be out of date unless there's just nobody left working on this >>> project. Is it officially dead? There hasn't been a lot of >>> movement on this list in the past few months since the 2.1.0 "non"- >>> release. I've checked out the source and will start digging now >>> to get a feel for what's here and how it works. What/where are >>> the primary missing pieces? Has all the 1.x functionality been >>> incorporated to 2.1? Is anything on the todo list still up for >>> doing? Should I be looking at the framework itself or the >>> applications? >>> >>> Anyway, to whoever is still alive on this project, let me know how >>> and where I can help and I'll be glad to. >>> >>> Thanks, >>> Craig Bateman >>> >>> _______________________________________________ >>> Biococoa-dev mailing list >>> Biococoa-dev at bioinformatics.org >>> http://www.bioinformatics.org/mailman/listinfo/biococoa-dev >>> >>> >>> >>> >>> >> >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/biococoa-dev > > -- > OpenMacGrid > Help science move fast forward: > http://www.macresearch.org/openmacgrid > > Charles Parnot > charles.parnot at gmail.com > > > > > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/biococoa-dev From schristley at mac.com Sat Feb 28 12:51:26 2009 From: schristley at mac.com (Scott Christley) Date: Sat, 28 Feb 2009 09:51:26 -0800 Subject: [Biococoa-dev] BCSequence class cluster? [Was Re: Introducing myself] In-Reply-To: References: Message-ID: <8F232061-3981-4E5F-95D1-E0C713AA117E@mac.com> I too agree that the class cluster architecture is a good idea. Let me throw out a few random thoughts, mainly me just thinking about different ways that sequences can be used. * I think the mutable sequence is useful. It isn't very common though as typically sequence data is considered to be ground truth but I think it will become more so as people start doing in silico experimentation, asking "what if" questions when the sequence is mutated. This can be tricky to handle efficiently, but maybe we would want to design the interface around the biological, i.e. SNPs, insertions, deletions, inversions, duplications, etc. * I'm not very enamored with the BCSequenceArray class. I'm not sure how it is any better than just using a standard NSArray, I tried looking in the archives for discussion but didn't really find anything. However my guess is that BCSequenceArray would somehow provide additional sequence specific functionality? Personally, I don't really want to treat BCSequence objects in any special way. I think its best if users can include into standard collections (NSArray, NSDictionary, NSSet) instead of having to use specialized collections. Thoughts? * Craig makes a good point about being able to do -sequenceWithId: to lookup a sequence. One issue to be aware of is that the id's in the FASTA files are not necessarily unique. In fact, the definition of the sequences often lie outside of the fasta file. Now if you download from NCBI then you have a good chance of getting unique id's, but take UCSC's goldenPath for example. If you download the human genome from there, the id just says chr1, chr2, etc. Mix and match with another organism and you can quickly forget which chr goes with whom. So from this perspective, we need to be careful not to rely upon the id's being unique. Typically id's are unique within a file, but this would really have to be a contract that the user enforces, it is not part of the FASTA format. * While I can understand why one would want to maintain the order of sequences in a file when reading them into an array, I think we should avoid relying upon that order being maintained. For example, a BCGenome class is something I've been thinking about adding. It combines together all of the chromosome sequences together (or for some organisms it would just be scaffolds from the assembly); an organization that then could be input to various comparative genomics algorithms. It could be all of the chromosome sequences are in one file, or each chromosome could be in its own file; it is really up to the user. Now if each chr was in its own file, we don't want to manage a set of arrays. We would likely want to pull out all of the sequences from multiple arrays and put them together into a single array. * Down the road (which is only about a block away :-), we won't be talking about just one genome for an organism but many individual genomes (aka 1000genomes.org). Users probably won't have all of these genomes residing on their disk, they are gonna be compressed against a reference genome maybe like we recently published: http://www.ncbi.nlm.nih.gov/pubmed/18996942 So I see in the future having a special BCSequence class which doesn't hold the actual sequence, but only the variations (possibly in a compressed form) from a reference sequence. This should allow hundreds to thousands of genomes to be "loaded" into memory and help alleviate those programs from being I/O bound. cheers Scott On Feb 27, 2009, at 12:39 PM, Craig Bateman wrote: > After looking at this for a while, I agree that a protocol would do > it, and would be consistent with using an Interface in many other > languages, but a BCSequence class cluster (and probably a > BCSequenceArray cluster that included a sequenceWithId: method since > many file formats support multiple sequences) might be a bit more > elegant. Especially if we're serious about wanting a > BCMutableSequence. > > This pattern is common in objective-c when you have multiple classes > that all implement the same interface and the actual class to use is > discernible at the time of construction. It's a little harder to > implement, but then consumers of the library don't need to worry > about which class(es) they need for a given purpose. > > The pseudo-code to use them would then be something like: (Sorry > about the naming here, I don't have the source in front of me as I > write this) > > BCSequenceFile *myFile = [BCCachedSequenceFile > fileWithContentsofFile:@"Whatever.fs"]; > BCSequenceArray *myArray = [BCSequenceArray > arrayWithSequenceFile:myFile]; > BCSequence *first = [myArray sequenceAtIndex:0]; > or > BCSequence *mySeq = [myArray sequenceWithId:@"GYS2"] > > The end user would be given an instance of BCCachedFastaFile in > myFile, BCCachedSequenceArray in myArray and BCCachedSequence for > the two sequence calls. This would all happen transparently behind > the scenes and they wouldn't necessarily need to know what class > they were using. Externally the memory vs file sequences look the > same. Internally the memory BCSequence utilizes an NSData while the > file-based one utilizes an NSFileHandle with an NSRange over the > sequence (at lesat that's how FASTA would work, other format > implementations would vary significantly). > > I'm pretty sure this can be done without introducing any breaking > changes. Does anyone object to me attempting to implement these > this way? > > On Wed, Feb 25, 2009 at 4:01 PM, Scott Christley > wrote: > Hey Craig, > > Well that is great, really. Sounds similar to my experience, I > entered my PhD to do core computer science, software engineering, > then got involved with a biology project, was hooked and been > following every since. > > One thing you might want to look at are the two main genome browsers > that exist today, one by UC Santa Cruz and the other by Ensembl. > > http://genome.cse.ucsc.edu/ > http://www.ensembl.org/index.html > > There is also a project that I'm involved with, VectorBase, which > also uses the Ensembl browser. > > http://www.vectorbase.org/index.php > > The reason I point these out is because all of them are web-based, > which is great, but a potential killer app "might be" to have a > local application which would allow researchers to analyze their > local data. Reproducing the functionality of these genome browsers > isn't the way to go, but there are many potential niches to be filled. > > Yes, shotgun sequencing is exactly what it is called. Humorous name > for sure, but you are exactly right, the "shotgun" blasts the genome > into many smaller bits, which are then assembled together > afterward. It was quite controversial when Venter's company took > the approach for the human genome project, in defiance of the public > consortium which was doing it the expensive, slow, but more accurate > way. But now it is the standard way, though its not perfect, and > assembly in general is a difficult problem. > > So for the BC*Sequence classes, if you look in the BCSequenceIO > group then you will find a BCCachedSequenceFile and > BCCachedFastaFile classes, which handle the file I/O. What is > missing is a BCCachedSequence class, to correspond to BCSequence. > From a design perspective, the two classes should stay separate > (memory-based versus file-based) but I think a protocol which > defines a common interface is what is needed. > > cheers > Scott > > > On Feb 24, 2009, at 2:31 PM, Craig Bateman wrote: > >> I accidently dropped the list in my reply, so Scott was the only >> one that got it. >> >> ---------- Forwarded message ---------- >> From: Craig Bateman >> Date: Mon, Feb 23, 2009 at 2:01 AM >> Subject: Re: [Biococoa-dev] Introducing myself >> To: Scott Christley >> >> >> Well, unfortunately I can't state what, in particular interests me >> about genetics, mostly because I know so little. I read the blind >> watchmaker and was intrigued by the author's explanation of how >> genes work, and since then have read other books about the human >> genome and the effects of certain genes on human development, etc. >> I guess I'm just vaguely interested in genetics research because I >> want to know. I certainly can't state that I'm interested in any >> one sub-topic over any other. In short, I've barely scratched the >> surface, and want to learn so much more... >> >> I am, however, an avid programmer, and was hoping that my vague >> interest in the domain of genetics coupled with my years of writing >> software (banking analysis software, but software all the same) >> would combine to provide a great developer resource for the project. >> >> As far as a "killer app" goes, I couldn't even guess what something >> like that would look like for BioCocoa... If you have some ideas I >> can certainly bring something to light, but honestly I haven't a >> clue about how any of this sequence information is actually used >> and/or what features in such an app would be useful. >> >> Unifying the BC*Sequence classes is a good idea, maybe I'll look at >> that first as a tooth-cutting exercise. Aside from that, I read a >> bit about "shotgun" sequencing, which may not be what it's actually >> called, but where overlapping bits of a sequence are used to >> assemble an entire sequence. >> >> So I've got a lot to learn, but anything I can contribute to this >> project or genetics/proteins/cancer/whatever research in general is >> a win in my book. >> >> >> >> On Feb 22, 2009, at 11:16 PM, Scott Christley wrote: >> >> Hello Craig, >> >> The coding I've been doing lately is primarily related to the >> research I'm doing, so from this sense it doesn't necessarily go >> fast. My long-term goal is to add some advanced analysis >> techniques into BioCocoa. >> >> One of the key things I would like to do is make the sequence and >> cached sequence class correspond in their interface. The cached >> sequence class is important to do large scale analysis on large >> genomes, because they are too big to load completely into memory. >> This is something that BioCocoa can offer above other toolkits like >> BioPerl and BioPython, high performance and large scale analysis. >> >> What interests you about genetics? Much of the algorithms in >> genetics, bioinformatics and so on are still being developed, even >> things like assembly of genomes is not a "done" technology. If you >> have a specific interest area, then I can help lay out a series of >> tasks that would be both highly useful and be interesting >> algorithmic work. >> >> Koen is right, the todo list is still accurate, and those are >> certainly useful enhancements to make. And the creation of a >> "killer app" is definitely desired, especially to bring these >> advanced analysis techniques together into an easy-to-use GUI and/ >> or command line applications that biologists can use. >> >> cheers >> Scott >> >> On Feb 21, 2009, at 12:43 PM, Craig Bateman wrote: >> >> I'm an experienced software engineer looking for an open source mac >> project to contribute to, and I'm recently very interested in >> genetics. So BioCocoa seemed an obvious choice. >> >> I looked at the To Do list, and fear that 2+ years later it must be >> out of date unless there's just nobody left working on this >> project. Is it officially dead? There hasn't been a lot of >> movement on this list in the past few months since the 2.1.0 "non"- >> release. I've checked out the source and will start digging now to >> get a feel for what's here and how it works. What/where are the >> primary missing pieces? Has all the 1.x functionality been >> incorporated to 2.1? Is anything on the todo list still up for >> doing? Should I be looking at the framework itself or the >> applications? >> >> Anyway, to whoever is still alive on this project, let me know how >> and where I can help and I'll be glad to. >> >> Thanks, >> Craig Bateman >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/biococoa-dev >> >> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From koenvanderdrift at gmail.com Sat Feb 28 13:38:53 2009 From: koenvanderdrift at gmail.com (Koen van der Drift) Date: Sat, 28 Feb 2009 13:38:53 -0500 Subject: [Biococoa-dev] BCSequence class cluster? [Was Re: Introducing myself] In-Reply-To: <8F232061-3981-4E5F-95D1-E0C713AA117E@mac.com> References: <8F232061-3981-4E5F-95D1-E0C713AA117E@mac.com> Message-ID: <58C5486C-A170-4F59-BC36-4D2EE8F4F35A@gmail.com> On Feb 28, 2009, at 12:51 PM, Scott Christley wrote: > > * I'm not very enamored with the BCSequenceArray class. I'm not > sure how it is any better than just using a standard NSArray, I > tried looking in the archives for discussion but didn't really find > anything. However my guess is that BCSequenceArray would somehow > provide additional sequence specific functionality? Personally, I > don't really want to treat BCSequence objects in any special way. I > think its best if users can include into standard collections > (NSArray, NSDictionary, NSSet) instead of having to use specialized > collections. Thoughts? I think I added it to be a replacement for NSArray. I cannot really think what any additional functionality could have been, if I remember I'll post it here. > > * Craig makes a good point about being able to do -sequenceWithId: > to lookup a sequence. One issue to be aware of is that the id's in > the FASTA files are not necessarily unique. In fact, the definition > of the sequences often lie outside of the fasta file. Now if you > download from NCBI then you have a good chance of getting unique > id's, but take UCSC's goldenPath for example. If you download the > human genome from there, the id just says chr1, chr2, etc. Mix and > match with another organism and you can quickly forget which chr > goes with whom. > So from this perspective, we need to be careful not to rely upon the > id's being unique. Typically id's are unique within a file, but > this would really have to be a contract that the user enforces, it > is not part of the FASTA format. This is why we added the BCAnnotation and BCFeature classes. Most data formats have different labels for name, sequence, authors, etc. I think the idea was to make our own definitions (BSSequenceName, BCSequenceAuthor, etc), and let BCSequenceReader take care of putting the right annotation and or feature in combination wth the actual sequence. - Koen.