From davide.cittaro at ifom-ieo-campus.it Tue Sep 11 10:17:52 2007 From: davide.cittaro at ifom-ieo-campus.it (Davide Cittaro) Date: Tue, 11 Sep 2007 16:17:52 +0200 Subject: [Biococoa-dev] biococoa svn and everything Message-ID: <0AE4057B-94AC-4952-940F-5A68583681A7@ifom-ieo-campus.it> Hi there, after some time I've decided to open biococoa framework and try my very first cocoa coding experience. I've added a method to BCSequeceReader so that one can read sequences from macvector. I would like to commit my changes but biococoa website (and bioinformatics.org after all) is not available at this time. Also I would like to test annotation parsing capabilities, but the demo provided (translation and peptides) only manage sequences... Is there software built on BioCocoa I can use to test the framework extensively? Thanks d /* Davide Cittaro HPC and Bioinformatics Systems @ Informatics Core IFOM - Istituto FIRC di Oncologia Molecolare via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro at ifom-ieo-campus.it */ -------------- next part -------------- An HTML attachment was scrubbed... URL: From schristley at mac.com Tue Sep 11 15:12:32 2007 From: schristley at mac.com (Scott Christley) Date: Tue, 11 Sep 2007 15:12:32 -0400 Subject: [Biococoa-dev] biococoa svn and everything In-Reply-To: <0AE4057B-94AC-4952-940F-5A68583681A7@ifom-ieo-campus.it> References: <0AE4057B-94AC-4952-940F-5A68583681A7@ifom-ieo-campus.it> Message-ID: <49F8441F-C43A-4474-899F-73C754274C1A@mac.com> Hello Davide, If you send me a patch with your changes, I can apply it to the repository. You would need to be given developer access in order to commit the changes yourself. As for testing, there is a target in the Xcode project called "Test - BCFoundation" which tests some of the classes, it is not a comprehensive test suite but it does check a number of things. If you build that target, it automatically runs the tests; you can see the output of the tests in the Build Results windows, specifically the Build Transcripts view in the middle section. I don't believe there are any BCSequenceReader tests yet, maybe you would like to donate the first one with a simple test for the macvector format? cheers Scott On Sep 11, 2007, at 10:17 AM, Davide Cittaro wrote: > Hi there, after some time I've decided to open biococoa framework > and try my very first cocoa coding experience. > I've added a method to BCSequeceReader so that one can read > sequences from macvector. I would like to commit my changes but > biococoa website (and bioinformatics.org after all) is not > available at this time. > Also I would like to test annotation parsing capabilities, but the > demo provided (translation and peptides) only manage sequences... > Is there software built on BioCocoa I can use to test the framework > extensively? > > Thanks > > d > > /* > Davide Cittaro > HPC and Bioinformatics Systems @ Informatics Core > > IFOM - Istituto FIRC di Oncologia Molecolare > via adamello, 16 > 20139 Milano > Italy > > tel.: +39(02)574303007 > e-mail: davide.cittaro at ifom-ieo-campus.it > */ > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From kvddrift at earthlink.net Tue Sep 11 20:39:22 2007 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 11 Sep 2007 20:39:22 -0400 Subject: [Biococoa-dev] biococoa svn and everything In-Reply-To: <49F8441F-C43A-4474-899F-73C754274C1A@mac.com> References: <0AE4057B-94AC-4952-940F-5A68583681A7@ifom-ieo-campus.it> <49F8441F-C43A-4474-899F-73C754274C1A@mac.com> Message-ID: <58C003C2-A3D9-450C-B11D-966DD9D1D95A@earthlink.net> Hi Davide and Scott, Welcome to the BioCocoa team! It's good to see some new blood in the project, and to know that there is still interest in the framework. Not sure if you already found this info on our wiki page, but here you can read about how to commit files to the project: http://bioinformatics.org/biococoa/wiki/pmwiki.php?n=Main.AddingFiles cheers, - Koen. On Sep 11, 2007, at 3:12 PM, Scott Christley wrote: > Hello Davide, > > If you send me a patch with your changes, I can apply it to the > repository. You would need to be given developer access in order > to commit the changes yourself. > > As for testing, there is a target in the Xcode project called "Test > - BCFoundation" which tests some of the classes, it is not a > comprehensive test suite but it does check a number of things. If > you build that target, it automatically runs the tests; you can see > the output of the tests in the Build Results windows, specifically > the Build Transcripts view in the middle section. I don't believe > there are any BCSequenceReader tests yet, maybe you would like to > donate the first one with a simple test for the macvector format? > > cheers > Scott > > On Sep 11, 2007, at 10:17 AM, Davide Cittaro wrote: > >> Hi there, after some time I've decided to open biococoa framework >> and try my very first cocoa coding experience. >> I've added a method to BCSequeceReader so that one can read >> sequences from macvector. I would like to commit my changes but >> biococoa website (and bioinformatics.org after all) is not >> available at this time. >> Also I would like to test annotation parsing capabilities, but the >> demo provided (translation and peptides) only manage sequences... >> Is there software built on BioCocoa I can use to test the >> framework extensively? >> >> Thanks >> >> d >> >> /* >> Davide Cittaro >> HPC and Bioinformatics Systems @ Informatics Core >> >> IFOM - Istituto FIRC di Oncologia Molecolare >> via adamello, 16 >> 20139 Milano >> Italy >> >> tel.: +39(02)574303007 >> e-mail: davide.cittaro at ifom-ieo-campus.it >> */ >> >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev From davide.cittaro at ifom-ieo-campus.it Wed Sep 12 03:52:20 2007 From: davide.cittaro at ifom-ieo-campus.it (Davide Cittaro) Date: Wed, 12 Sep 2007 09:52:20 +0200 Subject: [Biococoa-dev] biococoa svn and everything In-Reply-To: <58C003C2-A3D9-450C-B11D-966DD9D1D95A@earthlink.net> References: <0AE4057B-94AC-4952-940F-5A68583681A7@ifom-ieo-campus.it> <49F8441F-C43A-4474-899F-73C754274C1A@mac.com> <58C003C2-A3D9-450C-B11D-966DD9D1D95A@earthlink.net> Message-ID: Hi both! On Sep 12, 2007, at 2:39 AM, Koen van der Drift wrote: > > Welcome to the BioCocoa team! It's good to see some new blood in > the project, and to know that there is still interest in the > framework. > Is the BioCocoa project still "alive"? Don't get me wrong, I've received the last mail from the mailing list in... may 2007! How many people are involved? > Not sure if you already found this info on our wiki page, but here > you can read about how to commit files to the project: > > http://bioinformatics.org/biococoa/wiki/pmwiki.php?n=Main.AddingFiles I'm going to read! Thanks >> >> As for testing, there is a target in the Xcode project called >> "Test - BCFoundation" which tests some of the classes, it is not a >> comprehensive test suite but it does check a number of things. If >> you build that target, it automatically runs the tests; you can >> see the output of the tests in the Build Results windows, >> specifically the Build Transcripts view in the middle section. I >> don't believe there are any BCSequenceReader tests yet, maybe you >> would like to donate the first one with a simple test for the >> macvector format? I have ony two demos in Xcode targets. They both work, I can import MacVector protein and nucleotide sequences. Give me time to implement "features&annotations parsing" then I can think about a real-world test. Oh, consider that I have less than 30 minutes a day to dedicate to BioCocoa :-) d /* Davide Cittaro HPC and Bioinformatics Systems @ Informatics Core IFOM - Istituto FIRC di Oncologia Molecolare via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro at ifom-ieo-campus.it */ -------------- next part -------------- An HTML attachment was scrubbed... URL: From davide.cittaro at ifom-ieo-campus.it Wed Sep 12 04:13:09 2007 From: davide.cittaro at ifom-ieo-campus.it (Davide Cittaro) Date: Wed, 12 Sep 2007 10:13:09 +0200 Subject: [Biococoa-dev] biococoa svn and everything In-Reply-To: References: <0AE4057B-94AC-4952-940F-5A68583681A7@ifom-ieo-campus.it> <49F8441F-C43A-4474-899F-73C754274C1A@mac.com> <58C003C2-A3D9-450C-B11D-966DD9D1D95A@earthlink.net> Message-ID: <6CD597DA-8FF9-42B1-B23F-80A8D3BB32A2@ifom-ieo-campus.it> On Sep 12, 2007, at 9:52 AM, Davide Cittaro wrote: > I have ony two demos in Xcode targets. They both work, I can import > MacVector protein and nucleotide sequences. Ok, I was using the .dmg version... The svn trunk and branch 2.0 both have the test... d /* Davide Cittaro HPC and Bioinformatics Systems @ Informatics Core IFOM - Istituto FIRC di Oncologia Molecolare via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro at ifom-ieo-campus.it */ -------------- next part -------------- An HTML attachment was scrubbed... URL: From davide.cittaro at ifom-ieo-campus.it Wed Sep 12 04:41:50 2007 From: davide.cittaro at ifom-ieo-campus.it (Davide Cittaro) Date: Wed, 12 Sep 2007 10:41:50 +0200 Subject: [Biococoa-dev] biococoa svn and everything In-Reply-To: <49F8441F-C43A-4474-899F-73C754274C1A@mac.com> References: <0AE4057B-94AC-4952-940F-5A68583681A7@ifom-ieo-campus.it> <49F8441F-C43A-4474-899F-73C754274C1A@mac.com> Message-ID: Hi Scott On Sep 11, 2007, at 9:12 PM, Scott Christley wrote: > Hello Davide, > > If you send me a patch with your changes, I can apply it to the > repository. You would need to be given developer access in order > to commit the changes yourself. After a long reading, I've decided to send the patch. It is not clear which version I had to patch... I've checked BCSequenceIO files and they don't look different in various 2.x versions (-> .dmg, trunk, branch 2.0 and release-2.0), BTW I've done the patch against the downloadable .dmg version. Essentially the changes are: in BCSequenceReader.h: MACVECTOR_HEADER struct definition (to get MV file info) (BCSequenceArray *)readMacVectorFile:(NSString *)textFile method definition in BCSequenceReader.m: NSHFSTypeOfFile "scanning" to allow reading of 'NUCL' or 'PROT' files (in the same manner you do for GCK files). (BCSequenceArray *)readMacVectorFile:(NSString *)textFile method implementation d /* Davide Cittaro HPC and Bioinformatics Systems @ Informatics Core IFOM - Istituto FIRC di Oncologia Molecolare via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro at ifom-ieo-campus.it */ ? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: BioCocoa-MacVector.patch Type: application/octet-stream Size: 4700 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From kvddrift at earthlink.net Wed Sep 12 08:47:09 2007 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 12 Sep 2007 08:47:09 -0400 (GMT-04:00) Subject: [Biococoa-dev] biococoa svn and everything Message-ID: <9828231.1189601230166.JavaMail.root@elwamui-muscovy.atl.sa.earthlink.net> >Is the BioCocoa project still "alive"? Don't get me wrong, I've >received the last mail from the mailing list in... may 2007! How many >people are involved? I don't think the project was ever officially shut down, however, the people working on it about two years ago have moved on in their lives, and have no more time to actively work on the project. There were about 5 or 6 people actively involved. The current released version 2.0 is more or less a good starting point to use in apps, but there are also still many things unfinished or missing. So any input is more than welcome! There was also some talk that macresearch.org would host the project (giving us a lot of visibility), but I have not heard about that in a long time. - Koen. From schristley at mac.com Wed Sep 12 16:35:50 2007 From: schristley at mac.com (Scott Christley) Date: Wed, 12 Sep 2007 16:35:50 -0400 Subject: [Biococoa-dev] biococoa svn and everything In-Reply-To: <9828231.1189601230166.JavaMail.root@elwamui-muscovy.atl.sa.earthlink.net> References: <9828231.1189601230166.JavaMail.root@elwamui-muscovy.atl.sa.earthlink.net> Message-ID: That's the great thing about open source projects, they can sleep for awhile then wake up! :-) I'm currently in the process of integrating code for a paper I submitted to BMC Bioinformatics journal for finding ultraconserved elements; I've been making changes so that the code is more flexible and can be used for other analysis. The pieces I'm putting in are: BCCachedSequenceFile, some classes for handling very large sequence files, i.e. whole genomes, by not reading in the whole sequence into memory, but cache meta-data then read from the sequence file when sequence data is needed. BCSuffixArray, implements a suffix array data structure, a memory efficient structure that allows for string matching operations. BCMCP, this uses BCSuffixArray and BCCachedSequenceFile to perform the ultraconserved analysis which is essentially looking for longest common substrings across the whole genome for multiple organisms. I have some gene expression related classes that I worked on a few months ago; some utility stuff like downloading data from NCBI GEO and parsing the SOFT file format, but I need to clean it up before committing it. I'm not exactly sure where I'm going with this as it seems that R/Bioconductor is the main toolkit to use for statistical analysis of gene expression. One of my interests is whole genome comparative analysis, so I intend to keep using BioCocoa and adding code to it; hopefully others find it useful! cheers Scott On Sep 12, 2007, at 8:47 AM, Koen van der Drift wrote: >> Is the BioCocoa project still "alive"? Don't get me wrong, I've >> received the last mail from the mailing list in... may 2007! How many >> people are involved? > > I don't think the project was ever officially shut down, however, > the people working on it about two years ago have moved on in their > lives, and have no more time to actively work on the project. There > were about 5 or 6 people actively involved. The current released > version 2.0 is more or less a good starting point to use in apps, > but there are also still many things unfinished or missing. So any > input is more than welcome! > > There was also some talk that macresearch.org would host the > project (giving us a lot of visibility), but I have not heard about > that in a long time. > > - Koen. > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev From charles.parnot at gmail.com Thu Sep 13 12:48:46 2007 From: charles.parnot at gmail.com (Charles Parnot) Date: Thu, 13 Sep 2007 09:48:46 -0700 Subject: [Biococoa-dev] biococoa svn and everything In-Reply-To: <9828231.1189601230166.JavaMail.root@elwamui-muscovy.atl.sa.earthlink.net> References: <9828231.1189601230166.JavaMail.root@elwamui-muscovy.atl.sa.earthlink.net> Message-ID: <5752D67B-7E27-43D5-BC57-D0EB37D8EF65@gmail.com> Hi there! I was one of the people that did the "moving on" thing I think there is a pretty strong basis in the framework, at least for the export/import tools, and then for the basics BCSequence stuff. I did set up the initial test suite, which I think would need to be updated and extended. When the project went into hibernation mode, the status was (at least from my point of view): * in search of a project leader, that would have some basic amount of time to make decision as to where to go, and do some coding too * needing a "killer" app to wrap the framework and put it to use. This is the only way things would be tested in the real world by real users. The killer app can be a simple sequence editor that expose as much as possible of the underlying framework * a design decision has to be made to allow 2 aspects of the framework to coexist: a core framework that provides the basic functionality; an extension mechanisms that allows people to easily contribute additional more specialized functionality (we had some talks for instance with Phil Seibel about how the NSImage and NSImageRepresentation design could inspire something. But really more thoughts need to be put into that, and nothing has been decided). The idea is that not everybody will be interested in the specialized stuff, so having optional modules would be a good thing. * one of the feature that was in the works was to add annotation/ feature to the basic BCSequence class so, a big roadmap, with lots of avenues ;-) And yes, if things start moving again, or a project leader self nominates, that would certainly warrant a post on macresearch. Hosting the project is also still a possibility, but that would mean some additional work for the project leader too in setting things up and maintaining it, as well as some kind of commitment for a reasonable amount of time. charles On Sep 12, 2007, at 5:47 AM, Koen van der Drift wrote: >> Is the BioCocoa project still "alive"? Don't get me wrong, I've >> received the last mail from the mailing list in... may 2007! How many >> people are involved? > > I don't think the project was ever officially shut down, however, > the people working on it about two years ago have moved on in their > lives, and have no more time to actively work on the project. There > were about 5 or 6 people actively involved. The current released > version 2.0 is more or less a good starting point to use in apps, > but there are also still many things unfinished or missing. So any > input is more than welcome! > > There was also some talk that macresearch.org would host the > project (giving us a lot of visibility), but I have not heard about > that in a long time. > > - Koen. > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev -- Xgrid-at-Stanford Help science move fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford Charles Parnot charles.parnot at gmail.com From schristley at mac.com Wed Sep 19 15:32:02 2007 From: schristley at mac.com (Scott Christley) Date: Wed, 19 Sep 2007 15:32:02 -0400 Subject: [Biococoa-dev] biococoa svn and everything In-Reply-To: <5752D67B-7E27-43D5-BC57-D0EB37D8EF65@gmail.com> References: <9828231.1189601230166.JavaMail.root@elwamui-muscovy.atl.sa.earthlink.net> <5752D67B-7E27-43D5-BC57-D0EB37D8EF65@gmail.com> Message-ID: <3419726E-6BCC-4068-AE36-A6B03EB35F6A@mac.com> Hello Charles! I'm not sure if I'm nominating myself to be project leader, seems a bit ambitious for somebody new to the community, but I certainly have the time and (most importantly) the desire to move BioCocoa forward. What I worry about mostly is not losing the ability to add new developers if and when they come along, I remember that I tried to send an email to Peter Schols from bioinformatics.org and it went into a black hole, I had to find another email to reach him. He was responsive though (thanks Peter if you are out there) once I got the email right, but if he has moved on maybe it would be good to give some others admin access to the project? You are exactly right that there are a lot of avenues that can be taken. I keep thinking to myself that BioCocoa can differentiate itself by providing functionality not provided by the other packages like BioPerl and BioJava. Not sure what a "killer" app would be, one thing that I think would be very cool though is a desktop genome browser (versus the web-based ones) which integrates all the genome information with analysis tools. Has anybody thought about putting an article together and submit to Nucleic Acids Research journal? Might be a good way to get a little awareness as well as have a solid reference that research articles can point to. cheers Scott On Sep 13, 2007, at 12:48 PM, Charles Parnot wrote: > Hi there! > > I was one of the people that did the "moving on" thing > > I think there is a pretty strong basis in the framework, at least > for the export/import tools, and then for the basics BCSequence > stuff. I did set up the initial test suite, which I think would > need to be updated and extended. > > When the project went into hibernation mode, the status was (at > least from my point of view): > > * in search of a project leader, that would have some basic amount > of time to make decision as to where to go, and do some coding too > * needing a "killer" app to wrap the framework and put it to use. > This is the only way things would be tested in the real world by > real users. The killer app can be a simple sequence editor that > expose as much as possible of the underlying framework > * a design decision has to be made to allow 2 aspects of the > framework to coexist: a core framework that provides the basic > functionality; an extension mechanisms that allows people to easily > contribute additional more specialized functionality (we had some > talks for instance with Phil Seibel about how the NSImage and > NSImageRepresentation design could inspire something. But really > more thoughts need to be put into that, and nothing has been > decided). The idea is that not everybody will be interested in the > specialized stuff, so having optional modules would be a good thing. > * one of the feature that was in the works was to add annotation/ > feature to the basic BCSequence class > > so, a big roadmap, with lots of avenues ;-) > > And yes, if things start moving again, or a project leader self > nominates, that would certainly warrant a post on macresearch. > Hosting the project is also still a possibility, but that would > mean some additional work for the project leader too in setting > things up and maintaining it, as well as some kind of commitment > for a reasonable amount of time. > > charles > > > > On Sep 12, 2007, at 5:47 AM, Koen van der Drift wrote: > >>> Is the BioCocoa project still "alive"? Don't get me wrong, I've >>> received the last mail from the mailing list in... may 2007! How >>> many >>> people are involved? >> >> I don't think the project was ever officially shut down, however, >> the people working on it about two years ago have moved on in >> their lives, and have no more time to actively work on the >> project. There were about 5 or 6 people actively involved. The >> current released version 2.0 is more or less a good starting point >> to use in apps, but there are also still many things unfinished or >> missing. So any input is more than welcome! >> >> There was also some talk that macresearch.org would host the >> project (giving us a lot of visibility), but I have not heard >> about that in a long time. >> >> - Koen. >> >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev > > -- > Xgrid-at-Stanford > Help science move fast forward: > http://cmgm.stanford.edu/~cparnot/xgrid-stanford > > Charles Parnot > charles.parnot at gmail.com > > > > From schristley at mac.com Thu Sep 20 17:27:49 2007 From: schristley at mac.com (Scott Christley) Date: Thu, 20 Sep 2007 17:27:49 -0400 Subject: [Biococoa-dev] MacVector support Message-ID: <2D20567F-100C-4F0B-BB6E-2FBFF682258A@mac.com> Thanks to Davide Cittaro, BioCocoa now has support for reading MacVector sequence files. I've added two test cases to the test suite which check reading of a DNA and a protein MacVector file, and all works well! The code is available on the SVN trunk. cheers Scott From charles.parnot at gmail.com Thu Sep 20 18:26:57 2007 From: charles.parnot at gmail.com (Charles Parnot) Date: Thu, 20 Sep 2007 15:26:57 -0700 Subject: [Biococoa-dev] MacVector support In-Reply-To: <2D20567F-100C-4F0B-BB6E-2FBFF682258A@mac.com> References: <2D20567F-100C-4F0B-BB6E-2FBFF682258A@mac.com> Message-ID: <2BFB5710-BED0-44D3-8671-20DCB8A56344@gmail.com> Thanks, well done! On Sep 20, 2007, at 2:27 PM, Scott Christley wrote: > Thanks to Davide Cittaro, BioCocoa now has support for reading > MacVector sequence files. I've added two test cases to the test > suite which check reading of a DNA and a protein MacVector file, > and all works well! The code is available on the SVN trunk. > > cheers > Scott > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev -- Xgrid-at-Stanford Help science move fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford Charles Parnot charles.parnot at gmail.com From kvddrift at earthlink.net Thu Sep 20 21:07:51 2007 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 20 Sep 2007 21:07:51 -0400 Subject: [Biococoa-dev] MacVector support In-Reply-To: <2D20567F-100C-4F0B-BB6E-2FBFF682258A@mac.com> References: <2D20567F-100C-4F0B-BB6E-2FBFF682258A@mac.com> Message-ID: <6984140D-7E16-4F62-AF13-BD8BE406589A@earthlink.net> On Sep 20, 2007, at 5:27 PM, Scott Christley wrote: > Thanks to Davide Cittaro, BioCocoa now has support for reading > MacVector sequence files. I've added two test cases to the test > suite which check reading of a DNA and a protein MacVector file, > and all works well! The code is available on the SVN trunk. Thanks, guys. Keep 'em coming! ;-) I did notice that there are now two nested Data Files directories (for MacVector support) under Test-BCFoundation. But only in my Xcode project, not on my HD. Is this on purpose? cheers, - Koen. From schristley at mac.com Thu Sep 20 21:16:42 2007 From: schristley at mac.com (Scott Christley) Date: Thu, 20 Sep 2007 21:16:42 -0400 Subject: [Biococoa-dev] MacVector support In-Reply-To: <6984140D-7E16-4F62-AF13-BD8BE406589A@earthlink.net> References: <2D20567F-100C-4F0B-BB6E-2FBFF682258A@mac.com> <6984140D-7E16-4F62-AF13-BD8BE406589A@earthlink.net> Message-ID: <9EFFF1C1-0C3B-4C40-9B3B-11DCFEED58D9@mac.com> On Sep 20, 2007, at 9:07 PM, Koen van der Drift wrote: > > I did notice that there are now two nested Data Files directories > (for MacVector support) under Test-BCFoundation. But only in my > Xcode project, not on my HD. Is this on purpose? Oh, no that is a mistake in the Xcode project! Thanks for pointing that out, I've committed a corrected project. Scott From sweetcocoa at mac.com Fri Sep 21 03:57:36 2007 From: sweetcocoa at mac.com (Peter Schols) Date: Fri, 21 Sep 2007 09:57:36 +0200 Subject: [Biococoa-dev] biococoa svn and everything In-Reply-To: <3419726E-6BCC-4068-AE36-A6B03EB35F6A@mac.com> References: <9828231.1189601230166.JavaMail.root@elwamui-muscovy.atl.sa.earthlink.net> <5752D67B-7E27-43D5-BC57-D0EB37D8EF65@gmail.com> <3419726E-6BCC-4068-AE36-A6B03EB35F6A@mac.com> Message-ID: Hi Scott, Great to see the BioCocoa project being alive again! I would be happy to share admin access or give it to someone else While I'm quite busy with Undercover these days (and with a new microscopy app we are developing), I'm still very interested in BioCocoa and my (probably naive) dream is that I will become an active member again in the future. So while I don't have time to contribute code to the project at this time, I'd be very interested in helping out with smaller things and with spreading the word. I think the NAR article is a great idea, btw. best wishes, Peter On 19 Sep 2007, at 21:32, Scott Christley wrote: > Hello Charles! > > I'm not sure if I'm nominating myself to be project leader, seems a > bit ambitious for somebody new to the community, but I certainly > have the time and (most importantly) the desire to move BioCocoa > forward. What I worry about mostly is not losing the ability to > add new developers if and when they come along, I remember that I > tried to send an email to Peter Schols from bioinformatics.org and > it went into a black hole, I had to find another email to reach > him. He was responsive though (thanks Peter if you are out there) > once I got the email right, but if he has moved on maybe it would > be good to give some others admin access to the project? > > You are exactly right that there are a lot of avenues that can be > taken. I keep thinking to myself that BioCocoa can differentiate > itself by providing functionality not provided by the other > packages like BioPerl and BioJava. Not sure what a "killer" app > would be, one thing that I think would be very cool though is a > desktop genome browser (versus the web-based ones) which integrates > all the genome information with analysis tools. > > Has anybody thought about putting an article together and submit to > Nucleic Acids Research journal? Might be a good way to get a > little awareness as well as have a solid reference that research > articles can point to. > > cheers > Scott > > On Sep 13, 2007, at 12:48 PM, Charles Parnot wrote: > >> Hi there! >> >> I was one of the people that did the "moving on" thing >> >> I think there is a pretty strong basis in the framework, at least >> for the export/import tools, and then for the basics BCSequence >> stuff. I did set up the initial test suite, which I think would >> need to be updated and extended. >> >> When the project went into hibernation mode, the status was (at >> least from my point of view): >> >> * in search of a project leader, that would have some basic amount >> of time to make decision as to where to go, and do some coding too >> * needing a "killer" app to wrap the framework and put it to use. >> This is the only way things would be tested in the real world by >> real users. The killer app can be a simple sequence editor that >> expose as much as possible of the underlying framework >> * a design decision has to be made to allow 2 aspects of the >> framework to coexist: a core framework that provides the basic >> functionality; an extension mechanisms that allows people to >> easily contribute additional more specialized functionality (we >> had some talks for instance with Phil Seibel about how the NSImage >> and NSImageRepresentation design could inspire something. But >> really more thoughts need to be put into that, and nothing has >> been decided). The idea is that not everybody will be interested >> in the specialized stuff, so having optional modules would be a >> good thing. >> * one of the feature that was in the works was to add annotation/ >> feature to the basic BCSequence class >> >> so, a big roadmap, with lots of avenues ;-) >> >> And yes, if things start moving again, or a project leader self >> nominates, that would certainly warrant a post on macresearch. >> Hosting the project is also still a possibility, but that would >> mean some additional work for the project leader too in setting >> things up and maintaining it, as well as some kind of commitment >> for a reasonable amount of time. >> >> charles >> >> >> >> On Sep 12, 2007, at 5:47 AM, Koen van der Drift wrote: >> >>>> Is the BioCocoa project still "alive"? Don't get me wrong, I've >>>> received the last mail from the mailing list in... may 2007! How >>>> many >>>> people are involved? >>> >>> I don't think the project was ever officially shut down, however, >>> the people working on it about two years ago have moved on in >>> their lives, and have no more time to actively work on the >>> project. There were about 5 or 6 people actively involved. The >>> current released version 2.0 is more or less a good starting >>> point to use in apps, but there are also still many things >>> unfinished or missing. So any input is more than welcome! >>> >>> There was also some talk that macresearch.org would host the >>> project (giving us a lot of visibility), but I have not heard >>> about that in a long time. >>> >>> - Koen. >>> >>> >>> _______________________________________________ >>> Biococoa-dev mailing list >>> Biococoa-dev at bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> >> -- >> Xgrid-at-Stanford >> Help science move fast forward: >> http://cmgm.stanford.edu/~cparnot/xgrid-stanford >> >> Charles Parnot >> charles.parnot at gmail.com >> >> >> >> > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev From schristley at mac.com Fri Sep 21 14:42:45 2007 From: schristley at mac.com (Scott Christley) Date: Fri, 21 Sep 2007 14:42:45 -0400 Subject: [Biococoa-dev] file formats Message-ID: Can people send me small example sequence files in the various file formats that BioCocoa supports, preferable a protein example and a DNA example? Or know where I can get some example files? I would like to fill out the tests for BCSequenceReader. I have some of the formats but others I am not familiar with. readNCBIFile: readStriderFile: readGCKFile: readGDEFile: readPirFile: readMSFFile: readPhylipFile: thanks Scott -------------- next part -------------- An HTML attachment was scrubbed... URL: From schristley at mac.com Fri Sep 21 15:01:38 2007 From: schristley at mac.com (Scott Christley) Date: Fri, 21 Sep 2007 15:01:38 -0400 Subject: [Biococoa-dev] BCCachedSequenceFile Message-ID: <32F46635-6781-4713-BB02-1C86B19BF6EA@mac.com> I checked in this code a week or so again, but never got around to posting a message. I've added a new class, BCCachedSequenceFile, and a concrete implementation class, BCCachedFastaFile. The idea behind a cached sequence file is that the sequence file is too large to load up into memory, yet you want to be able to access the sequence data while it remains on disk. The design is a factor class, BCCachedSequenceFile, that defines the interface and returns a concrete implementation class, BCCachedFastaFile, that knows how to handle a specific file format. Currently I only have a FASTA class as it seems most genome data is provided that way. The implementation reads the sequence file and collects meta-data about each sequence in the file, where it starts, ends, length, etc. Then the data can be access by providing a sequence id and a position within the sequence. The class figures out a file offset of where that data resides, reads from disk and returns. I still would like to do some optimization to speed up file access and return chunks of data instead of just one symbol, but for now it works pretty good. It is not perfect of course, for FASTA it assumes that the line width within a sequence is constant, though it can vary from sequence to sequence in the file, but I think this is pretty typical for FASTA files. cheers Scott From kvddrift at earthlink.net Fri Sep 21 20:30:22 2007 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 21 Sep 2007 20:30:22 -0400 Subject: [Biococoa-dev] BCCachedSequenceFile In-Reply-To: <32F46635-6781-4713-BB02-1C86B19BF6EA@mac.com> References: <32F46635-6781-4713-BB02-1C86B19BF6EA@mac.com> Message-ID: <1D17DFB8-F7B7-4190-9845-6D384A43CD47@earthlink.net> Hi Scott, Thanks for adding these files, they seems very useful. I was thinking based on how you factored out the BCCachedFastaFile class, maybe we should do the same for BCSequenceReader as well? This makes it maybe a little easier to maintain and add other formats. Just a thought. Also, the way your new class is now set up is quite different from BCSequenceReader, the latter which returns an BCSequenceArray (even if there's only one sequence in the file). Is it possible to use a similar approach for BCCachedSequenceFile as well? I think we need to make sure that we use a consistent approach throughout the framework, not only for the developers, but also for the (possible) users. Again, just a thought. cheers, - Koen. On Sep 21, 2007, at 3:01 PM, Scott Christley wrote: > > I checked in this code a week or so again, but never got around to > posting a message. I've added a new class, BCCachedSequenceFile, > and a concrete implementation class, BCCachedFastaFile. The idea > behind a cached sequence file is that the sequence file is too > large to load up into memory, yet you want to be able to access the > sequence data while it remains on disk. The design is a factor > class, BCCachedSequenceFile, that defines the interface and returns > a concrete implementation class, BCCachedFastaFile, that knows how > to handle a specific file format. Currently I only have a FASTA > class as it seems most genome data is provided that way. The > implementation reads the sequence file and collects meta-data about > each sequence in the file, where it starts, ends, length, etc. > Then the data can be access by providing a sequence id and a > position within the sequence. The class figures out a file offset > of where that data resides, reads from disk and returns. I still > would like to do some optimization to speed up file access and > return chunks of data instead of just one symbol, but for now it > works pretty good. It is not perfect of course, for FASTA it > assumes that the line width within a sequence is constant, though > it can vary from sequence to sequence in the file, but I think this > is pretty typical for FASTA files. > > cheers > Scott > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev From schristley at mac.com Sat Sep 22 11:36:36 2007 From: schristley at mac.com (Scott Christley) Date: Sat, 22 Sep 2007 11:36:36 -0400 Subject: [Biococoa-dev] BCCachedSequenceFile In-Reply-To: <1D17DFB8-F7B7-4190-9845-6D384A43CD47@earthlink.net> References: <32F46635-6781-4713-BB02-1C86B19BF6EA@mac.com> <1D17DFB8-F7B7-4190-9845-6D384A43CD47@earthlink.net> Message-ID: <7CC614B4-1ECC-49A1-9314-3CD75A9EBE7D@mac.com> On Sep 21, 2007, at 8:30 PM, Koen van der Drift wrote: > Thanks for adding these files, they seems very useful. I was > thinking based on how you factored out the BCCachedFastaFile class, > maybe we should do the same for BCSequenceReader as well? This > makes it maybe a little easier to maintain and add other formats. > Just a thought. Yes, that is a good idea. Makes the interface simple and clean. One disadvantage is that it creates a lot of classes, but I guess that doesn't really matter. The same idea could also be applied to BCSequenceWriter, though it looks like only fasta output is supported now, no reason more formats aren't added in the future. > Also, the way your new class is now set up is quite different from > BCSequenceReader, the latter which returns an BCSequenceArray (even > if there's only one sequence in the file). Is it possible to use a > similar approach for BCCachedSequenceFile as well? I think we need > to make sure that we use a consistent approach throughout the > framework, not only for the developers, but also for the (possible) > users. Again, just a thought. I was thinking about this when first designing the class, and I agree I would like to go in this direction. The idea would be a subclass of BCSequence, like BCCachedSequence, that overrides methods to encapsulate interaction with the sequence file. What I haven't quite figured out yet is how to support all of the functionality in BCSequence. There are some design issues I'm still mulling over. For example, should each BCCachedSequence hold meta-data about that particular sequence (or all the sequences) in the file, should all of its interaction go strictly through BCCachedSequenceFile? Currently BCCachedSequenceFile isn't thread safe, and in the future I will want it to be as I expect genome-wide algorithms to take advantage of the multi-core Macs. Also, BCSequence is currently expensive for accessing single sequence data, constructing a BCSymbol just to get a character is a bit too much. So part of this would be to think how to extend BCSequence with more cache-friendly functionality. cheers Scott From charles.parnot at gmail.com Sat Sep 22 16:50:10 2007 From: charles.parnot at gmail.com (Charles Parnot) Date: Sat, 22 Sep 2007 13:50:10 -0700 Subject: [Biococoa-dev] BCCachedSequenceFile In-Reply-To: <7CC614B4-1ECC-49A1-9314-3CD75A9EBE7D@mac.com> References: <32F46635-6781-4713-BB02-1C86B19BF6EA@mac.com> <1D17DFB8-F7B7-4190-9845-6D384A43CD47@earthlink.net> <7CC614B4-1ECC-49A1-9314-3CD75A9EBE7D@mac.com> Message-ID: <0B14434C-B212-4017-A9B5-A2F7F8A39F6D@gmail.com> > Also, BCSequence is currently expensive for accessing single > sequence data, constructing a BCSymbol just to get a character is a > bit too much. So part of this would be to think how to extend > BCSequence with more cache-friendly functionality. Note that BCSymbol objects are uniqued and reused. So in fact a BCSequence is an array of pointers. I don't know what the status was on the BCSequence implementation, but one of the "todo" things was to switch from NSArray to C arrays behind the scenes, while still exposing NSArray too in the interface. But I think Koen did some of that transition?? charles -- Xgrid-at-Stanford Help science move fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford Charles Parnot charles.parnot at gmail.com From kvddrift at earthlink.net Sat Sep 22 17:05:31 2007 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 22 Sep 2007 17:05:31 -0400 Subject: [Biococoa-dev] BCCachedSequenceFile In-Reply-To: <0B14434C-B212-4017-A9B5-A2F7F8A39F6D@gmail.com> References: <32F46635-6781-4713-BB02-1C86B19BF6EA@mac.com> <1D17DFB8-F7B7-4190-9845-6D384A43CD47@earthlink.net> <7CC614B4-1ECC-49A1-9314-3CD75A9EBE7D@mac.com> <0B14434C-B212-4017-A9B5-A2F7F8A39F6D@gmail.com> Message-ID: On Sep 22, 2007, at 4:50 PM, Charles Parnot wrote: > I don't know what the status was on the BCSequence implementation, > but one of the "todo" things was to switch from NSArray to C arrays > behind the scenes, while still exposing NSArray too in the > interface. But I think Koen did some of that transition?? That should indeed be all in place. BCSequence now uses NSData to store characters. Only if needed for specific calculations, BCSymbols will be generated, through methods such as symbolArray and symbolAtIndex. - Koen. From kvddrift at earthlink.net Sat Sep 22 17:13:06 2007 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 22 Sep 2007 17:13:06 -0400 Subject: [Biococoa-dev] BCCachedSequenceFile In-Reply-To: <7CC614B4-1ECC-49A1-9314-3CD75A9EBE7D@mac.com> References: <32F46635-6781-4713-BB02-1C86B19BF6EA@mac.com> <1D17DFB8-F7B7-4190-9845-6D384A43CD47@earthlink.net> <7CC614B4-1ECC-49A1-9314-3CD75A9EBE7D@mac.com> Message-ID: On Sep 22, 2007, at 11:36 AM, Scott Christley wrote: > For example, should each BCCachedSequence hold meta-data about that > particular sequence (or all the sequences) in the file, should all > of its interaction go strictly through BCCachedSequenceFile? I am not sure if I understand what you mean by meta-data. But we have been talking on the list about adding a BCAnnotation and/or BCFeature class to hold additional information about a particular sequence. Actually, BCAnnotation is already a part of the framework, albeit non functional. - Koen. From kvddrift at earthlink.net Sun Sep 23 09:39:29 2007 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 23 Sep 2007 09:39:29 -0400 Subject: [Biococoa-dev] file formats In-Reply-To: References: Message-ID: On Sep 21, 2007, at 2:42 PM, Scott Christley wrote: > > Can people send me small example sequence files in the various file > formats that BioCocoa supports, preferable a protein example and a > DNA example? Or know where I can get some example files? I would > like to fill out the tests for BCSequenceReader. I have some of > the formats but others I am not familiar with. > Scott, Check out this website for many formats: http:// emboss.sourceforge.net/docs/themes/SequenceFormats.html It doesn't cover the 'ncbi' format that we use in BioCocoa, a sample of that is here: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html - Koen. From schristley at mac.com Mon Sep 24 12:20:36 2007 From: schristley at mac.com (Scott Christley) Date: Mon, 24 Sep 2007 12:20:36 -0400 Subject: [Biococoa-dev] BCCachedSequenceFile In-Reply-To: References: <32F46635-6781-4713-BB02-1C86B19BF6EA@mac.com> <1D17DFB8-F7B7-4190-9845-6D384A43CD47@earthlink.net> <7CC614B4-1ECC-49A1-9314-3CD75A9EBE7D@mac.com> Message-ID: The meta-data is essentially just information about the sequences in the file, so it is standard BCAnnotation stuff like the sequence identifier, but it is also info that BCCachedSequenceFile needs to work, like where does each sequence start in the file, how long is the sequence, etc. Then it can calculate a position directly into the file, and read the data from disk. This meta-data is internal to the concrete implementation class, BCCachedFastaFile, as the type of information needed to be stored may be different from file format to file format. So the design I was thinking of is BCCachedSequence would have a reference to its BCCachedSequenceFile, then when it needs data, it asks BCCachedSequenceFile for the data from disk. The difficulty is that many of the BCSequence methods perform operations or return data on the complete sequence, which is difficult when you cannot read the whole sequence in memory because it is cached on disk ... cheers Scott On Sep 22, 2007, at 5:13 PM, Koen van der Drift wrote: > > On Sep 22, 2007, at 11:36 AM, Scott Christley wrote: > >> For example, should each BCCachedSequence hold meta-data about >> that particular sequence (or all the sequences) in the file, >> should all of its interaction go strictly through >> BCCachedSequenceFile? > > I am not sure if I understand what you mean by meta-data. But we > have been talking on the list about adding a BCAnnotation and/or > BCFeature class to hold additional information about a particular > sequence. Actually, BCAnnotation is already a part of the > framework, albeit non functional. > > - Koen. From schristley at mac.com Mon Sep 24 12:39:30 2007 From: schristley at mac.com (Scott Christley) Date: Mon, 24 Sep 2007 12:39:30 -0400 Subject: [Biococoa-dev] BCCachedSequenceFile In-Reply-To: <7CC614B4-1ECC-49A1-9314-3CD75A9EBE7D@mac.com> References: <32F46635-6781-4713-BB02-1C86B19BF6EA@mac.com> <1D17DFB8-F7B7-4190-9845-6D384A43CD47@earthlink.net> <7CC614B4-1ECC-49A1-9314-3CD75A9EBE7D@mac.com> Message-ID: <4C9B5BA1-952F-4D65-B048-1362DABD2551@mac.com> Along this lines, the current BCSequenceReader is somewhat memory inefficient for medium to large sequences. For example, attempting to load in a 120 Mbp fasta file containing a few thousands sequences, I ran out of memory (and my machine has 6GB). The main issue was the way fasta files where parsed which creates lots of temporary strings; I have some code currently enabled which optimizes this but there could be more improvement. One definite improvement is not to automatically read in the whole file as a string. This tends to be automatically Unicode so doubles the size of the file in memory. It would be better I think to rework some of the readers to read directly from the file, and construct the NSData on the fly. cheers Scott On Sep 22, 2007, at 11:36 AM, Scott Christley wrote: > > On Sep 21, 2007, at 8:30 PM, Koen van der Drift wrote: > >> Thanks for adding these files, they seems very useful. I was >> thinking based on how you factored out the BCCachedFastaFile >> class, maybe we should do the same for BCSequenceReader as well? >> This makes it maybe a little easier to maintain and add other >> formats. Just a thought. > > Yes, that is a good idea. Makes the interface simple and clean. > One disadvantage is that it creates a lot of classes, but I guess > that doesn't really matter. The same idea could also be applied to > BCSequenceWriter, though it looks like only fasta output is > supported now, no reason more formats aren't added in the future. > From kvddrift at earthlink.net Mon Sep 24 12:52:15 2007 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 24 Sep 2007 12:52:15 -0400 (EDT) Subject: [Biococoa-dev] BCCachedSequenceFile Message-ID: <2601884.1190652736317.JavaMail.root@elwamui-sweet.atl.sa.earthlink.net> The current BCSequenceReader (and BCSequenceWriter) code is largely based on the way it was designed in the original BioCocoa framework that was written by Peter a couple of years ago. We updated it to work with BCSequence, but I don't think it has ever been tested for large files. So any improvement to read files more efficently is more than welcome. - Koen. -----Original Message----- >From: Scott Christley >Sent: Sep 24, 2007 12:39 PM >To: biococoa-dev at bioinformatics.org >Subject: Re: [Biococoa-dev] BCCachedSequenceFile > > >Along this lines, the current BCSequenceReader is somewhat memory >inefficient for medium to large sequences. For example, attempting >to load in a 120 Mbp fasta file containing a few thousands sequences, >I ran out of memory (and my machine has 6GB). The main issue was the >way fasta files where parsed which creates lots of temporary strings; >I have some code currently enabled which optimizes this but there >could be more improvement. > >One definite improvement is not to automatically read in the whole >file as a string. This tends to be automatically Unicode so doubles >the size of the file in memory. It would be better I think to rework >some of the readers to read directly from the file, and construct the >NSData on the fly. > >cheers >Scott > >On Sep 22, 2007, at 11:36 AM, Scott Christley wrote: > >> >> On Sep 21, 2007, at 8:30 PM, Koen van der Drift wrote: >> >>> Thanks for adding these files, they seems very useful. I was >>> thinking based on how you factored out the BCCachedFastaFile >>> class, maybe we should do the same for BCSequenceReader as well? >>> This makes it maybe a little easier to maintain and add other >>> formats. Just a thought. >> >> Yes, that is a good idea. Makes the interface simple and clean. >> One disadvantage is that it creates a lot of classes, but I guess >> that doesn't really matter. The same idea could also be applied to >> BCSequenceWriter, though it looks like only fasta output is >> supported now, no reason more formats aren't added in the future. >> > >_______________________________________________ >Biococoa-dev mailing list >Biococoa-dev at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/biococoa-dev From kellert at ohsu.edu Mon Sep 24 12:49:30 2007 From: kellert at ohsu.edu (Thomas Keller) Date: Mon, 24 Sep 2007 09:49:30 -0700 Subject: [Biococoa-dev] BCCachedSequenceFile In-Reply-To: <4C9B5BA1-952F-4D65-B048-1362DABD2551@mac.com> References: <32F46635-6781-4713-BB02-1C86B19BF6EA@mac.com> <1D17DFB8-F7B7-4190-9845-6D384A43CD47@earthlink.net> <7CC614B4-1ECC-49A1-9314-3CD75A9EBE7D@mac.com> <4C9B5BA1-952F-4D65-B048-1362DABD2551@mac.com> Message-ID: <889E972A-1425-4A83-814E-3B6FA4400C45@ohsu.edu> Do you need any ab1 files (direct output from the ABI 3130xl)? regards, Tom K Thomas J Keller PhD kellert at ohsu.edu 4-2442 On Sep 24, 2007, at 9:39 AM, Scott Christley wrote: > > Along this lines, the current BCSequenceReader is somewhat memory > inefficient for medium to large sequences. For example, attempting > to load in a 120 Mbp fasta file containing a few thousands > sequences, I ran out of memory (and my machine has 6GB). The main > issue was the way fasta files where parsed which creates lots of > temporary strings; I have some code currently enabled which > optimizes this but there could be more improvement. > > One definite improvement is not to automatically read in the whole > file as a string. This tends to be automatically Unicode so > doubles the size of the file in memory. It would be better I think > to rework some of the readers to read directly from the file, and > construct the NSData on the fly. > > cheers > Scott > > On Sep 22, 2007, at 11:36 AM, Scott Christley wrote: > >> >> On Sep 21, 2007, at 8:30 PM, Koen van der Drift wrote: >> >>> Thanks for adding these files, they seems very useful. I was >>> thinking based on how you factored out the BCCachedFastaFile >>> class, maybe we should do the same for BCSequenceReader as well? >>> This makes it maybe a little easier to maintain and add other >>> formats. Just a thought. >> >> Yes, that is a good idea. Makes the interface simple and clean. >> One disadvantage is that it creates a lot of classes, but I guess >> that doesn't really matter. The same idea could also be applied >> to BCSequenceWriter, though it looks like only fasta output is >> supported now, no reason more formats aren't added in the future. >> > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev From kvddrift at earthlink.net Mon Sep 24 12:57:59 2007 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 24 Sep 2007 12:57:59 -0400 (EDT) Subject: [Biococoa-dev] BCCachedSequenceFile Message-ID: <21957723.1190653079629.JavaMail.root@elwamui-sweet.atl.sa.earthlink.net> The nice thing about BCSequence is that it can hold any (part of a) sequence, not necessarily a complete one. There are a bunch of methods available that will allow to create a subsequence, maybe you can use that? I also noticed that your BCCachedSequenceFile class creates a reverse sequence. FYI, BCSequence can already do this. - Koen. -----Original Message----- >From: Scott Christley >Sent: Sep 24, 2007 12:20 PM >To: Koen van der Drift >Cc: biococoa-dev at bioinformatics.org >Subject: Re: [Biococoa-dev] BCCachedSequenceFile > > >The meta-data is essentially just information about the sequences in >the file, so it is standard BCAnnotation stuff like the sequence >identifier, but it is also info that BCCachedSequenceFile needs to >work, like where does each sequence start in the file, how long is >the sequence, etc. Then it can calculate a position directly into >the file, and read the data from disk. This meta-data is internal to >the concrete implementation class, BCCachedFastaFile, as the type of >information needed to be stored may be different from file format to >file format. > >So the design I was thinking of is BCCachedSequence would have a >reference to its BCCachedSequenceFile, then when it needs data, it >asks BCCachedSequenceFile for the data from disk. The difficulty is >that many of the BCSequence methods perform operations or return data >on the complete sequence, which is difficult when you cannot read the >whole sequence in memory because it is cached on disk ... > >cheers >Scott > >On Sep 22, 2007, at 5:13 PM, Koen van der Drift wrote: > >> >> On Sep 22, 2007, at 11:36 AM, Scott Christley wrote: >> >>> For example, should each BCCachedSequence hold meta-data about >>> that particular sequence (or all the sequences) in the file, >>> should all of its interaction go strictly through >>> BCCachedSequenceFile? >> >> I am not sure if I understand what you mean by meta-data. But we >> have been talking on the list about adding a BCAnnotation and/or >> BCFeature class to hold additional information about a particular >> sequence. Actually, BCAnnotation is already a part of the >> framework, albeit non functional. >> >> - Koen. > From schristley at mac.com Mon Sep 24 17:39:36 2007 From: schristley at mac.com (Scott Christley) Date: Mon, 24 Sep 2007 17:39:36 -0400 Subject: [Biococoa-dev] BioCocoa Applications Message-ID: <97F29A26-242D-4FDA-B025-C2D7B1F165B1@mac.com> What do people think about creating a source repository with community donated applications that use BioCocoa? I'm thinking that having the BioCocoa library is great, but still people are required to write their own applications on top of it. Some could be sample applications, but I suspect that others would be useful full-fledged apps that maybe focus on specific area of analysis, etc. I certainly have some end-user oriented tools that I would like to provide, but don't have anyplace to put them except create a new project somewhere. cheers Scott From charles.parnot at gmail.com Mon Sep 24 17:53:58 2007 From: charles.parnot at gmail.com (Charles Parnot) Date: Mon, 24 Sep 2007 14:53:58 -0700 Subject: [Biococoa-dev] BioCocoa Applications Message-ID: > > What do people think about creating a source repository with > community donated applications that use BioCocoa? > > I'm thinking that having the BioCocoa library is great, but still > people are required to write their own applications on top of it. > Some could be sample applications, but I suspect that others would > be useful full-fledged apps that maybe focus on specific area of > analysis, etc. I certainly have some end-user oriented tools that > I would like to provide, but don't have anyplace to put them except > create a new project somewhere. > > cheers > Scott I agree that there is nothing better than some real-world app using the framework, to get the framework in the best shape. A lot of design and optimizations in the framework will then be triggered by real issues in real apps, not just what we think could be better. These apps can also serve as extra testing tools, in addition to the automated tests that are built in the framework itself. charles -- Xgrid-at-Stanford Help science move fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford Charles Parnot charles.parnot at gmail.com From schristley at mac.com Wed Sep 26 11:14:05 2007 From: schristley at mac.com (Scott Christley) Date: Wed, 26 Sep 2007 11:14:05 -0400 Subject: [Biococoa-dev] BioCocoa Applications In-Reply-To: <04A9DB3C-333A-420E-8CCD-8423371DF825@gmail.com> References: <97F29A26-242D-4FDA-B025-C2D7B1F165B1@mac.com> <04A9DB3C-333A-420E-8CCD-8423371DF825@gmail.com> Message-ID: <2F609EBD-2A38-4FDC-8141-E433F71CDEB4@mac.com> I agree as well. The question though is how to set up the SVN repository to support this properly. I'm more familiar with CVS than SVN, I understand the concepts of branches and tags but it is not clear to me how this works with SVN. I need to read up on this. Preferably people should be able to SVN the BioCocoa core framework without getting other stuff; likewise with the applications, they should be able to SVN either all the applications or just specific ones they are interested in. Any ideas on how to set up the repository? Would the current repository need to be re-structured to support separate applications? Thinking with my CVS mind, I would consider making the repository look something like this: trunk/ BioCocoa/ Applications/ MyApp/ YourApp/ thanks Scott On Sep 24, 2007, at 5:53 PM, Charles Parnot wrote: >> >> What do people think about creating a source repository with >> community donated applications that use BioCocoa? >> >> I'm thinking that having the BioCocoa library is great, but still >> people are required to write their own applications on top of it. >> Some could be sample applications, but I suspect that others would >> be useful full-fledged apps that maybe focus on specific area of >> analysis, etc. I certainly have some end-user oriented tools that >> I would like to provide, but don't have anyplace to put them >> except create a new project somewhere. >> >> cheers >> Scott > > I agree that there is nothing better than some real-world app using > the framework, to get the framework in the best shape. A lot of > design and optimizations in the framework will then be triggered by > real issues in real apps, not just what we think could be better. > > These apps can also serve as extra testing tools, in addition to > the automated tests that are built in the framework itself. > > charles > > -- > Xgrid-at-Stanford > Help science move fast forward: > http://cmgm.stanford.edu/~cparnot/xgrid-stanford > > Charles Parnot > charles.parnot at gmail.com > > > > From charles.parnot at gmail.com Wed Sep 26 12:23:26 2007 From: charles.parnot at gmail.com (Charles Parnot) Date: Wed, 26 Sep 2007 09:23:26 -0700 Subject: [Biococoa-dev] BioCocoa Applications In-Reply-To: <2F609EBD-2A38-4FDC-8141-E433F71CDEB4@mac.com> References: <97F29A26-242D-4FDA-B025-C2D7B1F165B1@mac.com> <04A9DB3C-333A-420E-8CCD-8423371DF825@gmail.com> <2F609EBD-2A38-4FDC-8141-E433F71CDEB4@mac.com> Message-ID: <04EE5A75-4C88-47B3-9AA0-B25740D9478B@gmail.com> SVN is much simpler than CVS, once you understand more of the basics. It is fairly easy to grasp those concepts. I really recommand reading the free SVN book online, particularly the parts explaining the 'philopsophy' of the system. By convention, and because it works well this way, you want to have a 'trunk', a 'tags' and a 'branches' directory for each project. The current svn tree is: BioCocoa/ trunk/ tags/ branches/ I would not recommand changing that too much (though svn makes that easy, it might still be confusing when going back to older revisions). Since the trunk directory contains all the BioCocoa framework code directly, with no other subdirectory, I would not recommand having the apps in there. Instead, I would suggest adding an additional directory under the root, called Applications: BioCocoa/ trunk/ tags/ branches/ Applications/ Then, you have these 2 options: *Option 1: BioCocoa/ trunk/ tags/ branches/ Applications/ MyApp/ trunk/ tags/ branches/ YourApp/ trunk/ tags/ branches/ *Option 2: BioCocoa/ trunk/ tags/ branches/ Applications/ trunk/ MyApp/ YourApp/ tags/ MyApp/ YourApp/ branches/ MyApp/ YourApp/ I would have a slight preference for Option 1, but it really does not matter that much, and there is no technical reason that I foresee why one option is better than the other. You might give it more thoughts, and maybe there would be some technical reasons why one option is better than the other. Again, I recommand following the svn convention because: (1) it works, (2) anybody familiar with svn will be instantly confortable. hope that helps! charles On Sep 26, 2007, at 8:14 AM, Scott Christley wrote: > > I agree as well. The question though is how to set up the SVN > repository to support this properly. I'm more familiar with CVS > than SVN, I understand the concepts of branches and tags but it is > not clear to me how this works with SVN. I need to read up on this. > > Preferably people should be able to SVN the BioCocoa core framework > without getting other stuff; likewise with the applications, they > should be able to SVN either all the applications or just specific > ones they are interested in. Any ideas on how to set up the > repository? Would the current repository need to be re-structured > to support separate applications? > > Thinking with my CVS mind, I would consider making the repository > look something like this: > > trunk/ > BioCocoa/ > Applications/ > MyApp/ > YourApp/ > > > thanks > Scott > > > On Sep 24, 2007, at 5:53 PM, Charles Parnot wrote: > >>> >>> What do people think about creating a source repository with >>> community donated applications that use BioCocoa? >>> >>> I'm thinking that having the BioCocoa library is great, but still >>> people are required to write their own applications on top of >>> it. Some could be sample applications, but I suspect that others >>> would be useful full-fledged apps that maybe focus on specific >>> area of analysis, etc. I certainly have some end-user oriented >>> tools that I would like to provide, but don't have anyplace to >>> put them except create a new project somewhere. >>> >>> cheers >>> Scott >> >> I agree that there is nothing better than some real-world app >> using the framework, to get the framework in the best shape. A lot >> of design and optimizations in the framework will then be >> triggered by real issues in real apps, not just what we think >> could be better. >> >> These apps can also serve as extra testing tools, in addition to >> the automated tests that are built in the framework itself. >> >> charles >> >> -- >> Xgrid-at-Stanford >> Help science move fast forward: >> http://cmgm.stanford.edu/~cparnot/xgrid-stanford >> >> Charles Parnot >> charles.parnot at gmail.com >> >> >> >> > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev -- Xgrid-at-Stanford Help science move fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford Charles Parnot charles.parnot at gmail.com From kvddrift at earthlink.net Wed Sep 26 20:53:44 2007 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 26 Sep 2007 20:53:44 -0400 Subject: [Biococoa-dev] biococoa svn and everything In-Reply-To: References: <9828231.1189601230166.JavaMail.root@elwamui-muscovy.atl.sa.earthlink.net> Message-ID: On Sep 12, 2007, at 4:35 PM, Scott Christley wrote: > BCSuffixArray, implements a suffix array data structure, a memory > efficient structure that allows for string matching operations. > > BCMCP, this uses BCSuffixArray and BCCachedSequenceFile to perform > the ultraconserved analysis which is essentially looking for > longest common substrings across the whole genome for multiple > organisms. Hi Scott, I noticed you did a lot of updates recently with these class - this is great. Form your comments in the code I have a hard time understanding what exactly a suffix array is (I'm an analytical chemist that mainly works with proteins, so pardon my ignorance ;-) Maybe you could add some more info in the source code to help better understand these classes? Another question I have is why you are using calls such as fopen, fread, etc instead of the methods that Obj-C and Cocoa provide for I/ O. Mind you, I am just trying to understand the code, it's no criticism at all. cheers, - Koen.