From a.griekspoor at nki.nl Wed Jun 1 04:34:24 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Wed, 1 Jun 2005 10:34:24 +0200 Subject: [Biococoa-dev] WWDC 2005 BioCocoa meeting In-Reply-To: <18e91c9ac9913d957d0eff831c45a2d8@earthlink.net> References: <667464FDA2C81D4CA79D7F3B728D10E73C15F3@adsrv100.nki.nl> <18e91c9ac9913d957d0eff831c45a2d8@earthlink.net> Message-ID: <7D98C7DB-05D4-4A71-B6F1-8B9255C32A9D@nki.nl> Hi Guys, Sorry for replying so late, packed my bags and from thursday BC will be a main focus finally again ;-) Great input on the presentation, I will try to summarize all emails below. Let's continue the discussions and I propose to those in SF next week to meet and prepare all stuff on monday and tuesday. So here we go: > I want to give some suggestions for the presentation. >> Topics being covered during presentation: >> - introduction to BioCocoa > Phil: Motivation, why we are doing this. Especially why cocoa and > not using biojava or other frameworks. Good point! > Charles: first and all: this is meant to be a COCOA framework, that > will strictly follow the conventions of the rest of Cocoa; after > all, this is the whole point if we don't want to be just "yet > another BioXXX project" Yep. > Koen: We need to think of some good arguments why users want to use > BioCocoa instead of eg BioPerl or BioJava. Both are already very > advanced, and work on OS X, so why would a user want to jump into a > very early BioCocoa? Of course, it is a Cocoa/ObjC framework, so > it's very easily to implement into native programs. However, as we > already have discussed, because we stuff everything in NSObject > subclasses, we add quite some overhead. Maybe this is also an issue > to discuss at WWDC. I think it's clear what we have to discuss in the intro ;-) I've said it before often during our discussions, I wish I could talk to people from Apple/Next about the way they dealt with object overhead, and would like to know what goes on under the hood when you use NSString or NSAttributedString for example. It would be nice to find that out at the WWDC... >> - framework structure and layout > Koen: I think one thing that could be discussed is our current > framework design. We've already gone through a couple of iterations > and I think we rather not want to go through that again ;). > However, it will be at the base of the rest of the code, and I > think it would be good to get some input on it. Now is the time to > make adjustments before we start adding a lot more code to it. Absolutely, this is a great opportunity to discuss the design choices we made. According to Robert Kehrer a few Apple staff people would be present, and it would be a great to ask their opinion on these matters, especially people from the cocoa frameworks. Perhaps we can all try to speak about these issues when we meet the cocoa engineers in for instance the hands-on-labs. > Phil: This is what i especially want to talk about. I have many > many things to contribute to the framework and we definitly need to > discuss where to place it. Yep, see above. > Charles: An other important idea related to that: a framework like > this had two sides: the inside and the outside(!). The inside = the > implementation details, the behind-the-scenes under-the-hood stuff. > The outside = the public interface. The WWDC might be a great > occasion to define the public interface more precisely and decide > what the potential users need. Maybe the details of the > implementation are not very important to discuss too much. WE > developers could discuss that between us, but this is not the thing > we will get useful feedback in just a couple of hours from an > external audience. I am not saying we should not talk at all about > it, but that we should just discuss it in general. Very true, perhaps it's nice to see it it in the following way, the BioCocoa meeting might be ideal to focus indeed on the outside. At the same time, it would again be great if everyone of us could discuss the inside with experienced people like Apple engineers. This info can then be fed back in our discussions. >> - future plans > Phil: This is the point we have to discuss in the list. what > exactly everybody wants the framework to be. > So i will start: > Basic datastructures shoud be provided: > Sequences, Alignments, Annotations, Protein Structures and more > Classes to access common databases like genbank, pdb etc. > I want to provide HMMs for sequence analysis > Koen: For me the things that need to be added are I/O of various > sequences, with the addition of reading them from a database, maybe > by using webkit? Also features and annotations are important, at > least for me. > Charles: > - I/O for different sequence format : this is the original primary > goal of BioCocoa and indeed the most important, because nobody > wants to have to deal with that. Imagine if you had to read jpg > files the old way and implement your own decompression algorithm > when using Cocoa! > - the BCSequence et al classes : right now, they are not very > powerful (on the outside!), and the interface is very simple; but > the core of it is there and quite nice now (and with tests!!); > there are still two questions about the design: do we need mutable/ > immutable classes? do we need to go to CoreData? > - the annotations: big thing... choosing a standard like the one > proposed by Alex? maybe we could get some feedback and ideas from > the audience. > - some prebuilt NSViews for GUI app... at this point, they should > not be too elaborate, but could be useful to have something to work > with and test things and get a better sense of what BioCocoa can > become > - the tools: BioPerl is a lot about wrapping the plethora of > existing CLI tools for biology; is this what we want to do too? it > does not look like it, but this is worth discussing maybe; it seems > from the mailing list and what people are doing is that we want to > integrate some of the most common algorithms in the code > (alignments, digests, sequence searches,...). However, do we still > want to provide an interface to other more specialized tools, or is > it not the point of BioCocoa? or something in the far future? > Koen: Also I would like to suggest that we don't yet implement > CoreData, since many people (including me) are still running 10.3. > Or at least make it an extension. I agree, and another thing discuss how to do this during the WWDC. True, great suggestions! >> - Q&A, discussion > Peter: Please let us know how we will/can divide the work. I'm > willing to help with the organisation of the presentation and > discussion. Let's prepare the definitive presentation at the WWDC together, including the division of who will tell what. I can try to make a template beforehand, but usually that's changed very easy later of necessary. > Peter: Are we also supposed to demo BC? ;-) That would be great, so yes, the BLAST demo would be nice! Furthermore, we have the peptides demo and the seqIO demo... > > Charles: One thing that came to my mind first was that maybe we > need to define what the BioCocoa framework could be used for. Give > some very real examples, in particular the applications that some > of you guys are developing. My own personal virtual pet project > would be a general sequence editing program ?-la DNA Strider... > Maybe other people at the WWDC have projects and ideas in mind. > Introduction to the talk could be about that: examples of potential > and existing applications, and how BioCocoa can speed up the > development and provide an open and consistent interface. Good plan! John: Great news on the presentation ? I?m just disappointed I won?t be able to go to California to meet with the rest of you. If any of you have stopovers in NYC on the way to California, though, get in touch.I'm sorry that I have not been able to pursue more a possible sponsorship, it's a pity you can't be there John. Well, next time we make it work. > I spent the weekend too loaded up on flu medication to actually do > any coding, but I threw together a quick start of a description of > the Foundation classes. If any of you find it useful for getting > the presentation started, please use anything you like. If you > find it incoherent, I blame the medicine. > Enjoy the trip, and send me a full report of WWDC. Hope they give > you a neat gift at the keynote - Great work on the descriptions!!! That will be certainly helpful!! We'll keep you all informed on the things happening at WWDC! > Koen: Shall I announce it also on Apple's scitech mailing list? Yes, please do! > Could you or Alex confirm what will be the right time and place? It's at 7pm, in the room of the WWDC Science Connection. It should appear on the website, but I haven't seen it yet. Ok, let me know what you think and what is there to be added to the things brought up. I propose to send all emails about where, when and what with regard to the meeting to the list so everyone present will know the plans. Perhaps it's nice to email all of you our cell phone numbers so we can get in contact if necessary (off list of course ;-). I can gather them first and send them to all wwdc participants... CU soon guys! Cheers, Alex > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kvddrift at earthlink.net Wed Jun 1 22:01:09 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 1 Jun 2005 22:01:09 -0400 Subject: [Biococoa-dev] gone for a while Message-ID: <6452007981d463a21d245146e26e33da@earthlink.net> Hi, I will be offline for a while. I am moving today, and need to get a new internet connection. Good luck with the WWDC preparations, and have fun! I hope to be back in a week or so. cheers, - Koen. From a.griekspoor at nki.nl Tue Jun 7 03:00:20 2005 From: a.griekspoor at nki.nl (a.griekspoor at nki.nl) Date: Tue, 7 Jun 2005 09:00:20 +0200 Subject: [Biococoa-dev] WWDC BioCocoa meeting Message-ID: <667464FDA2C81D4CA79D7F3B728D10E73C1610@adsrv100.nki.nl> Hi guys, After discussing things with Robert Kehrer we decided that due to the shifted Design Awards, we will do our presentation at 6.30 instead of 7. We only have one hour, so we should start on time and keep it short. After the design awards and stump the experts we can have a discussion about the issues raised during the meeting and the things we're all concerned with. Later the week, it might be a great idea to talk with those at WWDC somewhat more and also hang out in the performance labs. So, still wednesday but half an hour earlier. Tomorrow we'll try to compile the presentation. Robert will try to get it on the email that is send to everyone. CU tomorrow! Cheers, Alex ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From kvddrift at earthlink.net Tue Jun 7 10:46:39 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 7 Jun 2005 09:46:39 -0500 (GMT-05:00) Subject: [Biococoa-dev] WWDC BioCocoa meeting Message-ID: <16322731.1118155600023.JavaMail.root@wamui-darkeyed.atl.sa.earthlink.net> Again, too bad I cannot be with you guys to help out. I am looking forward to a full report of the meeting, including the presentation :) cheers, - Koen. -----Original Message----- From: a.griekspoor at nki.nl Sent: Jun 7, 2005 2:00 AM To: biococoa-dev at bioinformatics.org Subject: [Biococoa-dev] WWDC BioCocoa meeting Hi guys, After discussing things with Robert Kehrer we decided that due to the shifted Design Awards, we will do our presentation at 6.30 instead of 7. We only have one hour, so we should start on time and keep it short. After the design awards and stump the experts we can have a discussion about the issues raised during the meeting and the things we're all concerned with. Later the week, it might be a great idea to talk with those at WWDC somewhat more and also hang out in the performance labs. So, still wednesday but half an hour earlier. Tomorrow we'll try to compile the presentation. Robert will try to get it on the email that is send to everyone. CU tomorrow! Cheers, Alex ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** _______________________________________________ Biococoa-dev mailing list Biococoa-dev at bioinformatics.org https://bioinformatics.org/mailman/listinfo/biococoa-dev From kvddrift at earthlink.net Tue Jun 7 10:59:44 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 7 Jun 2005 09:59:44 -0500 (GMT-05:00) Subject: [Biococoa-dev] random thoughts Message-ID: <32106759.1118156384814.JavaMail.root@wamui-darkeyed.atl.sa.earthlink.net> Hi all, Right now I am at a mass spec scientific meeting in San Antonio (www.asms.org), and just saw an impressive presentation about modified nucleotides in tRNA (characterized by mass spec). One thing I want to start working on when I am back is the introduction of modifications of BCSymbols. Not sure yet how to implement this, but a possibility could be to further subclass BCSymbol to a BCModification class. Each BCSymbol can then have a NSArray of BCModifications (which in turn can have modifications too). The modifications are then read from the data that is stored in BCFeatures. In the airplane on my way to Texas I was sitting next to one of the pioneers of writing protein identification software through peptide mapping. He explained that one of the ways to speed up their algorithm was to use many threads. Not sure yet how the use of multiple threads will work in BioCocoa, but it might well be an approach to speed up algorithms that require a lot of calculations. Another thing I recently though about is that the new class BCSecondaryStructure could be extended to DNA/RNA as well. let me know what you think on these issues. cheers, - Koen. From charles.parnot at stanford.edu Tue Jun 7 12:07:18 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Tue, 7 Jun 2005 09:07:18 -0700 Subject: [Biococoa-dev] WWDC BioCocoa meeting In-Reply-To: <667464FDA2C81D4CA79D7F3B728D10E73C1610@adsrv100.nki.nl> References: <667464FDA2C81D4CA79D7F3B728D10E73C1610@adsrv100.nki.nl> Message-ID: At 9:00 AM +0200 6/7/05, wrote: >Hi guys, > >After discussing things with Robert Kehrer we decided that due to the shifted Design Awards, we will do our presentation at 6.30 instead of 7. We only have one hour, so we should start on time and keep it short. After the design awards and stump the experts we can have a discussion about the issues raised during the meeting and the things we're all concerned with. Later the week, it might be a great idea to talk with those at WWDC somewhat more and also hang out in the performance labs. So, still wednesday but half an hour earlier. Tomorrow we'll try to compile the presentation. Robert will try to get it on the email that is send to everyone. CU tomorrow! >Cheers, >Alex I will probably arrive late, so don't wait for me! see you tomorrow :-) charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at gmail.com Wed Jun 8 20:37:46 2005 From: charles.parnot at gmail.com (Charles Parnot) Date: Wed, 8 Jun 2005 17:37:46 -0700 Subject: [Biococoa-dev] Need to get in WWDC for the BioCocoa meeting ;-) Message-ID: <76B6EFD0-9F9C-4F95-976A-C89461E39141@gmail.com> Hi, I left a message on Robert Kehrer phone, but I have not gotten a call back yet, so I am sending an email before leaving to make sure I will be able to get in when I arrive at the Moscone Center. Alex, can you contact him, too? thanks! Help science move fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford Charles Parnot charles.parnot at gmail.com From jtimmer at bellatlantic.net Fri Jun 10 12:08:15 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 10 Jun 2005 12:08:15 -0400 Subject: [Biococoa-dev] WWDC In-Reply-To: <32106759.1118156384814.JavaMail.root@wamui-darkeyed.atl.sa.earthlink.net> Message-ID: Just wanted to say congratulations to Peter for his near miss in the student category this year. How'd the presentation go? John _______________________________________________ This mind intentionally left blank From peter.schols at bio.kuleuven.be Sun Jun 12 10:07:21 2005 From: peter.schols at bio.kuleuven.be (Peter Schols) Date: Sun, 12 Jun 2005 16:07:21 +0200 Subject: [Biococoa-dev] WWDC In-Reply-To: References: Message-ID: <55943B05-DC29-4E87-A8A1-5F1356056013@bio.kuleuven.be> Hi John, Thanks! The presentation was quite short and without demos as we did not have a projector. We displayed the presentation on a 30" display in the science connection. The presentation was more intended as a general introduction to start the discussion. Some good point were raised during discussion, e.g. about performance (where do we go for easy Obj-C and when do we have to choose fast C?) and about choices we have to make (do we target BC towards sequence analysis or towards genome annotation). These two points were raised by non-BC people. I think the remainder of the points raised were from BC people and were things that we already touched on on the list. I had to leave the discussion a bit early, however, (due to the design awards) so I can't comment on all points. Best, peter On 10 Jun 2005, at 18:08, John Timmer wrote: > Just wanted to say congratulations to Peter for his near miss in > the student > category this year. > > How'd the presentation go? > > John > > > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > From kvddrift at earthlink.net Sat Jun 18 20:02:24 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 18 Jun 2005 20:02:24 -0400 Subject: [Biococoa-dev] WWDC In-Reply-To: <55943B05-DC29-4E87-A8A1-5F1356056013@bio.kuleuven.be> References: <55943B05-DC29-4E87-A8A1-5F1356056013@bio.kuleuven.be> Message-ID: On Jun 12, 2005, at 10:07 AM, Peter Schols wrote: > The presentation was quite short and without demos as we did not have > a projector. We displayed the presentation on a 30" display in the > science connection. The presentation was more intended as a general > introduction to start the discussion. Some good point were raised > during discussion, e.g. about performance (where do we go for easy > Obj-C and when do we have to choose fast C?) and about choices we have > to make (do we target BC towards sequence analysis or towards genome > annotation). These two points were raised by non-BC people. I think > the remainder of the points raised were from BC people and were things > that we already touched on on the list. I had to leave the discussion > a bit early, however, (due to the design awards) so I can't comment on > all points. > Ahh, finally back online :) First, Peter, congrats with your award! There seems to be a lot of talent in the Benelux :) Anyway, back to WWDC and the BioCocoa meeting. Are there any 'decisions' made during the WWDC? I'm sure you guys discussed a lot, so there must be some thughts, new insights, input from others, etc? Was the meeting useful? Please give some more insight to those few poor suckers who couldn't make it :) In the meantime I have been thinking a little about the current structure of classes. With the new classes that have been added by Phillip, and the upcoming modification classes, I think we can add a new superclass, namely BCObject. We cannot let BCAtom, or BCModification subclass from BCSymbol. However, these classes also have a name, character, mass, etc. So it seems logical to create a superclass, from which all the building blocks can derive. I have created a small UML file in OmniGraffle to illustarte this. However, I have no clue where to put BCResidue. Phillip, could you elaborate on what you intended with this class? Similarly, Phillip's sequence classes need to be in sync with the BCSequence class cluster. I also created a UML file for this. There is BCSequenceStructure, BCChain, etc. Any ideas anyone? the two UML files are attached in a zip file. cheers, - Koen. -------------- next part -------------- A non-text attachment was scrubbed... Name: BC UML.zip Type: application/zip Size: 10544 bytes Desc: not available URL: -------------- next part -------------- From kvddrift at earthlink.net Fri Jun 24 19:09:27 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 24 Jun 2005 19:09:27 -0400 Subject: [Biococoa-dev] BCObject In-Reply-To: References: <55943B05-DC29-4E87-A8A1-5F1356056013@bio.kuleuven.be> Message-ID: <4e3279f73783abb64b581f8d5d5defd7@earthlink.net> > In the meantime I have been thinking a little about the current > structure of classes. With the new classes that have been added by > Phillip, and the upcoming modification classes, I think we can add a > new superclass, namely BCObject. We cannot let BCAtom, or > BCModification subclass from BCSymbol. However, these classes also > have a name, character, mass, etc. So it seems logical to create a > superclass, from which all the building blocks can derive. I have > created a small UML file in OmniGraffle to illustarte this. However, I > have no clue where to put BCResidue. Phillip, could you elaborate on > what you intended with this class? > > Similarly, Phillip's sequence classes need to be in sync with the > BCSequence class cluster. I also created a UML file for this. There is > BCSequenceStructure, BCChain, etc. > Anyone object if I add the BCObject class that acts as a root for BCSymbol, BCAtom, BCAbstractSequence, BCSequenceStructure, ...? - Koen. From jtimmer at bellatlantic.net Fri Jun 24 19:26:06 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 24 Jun 2005 19:26:06 -0400 Subject: [Biococoa-dev] BCObject In-Reply-To: <4e3279f73783abb64b581f8d5d5defd7@earthlink.net> Message-ID: >> In the meantime I have been thinking a little about the current >> structure of classes. With the new classes that have been added by >> Phillip, and the upcoming modification classes, I think we can add a >> new superclass, namely BCObject. We cannot let BCAtom, or >> BCModification subclass from BCSymbol. However, these classes also >> have a name, character, mass, etc. So it seems logical to create a >> superclass, from which all the building blocks can derive. I have >> created a small UML file in OmniGraffle to illustarte this. However, I >> have no clue where to put BCResidue. Phillip, could you elaborate on >> what you intended with this class? >> >> Similarly, Phillip's sequence classes need to be in sync with the >> BCSequence class cluster. I also created a UML file for this. There is >> BCSequenceStructure, BCChain, etc. >> > > > Anyone object if I add the BCObject class that acts as a root for > BCSymbol, BCAtom, BCAbstractSequence, BCSequenceStructure, ...? > The name implies that it's going to be the root of everything we do, which may not be the case. But I certainly couldn't think of a better one at the moment.... _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Sat Jun 25 08:20:30 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 25 Jun 2005 08:20:30 -0400 Subject: [Biococoa-dev] BCObject In-Reply-To: References: Message-ID: <28369de06b2714ced518d9b5a0fedb28@earthlink.net> On Jun 24, 2005, at 7:26 PM, John Timmer wrote: > The name implies that it's going to be the root of everything we do, > which > may not be the case. But I certainly couldn't think of a better one > at the > moment.... > Good point, I didn't think of that. We could call it BCStructuralObject, so it implies it will only be the superclass of building blocks and sequences. And not of tools, etc. - Koen. From kvddrift at earthlink.net Tue Jun 28 20:15:30 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 28 Jun 2005 20:15:30 -0400 Subject: [Biococoa-dev] SequenceIO Message-ID: <0f0c72a4058c0e08d806a15929ad91be@earthlink.net> Hi, Where is everyone? Enjoying a vacation, or hard at work, or passed out from the heatwave :) I started thinking again about the IO classes. Right now, BCSequenceReader returns a dictionary containing one or more sequences as the values, and either a description, or title as the keys. This will allow that files containing multiple sequences can be read into the dictionary. Accessing the sequences is not so straightforward. Basically now the user first needs get the key for the sequence value from an array of keys, and then use that key to obtain the sequence from the dictionary. This seems rather cumbersome, I think. Therefore I propose that BCSequenceReader simply returns an array of objects. We can either store BCSequence objects in the array or create some kind of wrapper for each sequence, eg a new SequenceIO class. Annotations and features are now handled in the BCSequence class, so can be added in the IO code. So for a simple fasta class we would have an array of sequences with one annotation, with the key @">" and the value whatever string follows the first line. For a more complicated sequence-format, eg SwissProt, basically all annotations are read in line by line, using the file-specific keys (@"ID", @"AC", @"DT" etc). Then when it hits the sequence, we can create a BCSequence object, and at the end store the annotations in the BCSequence. I suggest the keys should be whatever the fileformat uses, but or somecommon annotations, like author, organism, we could supply some more human readable accessor methods. cheers, - Koen. From charles.parnot at gmail.com Wed Jun 29 00:33:49 2005 From: charles.parnot at gmail.com (Charles Parnot) Date: Tue, 28 Jun 2005 21:33:49 -0700 Subject: [Biococoa-dev] SequenceIO In-Reply-To: <0f0c72a4058c0e08d806a15929ad91be@earthlink.net> References: <0f0c72a4058c0e08d806a15929ad91be@earthlink.net> Message-ID: <4A25C5E4-CAB8-4A45-A5C2-EE88B9BD5E04@gmail.com> On Jun 28, 2005, at 5:15 PM, Koen van der Drift wrote: > Hi, > > Where is everyone? Enjoying a vacation, or hard at work, or passed > out from the heatwave :) working on the Xgrid stuff for Tiger... Version 0.1 will be done this week, so at least I will be able to use all these computers again for my calculations!! > I started thinking again about the IO classes. Right now, > BCSequenceReader returns a dictionary containing one or more > sequences as the values, and either a description, or title as the > keys. This will allow that files containing multiple sequences can > be read into the dictionary. Accessing the sequences is not so > straightforward. Basically now the user first needs get the key for > the sequence value from an array of keys, and then use that key to > obtain the sequence from the dictionary. This seems rather > cumbersome, I think. > > Therefore I propose that BCSequenceReader simply returns an array > of objects. We can either store BCSequence objects in the array or > create some kind of wrapper for each sequence, eg a new SequenceIO > class. Annotations and features are now handled in the BCSequence > class, so can be added in the IO code. It seems natural that the BCSequenceReader should now return BCSequence objects. Yes, totally for it. The annotations will come with it. One of the problem at this point is we have not fully decided on a strong clear BCAnnotation object. Alex has started something, but I don't think he was done yet? > So for a simple fasta class we would have an array of sequences > with one annotation, with the key @">" and the value whatever > string follows the first line. For a more complicated sequence- > format, eg SwissProt, basically all annotations are read in line by > line, using the file-specific keys (@"ID", @"AC", @"DT" etc). Then > when it hits the sequence, we can create a BCSequence object, and > at the end store the annotations in the BCSequence. I suggest the > keys should be whatever the fileformat uses, but or somecommon > annotations, like author, organism, we could supply some more human > readable accessor methods. Getting equivalent for the keys can easily be added later, and will be needed anyway. And, yes, keeping the annotations specific for the format is fine for now. I guess, we should add a 'originalFileFormat' key or something to help with key identifications. > cheers, > > - Koen. I will be back soon with more stuff... At least, one of us is working on BioCocoa! charles -- Xgrid-at-Stanford Help science move fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford Charles Parnot charles.parnot at gmail.com From charles.parnot at gmail.com Wed Jun 29 01:10:33 2005 From: charles.parnot at gmail.com (Charles Parnot) Date: Tue, 28 Jun 2005 22:10:33 -0700 Subject: [Biococoa-dev] getting my own posts Message-ID: I recently switched to Mail.app, and I don't get my own posts anymore. Is there some way to get that back? thanks! charles -- Charles Parnot charles at confometrx.com From kvddrift at earthlink.net Wed Jun 29 06:21:47 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 29 Jun 2005 06:21:47 -0400 Subject: [Biococoa-dev] getting my own posts In-Reply-To: References: Message-ID: On Jun 29, 2005, at 1:10 AM, Charles Parnot wrote: > I recently switched to Mail.app, and I don't get my own posts anymore. > Is there some way to get that back? > > You can set that on https://bioinformatics.org/mailman/options/biococoa-dev (need to login to go to your own settings). - Koen. From kvddrift at earthlink.net Wed Jun 29 06:32:03 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 29 Jun 2005 06:32:03 -0400 Subject: [Biococoa-dev] SequenceIO In-Reply-To: <4A25C5E4-CAB8-4A45-A5C2-EE88B9BD5E04@gmail.com> References: <0f0c72a4058c0e08d806a15929ad91be@earthlink.net> <4A25C5E4-CAB8-4A45-A5C2-EE88B9BD5E04@gmail.com> Message-ID: <1ca9c6121c8b52fb395682f176b1ddd5@earthlink.net> On Jun 29, 2005, at 12:33 AM, Charles Parnot wrote: > One of the problem at this point is we have not fully decided on a > strong clear BCAnnotation object. Alex has started something, but I > don't think he was done yet? > Right now the BCAnnotation object mimics a dictionary. And one could argue whether we should not just use a wrapper for a dictionary. But that's not the most important issue. Basically, they are stored in BCSequence as another dictionary, using the key of the annotation as the key and a BCAnnotation object as value. Maybe it is easier this way when looking up annotations, but it seems overcomplicated to me. If we keep a annotations wrapper object, why not store them in an NSArray? - Koen. From charles.parnot at gmail.com Wed Jun 29 12:32:33 2005 From: charles.parnot at gmail.com (Charles Parnot) Date: Wed, 29 Jun 2005 09:32:33 -0700 Subject: [Biococoa-dev] getting my own posts In-Reply-To: References: Message-ID: <15DBBD00-4BF3-4C3B-82CA-9EFD12BE80E2@gmail.com> On Jun 29, 2005, at 3:21 AM, Koen van der Drift wrote: > > On Jun 29, 2005, at 1:10 AM, Charles Parnot wrote: > > >> I recently switched to Mail.app, and I don't get my own posts >> anymore. Is there some way to get that back? >> >> >> > > You can set that on https://bioinformatics.org/mailman/options/ > biococoa-dev (need to login to go to your own settings). > > - Koen. Thanks, Koen, but it is in fact already set up this way... It used to work with my stanford.edu address. In fact, I found the answer; this is gmail behavior and it is very annoying: http://gmail.google.com/support/bin/answer.py?answer=10314&topic=133 "Mail you send or forward to a mailing list you subscribe to, or to an account that forwards messages to your Gmail account, will only appear in 'Sent Mail.' This is intended to help prevent clutter in your inbox. If a message isn't successfully delivered, you'll receive an error message in your inbox." sorry for the pollution! charles -- Xgrid-at-Stanford Help science move fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford Charles Parnot charles.parnot at gmail.com From charles.parnot at gmail.com Wed Jun 29 13:02:14 2005 From: charles.parnot at gmail.com (Madeleine Parnot) Date: Wed, 29 Jun 2005 10:02:14 -0700 Subject: [Biococoa-dev] test - sorry for polluting again the list Message-ID: <0B367B9A-7C7C-4ABA-BF04-6EF506081CFE@gmail.com> I am trying again to send a message to myself From charles.parnot at gmail.com Wed Jun 29 13:11:15 2005 From: charles.parnot at gmail.com (Charles Parnot) Date: Wed, 29 Jun 2005 10:11:15 -0700 Subject: [Biococoa-dev] SequenceIO In-Reply-To: <1ca9c6121c8b52fb395682f176b1ddd5@earthlink.net> References: <0f0c72a4058c0e08d806a15929ad91be@earthlink.net> <4A25C5E4-CAB8-4A45-A5C2-EE88B9BD5E04@gmail.com> <1ca9c6121c8b52fb395682f176b1ddd5@earthlink.net> Message-ID: On Jun 29, 2005, at 3:32 AM, Koen van der Drift wrote: > > On Jun 29, 2005, at 12:33 AM, Charles Parnot wrote: > > >> One of the problem at this point is we have not fully decided on a >> strong clear BCAnnotation object. Alex has started something, but >> I don't think he was done yet? >> >> > > Right now the BCAnnotation object mimics a dictionary. And one > could argue whether we should not just use a wrapper for a > dictionary. But that's not the most important issue. I think that encapsulating the BCAnnotation is a good thing. At this point, it is a dictionary, but it could change in the future and keep the same interface/header, so that the framework users don't have to change their code. > Basically, they are stored in BCSequence as another dictionary, > using the key of the annotation as the key and a BCAnnotation > object as value. Maybe it is easier this way when looking up > annotations, but it seems overcomplicated to me. If we keep a > annotations wrapper object, why not store them in an NSArray? It is not really complicated. It is redundant, yes. But maybe redundancy in this case is also convenient, as we can quickly access the list of annotation names without looping through the NSArray. Though there would convenience methods to do that even with NSArray, e.g. KVO and 'valueForKeyPath'. In any case, one of the thing we agreed on at the WWDC (and you would not know, sorry :-), is that there probably won't be a performance issue with annotations, so the way we do it does not really matter so much. NSArray, NSDictionary: tomato, tomato. So the bottom line is: I will do whatever the majority decides on this one. charles From kvddrift at earthlink.net Wed Jun 29 18:15:47 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 29 Jun 2005 18:15:47 -0400 Subject: [Biococoa-dev] SequenceIO In-Reply-To: References: <0f0c72a4058c0e08d806a15929ad91be@earthlink.net> <4A25C5E4-CAB8-4A45-A5C2-EE88B9BD5E04@gmail.com> <1ca9c6121c8b52fb395682f176b1ddd5@earthlink.net> Message-ID: <3a136128101c9189b6a11dbae4332fcc@earthlink.net> On Jun 29, 2005, at 1:11 PM, Charles Parnot wrote: >> Right now the BCAnnotation object mimics a dictionary. And one could >> argue whether we should not just use a wrapper for a dictionary. But >> that's not the most important issue. > > I think that encapsulating the BCAnnotation is a good thing. At this > point, it is a dictionary, but it could change in the future and keep > the same interface/header, so that the framework users don't have to > change their code. Good point, I will leave it as it is. > > >> Basically, they are stored in BCSequence as another dictionary, using >> the key of the annotation as the key and a BCAnnotation object as >> value. Maybe it is easier this way when looking up annotations, but >> it seems overcomplicated to me. If we keep a annotations wrapper >> object, why not store them in an NSArray? > > It is not really complicated. It is redundant, yes. But maybe > redundancy in this case is also convenient, as we can quickly access > the list of annotation names without looping through the NSArray. > Though there would convenience methods to do that even with NSArray, > e.g. KVO and 'valueForKeyPath'. Complicated was indeed a wrong word choice. Again, I will leave this as it is. Changing the code was not that difficult and I will commit the files soon, so everyone can see what is going on. That being said, I am running in the following problem. Some file formats have many lines with annotations, eg the test2.txt file in the Translation example. As you can see some lines have the same identifier (DT, OC, etc). If I use that as the key, the final dictionary wil only contain the last line, because it will override existing keys. I can think of a few solutions. First is what I do now, is to append the values to the existing one, leaving only one line with each identifier. This works fine, but could give problems if we want to write the files out, because we don't know where the different lines begin and end. We could of course put some kind of marker inbetween the strings, so whe know where each next one begins. Another solution could be to assign numbers to identifiers with multiple lines, ID1, ID2, ID3, etc. Problem here is that this will give preblems when searching for a specific key. My preference would be now the first solution, but if anyone has a better suggestion, please shout. Another issue are nested annotations. Again see the test2.txt file and look for RN (for reference). It is followed by a set of identifiers for the references, and then is followed by another reference. I guess I could put the subannotations in a new dictionary, and put those in the content of the RN annotation. A similar issue can be found in ncbi files (see test4.txt) > > In any case, one of the thing we agreed on at the WWDC (and you would > not know, sorry :-), Still waiting for minutes and the presentation ;-) > is that there probably won't be a performance issue with annotations, > so the way we do it does not really matter so much. NSArray, > NSDictionary: tomato, tomato. So the bottom line is: I will do > whatever the majority decides on this one. It should be not so difficult to change, so for now let's stick with what we have. cheers, - Koen. From charles.parnot at gmail.com Wed Jun 29 18:35:18 2005 From: charles.parnot at gmail.com (Charles Parnot) Date: Wed, 29 Jun 2005 15:35:18 -0700 Subject: [Biococoa-dev] SequenceIO In-Reply-To: <3a136128101c9189b6a11dbae4332fcc@earthlink.net> References: <0f0c72a4058c0e08d806a15929ad91be@earthlink.net> <4A25C5E4-CAB8-4A45-A5C2-EE88B9BD5E04@gmail.com> <1ca9c6121c8b52fb395682f176b1ddd5@earthlink.net> <3a136128101c9189b6a11dbae4332fcc@earthlink.net> Message-ID: <820BD5CD-3349-460F-8A1D-000119F48628@gmail.com> > Changing the code was not that difficult and I will commit the > files soon, so everyone can see what is going on. That being said, > I am running in the following problem. Some file formats have many > lines with annotations, eg the test2.txt file in the Translation > example. As you can see some lines have the same identifier (DT, > OC, etc). If I use that as the key, the final dictionary wil only > contain the last line, because it will override existing keys. I > can think of a few solutions. First is what I do now, is to append > the values to the existing one, leaving only one line with each > identifier. This works fine, but could give problems if we want to > write the files out, because we don't know where the different > lines begin and end. We could of course put some kind of marker > inbetween the strings, so whe know where each next one begins. > Another solution could be to assign numbers to identifiers with > multiple lines, ID1, ID2, ID3, etc. Problem here is that this will > give preblems when searching for a specific key. My preference > would be now the first solution, but if anyone has a better > suggestion, please shout. Yes, concetenating all the lines, separated by a new-line, seems very reasonable, and easy to revert. You can use 'componentsJoinedByString' and componentsSeparatedByString', using @"\n" as the separator (...or @"\r"???). > Another issue are nested annotations. Again see the test2.txt file > and look for RN (for reference). It is followed by a set of > identifiers for the references, and then is followed by another > reference. I guess I could put the subannotations in a new > dictionary, and put those in the content of the RN annotation. A > similar issue can be found in ncbi files (see test4.txt) Nested annotations are a big issue, particularly regarding sequence position. We have to come up with something good... thanks, Koen, for all the work! charles -- Xgrid-at-Stanford Help science move fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford Charles Parnot charles.parnot at gmail.com From kvddrift at earthlink.net Wed Jun 29 20:02:40 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 29 Jun 2005 20:02:40 -0400 Subject: [Biococoa-dev] SequenceIO In-Reply-To: <820BD5CD-3349-460F-8A1D-000119F48628@gmail.com> References: <0f0c72a4058c0e08d806a15929ad91be@earthlink.net> <4A25C5E4-CAB8-4A45-A5C2-EE88B9BD5E04@gmail.com> <1ca9c6121c8b52fb395682f176b1ddd5@earthlink.net> <3a136128101c9189b6a11dbae4332fcc@earthlink.net> <820BD5CD-3349-460F-8A1D-000119F48628@gmail.com> Message-ID: On Jun 29, 2005, at 6:35 PM, Charles Parnot wrote: > Yes, concetenating all the lines, separated by a new-line, seems very > reasonable, and easy to revert. You can use 'componentsJoinedByString' > and componentsSeparatedByString', using @"\n" as the separator (...or > @"\r"???). Of course, we are assuming that the content is an NSSttring, which might not be the case in all cases. In general it will be, but it might as well be another BCAnnotation object, or a number. For now, I will code it as being a string, but this is something that we need to be aware of. Unless I miss the point completely :) - Koen.