From jtimmer at bellatlantic.net Thu Aug 5 10:45:46 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 05 Aug 2004 10:45:46 -0400 Subject: [Biococoa-dev] Design question In-Reply-To: <1B789D0A-E2CD-11D8-A3E4-00039345483C@bio.kuleuven.ac.be> Message-ID: I'm going to be writing two methods: One gives a list of all ORFs over a certain size given the size and a DNA sequence, the second will list all sites in a sequence, given a site and the sequence. The question is: how to return the list? I could see three options: The sites could be handled as an NSIndexSet, but that won't work for the ORFs and is 10.3 only. Another option would be to store Ranges as NSValues and return an array of them. This would be very convenient internally, but wouldn't allow convenient saving of the information, since NSValues would need to be encoded before saving. The final thing would be to add a category to NSDictionary that would add the methods "storeRangeInDictionary" and "retrieveRangeFromDictionary" that would just make length and location keys in a dictionary. Does anyone have a preference about how to handle it? Cheers, John _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Thu Aug 5 14:10:29 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 5 Aug 2004 20:10:29 +0200 Subject: [Biococoa-dev] Design question In-Reply-To: References: Message-ID: Hi John and others, My thoughts on this issue, starting with replying the points you brought up. > I could see three options: > The sites could be handled as an NSIndexSet, but that won't work for > the > ORFs and is 10.3 only. As we discussed, 10.2 will be the target OS, I guess that leaves this option. > Another option would be to store Ranges as NSValues and return an > array of > them. This would be very convenient internally, but wouldn't allow > convenient saving of the information, since NSValues would need to be > encoded before saving. True, but wait to see my answer on that later. > The final thing would be to add a category to NSDictionary that would > add > the methods "storeRangeInDictionary" and "retrieveRangeFromDictionary" > that > would just make length and location keys in a dictionary. I think it is handy to add a number of these categories anyway, but then name them according to the general scheme, which would be -rangeForKey: and -setRange:forKey: Similarly, I often have to add -rectForKey, colorForKey myself, it would be nice to add all these to the framework. > Does anyone have a preference about how to handle it? My option is not there ;-) Option 4 would be the thing I tend to end up with everytime I tackle these kind of problems, and is a solution both very Cocoa like, and very BioPerl/BioJava like, that is to model everything to classes. My opionion is to write a class for each "module" there is in the real world. In this case, a "restriction enzyme class", a "digestion class", a "cut class", a "dna fragment" class (which can be just the general "dna sequence class") etc. I have used it in the upcoming version of EnzymeX and have attached the header files of to classes to give you an idea. The nice thing is that it's now easy to extend things and model things. For instance, to do the drawing of all cuts in a plasmid I run into the problem that I had (to) many enzymes drawing overlappingly in multiple cloning sites. My multiple cloning site class was the solution, as I could separate the "cuts" into in- and outside multiple cloning sites and draw both categories differently. Also the attached digest class (which represents a cut fragment) is very elegant as it allows sorting on length, cutposition etc. The nice thing is that you can add these things to arrays and dictionaries as well. To further comment on the attached classes. Of course NSCoding support should be added, which is simple and add direct reading and writing of arrays and dictionaries that contain these objects. Also, I should better stick to keyvalue coding (it's 90% now) as it is the basis for bindings. Don't look to much to the class nomenclature here, we should clearly pick better names. Finally, see how easy it is to add stuff like sorting for instance: [NSArray sortedArrayUsingSelector: @selector(sortResultsOnLengthDescending:)]; returns an array with all fragments sorted on size. Also, I have added the - (NSString *) description; method which makes debugging so much easier. The implementation: - (NSString *) description{ if([self nrOfCuts] > 0) return [NSString stringWithFormat: @"EXMapMCS: %d %@ ---- %d %@, %d cuts", [[self firstCut]position], [[self firstCut]enzyme], [[self lastCut]position], [[self lastCut]enzyme], [self nrOfCuts]]; else return @"EXMapMCS: empty"; } Now you can just call on an array: NSLog(@"%@", [myArrayWithFragments objectAtIndex: 0]); and it shows you all details of the first object in the array. One could think implementing many methods in these objects like stringRepresentation: for a DNA sequence class, or -complement; Finally, the really nice thing which saved a lot of time for EnzymeX was adding the - (BOOL)hitTest: (int)pos; method. I wanted users to be able to select a fragment or cut by clicking. Now it's simply a matter of asking each fragment in the array in a loop if it is "hit", and there you go. Of course, we should give the method a different, more general name, like -containsPosition; or startsAt: These are the socalled convenience methods, which make life, well a convenient ;-) All in all, I think this option will give us a versatile and super flexible/extensible framework. Everyone can add small, but handy methods without worrying about breaking stuff. Moreover, it allows easy passing of objects throughout the framework. For instance, say we wanted to add alignments, as input we would use our own objects. The seqio controller knows how to write these/convert these, etc. This also leaves plenty of room for attaching bindings, exporting to indexsets etc. This is what in my opinion the BCFoundation should look like, and I think is something also the other bio... frameworks have choosen. Many classes representing all kinds of instances we use. This is also the way Cocoa's foundation works (nsstring, nsrect, nsdictionary, nsarray, nsrange etc). The only question left is, do we need a controllerobject for certain tasks. For instance, a "Alignment controller", a "Digestion controller"? I think that that wouldn't be such a bad idea. Normally this goes into your application's code, like EnzymeX is doing all drawing, supervises the hit test, digestion, etc. As we are writing a framework, we can't rely on other people writing this, it would simply be to much work to figure out how we, the biococoa developers, would like to see how they should do it. Therefore, we have to do that ourselves as well, to use the framework one would only need to instantiate a controller which does all the logic behind the scene. The user would just have to tell the "Digestion controller the sequence, enzymes etc, and he would get back a fragment array. That is what my ideal world would look like. That leaves the question, how to implement this all, where to start. I think we should try to get a basic foundation first, the simple objects. We should also decide what to call them, perhaps we could fill in the scheme John made in further detail. Maybe for compatibility reasons largely based on what BioJava/Perl have done, of course only if that makes sense, and we should keep the Cocoa names as templates. I am willing to prototype a few of these classes based on the attached header below, if you guys think that's a proper basis of course. If that's done, we can start adding methods like the ones John was describing, which should be rewritten to use our BCFoundation objects. The nice thing is that others can safely add methods to the objects without breaking the code John add in. Also very nice, John can add convenience methods in those classes as well as he finds out that these can help simplifying his ORF finding method. Well, enough writing, I'm sure this enough to comment on for everyone ;-) I'm curious what you guys think of this, let me know if you think differently or have something to add. After that we can talk about the practical implementation... Cheers, Alex // // EXMapDigest.h // EnzymeX // // Created by Alexander Griekspoor on Fri Nov 07 2003. // Copyright (c) 2003 __MyCompanyName__. All rights reserved. // #import @interface EXMapDigest : NSObject { // ======================================================================== === #pragma mark --- VARIABLES & PROPERTIES // ======================================================================== === int start; int end; int startcut5; int endcut5; int constructlength; NSString *startEnzyme; NSString *endEnzyme; int startCutPosition; int endCutPosition; BOOL isSelected; } // ======================================================================== === #pragma mark --- INIT & DEALLOC // ======================================================================== === - (id)init; - (void)dealloc; // ======================================================================== === #pragma mark --- ACCESSOR METHODS // ======================================================================== === - (int)start; - (void)setStart:(int)newStart; - (int)end; - (void)setEnd:(int)newEnd; - (int)startcut5; - (void)setStartcut5:(int)newStartcut5; - (int)endcut5; - (void)setEndcut5:(int)newEndcut5; - (int)constructlength; - (void)setConstructlength:(int)newConstructlength; - (NSString *)startEnzyme; - (void)setStartEnzyme:(NSString *)newStartEnzyme; - (NSString *)endEnzyme; - (void)setEndEnzyme:(NSString *)newEndEnzyme; - (BOOL)isSelected; - (void)setIsSelected:(BOOL)newIsSelected; // ======================================================================== === #pragma mark --- GENERAL METHODS // ======================================================================== === - (NSString *) description; - (int)startCutPosition; - (int)endCutPosition; - (int)length; - (float)percentage; - (BOOL)hitTest: (int)pos; // ======================================================================== === #pragma mark --- UTILITY & CONVERTER METHODS // ======================================================================== === - (NSComparisonResult)sortResultsOnPositionDescending:(EXMapDigest*) dig; - (NSComparisonResult)sortResultsOnPositionAscending:(EXMapDigest*) dig; - (NSComparisonResult)sortResultsOnLengthDescending:(EXMapDigest*) dig; - (NSComparisonResult)sortResultsOnLengthAscending:(EXMapDigest*) dig; @end // // EXMapMCS.h // EnzymeX // // Created by Alexander Griekspoor on Fri Nov 07 2003. // Copyright (c) 2003 __MyCompanyName__. All rights reserved. // #import @class EXMapCut; @interface EXMapMCS : NSObject { // ======================================================================== === #pragma mark --- VARIABLES & PROPERTIES // ======================================================================== === NSMutableArray *cuts; NSRect rect; } // ======================================================================== === #pragma mark --- INIT & DEALLOC // ======================================================================== === - (id)init; - (void)dealloc; // ======================================================================== === #pragma mark --- ACCESSOR METHODS // ======================================================================== === - (NSMutableArray *)cuts; - (void)setCuts:(NSMutableArray *)newCuts; - (NSRect)rect; - (void)setRect:(NSRect)newRect; - (int)nrOfCuts; - (EXMapCut *)firstCut; - (EXMapCut *)lastCut; - (void)addCut: (EXMapCut *)cut; - (void)removeAllCuts; // ======================================================================== === #pragma mark --- GENERAL METHODS // ======================================================================== === - (NSString *) description; // ======================================================================== === #pragma mark --- UTILITY & CONVERTER METHODS // ======================================================================== === - (NSComparisonResult)sortMCSOnPositionDescending:(EXMapMCS*) mcs; - (NSComparisonResult)sortMCSOnPositionAscending:(EXMapMCS*) mcs; - (NSComparisonResult)sortMCSOnCountDescending:(EXMapMCS*) mcs; - (NSComparisonResult)sortMCSOnCountAscending:(EXMapMCS*) mcs; @end Op 5-aug-04 om 16:45 heeft John Timmer het volgende geschreven: > I'm going to be writing two methods: One gives a list of all ORFs > over a > certain size given the size and a DNA sequence, the second will list > all > sites in a sequence, given a site and the sequence. > > The question is: how to return the list? > > I could see three options: > The sites could be handled as an NSIndexSet, but that won't work for > the > ORFs and is 10.3 only. > Another option would be to store Ranges as NSValues and return an > array of > them. This would be very convenient internally, but wouldn't allow > convenient saving of the information, since NSValues would need to be > encoded before saving. > The final thing would be to add a category to NSDictionary that would > add > the methods "storeRangeInDictionary" and "retrieveRangeFromDictionary" > that > would just make length and location keys in a dictionary. > > Does anyone have a preference about how to handle it? > > Cheers, > > John > > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 18225 bytes Desc: not available URL: From mek at mekentosj.com Thu Aug 5 14:21:31 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 5 Aug 2004 20:21:31 +0200 Subject: [Biococoa-dev] Design question In-Reply-To: References: Message-ID: <464EC21A-E70C-11D8-8873-000393CFDE0C@mekentosj.com> Just to illustrate the principle: look at http://www.biojava.org/docs/api/index.html and lookup their RestrictionEnzyme class in the "All Classes" list on the left. It's exactly the way I propose to do as well. Also click on the "Use" button on top of the page once you have the RestrictionEnzyme class in opened. You'll see that they have "controller objects" as well. We could make a SharedRestrictionEnzymeManager as well, one that could return all listed enzymes for instance, or which you could ask for details about a specific enzyme. Another example: Go again to the mainpage which the link above points to. Check out the core biological class "org.biojava.bio.seq" and you'll see that they have a class called DNATools and RNATools. This is where a lot of general methods could go into. Finally, it would be an idea as well to add a number of protocols that would group certain related classes (like NSCoding, NSCopying, does for the foundation frameworks). Again, enough stuff to think about and a lot of design work, but these guys at BioJava have done many things along the lines we were thinking as well, and it seems to work very nicely... Alex Op 5-aug-04 om 16:45 heeft John Timmer het volgende geschreven: > I'm going to be writing two methods: One gives a list of all ORFs > over a > certain size given the size and a DNA sequence, the second will list > all > sites in a sequence, given a site and the sequence. > > The question is: how to return the list? > > I could see three options: > The sites could be handled as an NSIndexSet, but that won't work for > the > ORFs and is 10.3 only. > Another option would be to store Ranges as NSValues and return an > array of > them. This would be very convenient internally, but wouldn't allow > convenient saving of the information, since NSValues would need to be > encoded before saving. > The final thing would be to add a category to NSDictionary that would > add > the methods "storeRangeInDictionary" and "retrieveRangeFromDictionary" > that > would just make length and location keys in a dictionary. > > Does anyone have a preference about how to handle it? > > Cheers, > > John > > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From jtimmer at bellatlantic.net Thu Aug 5 18:54:18 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 05 Aug 2004 18:54:18 -0400 Subject: [Biococoa-dev] Design question In-Reply-To: <464EC21A-E70C-11D8-8873-000393CFDE0C@mekentosj.com> Message-ID: Not bothering to quote everything Alex said, but general message I got is: Since I'm writing something that's very basic (find a site in DNA), I shouldn't worry too much about savability. Any object that understands DNA can use this method to get the information it needs. It will then be responsible for saving any such information it receives. Given that, I'm going to make the methods for site finding and ORF finding return arrays of NSNumbers and NSValues holding NSRanges, respectively. How that information gets saved is not the concern of these methods. That sound okay? John _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Thu Aug 5 19:00:38 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 6 Aug 2004 01:00:38 +0200 Subject: [Biococoa-dev] Design question In-Reply-To: References: Message-ID: <44956E61-E733-11D8-8873-000393CFDE0C@mekentosj.com> Indeed John, Those methods should do perfect in that way, savability should be a later concern as we further setup the foundation, we can always make it work with those objects by then... Alex By the way, I'm pretty sure that you can put NSRanges in arrays and dictionaries natively already (without converting them to NSNumbers/Values first), at least I believe I've done that in EnzymeX.... Op 6-aug-04 om 0:54 heeft John Timmer het volgende geschreven: > Not bothering to quote everything Alex said, but general message I got > is: > > Since I'm writing something that's very basic (find a site in DNA), I > shouldn't worry too much about savability. > > Any object that understands DNA can use this method to get the > information > it needs. It will then be responsible for saving any such information > it > receives. > > > Given that, I'm going to make the methods for site finding and ORF > finding > return arrays of NSNumbers and NSValues holding NSRanges, > respectively. How > that information gets saved is not the concern of these methods. > > That sound okay? > > John > > > > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* From jtimmer at bellatlantic.net Thu Aug 5 19:22:05 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 05 Aug 2004 19:22:05 -0400 Subject: [Biococoa-dev] Design question In-Reply-To: <44956E61-E733-11D8-8873-000393CFDE0C@mekentosj.com> Message-ID: A more fundamental question - Do we want the values to assume the first base is position 0 (as in an NSString) or 1 (as in most views of DNA sequences)? _______________________________________________ This mind intentionally left blank From james.balhoff at duke.edu Thu Aug 5 23:42:27 2004 From: james.balhoff at duke.edu (Jim Balhoff) Date: Thu, 5 Aug 2004 22:42:27 -0500 Subject: [Biococoa-dev] Design question In-Reply-To: <44956E61-E733-11D8-8873-000393CFDE0C@mekentosj.com> References: <44956E61-E733-11D8-8873-000393CFDE0C@mekentosj.com> Message-ID: Hey Alex, On Aug 5, 2004, at 6:00 PM, Alexander Griekspoor wrote: > Indeed John, > Those methods should do perfect in that way, savability should be a > later concern as we further setup the foundation, we can always make > it work with those objects by then... > Alex > > By the way, I'm pretty sure that you can put NSRanges in arrays and > dictionaries natively already (without converting them to > NSNumbers/Values first), at least I believe I've done that in > EnzymeX.... > Unfortunately, NSRange is a C struct and not an Objective-C object, so it can't go into collections on its own. I find it kind of confusing, but I guess it improves performance. - Jim ____________________________________________ James P. Balhoff Dept. of Biology Duke University Durham, NC 27708-0338 USA -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2373 bytes Desc: not available URL: From mek at mekentosj.com Fri Aug 6 02:28:28 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 6 Aug 2004 08:28:28 +0200 Subject: [Biococoa-dev] Design question In-Reply-To: References: <44956E61-E733-11D8-8873-000393CFDE0C@mekentosj.com> Message-ID: You're right Jim, You indeed need [NSValue valueWithPoint:aRange] to do this. I must have been mistaken, but I'm absolutely sure that not to long ago I had an incidence where I thought "hey, can you do this directly?!", clearly that must have been something else then... Alex Op 6-aug-04 om 5:42 heeft Jim Balhoff het volgende geschreven: > Hey Alex, > > On Aug 5, 2004, at 6:00 PM, Alexander Griekspoor wrote: > >> Indeed John, >> Those methods should do perfect in that way, savability should be a >> later concern as we further setup the foundation, we can always make >> it work with those objects by then... >> Alex >> >> By the way, I'm pretty sure that you can put NSRanges in arrays and >> dictionaries natively already (without converting them to >> NSNumbers/Values first), at least I believe I've done that in >> EnzymeX.... >> > > Unfortunately, NSRange is a C struct and not an Objective-C object, so > it can't go into collections on its own. I find it kind of confusing, > but I guess it improves performance. > > ObjC_classic/TypesAndConstants/FoundationTypes.html#//apple_ref/c/ > tdef/NSRange> > > ObjC_classic/Classes/NSValue.html#//apple_ref/occ/cl/NSValue> > > - Jim > ____________________________________________ > James P. Balhoff > Dept. of Biology > Duke University > Durham, NC 27708-0338 > USA > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From kvddrift at earthlink.net Fri Aug 6 18:06:33 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 6 Aug 2004 18:06:33 -0400 Subject: [Biococoa-dev] Design question In-Reply-To: References: Message-ID: Hi all, Here are some of my thoughts about the framework design. In the last few messages I missed any mention of a BCSequence class that more or less functions as the center of the framework. It's main member is then probably an NSString representing the sequence (DNA, RNA, protein). This is easy because they all are single character based sequences (unlike eg glycans). Additionally this class could have an NSMutableArray member consisting of objects that represent each single base, amino acid. Whenever the NSString is edited, the NSMutableArray is updated, and vice versa. We could have a BCRootObject from which BCAminoAcid, BCNucleotide, etc derive. These nucleotide and amino acid classes can then store more info about themselves, eg long name, pI, MW, modifications, annotations, etc. Also BCFunctionalGroup (methyl, phosphate) could be based on BCRootObject. Regarding the question whether the sequences should be 0-based or 1-based, I suggest we use both :) The BCSequence can have an NSRange member that is 1-based (or two ints indicating the start and end position), and the NSString and NSMutableArray are both 0-based. Another thing is that we should try to make the enzyme class (or any class that acts on a sequence) universal so it works both for DNA and proteins. Or at least have a base clase and put specific functionality in a DNAEnzyme and ProteinEnzyme class. Here are some liks to naming conventions: BTW, I am writing an app that among other things digests proteins (could you guess ;-), and can provide code for that. - Koen. From jtimmer at bellatlantic.net Fri Aug 6 18:37:01 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 06 Aug 2004 18:37:01 -0400 Subject: [Biococoa-dev] Design question In-Reply-To: Message-ID: > In the last few messages I missed any mention of a BCSequence class > that more or less functions as the center of the framework. It's main > member is then probably an NSString representing the sequence (DNA, > RNA, protein). This is easy because they all are single character based > sequences (unlike eg glycans). I had sent out an RTF file with some ideas on how to construct the class - if you hadn't gotten it, let me know and I'll resend it. It hadn't triggered any discussion yet, but I'd assumed that was because it was all ideas and no code yet. > Additionally this class could have an NSMutableArray member consisting > of objects that represent each single base, amino acid. Whenever the > NSString is edited, the NSMutableArray is updated, and vice versa. I'm curious as to what this adds that a string doesn't. The string itself acts like an array for most purposes other than sorting, which is a bad idea for sequences ;). Creating enough individual objects for storing a long sequence is going to be very processor/memory intensive - I'd imagine making an array out of a BAC sequence would bring my laptop to its knees. If there are a few specific array methods that you'd like to have for the sequence, I'll happily code them using the NSString contents. I'm hoping to put the last of the methods I have lying around into the BCUtils this weekend, and get started on implementing the sequence class ideas I've got next week. Relevant to this, I'm planning on using a few enumerations (ie - BCDNASequence, BCProteinSequence, BCRNASequence), but haven't done so in ObjC before. Where do they get defined, and does anybody have some code they're willing to share with me to make sure I don't screw that up? Meanwhile, I still can't log into my ssh account and my request for help from bioinformatics.org has gone unanswered. Sigh.... Cheers, Jay _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Fri Aug 6 19:16:33 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 6 Aug 2004 19:16:33 -0400 Subject: [Biococoa-dev] Design question In-Reply-To: References: Message-ID: On Aug 6, 2004, at 6:37 PM, John Timmer wrote: > > I had sent out an RTF file with some ideas on how to construct the > class - > if you hadn't gotten it, let me know and I'll resend it. It hadn't > triggered any discussion yet, but I'd assumed that was because it was > all > ideas and no code yet. Yes I have seen that. I meant the recent messages from yesterday. > > >> Additionally this class could have an NSMutableArray member consisting >> of objects that represent each single base, amino acid. Whenever the >> NSString is edited, the NSMutableArray is updated, and vice versa. > I'm curious as to what this adds that a string doesn't. You can access/manipulate more information about each base/amino acid. If you just have a letter, you don't have that info in the object, only an identifier. But I agree that it could be very processor intensive for very long sequences. > Relevant to this, I'm planning on using a few enumerations (ie - > BCDNASequence, BCProteinSequence, BCRNASequence), but haven't done so > in > ObjC before. Where do they get defined, and does anybody have some > code > they're willing to share with me to make sure I don't screw that up? Why not use an NSArray, etc? They have built-in enumerators. Also look at the NSScanner class to quickly find a character or string in another string. - Koen. From kvddrift at earthlink.net Sat Aug 7 08:27:44 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 7 Aug 2004 08:27:44 -0400 Subject: [Biococoa-dev] Design question In-Reply-To: References: Message-ID: <2EB62EE8-E86D-11D8-922C-003065A5FDCC@earthlink.net> On Aug 6, 2004, at 6:37 PM, John Timmer wrote: > Meanwhile, I still can't log into my ssh account and my request for > help > from bioinformatics.org has gone unanswered. Sigh.... > > John, What error message do you get? - Koen. From jtimmer at bellatlantic.net Sat Aug 7 22:50:39 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sat, 07 Aug 2004 22:50:39 -0400 Subject: [Biococoa-dev] Design question In-Reply-To: <2EB62EE8-E86D-11D8-922C-003065A5FDCC@earthlink.net> Message-ID: > > On Aug 6, 2004, at 6:37 PM, John Timmer wrote: > >> Meanwhile, I still can't log into my ssh account and my request for >> help >> from bioinformatics.org has gone unanswered. Sigh.... >> >> > > John, > > What error message do you get? None, it just rejects my login. The mail I got indicated that it would be the same as the website login. I can login at the website without a problem, but trying to ssh in with the same username/password combination gets rejected. Maybe I should just give up and create a new account? John _______________________________________________ This mind intentionally left blank From peter.schols at bio.kuleuven.ac.be Sun Aug 8 06:12:46 2004 From: peter.schols at bio.kuleuven.ac.be (Peter Schols) Date: Sun, 8 Aug 2004 12:12:46 +0200 Subject: [Biococoa-dev] Design question In-Reply-To: References: Message-ID: <7EDF74DE-E923-11D8-B78A-003065D0AD9E@bio.kuleuven.ac.be> Hi John, The people at bioinformatics.org are quite helpful but it can take a few days before they reply to e-mails. It might be better to create a new account indeed. As soon as you get your new login, please let me know and I'll add you to the list of developers. Cheers, Peter On 08 Aug 2004, at 04:50, John Timmer wrote: >> >> On Aug 6, 2004, at 6:37 PM, John Timmer wrote: >> >>> Meanwhile, I still can't log into my ssh account and my request for >>> help >>> from bioinformatics.org has gone unanswered. Sigh.... >>> >>> >> >> John, >> >> What error message do you get? > > None, it just rejects my login. The mail I got indicated that it > would be > the same as the website login. I can login at the website without a > problem, but trying to ssh in with the same username/password > combination > gets rejected. > > Maybe I should just give up and create a new account? > > John > > > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev From H.Bhaskar at exeter.ac.uk Mon Aug 9 09:36:48 2004 From: H.Bhaskar at exeter.ac.uk (Harish Bhaskar) Date: Mon, 9 Aug 2004 14:36:48 +0100 Subject: [Biococoa-dev] Summer School on Bioinformatics ! Message-ID: Dear Colleugue, This is to bring to your notice, the British Computer Society Summer School on Bioinformatics that is being organised at the University of Exeter (29,August - 3,September). This event is directed to PhD students, academics and industrial participants in the area of bioinformatics. The school involves lectures delivered by leading academics and practitioners in the UK. This is a fully residential summer school for one week and the registration cost covers accomodation, breakfast, tea/coffee and lunch. We would like to specially invite researchers and scientists from your group to attend this event. The registration forms are available at http://www.dcs.ex.ac.uk/bcs-par/bioinformatics04.html. To avail the early bird registration discount, fax us your registration forms to 01392-264067 before 13th August. Also, please circulate this email to interested colleagues in your organisation. There are several other benefits to registering before the 13th of August, 2004 as highlighted below: -- FREE IEEE Computer Society Membership for students for 2005 -- FREE IEEE Transactions on Computational Biology and Bioinformatics Subscription for 2005 -- FREE 20% discount on all Springer Books till October 2004 -- FREE British Computer Society PAR Specialist Group Membership for 2005 -- Best poster presentation prize of ?100 if your poster is judged best at the school -- Guaranteed accommodation on campus close to the venue -- Discounted fee, which increases after the 13th of August We look forward to seing you in the summer school. Harish Bhaskar (H.Bhaskar at ex.ac.uk) Secretary Summer School on Bioinformatics. From mek at mekentosj.com Mon Aug 9 18:12:50 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Tue, 10 Aug 2004 00:12:50 +0200 Subject: [Biococoa-dev] Design question In-Reply-To: References: Message-ID: <40D27C82-EA51-11D8-8162-000393CFDE0C@mekentosj.com> Hi guys, Back from a sunny and warm long weekend, let me continue our discussion on framework design and implementation based on Koen's input. Let me start by proposing an idea which again is derived from the BioJava framework. I think also in this respect they must have had a similar discussion as we are having here, and I see there solution to the problem as a very nice one. First an explanation of the idea, then my thoughts in this light as a reply to the things brought up. Basically we have two options: either go for a string based solution (as sequences are kind of long strings in the end), or go for a specific sequence class approach. As outlined in the link above, the string based approach has some clear disadvantages: 1 One would constantly need validation of strings as they allow non existing characters. I use strings in my programs to store sequences (I bet everyone does) and I constantly strip "foreign" characters upon editing, copying, dragging etc, in fact that's how I call my method ;-) 2 Ambiguity is hard to support, quoted: "The meaning of each symbol is not necessarily clear. The `T' which means thymidine in DNA is the same `T' which is a threonine residue in a protein sequence" 3 Limited alphabet, Koen already mentioned that glycans are hard to express in single letter codes. This is all solved by a class based approach where all nucleotides, amino acids, glycans etc are represented by there own class. However, John already pointed to the weak spot here, instantiating so many objects quickly results in big memory problems. The guys at BioJava came up with a nice solution, the best of both world so to speak: http://www.biojava.org/tutorials/chap1.html What we do is create singleton objects (think "sharedDefaultManager") for each class of "symbol", then refer to these using pointers. A sequence like "ATGC" would be an array in the form of: "pointer to shared "A" object, pointer to shared "T" object,pointer to shared "G" object, pointer to shared "C" object, etc" All used objects are present in memory only once, and the sequence is an array of pointers which is very cheap memory wise. To highlight some of the things in this approach which I like very much: - Great performance memory wise - The "symbol" classes can store all additional data like name, pi, etc - Solution to the ambiguity problem (see the getMatches() method) I have quite some experience using singleton classes in the form of sharedcontrollers. they are easy to implement and work very, very well. Replying in the light of this idea: > In the last few messages I missed any mention of a BCSequence class > that more or less functions as the center of the framework. It's main > member is then probably an NSString representing the sequence (DNA, > RNA, protein). This is easy because they all are single character > based sequences (unlike eg glycans). Again, the glycans can now be easily implemented in the form of a set of glycans symbols (shared objects) and now glycan sequences can be expressed as "sequences" > Additionally this class could have an NSMutableArray member consisting > of objects that represent each single base, amino acid. Whenever the > NSString is edited, the NSMutableArray is updated, and vice versa. We > could have a BCRootObject from which BCAminoAcid, BCNucleotide, etc > derive. These nucleotide and amino acid classes can then store more > info about themselves, eg long name, pI, MW, modifications, > annotations, etc. Also BCFunctionalGroup (methyl, phosphate) could be > based on BCRootObject. As said, John already mentioned the big pitfall of this approach; it's never wise to have these things present in parallel. First of all, it requires careful synchronization, and what happens if it does got out of sync?. Second, it requires at least twice the amount of memory as you have both a string and symbollist around. This is all prevented with the "shared symbollist approach" as we now have one "datasource". To convert to the string based world, I could very well imagine that the BCRootObject will have a - (NSString *)stringRepresentation; method that converts the symbollist and spits out a NSString for you (based on how the string part is defined in the symbol classes, which also defines long name, pI, MW, modifications, annotations, etc). We can also implement a number of "stringRepresentationForRange" methods. I think we should discuss how exactly the functional groups should be worked out. Either as separate symbols, or as possible "properties" of the base class. Example: should phosporylated-Serine be a separate "BCSymbol", or should phosphorylation be a "BCFunctionalGroup" that can be added to a symbol? Properly the first option if we go for shared symbols, as you can either add a property to all serines or none in this approach. The alternative option is to keep a modification dictionary (modification and position) associated at the sequence level instead of the symbol one. > Regarding the question whether the sequences should be 0-based or > 1-based, I suggest we use both :) The BCSequence can have an NSRange > member that is 1-based (or two ints indicating the start and end > position), and the NSString and NSMutableArray are both 0-based. The tutorial above mentions an interesting choice: "Note that numbering of Symbols within the SymbolList runs from 1 to length, not from 0 to length-1 as is the case with Java strings. This is consistent with the coordinate system found in files of annotated biological sequences." Maybe we should do the same here. > Another thing is that we should try to make the enzyme class (or any > class that acts on a sequence) universal so it works both for DNA and > proteins. Or at least have a base clase and put specific functionality > in a DNAEnzyme and ProteinEnzyme class. I agree, but as they might be very different, we indeed should go for a general enzyme superclass and further define stuff in subclasses. I guess that's something we will find out rapidly during coding. Another note here is that the shared symbols approach nicely allows defining the recognition site for both types as for instance the fact that a T stands for both thymidine and threonine forms no problem using this system. > Here are some liks to naming conventions: > > CodingGuidelines/Articles/NamingBasics.html> > CodingGuidelines/Articles/NamingIvarsAndTypes.html> Great articles, I guess we should stick to those conventions as close as possible. By the way, John it also shows you how to do enumerations: typedef enum { NSRadioModeMatrix = 0, NSHighlightModeMatrix = 1, NSListModeMatrix = 2, NSTrackModeMatrix = 3 } NSMatrixMode; You place them in the header file of the specific class you plan to use them in. Alternatively, we can add a file called BCConstants.h where more general enumerations and constants can be placed. One other remark, I guess what John meant were these enumerations, not to be confused with the enumerators for arrays. That's something completely different. These have as a big advantage that they are very lightweight, and give names to commonly used values. In addition, the integer value allows you to do math and nice comparisons. Example: typedef enum { BCLowPriority = 0, BCNormalPriority = 1, BCHighPriority = 2, BCVeryHighPriority = 3 } BCPriority; Given an integer called priority assigned with one of these, you can now do things like priority++ to increase priority; or things like if(priority>1) then... to check priorities of compare them. Very handy and the cocoa frameworks are filled with these (NSNotFound anyone? Or NSPortraitOrientation?) Tip: make sure you leave plenty of room for extension. > BTW, I am writing an app that among other things digests proteins > (could you guess ;-), and can provide > code for that. Yes! Very nice! I'm curious to your thoughts about this guys, I know it's easier to talk about implementation than the coding (for which I do not have much time right now, luckily that should change soon), but deciding on the right foundation may save a lot of efforts later! Looking forward to your replies! Cheers, Alex Ps. Again, I encourage everyone to read the tutorial I linked to, and if you have time to further dive into the BioJava docs (further then I did), I'm sure there are plenty of more design decisions they made from which we can learn and take advantage of... ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From kvddrift at earthlink.net Mon Aug 9 21:49:32 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 9 Aug 2004 21:49:32 -0400 Subject: [Biococoa-dev] Design question In-Reply-To: <40D27C82-EA51-11D8-8162-000393CFDE0C@mekentosj.com> References: <40D27C82-EA51-11D8-8162-000393CFDE0C@mekentosj.com> Message-ID: <864AA97D-EA6F-11D8-942E-003065A5FDCC@earthlink.net> > The guys at BioJava came up with a nice solution, the best of both > world so to speak: http://www.biojava.org/tutorials/chap1.html > What we do is create singleton objects (think "sharedDefaultManager") > for each class of "symbol", then refer to these using pointers. A > sequence like "ATGC" would be an array in the form of: "pointer to > shared "A" object, pointer to shared "T" object,pointer to shared "G" > object, pointer to shared "C" object, etc" All used objects are > present in memory only once, and the sequence is an array of pointers > which is very cheap memory wise. To highlight some of the things in > this approach which I like very much: > - Great performance memory wise > - The "symbol" classes can store all additional data like name, pi, etc > - Solution to the ambiguity problem (see the getMatches() method) Absolutely the right approach, IMO. > I think we should discuss how exactly the functional groups should be > worked out. Either as separate symbols, or as possible "properties" of > the base class. Example: should phosporylated-Serine be a separate > "BCSymbol", or should phosphorylation be a "BCFunctionalGroup" that > can be added to a symbol? Properly the first option if we go for > shared symbols, as you can either add a property to all serines or > none in this approach. The alternative option is to keep a > modification dictionary (modification and position) associated at the > sequence level instead of the symbol one. The second option is they way to go, I think. If I remember correctly, both the 'Singleton' and this approach are one of the 'Design Patterns', first described by the Gang of Four. I forgot the name of the second one, but you should check out that book in your local bookstore/library. One of the bibles of OOP. > >> Regarding the question whether the sequences should be 0-based or >> 1-based, I suggest we use both :) The BCSequence can have an NSRange >> member that is 1-based (or two ints indicating the start and end >> position), and the NSString and NSMutableArray are both 0-based. > The tutorial above mentions an interesting choice: "Note that > numbering of Symbols within the SymbolList runs from 1 to length, not > from 0 to length-1 as is the case with Java strings. This is > consistent with the coordinate system found in files of annotated > biological sequences." Maybe we should do the same here. Yes, I agree again. The stringRepresentation mentioned earlier will then do the conversion to 0-based. > > Ps. Again, I encourage everyone to read the tutorial I linked to, and > if you have time to further dive into the BioJava docs (further then I > did), I'm sure there are plenty of more design decisions they made > from which we can learn and take advantage of... Great tutorials, also the other chapters. Alex, thanks for all the input. It's really good to talk about this before diving blindfolded into codeland. - Koen. From mek at mekentosj.com Tue Aug 10 02:01:42 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Tue, 10 Aug 2004 08:01:42 +0200 Subject: [Biococoa-dev] Design question In-Reply-To: <864AA97D-EA6F-11D8-942E-003065A5FDCC@earthlink.net> References: <40D27C82-EA51-11D8-8162-000393CFDE0C@mekentosj.com> <864AA97D-EA6F-11D8-942E-003065A5FDCC@earthlink.net> Message-ID: >> Properly the first option if we go for shared symbols, as you can >> either add a property to all serines or none in this approach. The >> alternative option is to keep a modification dictionary (modification >> and position) associated at the sequence level instead of the symbol >> one. > The second option is they way to go, I think. If I remember correctly, > both the 'Singleton' and this approach are one of the 'Design > Patterns', first described by the Gang of Four. I forgot the name of > the second one, but you should check out that book in your local > bookstore/library. One of the bibles of OOP. Thanks Koen, I will check it out, I find myself in "programming land" so new that I seriously miss a lot of historic knowledge... The fact that I was trained as a biologist instead of IT guy doesn't help much either ;-) Yesterday I was still thinking a bit more about the two options I presented, and indeed the modification dictionary seems the best way to go. I think it's a very nice approach to keep this in a similar way as for instance the genbank records show features associated with the sequence. I believe John also mentioned something about this. The hierarchy would be something along the lines of a dictionary containing BCAnnotation objects (biojava does this as well), that would describe the positions in simple NSRanges and the type perhaps as BCFunctionalGroup objects. One of the problems will be to keep the system such that new (for us unknown) modifications/features are easily added... Another thought I would like you to comment on is the addition of a "history/editing dictionary" which keeps track of who added/edited a sequence and when/what things were edited. In general, I think it would be nice if we would go for the "non-destructive editing approach" wherever possible. My would-be Biococoa based DNAStrider-like app would for instance allow the user to cut and paste fragments and vectors, and it would be very nice if many of the editing could always be undone, and the original sequence could always be viewed. Think along the lines of a modern video editing approach, the files are unchanged, only the displayed parts are changed. This could save a lot of memory/disk reusal/writing as well. Of course there must be methods to "crop" your file as it has no use to keep a complete genome around if your only interested in one gene right... These are things we have to build in the core BCSequence class.... Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Mac vs Windows 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* From jtimmer at bellatlantic.net Tue Aug 10 11:06:36 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 10 Aug 2004 11:06:36 -0400 Subject: [Biococoa-dev] Design question In-Reply-To: Message-ID: > Thanks Koen, I will check it out, I find myself in "programming land" > so new that I seriously miss a lot of historic knowledge... > The fact that I was trained as a biologist instead of IT guy doesn't > help much either ;-) > I'm in the same situation, so I sympathize. Anyway, I like the idea of the singleton base arrangement. I spent my time wandering around the javadoc references instead of the tutorial, and clearly I missed a lot of things. Given easy methods to convert between the array and strings, it would allow us to code all the methods using whichever format is easier. And I'm all for easy to make methods.... > Yesterday I was still thinking a bit more about the two options I > presented, and indeed the modification dictionary seems the best way to > go. I think it's a very nice approach to keep this in a similar way as > for instance the genbank records show features associated with the > sequence. I believe John also mentioned something about this. The > hierarchy would be something along the lines of a dictionary containing > BCAnnotation objects (biojava does this as well), that would describe > the positions in simple NSRanges and the type perhaps as > BCFunctionalGroup objects. One of the problems will be to keep the > system such that new (for us unknown) modifications/features are easily > added... I had thought an array of NSDictionary like objects, each a BCFeature (or BCAnnotation) would be easier. The key thing would be to have a unique ID set when a feature is added, so the user is shielded from naming conflicts (they could add as many things named "ORF" as they want). This would also allow a feature to point to a separate sequence within a bundle of sequences - ie, the amino acid sequence of that ORF. Either way works, but I'd thought that an array as the root feature object had more parallels with other sequence file formats (ie - NCBI's) and having a regular, repeating structure would make the native file format a bit more readable. The flipside is that looking up a specific object in a dictionary would be much simpler to code. Maybe a vote on this is in order? One thing I'd argue for is an enumeration of defined feature types. The user should be free to create their own, but there are huge advantages of a set of non-custom ones. Imagine being able to search an institute wide plasmid collection for everything with a Vertebrate promoter, protein tag, and unique BamHI site.... > Another thought I would like you to comment on is the addition of a > "history/editing dictionary" which keeps track of who added/edited a > sequence and when/what things were edited. In general, I think it would > be nice if we would go for the "non-destructive editing approach" > wherever possible. My would-be Biococoa based DNAStrider-like app would > for instance allow the user to cut and paste fragments and vectors, and > it would be very nice if many of the editing could always be undone, > and the original sequence could always be viewed. Think along the lines > of a modern video editing approach, the files are unchanged, only the > displayed parts are changed. This could save a lot of memory/disk > reusal/writing as well. Of course there must be methods to "crop" your > file as it has no use to keep a complete genome around if your only > interested in one gene right... As you point out, the danger here would be that we'd have to guess in advance the information content that would best suit the user. Permanent undo's are also out of keeping with most AppKit design practices, where the UndoManager doesn't survive application quits. I'm all for keeping an internal Undo list in each sequence object and allowing that to transfer with drag/drop actions and such, but I'm hesitant about writing it to disk. Something like that might be better implemented on a per-program basis, rather than at the root of BioCocoa. Off to visit the mice now... John _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Tue Aug 10 14:57:13 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Tue, 10 Aug 2004 20:57:13 +0200 Subject: [Biococoa-dev] Design question In-Reply-To: References: Message-ID: <175123AB-EAFF-11D8-A322-000393CFDE0C@mekentosj.com> > Given easy methods to convert between the array and strings, it would > allow > us to code all the methods using whichever format is easier. And I'm > all > for easy to make methods.... So am I ;-) But there's one caveat, I personally think we should see the singleton base sequence as the "native" format for our sequence class and throughout the framework. That means that the stringRepresentation is merely a way to give users the possibility to get back a string in the end, but internally all methods should work with and be optimized for the singleton base classes. I outlined the disadvantages of the stringbased approach that you will encounter (like the validation problem), it would be a pity if one still would continuously watch these caveats while we have such a nice system around. I hope that elegant and strong foundation classes based on the singletons will almost complete remove the need for the strings world ;-) >> Yesterday I was still thinking a bit more about the two options I >> presented, and indeed the modification dictionary seems the best way >> to >> go. I think it's a very nice approach to keep this in a similar way as >> for instance the genbank records show features associated with the >> sequence. I believe John also mentioned something about this. The >> hierarchy would be something along the lines of a dictionary >> containing >> BCAnnotation objects (biojava does this as well), that would describe >> the positions in simple NSRanges and the type perhaps as >> BCFunctionalGroup objects. One of the problems will be to keep the >> system such that new (for us unknown) modifications/features are >> easily >> added... > I had thought an array of NSDictionary like objects, each a BCFeature > (or > BCAnnotation) would be easier. The key thing would be to have a > unique ID > set when a feature is added, so the user is shielded from naming > conflicts > (they could add as many things named "ORF" as they want). This would > also > allow a feature to point to a separate sequence within a bundle of > sequences > - ie, the amino acid sequence of that ORF. You're right, an array filled with BCFeatures or BCAnnotation would be very nice, this way we avoid namespace collisions. I'm not sure whether we need an additional layer of dictionaries in between actually. But perhaps I missed the point here. A BCFeature would then contain further info on the type/contents of modification, plus the range over which the feature extends. For the reference to other sequences we should come up with something like bundle identifiers that uniquely identify sequences inside a bundle, but this is of later problem while devising the file format we plan to use (I think BioJava uses URIs here, see link below). > Either way works, but I'd thought that an array as the root feature > object > had more parallels with other sequence file formats (ie - NCBI's) and > having > a regular, repeating structure would make the native file format a bit > more > readable. I fully agree. > The flipside is that looking up a specific object in a dictionary > would be much simpler to code. Maybe a vote on this is in order? Well I guess the enumerators present for arrays are sufficient here, nice to mention is again that we can build in sort methods in the BCFeature class (as I showed in the attached headers last time) which allows sorting on type, name, length, position etc. > One thing I'd argue for is an enumeration of defined feature types. > The > user should be free to create their own, but there are huge advantages > of a > set of non-custom ones. Imagine being able to search an institute wide > plasmid collection for everything with a Vertebrate promoter, protein > tag, > and unique BamHI site.... True, we should try to keep a defined set, perhaps we need a intermediate categories level here, like restriction enzyme or structure type with defined subtypes like BamHI for the first, and helix, beta strand for the second. Let's see how far we can come, a plist with proposed categories and subtypes might be an option here. Keep in mind that the list might be pretty long, already containing at least 700 restriction enzymes. > >> Another thought I would like you to comment on is the addition of a >> "history/editing dictionary" which keeps track of who added/edited a >> sequence and when/what things were edited. In general, I think it >> would >> be nice if we would go for the "non-destructive editing approach" >> wherever possible. My would-be Biococoa based DNAStrider-like app >> would >> for instance allow the user to cut and paste fragments and vectors, >> and >> it would be very nice if many of the editing could always be undone, >> and the original sequence could always be viewed. Think along the >> lines >> of a modern video editing approach, the files are unchanged, only the >> displayed parts are changed. This could save a lot of memory/disk >> reusal/writing as well. Of course there must be methods to "crop" your >> file as it has no use to keep a complete genome around if your only >> interested in one gene right... > As you point out, the danger here would be that we'd have to guess in > advance the information content that would best suit the user. > Permanent > undo's are also out of keeping with most AppKit design practices, > where the > UndoManager doesn't survive application quits. I'm all for keeping an > internal Undo list in each sequence object and allowing that to > transfer > with drag/drop actions and such, but I'm hesitant about writing it to > disk. > Something like that might be better implemented on a per-program basis, > rather than at the root of BioCocoa. You're right John, that would be a real option. I guess what is a better solution, and something we should suggest is that for instance a developer produces (program) specific BCFeatures, like a fragment feature or cut vector feature, this way he can easily add features that can be saved and also preserve the original content if he likes (but display the feature part only). We should keep in mind the whole upcoming meta data story for Tiger, and things like created by, created at, etc etc should be added to our files. Do we create these as BCFeatures? I guess not... So we also need "file-wide" features that do not necessarily point to a specific part of the sequence, but to the whole. I propose this to be a separate array containing BCFileAnnotations or something, this way again each program can add specific features into the file and leaves plenty of room for both sequence specific and file specific info to add. We should stress in the docs that developers should expect to encounter non-biococoa-defined features/annotations in the files. Just a few extra remarks / things to remember: How BioJava tackles this problem is similar to what we propose and can be found here: http://www.biojava.org/tutorials/chap2.html I also read something about a "biofoundation-wide" standard of storing and accessing sequence data, here: http://obda.open-bio.org/ It also allows access to data stored using BioSQL. At the moment I have no clue what it means, but perhaps one of you knows more. Anyway, I don't think that has much priority right now. Do you guys think we should, like the Cocoa frameworks, provide both immutable and mutable variants of our classes, or are all our objects mutable by definition? The first ones might allow further optimization if they do not need to be mutable. Looking forward to your reaction! Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* From a.griekspoor at nki.nl Tue Aug 10 15:01:50 2004 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Tue, 10 Aug 2004 21:01:50 +0200 Subject: [Biococoa-dev] Another team member, well sort of... Message-ID: Guys, Let me introduce you to Tom "Tosj" Groothuis, the other half of Mek & Tosj. I don't think he needs much introduction for those who have visited our website, and now Tom would like to join the brain storming sessions on the framework.... Cheers, Alex Peter could you add Tom to the list of developers, and add him to the mailinglist? His bioinformatics login name is "ttwimlex" ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* From jtimmer at bellatlantic.net Tue Aug 10 15:50:31 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 10 Aug 2004 15:50:31 -0400 Subject: [Biococoa-dev] Design question In-Reply-To: <175123AB-EAFF-11D8-A322-000393CFDE0C@mekentosj.com> Message-ID: > So am I ;-) But there's one caveat, I personally think we should see > the singleton base sequence as the "native" format for our sequence > class and throughout the framework. That means that the > stringRepresentation is merely a way to give users the possibility to > get back a string in the end, but internally all methods should work > with and be optimized for the singleton base classes. I outlined the > disadvantages of the stringbased approach that you will encounter (like > the validation problem), it would be a pity if one still would > continuously watch these caveats while we have such a nice system > around. Okay, maybe I'll write up the DNA singletons and then re-work the methods over the weekend ahead. My job talk is over with, so time may be more free. > I hope that elegant and strong foundation classes based on the > singletons will almost complete remove the need for the strings world > ;-) Yes, other than saving the file, drag and drop to other apps, exporting to Unix tools and databases, etc. etc. I see the elegance and utility of the idea internally, but as soon as we exit BioCocoa, sequences are a UTF8 world ;). And welcome, Tom! JT _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Tue Aug 10 15:56:46 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 10 Aug 2004 15:56:46 -0400 Subject: [Biococoa-dev] Design question In-Reply-To: <175123AB-EAFF-11D8-A322-000393CFDE0C@mekentosj.com> Message-ID: > For the reference to other sequences we should come up with something > like bundle identifiers that uniquely identify sequences inside a > bundle, but this is of later problem while devising the file format we > plan to use (I think BioJava uses URIs here, see link below). I'd already thought about this. I figure each sequence could have a list of used indexes, and a reference to its parent bundle, which also has a list. If the parent bundle exists, it hands out a new index to any sequence object requesting it. If it doesn't, the sequence object can choose its own. If you add an existing sequence to a bundle, it should have a rationalization mechanism - just reassign all indexes according to a range provided by the bundle. Sorry, sent the last message too soon, so you're getting two. Cheers, John _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Tue Aug 10 20:39:34 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 10 Aug 2004 20:39:34 -0400 Subject: [Biococoa-dev] Design question In-Reply-To: <175123AB-EAFF-11D8-A322-000393CFDE0C@mekentosj.com> References: <175123AB-EAFF-11D8-A322-000393CFDE0C@mekentosj.com> Message-ID: >> One thing I'd argue for is an enumeration of defined feature types. >> The >> user should be free to create their own, but there are huge >> advantages of a >> set of non-custom ones. Imagine being able to search an institute >> wide >> plasmid collection for everything with a Vertebrate promoter, protein >> tag, >> and unique BamHI site.... > True, we should try to keep a defined set, perhaps we need a > intermediate categories level here, like restriction enzyme or > structure type with defined subtypes like BamHI for the first, and > helix, beta strand for the second. Let's see how far we can come, a > plist with proposed categories and subtypes might be an option here. > Keep in mind that the list might be pretty long, already containing at > least 700 restriction enzymes. Mmmmm - I love BamHI (sorry, I couldn't resist the joke ;) >>> Another thought I would like you to comment on is the addition of a >>> "history/editing dictionary" which keeps track of who added/edited a >>> sequence and when/what things were edited. In general, I think it >>> would >>> be nice if we would go for the "non-destructive editing approach" >>> wherever possible. My would-be Biococoa based DNAStrider-like app >>> would >>> for instance allow the user to cut and paste fragments and vectors, >>> and >>> it would be very nice if many of the editing could always be undone, >>> and the original sequence could always be viewed. Think along the >>> lines >>> of a modern video editing approach, the files are unchanged, only the >>> displayed parts are changed. This could save a lot of memory/disk >>> reusal/writing as well. Of course there must be methods to "crop" >>> your >>> file as it has no use to keep a complete genome around if your only >>> interested in one gene right... >> As you point out, the danger here would be that we'd have to guess in >> advance the information content that would best suit the user. >> Permanent >> undo's are also out of keeping with most AppKit design practices, >> where the >> UndoManager doesn't survive application quits. I'm all for keeping an >> internal Undo list in each sequence object and allowing that to >> transfer >> with drag/drop actions and such, but I'm hesitant about writing it to >> disk. >> Something like that might be better implemented on a per-program >> basis, >> rather than at the root of BioCocoa. I'm not sure if we should focus on saving, undo, etc right now. This should be left as an exercise for the user, because there are probably many different situations for as many different apps. - Koen. From tosj at mekentosj.com Wed Aug 11 03:07:54 2004 From: tosj at mekentosj.com (Tom Groothuis (Tosj)) Date: Wed, 11 Aug 2004 09:07:54 +0200 Subject: [Biococoa-dev] Introducing: Me = Tom "Tosj" Groothuis Message-ID: <2A7EB494-EB65-11D8-96F2-00306579658C@mekentosj.com> Hello my new mailing list friends! As the lagging part of Mek&Tosj I have also signed up for this mailing list. Alexander encouraged me to do so, so I can keep up with the things he is involved in. He also advised me to tell something about myself, so I introduce: myself. As I mentioned I am the lagging part of Mek&Tosj, Alex is doing all of the programming, I am trying to teach myself programming too, but as you might tell it is not that simple and I DO have a job too... My part of MekenTosj is to help Alex wherever I can: being his partner in crime, I try to help him have a look at the programs as a kind of super-beta-tester and come up with features he might put in, because I DO know what the programming opportunities are, but don't know how to write it. Further I try to be the CFO of Mekentosj, but as we have only made freeware so far, that is not a very hard job... And as 2 men know more than 1 we both screen our normal and digital world for interesting new features, links, Apple events etc. I hope to be of at least some help as a bit an outside member, but that can be very refreshing as Alex can confirm. Greetings, Tosj *********************************** Tosj (Tom Groothuis) Mek&Tosj Web: http://www.mekentosj.com Mail: tosj at mekentosj.com AIM: mekentosj at mac.com *********************************** From jtimmer at bellatlantic.net Wed Aug 11 12:07:47 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Wed, 11 Aug 2004 12:07:47 -0400 Subject: [Biococoa-dev] Base design In-Reply-To: <2A7EB494-EB65-11D8-96F2-00306579658C@mekentosj.com> Message-ID: Okay, about to start coding. Before I take the plunge, I'm thinking of the following setup: BCSequenceDNABase - An abstract superclass, allows for easy testing of a sequence for DNA content. Would also include all the .h's for the individual bases, so including this header would get you all bases (though I may have to think about how to avoid circular importing - is this a problem?). If you tried to initialize this class, it would generate an "N" base. It would have class methods for changing strings to sequences and vice versa. Each base would implement the following methods: (NSString *) name; // Adenine, Purine, etc. (NSString *) symbol; // A, T, etc. (BOOL) representsSingleNucleotide; // YES if G, NO if W, etc. (BCSequenceDNABase *)complement; // obvious (NSArray *)matches; // an array of all base objects represented by this What have I forgotten? JT _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Wed Aug 11 15:44:58 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Wed, 11 Aug 2004 21:44:58 +0200 Subject: [Biococoa-dev] Base design In-Reply-To: References: Message-ID: John, Before you start of perhaps it nice to read the attached pieces of documentation that describe the logic and architecture of the BioJava setup we discussed earlier. I like the idea they have of Symbols that make up Alphabets from which you can generate symbollists. Have a look at their docs starting at the org.biojava.bio.symbol package to see which methods each class implements. I like the idea of sticking as close as possible to their setup in terms of classes and class methods, the implementation is up to you of course. I suggest also to download the biojava source and see how they implemented stuff. Some remarks about the methods you mention: > (NSString *) symbol; // A, T, etc. This would be biojava's "getToken()" method, they choose for a char instead of an object like NSString, perhaps this would indeed be wiser if this method is used in memory sensitive methods. > (BOOL) representsSingleNucleotide; // YES if G, NO if W, etc. Biojava neatly uses the distinction between an atomic symbol (that represents only one), and basissymbols (like N, purine, w, g) etc I guess that removes a lot of checking as it's defined in the class already (see documentation snippets below). > I still wonder how we should implement the specific classes (you mention the shared headers, which I don't really get exactly). Haven't found out how biojava does that either, but it seems you have a plan already. Anyway, maybe it's time to plunge in the coding waters ;-) To switch to a somewhat different subject, it might be a good idea to ask everyone to document there programming using the headerdoc system. It's quite easy, but lateron allows us to quickly generate documentation in a format familiar to most developers. Here's a link to the exact details: http://developer.apple.com/documentation/DeveloperTools/Conceptual/ HeaderDoc/index.html?http://developer.apple.com/documentation/ DeveloperTools/Conceptual/HeaderDoc/tags/chapter_2_section_2.html#// apple_ref/doc/uid/TP40001215-CH346-DontLinkElementID_6129 Click on the show TOC to get to the complete documentation for headerdoc. As an example I have attached a file from the AGRegex framework, it nicely shows what it will look like. To avoid the removal of the attachment, here it is inline: The really cool thing is that you can use XCode's Applescript menu to quickly insert templates, as easy as it can get! Let me know what you think of all this... Cheers, Alex Package org.biojava.bio.symbol Description Representation of the Symbols that make up a sequence, and locations within them. This package is not intended to have strong biological ties. It is here to make programming things like dynamic-programming much easier. It also handles serialization of well-known alphabets so that applicable singleton properties of alphabets and Symbols are maintained. All coordinates are in 'bio-coordinates' - that is - legal indexes start from 1 and a range is inclusive (4 to 7 includes 4, 5, 6 and 7). A Symbol is a single token. The Symbol maintains a name, a token (char), and an Annotation bundle. A set of Symbols is represented by an Alphabet instance. If the Alphabet can guarantee that there are only ever a finite number of Symbols contained with in it, then it must implement FiniteAlphabet. The Symbol objects within a FiniteAlphabet can be tested for equality by comparing their references directly. A SymbolList is a string over the Symbols from a single Alphabet instance. This allows you to represent a sequence of tokens, such as DNA nucleotides, or stock-market prices. Locations within a SymbolList can be represented by a Location object. This interface defines a sub-set of points that are within the Location. This uses bio-coordinates, and defines all the operations that you are likely to need to build your own Locations (union, intersection and the like). public interface Symbol extends Annotatable A single symbol. This is the atomic unit of a SymbolList, or a sequence. It allows for fine-grain fly-weighting, so that there can be one instance of each symbol that is referenced multiple times. Symbols from finite alphabets are identifiable using the == operator. Symbols from infinite alphabets may have some specific API to test for equality, but should realy over-ride the equals() method. Some symbols represent a single token in the sequence. For example, there is a Symbol instance for adenine in DNA, and another one for cytosine. Symbols can potentialy represent sets of Symbols. For example, n represents any DNA Symbol, and X any protein Symbol. Gap represents the knowledge that there is no Symbol. In addition, some symbols represent ordered lists of other Symbols. For example, the codon agt can be represented by a single Symbol from the Alphabet DNAxDNAxDNA. Symbols can represent ambiguity over these complex symbols. For example, you could construct a Symbol instance that represents the codons atn. This matches the codons {ata, att, atg, atc}. It is also possible to build a Symbol instance that represents all stop codons {taa, tag, tga}, which can not be represented in terms of a single ambiguous n'tuple. There are three Symbol interfaces. Symbol is the most generic. It has the methods getToken and getName so that the Symbol can be textually represented. In addition, it defines getMatches that returns an Alphabet over all the AtomicSymbol instances that match the Symbol (N would return an Alphabet containing {A, G, C, T}, and Gap would return {}). BasisSymbol instances can always be represented by an n'tuple of BasisSymbol instances. It adds the method getSymbols so that you can retrieve this list. For example, the tuple [ant] is a BasisSymbol, as it is uniquely specified with those three BasisSymbol instances a, n and t. n is a BasisSymbol instance as it is uniquely represented by itself. AtomicSymbol instances specialize BasisSymbol by guaranteeing that getMatches returns a set containing only that instance. That is, they are indivisable. The DNA nucleotides are instances of AtomicSymbol, as are individual codons. The stop codon {tag} will have a getMatches method that returns {tag}, a getBases method that also returns {tag} and a getSymbols method that returns the List [t, a, g]. {tna} is a BasisSymbol but not an AtomicSymbol as it matches four AtomicSymbol instances {taa, tga, tca, tta}. It follows that each symbol in getSymbols for an AtomicSymbol instance will also be AtomicSymbol instances. public interface AtomicSymbol extends BasisSymbol A symbol that is not ambiguous. Atomic symbols are the real underlying elements that a SymbolList is meant to be composed of. DNA nucleotides are atomic, as are amino-acids. The getMatches() method should return an alphabet containing just the one Symbol. The Symbol instances for single codons would be instances of AtomicSymbol as they can only be represented as a Set of symbols from their alphabet that contains just that one symbol. AtomicSymbol instances guarantee that getMatches returns an Alphabet containing just that Symbol and each element of the List returned by getSymbols is also atomic. public interface BasisSymbol extends Symbol A symbol that can be represented as a string of Symbols. BasisSymbol instances can always be represented uniquely as a single List of BasisSymbol instances. The symbol N is a BasisSymbol - it can be uniquely represented by N. It matches {a, g, c, t}. Similarly, the symbol atn is a BasisSymbol, as it can be uniquely represented with a single list of symbols [a, t, n]. Its getMatches will return the set {ata, att, atg, atc}. The getSymbols method returns the unique list of BasisSymbol instances that this is composed from. For example, the codon ambiguity symbol atn will have a getSymbols method that returns the list [a, t, n]. The getMatches method will return an Alphabet containing each AtomicSymbol that can be made by expanding the list of BasisSymbol instances. public interface Alphabet extends Annotatable The set of AtomicSymbols which can be concatenated together to make a SymbolList. A non-atomic symbol is considered to be contained within this alphabet if all of the atomic symbols that it could match are members of this alphabet. public interface FiniteAlphabet extends Alphabet An alphabet over a finite set of Symbols. This interface makes the distinction between an alphabet over a finite (and possibly small) number of symbols and an Alphabet over an infinite (or extremely large) set of symbols. Within a FiniteAlphabet, the == operator should be sufficient to decide upon equality for all AtomicSymbol instances. The alphabet functions as the repository of objects in the fly-weight design pattern. Only symbols within an alphabet should appear in object that claim to use the alphabet - otherwise something is in error. public interface SymbolList extends Changeable A sequence of symbols that belong to an alphabet. This uses biological coordinates (1 to length). public interface GappedSymbolList extends SymbolList This extends SymbolList with API for manipulating, inserting and deleting gaps. You could make a SymbolList that contains gaps directly. However, this leaves you with a nasty problem if you wish to support gap-edit operations. Also, the original SymbolList must either be coppied or lost. GappedSymbolList solves these problems. It will maintain a data-structure that places gaps. You can add and remove the gaps by using the public API. For gap-insert operations, the insert index is the position that will become a gap. The symbol currently there will move to a higher index. To insert leading gaps, add gaps at index 1. To insert trailing gaps, add gaps at index length+1. > ********************** // AGRegex.h // // Copyright (c) 2002 Aram Greenman. All rights reserved. // // Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: // // 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. // 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. // 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. // // THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. #import #import @class AGRegex, NSArray, NSString; /*! @enum Options Options defined for -initWithPattern:options:. Two or more options can be combined with the bitwise OR operator. @constant AGRegexCaseInsensitive Matching is case insensitive. Equivalent to /i in Perl. @constant AGRegexDotAll Dot metacharacter matches any character including newline. Equivalent to /s in Perl. @constant AGRegexExtended Allow whitespace and comments in the pattern. Equivalent to /x in Perl. @constant AGRegexLazy Makes greedy quantifiers lazy and lazy quantifiers greedy. No equivalent in Perl. @constant AGRegexMultiline Caret and dollar anchors match at newline. Equivalent to /m in Perl. */ enum { AGRegexCaseInsensitive = 1, AGRegexDotAll = 2, AGRegexExtended = 4, AGRegexLazy = 8, AGRegexMultiline = 16 }; /*! @class AGRegexMatch @abstract A single occurence of a regular expression. @discussion An AGRegexMatch represents a single occurence of a regular expression within the target string. The range of each subpattern within the target string is returned by -range, -rangeAtIndex:, or -rangeNamed:. The part of the target string that matched each subpattern is returned by -group, -groupAtIndex:, or -groupNamed:. */ @interface AGRegexMatch : NSObject { AGRegex *regex; NSString *string; int *matchv; int count; } /*! @method count The number of capturing subpatterns, including the pattern itself. */ - (int)count; /*! @method group Returns the part of the target string that matched the pattern. */ - (NSString *)group; /*! @method groupAtIndex: Returns the part of the target string that matched the subpattern at the given index or nil if it wasn't matched. The subpatterns are indexed in order of their opening parentheses, 0 is the entire pattern, 1 is the first capturing subpattern, and so on. */ - (NSString *)groupAtIndex:(int)idx; /*! @method groupNamed: Returns the part of the target string that matched the subpattern of the given name or nil if it wasn't matched. */ - (NSString *)groupNamed:(NSString *)name; /*! @method range Returns the range of the target string that matched the pattern. */ - (NSRange)range; /*! @method rangeAtIndex: Returns the range of the target string that matched the subpattern at the given index or {NSNotFound, 0} if it wasn't matched. The subpatterns are indexed in order of their opening parentheses, 0 is the entire pattern, 1 is the first capturing subpattern, and so on. */ - (NSRange)rangeAtIndex:(int)idx; /*! @method rangeNamed: Returns the range of the target string that matched the subpattern of the given name or {NSNotFound, 0} if it wasn't matched. */ - (NSRange)rangeNamed:(NSString *)name; /*! @method string Returns the target string. */ - (NSString *)string; @end /*! @class AGRegex @abstract An Perl-compatible regular expression class. @discussion An AGRegex is created with -initWithPattern: or -initWithPattern:options: or the corresponding class methods +regexWithPattern: or +regexWithPattern:options:. These take a regular expression pattern string and the bitwise OR of zero or more option flags. For example:     AGRegex *regex = [[AGRegex alloc] initWithPattern:@"(paran|andr)oid" options:AGRegexCaseInsensitive]; Matching is done with -findInString: or -findInString:range: which look for the first occurrence of the pattern in the target string and return an AGRegexMatch or nil if the pattern was not found.     AGRegexMatch *match = [regex findInString:@"paranoid android"]; A match object returns a captured subpattern by -group, -groupAtIndex:, or -groupNamed:, or the range of a captured subpattern by -range, -rangeAtIndex:, or -rangeNamed:. The subpatterns are indexed in order of their opening parentheses, 0 is the entire pattern, 1 is the first capturing subpattern, and so on. -count returns the total number of subpatterns, including the pattern itself. The following prints the result of our last match case:     for (i = 0; i < [match count]; i++)
        NSLog(@"%d %@ %@", i, NSStringFromRange([match rangeAtIndex:i]), [match groupAtIndex:i]);
    0 {0, 8} paranoid
    1 {0, 5} paran
If any of the subpatterns didn't match, -groupAtIndex: will return nil, and -rangeAtIndex: will return {NSNotFound, 0}. For example, if we change our original pattern to "(?:(paran)|(andr))oid" we will get the following output:     0 {0, 8} paranoid
    1 {0, 5} paran
    2 {2147483647, 0} (null)
-findAllInString: and -findAllInString:range: return an NSArray of all non-overlapping occurrences of the pattern in the target string. -findEnumeratorInString: and -findEnumeratorInString:range: return an NSEnumerator for all non-overlapping occurrences of the pattern in the target string. For example,     NSArray *all = [regex findAllInString:@"paranoid android"]; The first object in the returned array is the match case for "paranoid" and the second object is the match case for "android". AGRegex provides the methods -replaceWithString:inString: and -replaceWithString:inString:limit: to perform substitution on strings.     AGRegex *regex = [AGRegex regexWithPattern:@"remote"];
    NSString *result = [regex replaceWithString:@"complete" inString:@"remote control"]; // result is "complete control"
Captured subpatterns can be interpolated into the replacement string using the syntax $x or ${x} where x is the index or name of the subpattern. $0 and $& both refer to the entire pattern. Additionally, the case modifier sequences \U...\E, \L...\E, \u, and \l are allowed in the replacement string. All other escape sequences are handled literally.     AGRegex *regex = [AGRegex regexWithPattern:@"[usr]"];
    NSString *result = [regex replaceWithString:@"\\u$&." inString:@"Back in the ussr"]; // result is "Back in the U.S.S.R."
Note that you have to escape a backslash to get it into an NSString literal. Named subpatterns may also be used in the pattern and replacement strings, like in Python.     AGRegex *regex = [AGRegex regexWithPattern:@"(?P<who>\\w+) is a (?P<what>\\w+)"];
    NSString *result = [regex replaceWithString:@"Jackie is a $what, $who is a runt" inString:@"Judy is a punk"]); // result is "Jackie is a punk, Judy is a runt"
Finally, AGRegex provides -splitString: and -splitString:limit: which return an NSArray created by splitting the target string at each occurrence of the pattern. For example:     AGRegex *regex = [AGRegex regexWithPattern:@"ea?"];
    NSArray *result = [regex splitString:@"Repeater"]; // result is "R", "p", "t", "r"
If there are captured subpatterns, they are returned in the array.     AGRegex *regex = [AGRegex regexWithPattern:@"e(a)?"];
    NSArray *result = [regex splitString:@"Repeater"]; // result is "R", "p", "a", "t", "r"
In Perl, this would return "R", undef, "p", "a", "t", undef, "r". Unfortunately, there is no convenient way to represent this in an NSArray. (NSNull could be used in place of undef, but then all members of the array couldn't be expected to be NSStrings.) */ @interface AGRegex : NSObject { void *regex; void *extra; int groupCount; } /*! @method regexWithPattern: Creates a new regex using the given pattern string. Returns nil if the pattern string is invalid. */ + (id)regexWithPattern:(NSString *)pat; /*! @method regexWithPattern:options: Creates a new regex using the given pattern string and option flags. Returns nil if the pattern string is invalid. */ + (id)regexWithPattern:(NSString *)pat options:(int)opts; /*! @method initWithPattern: Initializes the regex using the given pattern string. Returns nil if the pattern string is invalid. */ - (id)initWithPattern:(NSString *)pat; /*! @method initWithPattern:options: Initializes the regex using the given pattern string and option flags. Returns nil if the pattern string is invalid. */ - (id)initWithPattern:(NSString *)pat options:(int)opts; /*! @method findInString: Calls findInString:range: using the full range of the target string. */ - (AGRegexMatch *)findInString:(NSString *)str; /*! @method findInString:range: Returns an AGRegexMatch for the first occurrence of the regex in the given range of the target string or nil if none is found. */ - (AGRegexMatch *)findInString:(NSString *)str range:(NSRange)r; /*! @method findAllInString: Calls findAllInString:range: using the full range of the target string. */ - (NSArray *)findAllInString:(NSString *)str; /*! @method findAllInString:range: Returns an array of all non-overlapping occurrences of the regex in the given range of the target string. The members of the array are AGRegexMatches. */ - (NSArray *)findAllInString:(NSString *)str range:(NSRange)r; /*! @method findEnumeratorInString: Calls findEnumeratorInString:range: using the full range of the target string. */ - (NSEnumerator *)findEnumeratorInString:(NSString *)str; /*! @method findEnumeratorInString:range: Returns an enumerator for all non-overlapping occurrences of the regex in the given range of the target string. The objects returned by the enumerator are AGRegexMatches. */ - (NSEnumerator *)findEnumeratorInString:(NSString *)str range:(NSRange)r; /*! @method replaceWithString:inString: Calls replaceWithString:inString:limit: with no limit. */ - (NSString *)replaceWithString:(NSString *)rep inString:(NSString *)str; /*! @method replaceWithString:inString:limit: Returns the string created by replacing occurrences of the regex in the target string with the replacement string. If the limit is positive, no more than that many replacements will be made. Captured subpatterns can be interpolated into the replacement string using the syntax $x or ${x} where x is the index or name of the subpattern. $0 and $& both refer to the entire pattern. Additionally, the case modifier sequences \U...\E, \L...\E, \u, and \l are allowed in the replacement string. All other escape sequences are handled literally. */ - (NSString *)replaceWithString:(NSString *)rep inString:(NSString *)str limit:(int)limit; /*! @method splitString: Call splitString:limit: with no limit. */ - (NSArray *)splitString:(NSString *)str; /*! @method splitString:limit: Returns an array of strings created by splitting the target string at each occurrence of the pattern. If the limit is positive, no more than that many splits will be made. If there are captured subpatterns, they are returned in the array. */ - (NSArray *)splitString:(NSString *)str limit:(int)lim; @end ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 26390 bytes Desc: not available URL: From kvddrift at earthlink.net Wed Aug 11 21:00:07 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 11 Aug 2004 21:00:07 -0400 Subject: [Biococoa-dev] Base design In-Reply-To: References: Message-ID: On Aug 11, 2004, at 12:07 PM, John Timmer wrote: > BCSequenceDNABase - > An abstract superclass, allows for easy testing of a sequence for DNA > content. > I am not sure what this class would do. Is it to represent a sequence of bases, or is it to represent one base? Maybe we should start with the abstract BCSequence class, and the implementation of the singleton pattern for symbols. After that we can write the various subclasses, etc. - Koen. From a.griekspoor at nki.nl Thu Aug 12 06:05:41 2004 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Thu, 12 Aug 2004 12:05:41 +0200 Subject: [Biococoa-dev] Base design In-Reply-To: References: Message-ID: <2AD8479E-EC47-11D8-8352-000393CFDE0C@nki.nl> Did you guys get my (very long) email in the end? I got a message dat the list moderator should approve it first because it was to long ;-) If not, I'll send it again, although Koen gave a nice summary already. I would say that our setup could match that of BioJava as follows: Symbol -> BCSymbol (which could be devided in BCAtomicSymbol, BCAmbigousSymbol/BCBasisSymbol) Alphabet -> BCAlphabet Symbollist -> BCSequence Alex Op 12-aug-04 om 3:00 heeft Koen van der Drift het volgende geschreven: > > On Aug 11, 2004, at 12:07 PM, John Timmer wrote: > >> BCSequenceDNABase - >> An abstract superclass, allows for easy testing of a sequence for DNA >> content. >> > > > I am not sure what this class would do. Is it to represent a sequence > of bases, or is it to represent one base? Maybe we should start with > the abstract BCSequence class, and the implementation of the singleton > pattern for symbols. After that we can write the various subclasses, > etc. > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From jtimmer at bellatlantic.net Thu Aug 12 08:31:20 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 12 Aug 2004 08:31:20 -0400 Subject: [Biococoa-dev] Base design In-Reply-To: Message-ID: > > I am not sure what this class would do. Is it to represent a sequence > of bases, or is it to represent one base? Maybe we should start with > the abstract BCSequence class, and the implementation of the singleton > pattern for symbols. After that we can write the various subclasses, > etc. Okay, I learned a couple of things by trying them out last night, and I've got a clearer idea of what I'm going to try to do. The basic ideas are two-fold: convenience and readable code. For convenience, I've created a sort of "super-header", BCSequenceDNABases.h. Importing this will in turn import all the individual base headers. BCSequenceDNABase is going to be the abstract class. The readability factor comes from being able to take any object and do: if ( [anObject isKindOfClass: [BCSequenceDNABase class]] ) etc. Instead of having to test for each individual class, or make assumptions about responding to selectors and such (incidentally, I'm assuming the runtime system handles generating unique class IDs - if I'm wrong, please let me know). I was going to leave the class more or less empty, but then I realize I actually have to fill it with empty methods in order to suppress compiler warnings. Given that I have to put methods there, I think I'm going to move all the singleton generation methods into the base class. I think (though I'm not positive), that it'll be simpler to call [BCSequenceDNABase adenine]; [BCSequenceDNABase thymidine]; Than having to call a different class for each base. I should have an example coded with ATCGN before the end of the day, at which point it'll be easier for you to tell me what I've done wrong ;). John _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Thu Aug 12 09:43:16 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 12 Aug 2004 15:43:16 +0200 Subject: [Biococoa-dev] Base design In-Reply-To: References: Message-ID: <902FD869-EC65-11D8-A053-000393CFDE0C@mekentosj.com> > The basic ideas are two-fold: convenience and readable code. For > convenience, I've created a sort of "super-header", > BCSequenceDNABases.h. > Importing this will in turn import all the individual base headers. Ok, got it now, that seems a good idea. > Given that I have to put methods there, I think I'm going to move all > the > singleton generation methods into the base class. I think (though I'm > not > positive), that it'll be simpler to call > [BCSequenceDNABase adenine]; > [BCSequenceDNABase thymidine]; > > Than having to call a different class for each base. I agree, that sounds really simple and elegant > I should have an example coded with ATCGN before the end of the day, at > which point it'll be easier for you to tell me what I've done wrong ;). Great! Looking forward to that John! > Yes, I did receive it, I'm just not convinced ;). I just think that > atomic > vs. ambiguous is a bit of a false distinction. 95% of the time, for > coding > convenience, you're going to want the two types of base to behave > identically - by this I mean that it's easier to have both types > respond to > the same selectors so that loops can just plow through an entire > sequence > without checking what base it's dealing with. > A lot of the remaining 5% mostly encompasses testing whether it > belongs to > atomic or ambiguous, which a single method will handle. The nice thing of two classes is that you can add specific behaviour in the ambigious one, and have the superclass contain all shared methods (which still allows most methods to plow through the complete sequence as the methods they rely one are inherited from the BCSymbol class). Still, the question really is "how many methods are there to put extra in?". If you look at BioJava I couldn't find any that quickly, so I guess you are right and that would not justify more complexity compared to one method like (BOOL)isAmbiguous; In fact if you do (NSArray *)getMatches; and it returns more than one you know already, but a boolean convenience method is well really convenient ;-) I guess it's a matter of taste in the end, and maybe there isn't a "one is better then the other" here. > If you could convince me otherwise (you have in the past - witness the > fact > that I'm writing base classes at all), i'll happily reconsider. Unless somebody else has a reason why two classes would be better then one, I'm not gonna. > PS - Overall, I think a lot of things in BioJava are great, since a > lot of > bright people have put serious thought into these things, but Java is a > statically typed language, and there's going to be some differences > based on > ObjC's flexibility. There's also going to be differences because the > foundation toolset makes different things easier in Cocoa than in Java. You're absolutely right, it's essential to leverage the advantage of Cocoa over those of Java, but it's nice to have a source of inspiration at hand. > The key thing will be recognizing when I'm doing things differently > just for the > sake of being different, so let me know if you think I'm doing that. That's the nice thing of being with more than one ;-) Cheers, Alex ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From jtimmer at bellatlantic.net Thu Aug 12 13:42:44 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 12 Aug 2004 13:42:44 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <902FD869-EC65-11D8-A053-000393CFDE0C@mekentosj.com> Message-ID: I've got a very busy afternoon/evening, so I won't be able to do anything much with these, but I thought I'd send them around. Currently, things compile without an error, but I haven't tried to start using the classes, so anything's possible. I think moving the arrays used in the anyBase class into permanent values might be a good idea, so the array's don't have to be re-created with every method call. Going any further's going to require a lot of unrewarding copying and pasting, plus updating the root and "any nucleotide" classes so it'll be slow (I'll force myself to do at least one a day). Some ideas for other methods: Have all BC classes that represent something implement a "savableRepresentation" method that returns its information in a format that can be stored in a dictionary/array file. Also - (BOOL) complementsBase: (BCSequenceDNABase *)entry; - (BOOL) representsBase: (BCSequenceDNABase *) entry; Would help with some search routines. The more we decide we want now, the less copying/pasting I'll have to do. Cheers, John _______________________________________________ This mind intentionally left blank -------------- next part -------------- A non-text attachment was scrubbed... Name: BCSequenceDNA.zip Type: application/octet-stream Size: 12839 bytes Desc: not available URL: From kvddrift at earthlink.net Thu Aug 12 18:50:56 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 12 Aug 2004 18:50:56 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: References: Message-ID: <1283A07E-ECB2-11D8-919D-003065A5FDCC@earthlink.net> John, Why do you have a separate class for each base? Idon;t think that's how a singleton is supposed to work (although I don't have an example ready now that shows you how it *should* work :-). You could just have a BCSequenceUnit (or whatever we decide to call it) class and create a base by eg passing only a name. All the info about each base can be stored in an external plist file. The same BCSequenceUnit class can be used to create other structural objects. This way most code for getting name, properties is written only once, instead repeated for each type of sequence unit. So for instance: BCSequenceUnit | | ----------------BCBase | | ----------------BCAminoAcid | | ----------------BCFunctionalGroup | | ----------etc I also suggest that BCSequenceDNA should be a subclass of BCSequence, just as BCBase /BCSequenceDNABase should be a subclass of BCSequenceUnit: BCSequence | | ----------------BCSequenceDNA | | ----------------BCSequenceProtein - Koen. From mek at mekentosj.com Thu Aug 12 19:28:14 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 13 Aug 2004 01:28:14 +0200 Subject: [Biococoa-dev] Base test In-Reply-To: <1283A07E-ECB2-11D8-919D-003065A5FDCC@earthlink.net> References: <1283A07E-ECB2-11D8-919D-003065A5FDCC@earthlink.net> Message-ID: <489AFFE5-ECB7-11D8-BB73-000393CFDE0C@mekentosj.com> I agree, this setup would indeed save a lot of work and keep things simpler. The plist with data is a great idea as well, in fact not much changes as you can still call [BCSequenceUnit adenosine] as a class method with would invoke BCSequence init and get all the rest filled in from the plist (which is easily extendible as well). If the requested symbol name is not found in the plist, an exception is raised or error generated and no instance returned. If we don't use this setup adding or updating a single method in the 60 classes we would get, quickly would become a nightmare. Could we agree on the following names? If anyone prefers another one, let it know, but it would be nice if we can decide what to take (saves a lot of typing ;-) BCSymbol (instead of BCSequenceUnit) | | ------------ BCNucleotide (instead of BCBase) | | ------------ BCAminoAcid | | ------------ etc (think BCSaccharide and alikes) I think BCFunctionalGroup should be a feature (like phosphates) and not a symbol, but perhaps I am wrong here. For sequences I would propose to take the names Koen already mentioned: BCSequence | | ----------------BCSequenceDNA | | ----------------BCSequenceProtein I think this all makes life a bit simpler to implement John. I like the addition of the proposed equality testing methods you mention: - (BOOL) complementsNucleotide: (BCNucleotide *)entry; (would be BCNucleotide specific) - (BOOL) representsSymbol: (BCSymbol *) entry; (do you mean here to test whether this base belongs to an ambiguous group?) And would like to add: - (BOOL) isEqualToSymbol: (BCSymbol *) entry; To implement "savability" we would just have to implement the NSCoding protocol, that allows you to save arrays/dictionaries to contain these objects. Although it would perhaps be nice if there was a method to output these to some kind of plist like format in addition. Finally, you don't need to implement the "superheader file" as a class, you can simply write an empty header file containing the links to the others leaving the rest empty (so you don't need the warning not to use the class). This would be enough: ******************************* // // BCSequenceDNABases.h // BioCocoa // // Created by John Timmer on 8/12/04. // Copyright 2004 John Timmer. All rights reserved. // #import "BCSequenceDNABase.h" #import "BCSequenceDNAAdenine.h" #import "BCSequenceDNAThymidine.h" #import "BCSequenceDNACytidine.h" #import "BCSequenceDNAGuanidine.h" #import "BCSequenceDNAAnyBase.h" ******************************* What do you think John, would the setup proposed by Koen be easy to implement? I'm curious what you think. Cheers, Alex Op 13-aug-04 om 0:50 heeft Koen van der Drift het volgende geschreven: > John, > > Why do you have a separate class for each base? Idon;t think that's > how a singleton is supposed to work (although I don't have an example > ready now that shows you how it *should* work :-). You could just > have a BCSequenceUnit (or whatever we decide to call it) class and > create a base by eg passing only a name. All the info about each base > can be stored in an external plist file. The same BCSequenceUnit class > can be used to create other structural objects. This way most code > for getting name, properties is written only once, instead repeated > for each type of sequence unit. > > > So for instance: > > BCSequenceUnit > | > | > ----------------BCBase > | > | > ----------------BCAminoAcid > | > | > ----------------BCFunctionalGroup > | > | > ----------etc > > > I also suggest that BCSequenceDNA should be a subclass of BCSequence, > just as BCBase /BCSequenceDNABase should be a subclass of > BCSequenceUnit: > > BCSequence > | > | > ----------------BCSequenceDNA > | > | > ----------------BCSequenceProtein > > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 5235 bytes Desc: not available URL: From peter.schols at bio.kuleuven.ac.be Fri Aug 13 03:20:20 2004 From: peter.schols at bio.kuleuven.ac.be (Peter Schols) Date: Fri, 13 Aug 2004 09:20:20 +0200 Subject: [Biococoa-dev] Base test In-Reply-To: <1283A07E-ECB2-11D8-919D-003065A5FDCC@earthlink.net> References: <1283A07E-ECB2-11D8-919D-003065A5FDCC@earthlink.net> Message-ID: <3C2DB4A0-ECF9-11D8-9A4C-003065D0AD9E@bio.kuleuven.ac.be> Hi Koen and others, This seems like a very good architecture to me. The naming scheme makes sense as well, so maybe - as Alexander is suggesting - we should try to agree on this one. My only concern at this point is that a new object will be created for every base. This might lead to performance problems with huge datasets (e.g. entire genomes). On the other hand, the other Bio frameworks seem to use a similar implementation so if it works in Java, it should definitely work in Cocoa ;-) peter On 13 Aug 2004, at 00:50, Koen van der Drift wrote: > John, > > Why do you have a separate class for each base? Idon;t think that's > how a singleton is supposed to work (although I don't have an example > ready now that shows you how it *should* work :-). You could just > have a BCSequenceUnit (or whatever we decide to call it) class and > create a base by eg passing only a name. All the info about each base > can be stored in an external plist file. The same BCSequenceUnit class > can be used to create other structural objects. This way most code > for getting name, properties is written only once, instead repeated > for each type of sequence unit. > > > So for instance: > > BCSequenceUnit > | > | > ----------------BCBase > | > | > ----------------BCAminoAcid > | > | > ----------------BCFunctionalGroup > | > | > ----------etc > > > I also suggest that BCSequenceDNA should be a subclass of BCSequence, > just as BCBase /BCSequenceDNABase should be a subclass of > BCSequenceUnit: > > BCSequence > | > | > ----------------BCSequenceDNA > | > | > ----------------BCSequenceProtein > > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev From peter.schols at bio.kuleuven.ac.be Fri Aug 13 03:57:49 2004 From: peter.schols at bio.kuleuven.ac.be (Peter Schols) Date: Fri, 13 Aug 2004 09:57:49 +0200 Subject: [Biococoa-dev] Base test In-Reply-To: References: <1283A07E-ECB2-11D8-919D-003065A5FDCC@earthlink.net> <3C2DB4A0-ECF9-11D8-9A4C-003065D0AD9E@bio.kuleuven.ac.be> Message-ID: <78CCC86C-ECFE-11D8-9A4C-003065D0AD9E@bio.kuleuven.ac.be> Hi Alex, Oops, it seems like I've misinterpreted the meaning of a singleton (although I've used it a few times myself in my own apps). I'm just now reading through the backlog of list messages as we are currently rebuilding our house and moving everything to it. Anyway, thanks for the link, this removes my only concern ;-) peter On 13 Aug 2004, at 09:45, Alexander Griekspoor wrote: > Peter, > > Just to recapitulate, the BCSymbol classes are singleton objects, > meaning that they are instantiated only once no matter how many time > you ask for one. Everytime you call for a BCSymbol you get the same > instance back. At the end the only thing you are storing is a single > pointer instead of an object. This is the same way Cocoa's > sharedDefaultManager works and pretty much every object that has > "shared" as a prefix. The other bio frameworks have tackled the > problem using the same approach. Check out this link: > http://www.biojava.org/tutorials/chap1.html (scroll to "Doesn't this > waste all memory?" for an explanation. > > Alex > > Op 13-aug-04 om 9:20 heeft Peter Schols het volgende geschreven: > >> Hi Koen and others, >> >> This seems like a very good architecture to me. >> The naming scheme makes sense as well, so maybe - as Alexander is >> suggesting - we should try to agree on this one. >> My only concern at this point is that a new object will be created >> for every base. This might lead to performance problems with huge >> datasets (e.g. entire genomes). On the other hand, the other Bio >> frameworks seem to use a similar implementation so if it works in >> Java, it should definitely work in Cocoa ;-) >> >> peter >> >> On 13 Aug 2004, at 00:50, Koen van der Drift wrote: >> >>> John, >>> >>> Why do you have a separate class for each base? Idon;t think that's >>> how a singleton is supposed to work (although I don't have an >>> example ready now that shows you how it *should* work :-). You >>> could just have a BCSequenceUnit (or whatever we decide to call it) >>> class and create a base by eg passing only a name. All the info >>> about each base can be stored in an external plist file. The same >>> BCSequenceUnit class can be used to create other structural objects. >>> This way most code for getting name, properties is written only >>> once, instead repeated for each type of sequence unit. >>> >>> >>> So for instance: >>> >>> BCSequenceUnit >>> | >>> | >>> ----------------BCBase >>> | >>> | >>> ----------------BCAminoAcid >>> | >>> | >>> ----------------BCFunctionalGroup >>> | >>> | >>> ----------etc >>> >>> >>> I also suggest that BCSequenceDNA should be a subclass of >>> BCSequence, just as BCBase /BCSequenceDNABase should be a subclass >>> of BCSequenceUnit: >>> >>> BCSequence >>> | >>> | >>> ----------------BCSequenceDNA >>> | >>> | >>> ----------------BCSequenceProtein >>> >>> >>> - Koen. >>> >>> _______________________________________________ >>> Biococoa-dev mailing list >>> Biococoa-dev at bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> >> > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > Mac vs Windows > 65 million years ago, there were more > dinosaurs than humans. > Where are the dinosaurs now? > > ********************************************************* > From jtimmer at bellatlantic.net Fri Aug 13 10:57:27 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 13 Aug 2004 10:57:27 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <1283A07E-ECB2-11D8-919D-003065A5FDCC@earthlink.net> Message-ID: This is the first of two messages, since I think we need to split things into separate topics. > Why do you have a separate class for each base? Idon;t think that's how > a singleton is supposed to work (although I don't have an example ready > now that shows you how it *should* work :-). You could just have a > BCSequenceUnit (or whatever we decide to call it) class and create a > base by eg passing only a name. All the info about each base can be > stored in an external plist file. The same BCSequenceUnit class can be > used to create other structural objects. This way most code for > getting name, properties is written only once, instead repeated for > each type of sequence unit. Okay, the singleton aspect comes from the fact that every reference to Adenine anywhere in the program points to the same object. I'm sure there's other ways to define singletons (some of which may be more technically correct), but I think this is a reasonably valid one. Just a general technical note on the sequenceunit idea if we decide to go that way - we'd have to split nucleotides and everything else somewhere in the inheritance chain, since they have the concept of a complement that pretty much everything else lacks. I'm intrigued by the idea of a template .plist file, since it would definitely save a ton of work. The problem I see with it is that its really difficult to store a reference to another base. I think it would require all lookups of other bases to be done by symbol, since you can't store a selector name as a string and then call a class method with it. That means that every time we need to complement a base, find the atomic base representations, etc., we have to run through a switch/case with about 20 possibilities, in many cases doing so several times: + (id) baseForSymbol: (unichar)symbol { switch ( symbol ) { case 'A' : { return [BCSequenceDNABase adenine]; break; } case 'T' : { return [BCSequenceDNABase thymidine]; break; } case 'C' : { return [BCSequenceDNABase cytidine]; break; } case 'G' : { return [BCSequenceDNABase guanidine]; break; } case 'N' : { return [BCSequenceDNABase anyBase]; break; } // still have to add ambiguous bases, gaps, and a non-base holder default : return nil; } } I think it's a question of work up front vs. efficiency in use, but I could have missed some way of implementing this which gets around a lookup table. On the plus side, it's pointed out a serious deficiency in something I was thinking of doing. During base initialization, I was going to create the arrays of complements and representations. This means every time the first base was created, it would necessarily create its complement, all relevant ambiguous bases, etc. What I neglected to consider is that those would, in turn, try to get a reference to the original base, which would still be in the process of initialization. Ugly. I think I'll be moving array creation into the method that needs the array the first time it's called... > Finally, you don't need to implement the "superheader file" as a class, you > can simply write an empty header file containing the links to the others > leaving the rest empty (so you don't need the warning not to use the class). > This would be enough: Yeah, I was just having some trouble with Xcode not seeing the superheader file at all, so I figured I was doing something wrong, rather than Xcode misbehaving. Re-creating the header/.m file combination fixed it, and I didn't want to risk changing things again while it was working. JT _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Fri Aug 13 11:32:08 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 13 Aug 2004 11:32:08 -0400 Subject: [Biococoa-dev] Nomenclature In-Reply-To: <489AFFE5-ECB7-11D8-BB73-000393CFDE0C@mekentosj.com> Message-ID: > > Could we agree on the following names? If anyone prefers another one, let it > know, but it would be nice if we can decide what to take (saves a lot of > typing ;-) > > BCSymbol (instead of BCSequenceUnit) > | > | > ------------ BCNucleotide (instead of BCBase) > | > | > ------------ BCAminoAcid > | > | > ------------ etc (think BCSaccharide and alikes) > > I think BCFunctionalGroup should be a feature (like phosphates) and not a > symbol, but perhaps I am wrong here. > > For sequences I would propose to take the names Koen already mentioned: > > BCSequence > | > | > ----------------BCSequenceDNA > | > | > ----------------BCSequenceProtein > Okay, the BCSequenceDNA was just a first stab at thinking of the methods we?d want. Once I filled them in, I?d think about which ones would work generally across all sequences, and move them to the superclass, which I haven?t defined yet. What you saw was the twistings of my mind in action ;) That said, mentally I was reserving BCSequence as the wrapper for a type of sequence and all the additions ? features, format conversion, etc. I was thinking more along the lines of: BCSequenceGeneric | BCSequence | Contains an | Instance of -> ----------------BCSequenceDNA | | ----------------BCSequenceProtein > That said, it would violate the naming conventions I?d argued for, so I hereby reject my own thinking. Given that, what do we call the wrapper object on the left then? > > > I think this all makes life a bit simpler to implement John. I like the > addition of the proposed equality testing methods you mention: > - (BOOL) complementsNucleotide: (BCNucleotide *)entry; (would be BCNucleotide > specific) > - (BOOL) representsSymbol: (BCSymbol *) entry; (do you mean here to test > whether this base belongs to an ambiguous group?) > And would like to add: > - (BOOL) isEqualToSymbol: (BCSymbol *) entry; > Wouldn?t NSObject?s ?isEqual? method work just as well here? And the second method (which I wrote this morning) looks like: - (BOOL) representsBase: (BCSequenceDNABase *) entry { if ( [[self matches] containsObject: entry] ) return YES; return NO; } > It works if you have an ambiguous base as self, and get handed another base. JT _______________________________________________ This mind intentionally left blank -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtimmer at bellatlantic.net Fri Aug 13 11:41:16 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 13 Aug 2004 11:41:16 -0400 Subject: [Biococoa-dev] Oh, one other thing In-Reply-To: <489AFFE5-ECB7-11D8-BB73-000393CFDE0C@mekentosj.com> Message-ID: > To implement "savability" we would just have to implement the NSCoding > protocol, that allows you to save arrays/dictionaries to contain these > objects. Although it would perhaps be nice if there was a method to output > these to some kind of plist like format in addition. Actually, I?d argue against having this be the primary way we store things. Coding is a way of storing objects, and is only useful to other Cocoa programs. What I?d like to create is a format that?s also useful outside of BioCocoa, so that it?s useful on other platforms, using other frameworks, in other programming languages, etc. In the end, pretty much everything else out there views sequence information as a string of single-character symbols ? there?s no reason our primary storage mechanism shouldn?t be the same way, to ensure maximal interoperability. John By the way, I notice I?m presenting a lot of arguments for or against things. I want to make sure that those who do not have English as their primary language recognize that when I use the term ?argue? I mean it in the sense of presenting a compelling reason, rather than the sense of getting in an upset yelling match ;) _______________________________________________ This mind intentionally left blank -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtimmer at bellatlantic.net Fri Aug 13 13:39:44 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 13 Aug 2004 13:39:44 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <569E8BBA-ED4D-11D8-BB73-000393CFDE0C@mekentosj.com> Message-ID: I was aware of the foundation functions that take strings for arguments and return selectors/classes. Unfortunately, if we?re going to use singletons, we?d need to call Class methods to create them, not instance methods, and I?m not aware of any mechanism for calling a selector on a Class. One option that has occurred to me is to create a generic ?baseGenerator?, which you could instantiate, and then call the selectors on in order to generate all the classes. Not ideal, but functional, and the implementation would be hidden from the end users, so they?d have to download the code in order to discover how inelegant we are ;) JT _______________________________________________ This mind intentionally left blank -------------- next part -------------- An HTML attachment was scrubbed... URL: From mek at mekentosj.com Fri Aug 13 13:41:29 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 13 Aug 2004 19:41:29 +0200 Subject: [Biococoa-dev] RE: Nomenclature Message-ID: <01FE463B-ED50-11D8-BB73-000393CFDE0C@mekentosj.com> Part 2: > Okay, the BCSequenceDNA was just a first stab at thinking of the > methods we?d want. ?Once I filled them in, I?d think about which ones > would work generally across all sequences, and move them to the > superclass, which I haven?t defined yet. ?What you saw was the > twistings of my mind in action ;) I know how that goes when you're coding.... ;-) Don't take a look at some of my methods, often I have a lot of renaming to do for the sake of being a bit more logical... > That said, mentally I was reserving BCSequence as the wrapper for a > type of sequence and all the additions ? features, format conversion, > etc. ?I was thinking more along the lines of: > > ???????????????????????BCSequenceGeneric > ????????????????????????| > BCSequence ?????| > Contains an ??????| > Instance of -> ??----------------BCSequenceDNA > ????????????????????????| > ????????????????????????| > ????????????????????????----------------BCSequenceProtein > > > > That said, it would violate the naming conventions I?d argued for, so > I hereby reject my own thinking. ?Given that, what do we call the > wrapper object on the left then? I was thinking of the following two options, but perhaps someone comes up with something must better. BCRecord - Based on a record from a nucleotide database like entrez. (BCEntry would be another variant). Advantage: familiar setup (sequence, name, features, etc). Disadvantage: sometimes it's a bit strange to call things a record, like if you're cloning for instance, your not messing around with BCRecords. But perhaps in this case you are busy with BCFragments, derived from a BCRecords so it's not that bad after all. BCEntity - My favorite. BCUnit is to boring, but a more general, "building block of our framework" name would perhaps be apporiate here. Anyway, you might be laughing at it right now... > I think this all makes life a bit simpler to implement John. I like > the addition of the proposed equality testing methods you mention: > - (BOOL) complementsNucleotide: (BCNucleotide *)entry; (would be > BCNucleotide specific) > - (BOOL) representsSymbol: (BCSymbol *) entry; (do you mean here to > test whether this base belongs to an ambiguous group?) > And would like to add: > - (BOOL) isEqualToSymbol: (BCSymbol *) entry; > > > Wouldn?t NSObject?s ?isEqual? method work just as well here? I guess you are right because we use singleton objects. > > And the second method (which I wrote this morning) looks like: > - (BOOL) representsBase: (BCSequenceDNABase *) entry { > ????if ( [[self matches] containsObject: entry] ) > ????????return YES; > ????return NO; ??? > } Really elegant! > It works if you have an ambiguous base as self, and get handed another > base. In fact that works with every base, also the atomic ones as their -(NSArray *)matches; method should return only one entry: self. Almost there..... A. ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 4949 bytes Desc: not available URL: From mek at mekentosj.com Fri Aug 13 13:42:12 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 13 Aug 2004 19:42:12 +0200 Subject: [Biococoa-dev] Oh, one other thing In-Reply-To: References: Message-ID: <1BF879FA-ED50-11D8-BB73-000393CFDE0C@mekentosj.com> I fully agree on this John, still I would plea for implementing the NSCoding protocol as well, as it makes sure that our objects are storable as objects as well, often this makes it easier to save if you're not interested in exchange. In addition, it brings compatibility with a number of other Cocoa methods as a bonus. > Actually, I?d argue against having this be the primary way we store > things. ?Coding is a way of storing objects, and is only useful to > other Cocoa programs. ?What I?d like to create is a format that?s also > useful outside of BioCocoa, so that it?s useful on other platforms, > using other frameworks, in other programming languages, etc. For the primary way of storing things we indeed need some way of text like storing. I would strongly argue in favour of using XML for this as it is (considered to be) the future. That would also play nicely with CoreData and storage in databases. > By the way, I notice I?m presenting a lot of arguments for or against > things. ?I want to make sure that those who do not have English as > their primary language recognize that when I use the term ?argue? I > mean it in the sense of presenting a compelling reason, rather than > the sense of ?getting in an upset yelling match ;) I would like to copy this especially as I am not a native speaker, which could lead even easier to misunderstanding of my sentences (that in addition to the fact as I'm know for writing remarks that often sound cynical ;-) I would like just like to add that I very much enjoy our discussions we have, in particular that everyone can say whatever he likes or doesn't like about the discussion points brought up. It's nice to be part of the enthusiastic team we have seen so far! Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 2669 bytes Desc: not available URL: From james.balhoff at duke.edu Fri Aug 13 14:34:35 2004 From: james.balhoff at duke.edu (Jim Balhoff) Date: Fri, 13 Aug 2004 14:34:35 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: References: Message-ID: <6D619D19-ED57-11D8-90CF-000A95EECBDE@duke.edu> Hi John, On Aug 13, 2004, at 1:39 PM, John Timmer wrote: > I was aware of the foundation functions that take strings for > arguments and return selectors/classes. ?Unfortunately, if we?re going > to use singletons, we?d need to call Class methods to create them, not > instance methods, and I?m not aware of any mechanism for calling a > selector on a Class. > I'm not exactly sure what you mean here - are you asking how to call a class method? These are just defined with a "+" instead of a "-" in the header. So you would just send a message to the class itself (no need to allocate it or anything). Like: BCNucleotide *myG = [BCNucleotide gNucleotide]; The BCNucleotide class method would either allocate a new G instance, or return one it already made. Sorry I haven't been participating much in the design discussions - I'm trying to write a dissertation here! I am hoping to have lots of time for coding in a couple of months. - Jim ____________________________________________ James P. Balhoff Dept. of Biology Duke University Durham, NC 27708-0338 USA -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2373 bytes Desc: not available URL: From mek at mekentosj.com Fri Aug 13 14:38:23 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 13 Aug 2004 20:38:23 +0200 Subject: [Biococoa-dev] Base test Message-ID: > I was aware of the foundation functions that take strings for > arguments and return selectors/classes. ?Unfortunately, if we?re going > to use singletons, we?d need to call Class methods to create them, not > instance methods, and I?m not aware of any mechanism for calling a > selector on a Class. Oops, see the problem now. I can come quite far by getting the class method's location: NSString *str = @"callMe"; // example string, normally from plist class_getClassMethod([BCSymbol class], selector); // this should get you a pointer to the method The big problem here and where I get stuck is how to "invoke" this method on the class? Here is where my knowledge and background let me down ;-( > One option that has occurred to me is to create a generic > ?baseGenerator?, which you could instantiate, and then call the > selectors on in order to generate all the classes. ?Not ideal, but > functional, and the implementation would be hidden from the end users, > so they?d have to download the code in order to discover how inelegant > we are ;) I guess that would be the best option then, a factory object. Anyone a brilliant idea? Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 2027 bytes Desc: not available URL: From mek at mekentosj.com Fri Aug 13 14:46:35 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 13 Aug 2004 20:46:35 +0200 Subject: [Biococoa-dev] Base test In-Reply-To: <6D619D19-ED57-11D8-90CF-000A95EECBDE@duke.edu> References: <6D619D19-ED57-11D8-90CF-000A95EECBDE@duke.edu> Message-ID: <1A7F6CB2-ED59-11D8-BB73-000393CFDE0C@mekentosj.com> Well, the problem really is that we get the name of the method from a plist (the plist would contain all the complementary bases), so as a NSString. The trick is how to convert the name into a selector that we can call. Another complicating factor is that the performSelector: method is a NSObject instance method, so won't work on a class. I have found an ObjC function that gets you to the method location (it works because in my simple test it did not return NULL). But how does the ObjC runtime invokes a class method? My knowledge on the runtime architecture is clearly insufficient here... Alex Ps. I could post this on the cocoa-dev list of course, good idea? Op 13-aug-04 om 20:34 heeft Jim Balhoff het volgende geschreven: > Hi John, > > On Aug 13, 2004, at 1:39 PM, John Timmer wrote: > >> I was aware of the foundation functions that take strings for >> arguments and return selectors/classes. ?Unfortunately, if we?re >> going to use singletons, we?d need to call Class methods to create >> them, not instance methods, and I?m not aware of any mechanism for >> calling a selector on a Class. >> > > I'm not exactly sure what you mean here - are you asking how to call a > class method? These are just defined with a "+" instead of a "-" in > the header. So you would just send a message to the class itself (no > need to allocate it or anything). Like: > > BCNucleotide *myG = [BCNucleotide gNucleotide]; > > The BCNucleotide class method would either allocate a new G instance, > or return one it already made. > > Sorry I haven't been participating much in the design discussions - > I'm trying to write a dissertation here! I am hoping to have lots of > time for coding in a couple of months. > > - Jim > > ____________________________________________ > James P. Balhoff > Dept. of Biology > Duke University > Durham, NC 27708-0338 > USA > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From james.balhoff at duke.edu Fri Aug 13 14:51:21 2004 From: james.balhoff at duke.edu (Jim Balhoff) Date: Fri, 13 Aug 2004 14:51:21 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <6D619D19-ED57-11D8-90CF-000A95EECBDE@duke.edu> References: <6D619D19-ED57-11D8-90CF-000A95EECBDE@duke.edu> Message-ID: Sorry to reply to my own message - I just read the earlier stuff more closely! My apologies for completely misunderstanding. I'll think about this some more, and probably go back to my writing... - Jim On Aug 13, 2004, at 2:34 PM, Jim Balhoff wrote: > Hi John, > > On Aug 13, 2004, at 1:39 PM, John Timmer wrote: > >> I was aware of the foundation functions that take strings for >> arguments and return selectors/classes. ?Unfortunately, if we?re >> going to use singletons, we?d need to call Class methods to create >> them, not instance methods, and I?m not aware of any mechanism for >> calling a selector on a Class. >> > > I'm not exactly sure what you mean here - are you asking how to call a > class method? These are just defined with a "+" instead of a "-" in > the header. So you would just send a message to the class itself (no > need to allocate it or anything). Like: > > BCNucleotide *myG = [BCNucleotide gNucleotide]; > > The BCNucleotide class method would either allocate a new G instance, > or return one it already made. > > Sorry I haven't been participating much in the design discussions - > I'm trying to write a dissertation here! I am hoping to have lots of > time for coding in a couple of months. > > - Jim > > ____________________________________________ > James P. Balhoff > Dept. of Biology > Duke University > Durham, NC 27708-0338 > USA > > -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2373 bytes Desc: not available URL: From james.balhoff at duke.edu Fri Aug 13 15:03:48 2004 From: james.balhoff at duke.edu (Jim Balhoff) Date: Fri, 13 Aug 2004 15:03:48 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: References: Message-ID: <81EFF6AD-ED5B-11D8-90CF-000A95EECBDE@duke.edu> On Aug 13, 2004, at 2:38 PM, Alexander Griekspoor wrote: >> I was aware of the foundation functions that take strings for >> arguments and return selectors/classes. ?Unfortunately, if we?re >> going to use singletons, we?d need to call Class methods to create >> them, not instance methods, and I?m not aware of any mechanism for >> calling a selector on a Class. > > Oops, see the problem now. I can come quite far by getting the class > method's location: > NSString *str = @"callMe"; // example string, normally from plist > class_getClassMethod([BCSymbol class], selector); // this should > get you a pointer to the method > > The big problem here and where I get stuck is how to "invoke" this > method on the class? > Here is where my knowledge and background let me down ;-( > > Could you use this function? id objc_msgSend(id theReceiver, SEL theSelector, ...) A classes are objects, it seems like it would work. ____________________________________________ James P. Balhoff Dept. of Biology Duke University Durham, NC 27708-0338 USA -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2373 bytes Desc: not available URL: From mek at mekentosj.com Fri Aug 13 15:15:03 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 13 Aug 2004 21:15:03 +0200 Subject: [Biococoa-dev] Base test In-Reply-To: <81EFF6AD-ED5B-11D8-90CF-000A95EECBDE@duke.edu> References: <81EFF6AD-ED5B-11D8-90CF-000A95EECBDE@duke.edu> Message-ID: <143C95D0-ED5D-11D8-8E60-000393CFDE0C@mekentosj.com> > A classes are objects, it seems like it would work. I'm not so sure about that, but probably you are right. Anyway, the description says: objc_msgSend Sends a message with a simple return value to an instance of a class. So I'm afraid it doesn't work on the class itself. I couldn't get it to work either, but perhaps I'm doing something else wrong here... Cheers, Alex > > > ____________________________________________ > James P. Balhoff > Dept. of Biology > Duke University > Durham, NC 27708-0338 > USA > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 1464 bytes Desc: not available URL: From james.balhoff at duke.edu Fri Aug 13 15:22:16 2004 From: james.balhoff at duke.edu (Jim Balhoff) Date: Fri, 13 Aug 2004 15:22:16 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <143C95D0-ED5D-11D8-8E60-000393CFDE0C@mekentosj.com> References: <81EFF6AD-ED5B-11D8-90CF-000A95EECBDE@duke.edu> <143C95D0-ED5D-11D8-8E60-000393CFDE0C@mekentosj.com> Message-ID: <16592168-ED5E-11D8-90CF-000A95EECBDE@duke.edu> On Aug 13, 2004, at 3:15 PM, Alexander Griekspoor wrote: >> A classes are objects, it seems like it would work. > > I'm not so sure about that, but probably you are right. Anyway, the > description says: > > objc_msgSend > > Sends a message with a simple return value to an instance of a class. > > So I'm afraid it doesn't work on the class itself. I couldn't get it > to work either, but perhaps I'm doing something else wrong here... > > It works, try this: ******************************************main.m************************ ******** #import #import "MyClass.h" int main (int argc, const char * argv[]) { NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init]; SEL message = NSSelectorFromString(@"giveMeAnA"); NSString *result = objc_msgSend([MyClass class], message); NSLog(result); [pool release]; return 0; } ******************************************main.m************************ ******** ******************************************MyClass.h********************* *********** #import @interface MyClass : NSObject { } + (NSString *)giveMeAnA; @end ******************************************MyClass.h********************* *********** ******************************************MyClass.m********************* *********** #import "MyClass.h" @implementation MyClass + (NSString *)giveMeAnA { return @"A"; } @end ******************************************MyClass.m********************* *********** -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2373 bytes Desc: not available URL: From mek at mekentosj.com Fri Aug 13 15:26:24 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 13 Aug 2004 21:26:24 +0200 Subject: [Biococoa-dev] Base test In-Reply-To: <16592168-ED5E-11D8-90CF-000A95EECBDE@duke.edu> References: <81EFF6AD-ED5B-11D8-90CF-000A95EECBDE@duke.edu> <143C95D0-ED5D-11D8-8E60-000393CFDE0C@mekentosj.com> <16592168-ED5E-11D8-90CF-000A95EECBDE@duke.edu> Message-ID: Allright! I don't get it why it didn't work in my example as I did exactly the same !?! Anyway, it works in my example now as well. Guess that's the solution we need John..... Thanks Jim! Op 13-aug-04 om 21:22 heeft Jim Balhoff het volgende geschreven: > On Aug 13, 2004, at 3:15 PM, Alexander Griekspoor wrote: > >>> A classes are objects, it seems like it would work. >> >> I'm not so sure about that, but probably you are right. Anyway, the >> description says: >> >> objc_msgSend >> >> Sends a message with a simple return value to an instance of a class. >> >> So I'm afraid it doesn't work on the class itself. I couldn't get it >> to work either, but perhaps I'm doing something else wrong here... >> >> > > It works, try this: > > ******************************************main.m*********************** > ********* > #import > #import "MyClass.h" > > int main (int argc, const char * argv[]) { > NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init]; > SEL message = NSSelectorFromString(@"giveMeAnA"); > NSString *result = objc_msgSend([MyClass class], message); > NSLog(result); > [pool release]; > return 0; > } > ******************************************main.m*********************** > ********* > > ******************************************MyClass.h******************** > ************ > #import > > > @interface MyClass : NSObject { > > } > > + (NSString *)giveMeAnA; > > @end > ******************************************MyClass.h******************** > ************ > > ******************************************MyClass.m******************** > ************ > #import "MyClass.h" > > > @implementation MyClass > > + (NSString *)giveMeAnA > { > return @"A"; > } > > @end > ******************************************MyClass.m******************** > ************ ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* From james.balhoff at duke.edu Fri Aug 13 15:30:45 2004 From: james.balhoff at duke.edu (Jim Balhoff) Date: Fri, 13 Aug 2004 15:30:45 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <16592168-ED5E-11D8-90CF-000A95EECBDE@duke.edu> References: <81EFF6AD-ED5B-11D8-90CF-000A95EECBDE@duke.edu> <143C95D0-ED5D-11D8-8E60-000393CFDE0C@mekentosj.com> <16592168-ED5E-11D8-90CF-000A95EECBDE@duke.edu> Message-ID: <45BEFA87-ED5F-11D8-90CF-000A95EECBDE@duke.edu> On Aug 13, 2004, at 3:22 PM, Jim Balhoff wrote: > On Aug 13, 2004, at 3:15 PM, Alexander Griekspoor wrote: > >>> A classes are objects, it seems like it would work. >> >> I'm not so sure about that, but probably you are right. Anyway, the >> description says: >> >> objc_msgSend >> >> Sends a message with a simple return value to an instance of a class. >> >> So I'm afraid it doesn't work on the class itself. I couldn't get it >> to work either, but perhaps I'm doing something else wrong here... >> >> > > It works, try this: > Actually you can just send the performSelector: message to the class. I wasn't sure if class objects conformed to the NSObject protocol, but if you try sending the message [MyClass performSelector:message], you get the same result as the objc_msgSend() stuff. - Jim ____________________________________________ James P. Balhoff Dept. of Biology Duke University Durham, NC 27708-0338 USA -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2373 bytes Desc: not available URL: From mek at mekentosj.com Fri Aug 13 15:34:48 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 13 Aug 2004 21:34:48 +0200 Subject: [Biococoa-dev] Base test In-Reply-To: <45BEFA87-ED5F-11D8-90CF-000A95EECBDE@duke.edu> References: <81EFF6AD-ED5B-11D8-90CF-000A95EECBDE@duke.edu> <143C95D0-ED5D-11D8-8E60-000393CFDE0C@mekentosj.com> <16592168-ED5E-11D8-90CF-000A95EECBDE@duke.edu> <45BEFA87-ED5F-11D8-90CF-000A95EECBDE@duke.edu> Message-ID: That's even nicer! So should we file a bug against the documentation? :-) A. Op 13-aug-04 om 21:30 heeft Jim Balhoff het volgende geschreven: > > On Aug 13, 2004, at 3:22 PM, Jim Balhoff wrote: > >> On Aug 13, 2004, at 3:15 PM, Alexander Griekspoor wrote: >> >>>> A classes are objects, it seems like it would work. >>> >>> I'm not so sure about that, but probably you are right. Anyway, the >>> description says: >>> >>> objc_msgSend >>> >>> Sends a message with a simple return value to an instance of a class. >>> >>> So I'm afraid it doesn't work on the class itself. I couldn't get it >>> to work either, but perhaps I'm doing something else wrong here... >>> >>> >> >> It works, try this: >> > > Actually you can just send the performSelector: message to the class. > I wasn't sure if class objects conformed to the NSObject protocol, but > if you try sending the message [MyClass performSelector:message], you > get the same result as the objc_msgSend() stuff. > > - Jim > > ____________________________________________ > James P. Balhoff > Dept. of Biology > Duke University > Durham, NC 27708-0338 > USA > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From james.balhoff at duke.edu Fri Aug 13 15:48:07 2004 From: james.balhoff at duke.edu (Jim Balhoff) Date: Fri, 13 Aug 2004 15:48:07 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: References: <81EFF6AD-ED5B-11D8-90CF-000A95EECBDE@duke.edu> <143C95D0-ED5D-11D8-8E60-000393CFDE0C@mekentosj.com> <16592168-ED5E-11D8-90CF-000A95EECBDE@duke.edu> <45BEFA87-ED5F-11D8-90CF-000A95EECBDE@duke.edu> Message-ID: On Aug 13, 2004, at 3:34 PM, Alexander Griekspoor wrote: > That's even nicer! So should we file a bug against the documentation? > :-) > A. > I don't know - maybe the Cocoa-dev list could help with this. There is a type, Class, and classes are objects, according to the Objective-C manual. But I can't find anywhere that says it inherits from NSObject (I guess that would be impossible) or conforms to the NSObject protocol. So I'm not sure how you know what messages you can send to it as an object. It would be nice to have a better understanding before relying too heavily on the result of performSelector. I have been using Python lately, and this sort of stuff is really easy there! Classes are objects, methods are objects, and all can be gotten from strings... ____________________________________________ James P. Balhoff Dept. of Biology Duke University Durham, NC 27708-0338 USA -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2373 bytes Desc: not available URL: From mek at mekentosj.com Fri Aug 13 15:50:35 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 13 Aug 2004 21:50:35 +0200 Subject: [Biococoa-dev] Base test In-Reply-To: References: <81EFF6AD-ED5B-11D8-90CF-000A95EECBDE@duke.edu> <143C95D0-ED5D-11D8-8E60-000393CFDE0C@mekentosj.com> <16592168-ED5E-11D8-90CF-000A95EECBDE@duke.edu> <45BEFA87-ED5F-11D8-90CF-000A95EECBDE@duke.edu> Message-ID: <0AF7E385-ED62-11D8-8E60-000393CFDE0C@mekentosj.com> That was exactly my thought! It's a pity that cocoamamasam has gone bananas, but I will send a message tomorrow to the list... I'll keep you informed... Alex Op 13-aug-04 om 21:48 heeft Jim Balhoff het volgende geschreven: > On Aug 13, 2004, at 3:34 PM, Alexander Griekspoor wrote: > >> That's even nicer! So should we file a bug against the documentation? >> :-) >> A. >> > > I don't know - maybe the Cocoa-dev list could help with this. There > is a type, Class, and classes are objects, according to the > Objective-C manual. But I can't find anywhere that says it inherits > from NSObject (I guess that would be impossible) or conforms to the > NSObject protocol. So I'm not sure how you know what messages you can > send to it as an object. It would be nice to have a better > understanding before relying too heavily on the result of > performSelector. > > I have been using Python lately, and this sort of stuff is really easy > there! Classes are objects, methods are objects, and all can be > gotten from strings... > > ____________________________________________ > James P. Balhoff > Dept. of Biology > Duke University > Durham, NC 27708-0338 > USA > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From james.balhoff at duke.edu Fri Aug 13 16:07:55 2004 From: james.balhoff at duke.edu (Jim Balhoff) Date: Fri, 13 Aug 2004 16:07:55 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <0AF7E385-ED62-11D8-8E60-000393CFDE0C@mekentosj.com> References: <81EFF6AD-ED5B-11D8-90CF-000A95EECBDE@duke.edu> <143C95D0-ED5D-11D8-8E60-000393CFDE0C@mekentosj.com> <16592168-ED5E-11D8-90CF-000A95EECBDE@duke.edu> <45BEFA87-ED5F-11D8-90CF-000A95EECBDE@duke.edu> <0AF7E385-ED62-11D8-8E60-000393CFDE0C@mekentosj.com> Message-ID: <77271DCD-ED64-11D8-90CF-000A95EECBDE@duke.edu> On Aug 13, 2004, at 3:50 PM, Alexander Griekspoor wrote: > That was exactly my thought! It's a pity that cocoamamasam has gone > bananas, but I will send a message tomorrow to the list... > I'll keep you informed... > Alex -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2373 bytes Desc: not available URL: From mek at mekentosj.com Fri Aug 13 16:10:19 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 13 Aug 2004 22:10:19 +0200 Subject: [Biococoa-dev] Base test In-Reply-To: <77271DCD-ED64-11D8-90CF-000A95EECBDE@duke.edu> References: <81EFF6AD-ED5B-11D8-90CF-000A95EECBDE@duke.edu> <143C95D0-ED5D-11D8-8E60-000393CFDE0C@mekentosj.com> <16592168-ED5E-11D8-90CF-000A95EECBDE@duke.edu> <45BEFA87-ED5F-11D8-90CF-000A95EECBDE@duke.edu> <0AF7E385-ED62-11D8-8E60-000393CFDE0C@mekentosj.com> <77271DCD-ED64-11D8-90CF-000A95EECBDE@duke.edu> Message-ID: Wow, that has been the most useful link someone send me in a long time! Thanks a lot!!! Alex Op 13-aug-04 om 22:07 heeft Jim Balhoff het volgende geschreven: > > > On Aug 13, 2004, at 3:50 PM, Alexander Griekspoor wrote: > >> That was exactly my thought! It's a pity that cocoamamasam has gone >> bananas, but I will send a message tomorrow to the list... >> I'll keep you informed... >> Alex >> ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From james.balhoff at duke.edu Fri Aug 13 16:13:15 2004 From: james.balhoff at duke.edu (Jim Balhoff) Date: Fri, 13 Aug 2004 16:13:15 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <0AF7E385-ED62-11D8-8E60-000393CFDE0C@mekentosj.com> References: <81EFF6AD-ED5B-11D8-90CF-000A95EECBDE@duke.edu> <143C95D0-ED5D-11D8-8E60-000393CFDE0C@mekentosj.com> <16592168-ED5E-11D8-90CF-000A95EECBDE@duke.edu> <45BEFA87-ED5F-11D8-90CF-000A95EECBDE@duke.edu> <0AF7E385-ED62-11D8-8E60-000393CFDE0C@mekentosj.com> Message-ID: <35AF4684-ED65-11D8-90CF-000A95EECBDE@duke.edu> On Aug 13, 2004, at 3:50 PM, Alexander Griekspoor wrote: > That was exactly my thought! It's a pity that cocoamamasam has gone > bananas, but I will send a message tomorrow to the list... > I'll keep you informed... > Alex > Here is a good thread: > DATE : Sat Sep 02 12:50:33 2000 > > In general, a (Class) is an (id) (but not, obviously, vice-versa).? > In other words, class objects are real objects in ObjC. > > You can put a Class in an NSArray or other collection. > > One additional tidbit of ObjC arcana is that instance methods declared > by the base class (NSObject for most purposes) are also allowed to be > sent to Class objects.? So, because NSObject declares > -performSelector:withObject: you can send Classes > +performSelector:withObject: as well.? (In many cases NSObject > explicitly implements Class methods that are the same names as its > instance methods so that it can do things a bit differently.? So, > there is an explicit +release since the -release implementation would > not be appropriate for Class objects...)? Note that this is NOT true > for ALL instance methods, only for those defined in the base class.? > The NSString Class object can NOT be sent +length messages. > > Mike Ferris ____________________________________________ James P. Balhoff Dept. of Biology Duke University Durham, NC 27708-0338 USA -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2373 bytes Desc: not available URL: From mek at mekentosj.com Fri Aug 13 16:16:20 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 13 Aug 2004 22:16:20 +0200 Subject: [Biococoa-dev] Base test In-Reply-To: <35AF4684-ED65-11D8-90CF-000A95EECBDE@duke.edu> References: <81EFF6AD-ED5B-11D8-90CF-000A95EECBDE@duke.edu> <143C95D0-ED5D-11D8-8E60-000393CFDE0C@mekentosj.com> <16592168-ED5E-11D8-90CF-000A95EECBDE@duke.edu> <45BEFA87-ED5F-11D8-90CF-000A95EECBDE@duke.edu> <0AF7E385-ED62-11D8-8E60-000393CFDE0C@mekentosj.com> <35AF4684-ED65-11D8-90CF-000A95EECBDE@duke.edu> Message-ID: That answers that question then, this is the elegant and legitimate way to use a plist as the source for methods to complement bases. Alex Op 13-aug-04 om 22:13 heeft Jim Balhoff het volgende geschreven: > On Aug 13, 2004, at 3:50 PM, Alexander Griekspoor wrote: > >> That was exactly my thought! It's a pity that cocoamamasam has gone >> bananas, but I will send a message tomorrow to the list... >> I'll keep you informed... >> Alex >> > > Here is a good thread: > > > > >> DATE : Sat Sep 02 12:50:33 2000 >> >> In general, a (Class) is an (id) (but not, obviously, vice-versa).? >> In other words, class objects are real objects in ObjC. >> >> You can put a Class in an NSArray or other collection. >> >> One additional tidbit of ObjC arcana is that instance methods >> declared by the base class (NSObject for most purposes) are also >> allowed to be sent to Class objects.? So, because NSObject declares >> -performSelector:withObject: you can send Classes >> +performSelector:withObject: as well.? (In many cases NSObject >> explicitly implements Class methods that are the same names as its >> instance methods so that it can do things a bit differently.? So, >> there is an explicit +release since the -release implementation would >> not be appropriate for Class objects...)? Note that this is NOT true >> for ALL instance methods, only for those defined in the base class.? >> The NSString Class object can NOT be sent +length messages. >> >> Mike Ferris > > ____________________________________________ > James P. Balhoff > Dept. of Biology > Duke University > Durham, NC 27708-0338 > USA > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* From jtimmer at bellatlantic.net Fri Aug 13 17:24:23 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 13 Aug 2004 17:24:23 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: Message-ID: > Actually you can just send the performSelector: message to the class. > I wasn't sure if class objects conformed to the NSObject protocol, but > if you try sending the message [MyClass performSelector:message], you > get the same result as the objc_msgSend() stuff. > > That answers that question then, this is the elegant and legitimate way > to use a plist as the source for methods to complement bases. Right then, each base is an entry in a .plist dictionary, with the base name being the key. The first time anyone requests a base, the file will be read and all of the bases will be created. The base will be a dictionary and will have the following fields: Name Symbol Matches (array) Single (BOOL) Complement Complements (array) I'll create the name and symbol on loading, and keep a reference to the dictionary. The first time anyone starts to access the matches, complement, or complements, I'll take those strings out of the dictionary and use them load the arrays up with actual bases. This should prevent circular initialization issues. If the gods smile upon me, I can have this done before the weekend's over. I really hope it works, it's too cool an idea not to...... JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Fri Aug 13 18:32:20 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 13 Aug 2004 18:32:20 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <0AF7E385-ED62-11D8-8E60-000393CFDE0C@mekentosj.com> References: <81EFF6AD-ED5B-11D8-90CF-000A95EECBDE@duke.edu> <143C95D0-ED5D-11D8-8E60-000393CFDE0C@mekentosj.com> <16592168-ED5E-11D8-90CF-000A95EECBDE@duke.edu> <45BEFA87-ED5F-11D8-90CF-000A95EECBDE@duke.edu> <0AF7E385-ED62-11D8-8E60-000393CFDE0C@mekentosj.com> Message-ID: On Aug 13, 2004, at 3:50 PM, Alexander Griekspoor wrote: > It's a pity that cocoamamasam has gone bananas The archive is now at www.cocoabuilder.com - Koen. From kvddrift at earthlink.net Fri Aug 13 19:25:56 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 13 Aug 2004 19:25:56 -0400 Subject: [Biococoa-dev] more ideas Message-ID: <20AA0258-ED80-11D8-919D-003065A5FDCC@earthlink.net> Hi, Just reading through a few posts from today, and here are some misc comments I try to post in between the tornado warnings. I'll try to post a longer reply later today. I think the structure of classes could have an additional base class from which noth BCSymbol as BCSequence can derive. Again, this simplifies things a little bit: BCRoot | | -------BCSymbol | | | | | ------------ BCNucleotide | | | | | ------------ BCAminoAcid | | | | | ------------ etc (think BCSaccharide and alikes) | | ------- BCSequence | | ----------------BCSequenceDNA | | ----------------BCSequenceProtein BCRoot: NSString *name BCSymbol: NSString *symbol; NSDictionary *properties BCSequence: NSMutableArray *symbolList NSMutableDictionary *featuresList (key: position, value:feature) BCSequenceDNA: complement, etc BCSequenceProtein: MW, pI, etc Regarding the complement discussion, wouldn't a member "complement" in BCNucleotide (either a char or string or pointer to object) be much easier? You only need to set it up once. Also when maintaining a featuresList we have to update the keys indicating the position constantly when the sequence is edited. So that needs a little more thought. - Koen. From kvddrift at earthlink.net Fri Aug 13 19:33:08 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 13 Aug 2004 19:33:08 -0400 Subject: [Biococoa-dev] two requests Message-ID: <22458C5E-ED81-11D8-919D-003065A5FDCC@earthlink.net> Hi, I am unable to read my mail when I am at work (company policy :), but I can follow the discussions on this mailinglist through this website: http://bioinformatics.org/pipermail/biococoa-dev/2004-August/thread.html Unfortunately this webpage only shows plain text so whenever you guys add rtf text (code examples, etc), it becomes very illegible in the archives. So could you try to only use plain text? Also when you start a new thread, could you make sure that you don't reply to an existing thread? Also see the archives for how mixed it looks right now. thanks, - koen. From kvddrift at earthlink.net Sat Aug 14 08:08:33 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 14 Aug 2004 08:08:33 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: References: Message-ID: On Aug 13, 2004, at 10:57 AM, John Timmer wrote: > During base initialization, I was going to create the > arrays of complements and representations. This means every time the > first > base was created, it would necessarily create its complement, all > relevant > ambiguous bases, etc. I'm just a stupid chemist, could you explain to me why you need to create arrays of complements, representations and ambiguous bases for each base and what they are? (I do know that each base has a complement and what DNA looks like, but don't understand why you'd need an array for each base, or did you mean to create a complement sequence dynamically?) thanks, - Koen. From kvddrift at earthlink.net Sat Aug 14 08:27:03 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 14 Aug 2004 08:27:03 -0400 Subject: [Biococoa-dev] RE: Nomenclature In-Reply-To: <01FE463B-ED50-11D8-BB73-000393CFDE0C@mekentosj.com> References: <01FE463B-ED50-11D8-BB73-000393CFDE0C@mekentosj.com> Message-ID: <3F4F7750-EDED-11D8-B2C1-003065A5FDCC@earthlink.net> On Aug 13, 2004, at 1:41 PM, Alexander Griekspoor wrote: > I was thinking of the following two options, but perhaps someone comes > up with something must better. > > BCRecord - Based on a record from a nucleotide database like entrez. > (BCEntry would be another variant). Advantage: familiar setup > (sequence, name, features, etc). Disadvantage: sometimes it's a bit > strange to call things a record, like if you're cloning for instance, > your not messing around with BCRecords. But perhaps in this case you > are busy with BCFragments, derived from a BCRecords so it's not that > bad after all. > > BCEntity - My favorite. BCUnit is to boring, but a more general, > "building block of our framework" name would perhaps be apporiate > here. Anyway, you might be laughing at it right now... > What about BCObject or BCRoot? Boring but effective. Silly solutions are BCLego, BCTwoByFour, BCBrick, etc. Peter is remodelling his house, he might have even more suggestions :) >> And the second method (which I wrote this morning) looks like: >> - (BOOL) representsBase: (BCSequenceDNABase *) entry { >> ????if ( [[self matches] containsObject: entry] ) >> ????????return YES; >> ????return NO; ??? >> } > Really elegant! > Even shorter: - (BOOL) representsBase: (BCSequenceDNABase *) entry { return [[self matches] containsObject: entry]; } - Koen. From kvddrift at earthlink.net Sat Aug 14 08:39:24 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 14 Aug 2004 08:39:24 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <489AFFE5-ECB7-11D8-BB73-000393CFDE0C@mekentosj.com> References: <1283A07E-ECB2-11D8-919D-003065A5FDCC@earthlink.net> <489AFFE5-ECB7-11D8-BB73-000393CFDE0C@mekentosj.com> Message-ID: On Aug 12, 2004, at 7:28 PM, Alexander Griekspoor wrote: > I think BCFunctionalGroup should be a feature (like phosphates) and > not a symbol, but perhaps I am wrong here. > I think you're right. BioJava has both features and annotations, I guess functional groups should indeed be one of those. I need to read the biojava docs a little bit more carefully to understand the differences between both. An equivalent in Cocoa would be the stringattributes, so we can try to follow that apporach. - Koen. From mek at mekentosj.com Sat Aug 14 09:11:11 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 14 Aug 2004 15:11:11 +0200 Subject: [Biococoa-dev] Base test In-Reply-To: References: <1283A07E-ECB2-11D8-919D-003065A5FDCC@earthlink.net> <489AFFE5-ECB7-11D8-BB73-000393CFDE0C@mekentosj.com> Message-ID: <69CFB2F8-EDF3-11D8-907F-000393CFDE0C@mekentosj.com> I indeed wonder how in Cocoa stringattributes are done and kept in sync during editing.... It's a nice comparison, stringattributes vs sequence features, perhaps we could ask some folks at apple on the cocoa dev list how the implemented attributed strings... Alex Op 14-aug-04 om 14:39 heeft Koen van der Drift het volgende geschreven: > > On Aug 12, 2004, at 7:28 PM, Alexander Griekspoor wrote: > >> I think BCFunctionalGroup should be a feature (like phosphates) and >> not a symbol, but perhaps I am wrong here. >> > > > I think you're right. BioJava has both features and annotations, I > guess functional groups should indeed be one of those. I need to read > the biojava docs a little bit more carefully to understand the > differences between both. > > An equivalent in Cocoa would be the stringattributes, so we can try to > follow that apporach. > > > - Koen. > > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* From mek at mekentosj.com Sat Aug 14 09:13:45 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 14 Aug 2004 15:13:45 +0200 Subject: [Biococoa-dev] two requests In-Reply-To: <22458C5E-ED81-11D8-919D-003065A5FDCC@earthlink.net> References: <22458C5E-ED81-11D8-919D-003065A5FDCC@earthlink.net> Message-ID: Koen, In future post I'll convert them to plain text before sending, and try to start new threads when appropriate... Alex Op 14-aug-04 om 1:33 heeft Koen van der Drift het volgende geschreven: > Hi, > > I am unable to read my mail when I am at work (company policy :), but > I can follow the discussions on this mailinglist through this website: > > http://bioinformatics.org/pipermail/biococoa-dev/2004-August/ > thread.html > > Unfortunately this webpage only shows plain text so whenever you guys > add rtf text (code examples, etc), it becomes very illegible in the > archives. So could you try to only use plain text? > > Also when you start a new thread, could you make sure that you don't > reply to an existing thread? Also see the archives for how mixed it > looks right now. > > > thanks, > > - koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* From mek at mekentosj.com Sat Aug 14 09:22:12 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 14 Aug 2004 15:22:12 +0200 Subject: [Biococoa-dev] Base test In-Reply-To: References: Message-ID: Koen, The problem is not so much with complement bases but with the fact that the nucleotide alphabet contains ambiguous bases, bases that represent a group of bases. Examples are N for aNy base, but also R for puRine bases (A and G), W for Weak bases (A and T), etc. Therefore, each symbol has a -(NSArray *)matches; method that returns all bases that belong to this base (it returns only self if the base is atomic (A, C, G and T). A similar setup can be choosen for the complements of course, but perhaps it's indeed better to just return the complement ambiguous base. So N would return N as complement, W would return S as complement. I don't know if John plans to follow this latter setup actually, but he will probably tell us. In the protein world one could have a "stop" amino acid which would be ambiguous for the three stop codons we have, although another complicated factor would be species specificity. I hope this helps, but perhaps I completely misunderstood the question. Alex Op 14-aug-04 om 14:08 heeft Koen van der Drift het volgende geschreven: > > On Aug 13, 2004, at 10:57 AM, John Timmer wrote: > >> During base initialization, I was going to create the >> arrays of complements and representations. This means every time the >> first >> base was created, it would necessarily create its complement, all >> relevant >> ambiguous bases, etc. > > I'm just a stupid chemist, could you explain to me why you need to > create arrays of complements, representations and ambiguous bases for > each base and what they are? > > (I do know that each base has a complement and what DNA looks like, > but don't understand why you'd need an array for each base, or did you > mean to create a complement sequence dynamically?) > > > thanks, > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From kvddrift at earthlink.net Sat Aug 14 10:00:41 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 14 Aug 2004 10:00:41 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: References: Message-ID: <54152DF4-EDFA-11D8-B76F-003065A5FDCC@earthlink.net> On Aug 14, 2004, at 9:22 AM, Alexander Griekspoor wrote: > I hope this helps, but perhaps I completely misunderstood the question. > Thanks Alex, that explains my questions, and now I understand more of the previous discussion :) - Koen. From jtimmer at bellatlantic.net Sat Aug 14 11:10:18 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sat, 14 Aug 2004 11:10:18 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: Message-ID: > Koen, > > The problem is not so much with complement bases but with the fact that > the nucleotide alphabet contains ambiguous bases, bases that represent > a group of bases. Examples are N for aNy base, but also R for puRine > bases (A and G), W for Weak bases (A and T), etc. Therefore, each > symbol has a -(NSArray *)matches; method that returns all bases that > belong to this base (it returns only self if the base is atomic (A, C, > G and T). A similar setup can be choosen for the complements of course, > but perhaps it's indeed better to just return the complement ambiguous > base. So N would return N as complement, W would return S as > complement. I don't know if John plans to follow this latter setup > actually, but he will probably tell us. In the protein world one could > have a "stop" amino acid which would be ambiguous for the three stop > codons we have, although another complicated factor would be species > specificity. > I hope this helps, but perhaps I completely misunderstood the question. > Alex > That was more or less my plan. You can get complement, which is the direct complement (strict subset - N only really complements N), and the full set of complements (where N complements anything). I'll also have a "represents" and "representedBy" method - N represents anything, but is really only represented by N, while C represents only C, but can be represented by C, S, Y, N, etc. There's situations where I can imagine needing any one of these, so there's no reason not to implement all of them - do all the work in the base class, so that nobody ever has to repeat it. Cheers, John _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Sat Aug 14 17:40:36 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sat, 14 Aug 2004 17:40:36 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: Message-ID: Quick naming ideal - how about BCSequenceUnit as the root class of bases and amino acids? Then BCSequenceUnitDNABase, BCSequenceUnitAminoAcid, etc. The problem I have with the name "symbol" is that the items the classes will be representing have symbols, and so will have a symbol variable and a symbol method, so it's a bit of circular nomenclature. Incidentally, I'm not going to implement the base class just yet (even if we did know what we're calling it) - I'll get the DNA bases working first and then figure out which methods would also apply to an amino acid and move them up the inheritance chain. I've also attached the preliminary .plist file I'll be using to generate base definitions. Let me know if I made any mistakes, and feel free to add any other bases if you have time to kill. My current thought on extensibility: The class will have singleton variables and methods for accessing all defined bases, and I'll go looking through the .plist file for them. Any additional items will get held in a "custom base" dictionary, and will be accessible through a "getCustomBase: (NSString *)baseName". This will allow for quick access for typical uses, but still provide flexibility. I'm diving into this now, so stop me if I'm making a stupid mistake... JT _______________________________________________ This mind intentionally left blank -------------- next part -------------- A non-text attachment was scrubbed... Name: base template.plist Type: application/octet-stream Size: 4503 bytes Desc: not available URL: From kvddrift at earthlink.net Sat Aug 14 19:35:15 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 14 Aug 2004 19:35:15 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: References: Message-ID: <9865D294-EE4A-11D8-B76F-003065A5FDCC@earthlink.net> On Aug 14, 2004, at 5:40 PM, John Timmer wrote: > Quick naming ideal - how about BCSequenceUnit as the root class of > bases and > amino acids? Then BCSequenceUnitDNABase, BCSequenceUnitAminoAcid, > etc. > Why not BCUnit, BCUnitDNABase, BCUnitAminoAcid, etc? Or even BCDNABase, BCAminoAcid? They can be used everywhere, not limited to a sequence (although they will be most of the time). > > Incidentally, I'm not going to implement the base class just yet (even > if we > did know what we're calling it) - I'll get the DNA bases working first > and > then figure out which methods would also apply to an amino acid and > move > them up the inheritance chain. Good luck :) > > I've also attached the preliminary .plist file I'll be using to > generate > base definitions. Let me know if I made any mistakes, and feel free > to add > any other bases if you have time to kill. I don't know much about bases, but this looks fine. You can also add stuff like elemental composition and other physical properties. The nice thing of a plist is that stuff can always be added later. > > My current thought on extensibility: > The class will have singleton variables and methods for accessing all > defined bases, and I'll go looking through the .plist file for them. > Any > additional items will get held in a "custom base" dictionary, and will > be > accessible through a "getCustomBase: (NSString *)baseName". This will > allow > for quick access for typical uses, but still provide flexibility. > > I'm diving into this now, so stop me if I'm making a stupid mistake... Difficult to say without seeing the code. I'd say go ahead and share it, either on the list or the cvs website. Make sure you make plist (as an NSDictionary) a shared member of all bases, so you only have to read it once from disk. This is what I use for amino acids in my app: #import "IDAminoAcid.h" static NSMutableDictionary *aminoAcidPropertiesDict = nil; @implementation IDAminoAcid + (NSDictionary *) aaPropertiesDict { if ( aminoAcidPropertiesDict == nil ) { NSString *filePath = [[NSBundle mainBundle] pathForResource: @"aminoacids" ofType: @"plist"]; aminoAcidPropertiesDict = [[NSMutableDictionary alloc] initWithContentsOfFile: filePath]; } return aminoAcidPropertiesDict; } .... The just call [self aaPropertiesDict] to access the dictionary. And yes, the naming difference between aaPropertiesDict and aminoAcidPropertiesDict is intentional :) - Koen. From kvddrift at earthlink.net Sat Aug 14 20:00:06 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 14 Aug 2004 20:00:06 -0400 Subject: [Biococoa-dev] Base design In-Reply-To: References: Message-ID: <1122C4A9-EE4E-11D8-B76F-003065A5FDCC@earthlink.net> On Aug 11, 2004, at 3:44 PM, Alexander Griekspoor wrote: > Click on the show TOC to get to the complete documentation for > headerdoc. As an example I have attached a file from the AGRegex > framework, it nicely shows what it will look like. To avoid the > removal of the attachment, here it is inline: > > The really cool thing is that you can use XCode's Applescript menu to > quickly insert templates, as easy as it can get! > That's awesome. Another very useful tool is AutoGraf, an automatic class diagram generator. See: http://autograf.sourceforge.net/. - Koen. From kvddrift at earthlink.net Sun Aug 15 11:04:16 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 15 Aug 2004 11:04:16 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: References: Message-ID: <604C285C-EECC-11D8-B76F-003065A5FDCC@earthlink.net> On Aug 14, 2004, at 5:40 PM, John Timmer wrote: > Incidentally, I'm not going to implement the base class just yet (even > if we > did know what we're calling it) - I'll get the DNA bases working first > and > then figure out which methods would also apply to an amino acid and > move > them up the inheritance chain. > I have made a start for the BCSequence class, which will take care of editing any sequence of BCUnit/BCWhatever objects. We just need to agree what to use for the base/unit name. Right now I am using 'aminoAcid' because I just copy/pasted it from my protein app. Until we find a name I have uncommented the methods that use aminoacid. I have commited them to cvs, so everyone can have a look and give feedback. - Koen. From jtimmer at bellatlantic.net Sun Aug 15 12:16:38 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sun, 15 Aug 2004 12:16:38 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <604C285C-EECC-11D8-B76F-003065A5FDCC@earthlink.net> Message-ID: Just for reference, I'm attaching a .h file where I was writing out some of the things I'd thought would need to be implemented in this sort of class. Some of them would probably work well in the base class, and I think the method names are fairly informative. Please use anything you think is appropriate. JT > > I have made a start for the BCSequence class, which will take care of > editing any sequence of BCUnit/BCWhatever objects. We just need to > agree what to use for the base/unit name. Right now I am using > 'aminoAcid' because I just copy/pasted it from my protein app. Until we > find a name I have uncommented the methods that use aminoacid. > > I have commited them to cvs, so everyone can have a look and give > feedback. > > - Koen. > _______________________________________________ This mind intentionally left blank -------------- next part -------------- A non-text attachment was scrubbed... Name: BCSequenceDNA.h Type: application/octet-stream Size: 1801 bytes Desc: not available URL: From kvddrift at earthlink.net Sun Aug 15 14:15:10 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 15 Aug 2004 14:15:10 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: References: Message-ID: <0B5A7464-EEE7-11D8-B76F-003065A5FDCC@earthlink.net> On Aug 15, 2004, at 12:16 PM, John Timmer wrote: > Just for reference, I'm attaching a .h file where I was writing out > some of > the things I'd thought would need to be implemented in this sort of > class. > Some of them would probably work well in the base class, and I think > the > method names are fairly informative. Please use anything you think is > appropriate. > Looks good John. I think the following can go in the parent class. Most of these are already (with a different name) in what I commited this morning. /* /////////////////////////////////////////////////////////////////////// ///// // OBTAINING INFORMATION ABOUT THE SEQUENCE /////////////////////////////////////////////////////////////////////// ///// - (NSArray *) sequenceBaseArray; - (int) length; - (BCSequenceBase *) baseAtIndex: (int)index; /////////////////////////////////////////////////////////////////////// ///// // ALTERING THE CONTENTS /////////////////////////////////////////////////////////////////////// ///// - (void) setSequenceBaseArray: (NSArray *)entry; - (void) removeBaseAtIndex: (int)index; - (void) removeBasesInRange: (NSRange)entry; /////////////////////////////////////////////////////////////////////// ///// // DERIVING OTHER SEQUENCES /////////////////////////////////////////////////////////////////////// ///// - (BCSequenceDNA *) sequenceInRange: (NSRange)entry; Can we change the name BCSequenceBaseFoo to something else? Just as you commented that Symbol is confusing, I think SequenceBase is also confusing because (to me) it refers to DNA, not to a general building block. - Koen. From jtimmer at bellatlantic.net Sun Aug 15 14:14:59 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sun, 15 Aug 2004 14:14:59 -0400 Subject: [Biococoa-dev] New BCSequenceDNABase In-Reply-To: Message-ID: Attached, you should find the DNA base implementation. I need to both go back and header-doc this, and create the full set of bases in the .plist file. Right now, it compiles without warnings (so somebody with privileges can commit it to CVS), but it will fail miserably in use until the rest of the bases are defined. Cheers, John _______________________________________________ This mind intentionally left blank -------------- next part -------------- A non-text attachment was scrubbed... Name: BCSequenceDNABase.m Type: application/octet-stream Size: 12296 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: BCSequenceDNABase.h Type: application/octet-stream Size: 3105 bytes Desc: not available URL: From kvddrift at earthlink.net Sun Aug 15 15:40:37 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 15 Aug 2004 15:40:37 -0400 Subject: [Biococoa-dev] New BCSequenceDNABase In-Reply-To: References: Message-ID: On Aug 15, 2004, at 2:14 PM, John Timmer wrote: > Right now, it compiles without warnings (so somebody with privileges > can > commit it to CVS), but it will fail miserably in use until the rest of > the > bases are defined. Just added it, but didn't test it :) - Koen. From mek at mekentosj.com Sun Aug 15 18:18:34 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 16 Aug 2004 00:18:34 +0200 Subject: [Biococoa-dev] Base test In-Reply-To: References: Message-ID: <0BF64132-EF09-11D8-927C-000393CFDE0C@mekentosj.com> Hi Guys, Let me start by a very general comment, perhaps we should either put our work in a separate directory or provide some read me files for "innocent" downloaders of BioCocoa that find a lot of alpha code now of a sudden instead of the relatively stable version Peter left before we started. At least some comments in the read me file would be elegant. Perhaps Peter do you want to do that? Also, would it be wise to make some further organisation using folders. We have a Utils folder already. Perhaps both the sequence stuff and IO part could be placed in separate folders. Then one disclaimer here, I feel more and more guilty not having much time right now do really help programming. Our website makeover comes along nicely, but it will take some more time unfortunately. As a result I hope you don't get the feeling that this person is only complaining while you guys do all the work... Again, don't get offended it's all really meant with good intention. I'll try to jump in a.s.a.p.... OK, that having said, I'm impressed by the tempo guys, well done! > Can we change the name BCSequenceBaseFoo to something else? Just as > you commented that Symbol is confusing, I think SequenceBase is also > confusing because (to me) it refers to DNA, not to a general building > block. The naming scheme first, I understand the possible clash with the use of Symbol, although I think it still represents the subject best, and we are talking about BCSymbol instead of symbol (can't we call this latter property "character" (biojava calls them "token" which is a nice name as well). Anyway, my problem is not so much BCSymbol per se, I'm just not to fond of these really long names (long live autocomplete, but BCSequenceUnitDNABase is really long, let alone BCSequenceUnitAminoAcid!! I go along with Koen then, why don't we just call the thing BCAminoAcid or BCNucleotideDNA and BCNucleotideRNA (or BCDNABase/BCRNABase). I know you rather have a shared prefix as they descend from a common ancestor, but maybe that doesn't weight enough here. Question remains of course what we call the ancestor ;-) A few remarks that I had after my first quick look at the added code: - John shouldn't there be singleton objects for the "W, S, V, B, R" bases as well? I now only found ACGTN - I think it's impossible because of the needed statics, but would there be a way to COMPLETELY initialize all bases on the basis of the plist? Without having to hardcode them? In the ideal world I would imagine that you ask for a base, the factory object looks in the plist if that is such a base with that name is listed and then initializes it with the data from the plist. Alternatively, it would upon initialization enumerate over the plist to init all the bases listed. I was just wondering if someone could come up with an idea like that? - Koen, in your sequence class I saw you can init them with a string, great! But next you keep the string around and many methods depend on / work with the string. This leads to exactly the problems we discussed. Init with a string is logical of course, but then we should just let that go and completely depend on the sequence list containing John's bases. We shouldn't have to worry about keeping the string in sync here, the only string you can get back out is through the stringRepresentation; method which is generated at that particular moment back "translated" from the sequencelist. Of course I realize it is work in progress and perhaps to early. - I found the -position; method a bit confusing as to it's description vs what it does - What does the countedset do, and is that supported from Jaguar? - Then we encounter another problem. BCSequence should be a ancestor class that devides in aminoacid, dna or rna sequence subclasses. Now you have something mixed, do we incorporate translations into the sequence? I guess not, these sequences should be mixed, either pure dna or pure protein. If we do, RNA translation must be there as well. So the aminoacid methods are strange here. The idea I would propose is that there is a shared translation util object that you could feed a dna sequence and get (in the requested frames) the translated sequences back as protein sequence objects. It's the app task to control/organize these. Likewise, one could argue this for complements as well -> a shared dna utils object returns you the complement sequence if you hand it a sequence. Alternatively these translations could be added as features, but in all cases there's again the "how to keep things in sync upon editing" problem. I think we should keep things as separate and clean entities as much as possible. - Another discussion we had before was about the start/end position. I argued a bit before to handle things like movie editing. You have raw source clips and give a start and end position to mark the wanted region. The big advantage here is that you get socalled "non-destructive editing". Say you had selected bases 100 to 900 in a 1000bp sequence. In iMovie 2 you were in big trouble if you in hindsight rather have had bases 50 to 950 as you have cropped the sequence and thrown away the ends. In iMovie 4 this is no problem, the raw source is still there and the only thing you have to move is the begin/end marker. But during our discussions we more or less came to the conclusion that this would be something more appropriate to be coded in features as it's hard to predict when you want to crop or want to keep the complete sequence. In addition this current implementation is rather limited as only one region can be marked, instead of 100-200, 400-500 etc. A developer could easily add program specific features that allows him to simulate the desired behaviour when he wants to (like mark bases 50 to 200 as a cut fragment). - I love the snippet where you read the dictionary only once using the class method, certainly gonna use that one myself as well ;-) Again, many of these remarks might come to early. Also, a lot of work comes from the interplay between John and Koen to get the two basic parts, symbols and sequences, working. It's definitely going quite well from what I can see. I like the sequence header file items you send John, and indeed see many things already in the work of Koen. Indeed many items can go in the ancestor sequence class and it's key to identify as many as possible to keep the descendants look and work as similar as possible. If I see things like (although this should indeed be in the general sequence class, thus loosing the "DNA" part): /////////////////////////////////////////////////////////////////////// ///// // INITIALIZATION METHODS /////////////////////////////////////////////////////////////////////// ///// - (BCSequenceDNA *) initWithSequenceString: (NSString *)entry skippingNonBases: (BOOL)skip; + (BCSequenceDNA *) DNASequenceWithSequenceString: (NSString *)entry skippingNonBases: (BOOL)skip; + (BCSequenceDNA *) DNASequenceWithBaseArray: (NSArray *)entry; + (BCSequenceDNA *) DNASequenceWithSequence: (BCSequenceDNA *)entry; , I can hardly wait to start using it in a real program! I guess before that however, many discussion will follow ;-) Keep up the good work guys! Cheers, Alex ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 8062 bytes Desc: not available URL: From kvddrift at earthlink.net Sun Aug 15 18:22:08 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 15 Aug 2004 18:22:08 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <0BF64132-EF09-11D8-927C-000393CFDE0C@mekentosj.com> References: <0BF64132-EF09-11D8-927C-000393CFDE0C@mekentosj.com> Message-ID: <8B7EA5C8-EF09-11D8-B76F-003065A5FDCC@earthlink.net> On Aug 15, 2004, at 6:18 PM, Alexander Griekspoor wrote: > Let me start by a very general comment, perhaps we should either put > our work in a separate directory or provide some read me files for > "innocent" downloaders of BioCocoa that find a lot of alpha code now > of a sudden instead of the relatively stable version Peter left before > we started. At least some comments in the read me file would be > elegant. Perhaps Peter do you want to do that? > We can make a CVS branch for that. - Koen. (more replies later :) From kvddrift at earthlink.net Sun Aug 15 18:40:08 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 15 Aug 2004 18:40:08 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <0BF64132-EF09-11D8-927C-000393CFDE0C@mekentosj.com> References: <0BF64132-EF09-11D8-927C-000393CFDE0C@mekentosj.com> Message-ID: <0F7FD206-EF0C-11D8-B76F-003065A5FDCC@earthlink.net> On Aug 15, 2004, at 6:18 PM, Alexander Griekspoor wrote: > - Koen, in your sequence class I saw you can init them with a string, > great! But next you keep the string around and many methods depend on > / work with the string. This leads to exactly the problems we > discussed. Init with a string is logical of course, but then we should > just let that go and completely depend on the sequence list containing > John's bases. We shouldn't have to worry about keeping the string in > sync here, the only string you can get back out is through the > stringRepresentation; method which is generated at that particular > moment back "translated" from the sequencelist. Of course I realize it > is work in progress and perhaps to early. I think the only other thing I do with the string are the 2 accessor functions. Editing of the sequence is done for the NSMutableArray, although this is still commented out until we agree on the naming of classes. > > - I found the -position; method a bit confusing as to it's description > vs what it does It's used by my app to make a string that displays the start and end position (1-based) of a subsequence. We can rename it. > > - What does the countedset do, The counted set keeps track of the number of different aminoacids, see the method countAminoAcids how to populate the set. > and is that supported from Jaguar? Not sure, the docs usually say if it is 10.3 and later, but there is no such mention in the class description. > The idea I would propose is that there is a shared translation util > object that you could feed a dna sequence and get (in the requested > frames) the translated sequences back as protein sequence objects. > It's the app task to control/organize these. Agree, good plan. > Likewise, one could argue this for complements as well -> a shared dna > utils object returns you the complement sequence if you hand it a > sequence. Alternatively these translations could be added as features, > but in all cases there's again the "how to keep things in sync upon > editing" problem. I think we should keep things as separate and clean > entities as much as possible. Yes :) > I guess before that however, many discussion will follow ;-) Amen. - Koen. From mek at mekentosj.com Sun Aug 15 19:10:47 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 16 Aug 2004 01:10:47 +0200 Subject: [Biococoa-dev] Base test In-Reply-To: <0F7FD206-EF0C-11D8-B76F-003065A5FDCC@earthlink.net> References: <0BF64132-EF09-11D8-927C-000393CFDE0C@mekentosj.com> <0F7FD206-EF0C-11D8-B76F-003065A5FDCC@earthlink.net> Message-ID: <5774A4B4-EF10-11D8-927C-000393CFDE0C@mekentosj.com> >> Koen, in your sequence class I saw you can init them with a string, >> great! But next you keep the string around and many methods depend on >> / work with the string. This leads to exactly the problems we >> discussed. Init with a string is logical of course, but then we >> should just let that go and completely depend on the sequence list >> containing John's bases. We shouldn't have to worry about keeping the >> string in sync here, the only string you can get back out is through >> the stringRepresentation; method which is generated at that >> particular moment back "translated" from the sequencelist. Of course >> I realize it is work in progress and perhaps to early. > > I think the only other thing I do with the string are the 2 accessor > functions. Editing of the sequence is done for the NSMutableArray, > although this is still commented out until we agree on the naming of > classes. In that case the sequence and setSequence method definitely should be made @private so they are only used internally. In general while developing a framework we should take special care to nicely "private-out" all internal methods to avoid people to get a bunch of compiler warnings (the famous: "Also found bla bla method") when they use our framework. Also, why the keep the string around? That only takes up memory and leaves you most probably with one that is pretty quickly out of sync with the array: - (id)initWithString:(NSString*)aString withRange:(NSRange)aRange { if (self = [super init]) { [self setSequenceString:aString]; <<--- [self setRange:aRange]; sequence = [[NSMutableArray alloc] init]; sequenceCountedSet = [[NSCountedSet alloc] init]; } return self; } >> - I found the -position; method a bit confusing as to it's >> description vs what it does > > It's used by my app to make a string that displays the start and end > position (1-based) of a subsequence. We can rename it. I thought so already, either we should rename it (if we decide to keep the start/end position thing in) or remove it. If the positioning is kept, why don't we include it in the general description? I think the way it is now it's way to program specific, any developer might want it in a different way, so leave it up to them as they can access the integers anyway. >> - What does the countedset do, > > The counted set keeps track of the number of different aminoacids, see > the method countAminoAcids how to populate the set. Nice solution! The question if this is also something that we should move to the BCProteinUtil shared object. If we go for a strict separation between model (data), controller, and view (which I strongly am in favour of), we should move all these kind of methods outside of our data classes (which the sequence clearly is). That way we don't have to update these methods everytime we edit the sequence. The only advantage I see is for caching purposes but perhaps that should again be better left to the app developer to implement in return for a cleaner framework. I like the countedset method though, so just copy it to the shared object. >> and is that supported from Jaguar? > > Not sure, the docs usually say if it is 10.3 and later, but there is > no such mention in the class description. You're right, I could found the technote for 10.2 that mentioned that they had fixed a memory leak in NSCountedSet, guess that means it was there already ;-) >> I guess before that however, many discussion will follow ;-) > > Amen. LOL, well, it was still sunday when I wrote the email ;-) Oh, no it wasn't, now I have been lying as well, better think over my sins during a good night of sleep ;-) A. ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From a.griekspoor at nki.nl Sun Aug 15 19:39:53 2004 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Mon, 16 Aug 2004 01:39:53 +0200 Subject: [Biococoa-dev] CVS from XCode Message-ID: <680C8985-EF14-11D8-927C-000393CFDE0C@nki.nl> John, Did you get the CVS working for you already? Perhaps you or one of the other can help me out here. Using Apple's document here: http://developer.apple.com/documentation/DeveloperTools/Conceptual/ Xcode_SCM/ProjectSetup/ProjectSetup.html#//apple_ref/doc/uid/TP40001208 I managed to checkout the framework and import it in XCode. Everything works fine (checked out ok using my username and password) but now XCode keeps asking me for authentication which fails every time. It seems this is a "known" issue if I believe this snippet from the following link: http://maczealots.com/tutorials/xcode-cvs/ [BEGIN] You are also going to need to set SSH up so that it won't need to enter a password when accessing your repository's machine. XCode has some sort of issues when it comes to entering an authorization password that I could not get past. On your laptop enter the following commands ssh-keygen -t dsa Hit enter to accept the default values for each prompt. Next, you will need to copy the contents of the id_dsa.pub file so you can paste it into the authorized_keys file on the repository machine. On the client machine: cat ~/.ssh/id_dsa.pub (Copy the output) ssh justin at Gavin.local vi ~/.ssh/authorized_keys (Paste previous output and save the file) If you can now ssh to the repository machine without entering a password, you should have no trouble with XCode asking for a password. [END] I don't believe I have access to that file, but perhaps someone else has the answer to this question.... (I did activate ssh connections in Xcode) Thanks! Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From kvddrift at earthlink.net Sun Aug 15 19:45:08 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 15 Aug 2004 19:45:08 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <5774A4B4-EF10-11D8-927C-000393CFDE0C@mekentosj.com> References: <0BF64132-EF09-11D8-927C-000393CFDE0C@mekentosj.com> <0F7FD206-EF0C-11D8-B76F-003065A5FDCC@earthlink.net> <5774A4B4-EF10-11D8-927C-000393CFDE0C@mekentosj.com> Message-ID: <24276124-EF15-11D8-B76F-003065A5FDCC@earthlink.net> On Aug 15, 2004, at 7:10 PM, Alexander Griekspoor wrote: > In that case the sequence and setSequence method definitely should be > made @private so they are only used internally. In general while > developing a framework we should take special care to nicely > "private-out" all internal methods to avoid people to get a bunch of > compiler warnings (the famous: "Also found bla bla method") when they > use our framework. Also, why the keep the string around? That only > takes up memory and leaves you most probably with one that is pretty > quickly out of sync with the array: > You're right, I'll remove that line and the two accessors. >> It's used by my app to make a string that displays the start and end >> position (1-based) of a subsequence. We can rename it. > > I thought so already, either we should rename it (if we decide to keep > the start/end position thing in) or remove it. If the positioning is > kept, why don't we include it in the general description? I think the > way it is now it's way to program specific, any developer might want > it in a different way, so leave it up to them as they can access the > integers anyway. Agree. When I made the file today it was more a quick copy-paste from my Protein class. So stuff can be remved and added, no problem. > Nice solution! The question if this is also something that we should > move to the BCProteinUtil shared object. The same code can be used by all kinds of sequences, so it should be in the BCSequence class. Right now it says amino acid, but as soon as we decide on a name for the root-class, I'll change that to make it more universal. > If we go for a strict separation between model (data), controller, > and view (which I strongly am in favour of), Me too! BioCocoa should focus on the model and maybe the controller part, I think. > we should move all these kind of methods outside of our data classes > (which the sequence clearly is). That way we don't have to update > these methods everytime we edit the sequence. I disagree, the sequence takes care of it's own internal bookkeeping. But it doesn't have to be recalculated with each edit, only when another class want that info. - Koen. From kvddrift at earthlink.net Sun Aug 15 20:46:16 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 15 Aug 2004 20:46:16 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <0BF64132-EF09-11D8-927C-000393CFDE0C@mekentosj.com> References: <0BF64132-EF09-11D8-927C-000393CFDE0C@mekentosj.com> Message-ID: On Aug 15, 2004, at 6:18 PM, Alexander Griekspoor wrote: > The naming scheme first, I understand the possible clash with the use > of Symbol, although I think it still represents the subject best, and > we are talking about BCSymbol instead of symbol (can't we call this > latter property "character" (biojava calls them "token" which is a > nice name as well). BCSymbol implies only a symbol, not a whole subunit of a sequence, including all associated data. I still like BCRoot or BCUnit (both only 6 charatcters, so not much to type :) - Koen. From jtimmer at bellatlantic.net Sun Aug 15 21:24:33 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sun, 15 Aug 2004 21:24:33 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <0BF64132-EF09-11D8-927C-000393CFDE0C@mekentosj.com> Message-ID: > > Also, would it be wise to make some further organisation using folders. We > have a Utils folder already. Perhaps both the sequence stuff and IO part could > be placed in separate folders. Speaking of Utils, I'm going to rename that DNA item to "DNAStringUtils or something such like. I'll keep it there, because it's easy to use if all you're doing is converting a sequence in string form and using it as a string immediately, but I'll focus on making things work with our base/amino acid tokens. > A few remarks that I had after my first quick look at the added code: > - John shouldn't there be singleton objects for the "W, S, V, B, R" bases as > well? I now only found ACGTN > - I think it's impossible because of the needed statics, but would there be a > way to COMPLETELY initialize all bases on the basis of the plist? The static values and methods for the rest of the bases are on the way. I'll also make a gap and non-base item, for alignments and such. I can't do anything with them until I fill out the .plist file. When that's done, I'll make the methods. A short summary of how things work: The first time a base is called for, the class reads the .plist file. All the predefined base objects are created, but their complements/representations are kept as strings in the dictionary, and held on to within the base. Any additional bases are kept in a static dictionary, which is retained. The first time a base has one of the complement/representation methods are called, the string-based references there are converted into references to the appropriate singleton reference, thanks to the work of Jim and Alex. Any custom bases added to the .plist file have to be accessed with + (BCSequenceDNABase *) customBase: (NSString *)baseName When that's called, the dictionary within is also converted to a base object. I still need to make a method for adding a base programmatically, but that's just going to be 90% format validation code, so it shouldn't be hard. > - Then we encounter another problem. BCSequence should be a ancestor class > that devides in aminoacid, dna or rna sequence subclasses. Now you have > something mixed, do we incorporate translations into the sequence? I guess > not, these sequences should be mixed, either pure dna or pure protein. If we > do, RNA translation must be there as well. So the aminoacid methods are > strange here. The idea I would propose is that there is a shared translation > util object that you could feed a dna sequence and get (in the requested > frames) the translated sequences back as protein sequence objects. It's the > app task to control/organize these. Likewise, one could argue this for > complements as well -> a shared dna utils object returns you the complement > sequence if you hand it a sequence. Alternatively these translations could be > added as features, but in all cases there's again the "how to keep things in > sync upon editing" problem. I think we should keep things as separate and > clean entities as much as possible. I was actually thinking we could have a BCGeneticCode, consisting of BCCodons, again, extensible through .plists, to let us define the code for different species. Create a genetic code, hand it a dictionary, then submit your sequence to it, and get the amino acid sequence back. And finally, I think the CVS problem is with my login in general, rather than CVS in particular. I was hoping to get it sorted out, but I'm probably going to create a new account on Monday if I don't hear something by then. JT _______________________________________________ This mind intentionally left blank From peter.schols at bio.kuleuven.ac.be Mon Aug 16 02:04:48 2004 From: peter.schols at bio.kuleuven.ac.be (Peter Schols) Date: Mon, 16 Aug 2004 08:04:48 +0200 Subject: [Biococoa-dev] BioCocoa CVS (was: RE: Base test) In-Reply-To: <8B7EA5C8-EF09-11D8-B76F-003065A5FDCC@earthlink.net> References: <0BF64132-EF09-11D8-927C-000393CFDE0C@mekentosj.com> <8B7EA5C8-EF09-11D8-B76F-003065A5FDCC@earthlink.net> Message-ID: <2E35A56B-EF4A-11D8-B3E0-00039345483C@bio.kuleuven.ac.be> This seems to be the way to go (and the reason why CVS branches have been invented). Since the new BioCocoa structure will give rise to all future BioCocoa versions, I guess it makes sense to create a branch for the 1.5 version (the existing version) and keep the new alpha code in the main trunk. We could be even more cautious by creating a separate alpha branch and only move the alpha branch code to the main trunk if it has been tested. I wouldn't worry too much about "innocent downloaders", I guess most of them will just download the Biococoa15.zip at http://bioinformatics.org/biococoa/ which contains the current (stable) code and will only be replaced if we have another stable build based on the new structure. On the other hand, developers wanting to join the project will download the latest CVS version and they will get the latest (alpha) code, as intended. In later stages of development, it would be wise to create separate branches for major changes or for people who want to experiment with some new features. >> Let me start by a very general comment, perhaps we should either put >> our work in a separate directory or provide some read me files for >> "innocent" downloaders of BioCocoa that find a lot of alpha code now >> of a sudden instead of the relatively stable version Peter left >> before we started. At least some comments in the read me file would >> be elegant. Perhaps Peter do you want to do that? >> > > We can make a CVS branch for that. From peter.schols at bio.kuleuven.ac.be Mon Aug 16 02:07:01 2004 From: peter.schols at bio.kuleuven.ac.be (Peter Schols) Date: Mon, 16 Aug 2004 08:07:01 +0200 Subject: [Biococoa-dev] Base test In-Reply-To: <0BF64132-EF09-11D8-927C-000393CFDE0C@mekentosj.com> References: <0BF64132-EF09-11D8-927C-000393CFDE0C@mekentosj.com> Message-ID: <7D2C9FDB-EF4A-11D8-B3E0-00039345483C@bio.kuleuven.ac.be> I would like to echo Alexander's comments here. I hope to return to real programming as soon as we have moved and unpacked. > Then one disclaimer here, I feel more and more guilty not having much > time right now do really help programming. Our website makeover comes > along nicely, but it will take some more time unfortunately. As a > result I hope you don't get the feeling that this person is only > complaining while you guys do all the work... Again, don't get > offended it's all really meant with good intention. I'll try to jump > in a.s.a.p.... From mek at mekentosj.com Mon Aug 16 06:02:29 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 16 Aug 2004 12:02:29 +0200 Subject: [Biococoa-dev] Base test Message-ID: <61EFB852-EF6B-11D8-8136-000393CFDE0C@mekentosj.com> >> The naming scheme first, I understand the possible clash with the use >> of Symbol, although I think it still represents the subject best, and >> we are talking about BCSymbol instead of symbol (can't we call this >> latter property "character" (biojava calls them "token" which is a >> nice name as well). > > BCSymbol implies only a symbol, not a whole subunit of a sequence, > including all associated data. I don't really see why it wouldn't, but maybe that's indeed what it did become in our case. > I still like BCRoot or BCUnit (both only 6 charatcters, so not much > to type :) Personally: BCRoot is too heavy and doesn't sound like having to do anything with sequence units, and almost like a super super class everything derives from (like NSObject). BCUnit, hmm I you persist ok, but still something in me doesn't like that either. Why don't we pick BCToken? Or call the unit BCSymbol and what we call symbol now, a token, like BioJava does.... Again, if you are all in favour of BCUnit, that's ok. Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* From mek at mekentosj.com Mon Aug 16 06:31:11 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 16 Aug 2004 12:31:11 +0200 Subject: [Biococoa-dev] Base test In-Reply-To: <24276124-EF15-11D8-B76F-003065A5FDCC@earthlink.net> References: <0BF64132-EF09-11D8-927C-000393CFDE0C@mekentosj.com> <0F7FD206-EF0C-11D8-B76F-003065A5FDCC@earthlink.net> <5774A4B4-EF10-11D8-927C-000393CFDE0C@mekentosj.com> <24276124-EF15-11D8-B76F-003065A5FDCC@earthlink.net> Message-ID: <6471B6AF-EF6F-11D8-8136-000393CFDE0C@mekentosj.com> >> If we go for a strict separation between model (data), controller, >> and view (which I strongly am in favour of), > > Me too! BioCocoa should focus on the model and maybe the controller > part, I think. Yep, perhaps we can in the future add some appkit stuff like views / sequence pickers etc, but that is a long way down the road. > >> we should move all these kind of methods outside of our data classes >> (which the sequence clearly is). That way we don't have to update >> these methods everytime we edit the sequence. > > I disagree, the sequence takes care of it's own internal bookkeeping. > But it doesn't have to be recalculated with each edit, only when > another class want that info. Well, I guess we are hitting the grey borderline here. I strongly suggest not to add things like translation and stuff in the sequence classes, this simply requires to much logic (like managing/choosing translation dictionaries etc). On the other hand you might be right that a simple method like the occurrence of certain amino acids might go in the sequence classes themselves, although even that I would rather see in the utils class. Just keep the sequences data containers. If we prevent the addition of these 'type specific' methods in the sequence classes themselves we can keep them as similar as possible an put as much as possible in the superclass. Of course it would be nice if the util objects would show a consistent setup as well. I think we should compare the sequence class to the nsstring class. Things like representations, editing, subranges, comparison methods and stuff go in the string class, things like translations to different languages, spell checking, etc goes in separate controllers. In the end this probably will have to be decided on a method per method basis.... Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From mek at mekentosj.com Mon Aug 16 06:43:56 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 16 Aug 2004 12:43:56 +0200 Subject: [Biococoa-dev] Base test In-Reply-To: References: Message-ID: <2CA0BBF3-EF71-11D8-8136-000393CFDE0C@mekentosj.com> >> Also, would it be wise to make some further organisation using >> folders. We >> have a Utils folder already. Perhaps both the sequence stuff and IO >> part could >> be placed in separate folders. > Speaking of Utils, I'm going to rename that DNA item to > "DNAStringUtils or > something such like. I'll keep it there, because it's easy to use if > all > you're doing is converting a sequence in string form and using it as a > string immediately, but I'll focus on making things work with our > base/amino > acid tokens. Great ides! See BCToken isn't such a bad idea ;-) > > >> A few remarks that I had after my first quick look at the added code: >> - John shouldn't there be singleton objects for the "W, S, V, B, R" >> bases as >> well? I now only found ACGTN >> - I think it's impossible because of the needed statics, but would >> there be a >> way to COMPLETELY initialize all bases on the basis of the plist? > The static values and methods for the rest of the bases are on the way. > I'll also make a gap and non-base item, for alignments and such. I > can't do > anything with them until I fill out the .plist file. When that's > done, I'll > make the methods. I though so already, sorry for that.. > > A short summary of how things work: > The first time a base is called for, the class reads the .plist file. > All > the predefined base objects are created, but their > complements/representations are kept as strings in the dictionary, and > held > on to within the base. Any additional bases are kept in a static > dictionary, which is retained. > > The first time a base has one of the complement/representation methods > are > called, the string-based references there are converted into > references to > the appropriate singleton reference, thanks to the work of Jim and > Alex. > > Any custom bases added to the .plist file have to be accessed with > + (BCSequenceDNABase *) customBase: (NSString *)baseName > When that's called, the dictionary within is also converted to a base > object. I still need to make a method for adding a base > programmatically, > but that's just going to be 90% format validation code, so it > shouldn't be > hard. But how do you create the statics for these dynamically at run time? Because if you can, one could do the ideal "instantiate the whole thing from a plist setup" I dreamed out loud about in one of my last emails... >> - Then we encounter another problem. BCSequence should be a ancestor >> class >> that devides in aminoacid, dna or rna sequence subclasses. Now you >> have >> something mixed, do we incorporate translations into the sequence? I >> guess >> not, these sequences should be mixed, either pure dna or pure >> protein. If we >> do, RNA translation must be there as well. So the aminoacid methods >> are >> strange here. The idea I would propose is that there is a shared >> translation >> util object that you could feed a dna sequence and get (in the >> requested >> frames) the translated sequences back as protein sequence objects. >> It's the >> app task to control/organize these. Likewise, one could argue this for >> complements as well -> a shared dna utils object returns you the >> complement >> sequence if you hand it a sequence. Alternatively these translations >> could be >> added as features, but in all cases there's again the "how to keep >> things in >> sync upon editing" problem. I think we should keep things as separate >> and >> clean entities as much as possible. > I was actually thinking we could have a BCGeneticCode, consisting of > BCCodons, again, extensible through .plists, to let us define the code > for > different species. Create a genetic code, hand it a dictionary, then > submit > your sequence to it, and get the amino acid sequence back. Great! The BCGeneticCode would be an argument you pass to the translation object along with the sequence: (BCSequenceProtein *)translateDNASequence: (BCSequenceDNA *)sequence usingCode: (BCGeneticCode *)code inFrames:(NSArray *)frames; with a number of convenience methods like inFrame: (int)frame that all call this method or something like that... I already did the very nice and exiting work (ahum) of creating such a plist for EnzymeX, so we have this one already ;-) BCCodons express their sequence in the BCTokens right? > And finally, I think the CVS problem is with my login in general, > rather > than CVS in particular. I was hoping to get it sorted out, but I'm > probably > going to create a new account on Monday if I don't hear something by > then. OK, well it seems indeed a problem with XCode not able to handle the keys/agent stuff SSH requires that prevents it from working directly. I don't know if you Peter can add my public key to the cvs directory access list as suggested by the snippet I send you, but I guess not. Maybe I'd better contact the people at BioInformatics about this.... Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From jtimmer at bellatlantic.net Mon Aug 16 08:37:40 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Mon, 16 Aug 2004 08:37:40 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <7D2C9FDB-EF4A-11D8-B3E0-00039345483C@bio.kuleuven.ac.be> Message-ID: Well, I'd just like to say that I'm sure all of us will go through periods where we're not contributing as much because the rest of our life is too busy, and I don't think there's ever a need for any sort of apology. I'm certainly planning on going silent in mid-September, when I'll be spending a week and a half visiting my wife's family in England and Cyprus. Cheers, John > I would like to echo Alexander's comments here. I hope to return to > real programming as soon as we have moved and unpacked. > >> Then one disclaimer here, I feel more and more guilty not having much >> time right now do really help programming. Our website makeover comes >> along nicely, but it will take some more time unfortunately. As a >> result I hope you don't get the feeling that this person is only >> complaining while you guys do all the work... Again, don't get >> offended it's all really meant with good intention. I'll try to jump >> in a.s.a.p.... > _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Mon Aug 16 08:34:42 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Mon, 16 Aug 2004 08:34:42 -0400 Subject: [Biococoa-dev] Base test In-Reply-To: <2CA0BBF3-EF71-11D8-8136-000393CFDE0C@mekentosj.com> Message-ID: >> Any custom bases added to the .plist file have to be accessed with >> + (BCSequenceDNABase *) customBase: (NSString *)baseName >> When that's called, the dictionary within is also converted to a base >> object. I still need to make a method for adding a base >> programmatically, >> but that's just going to be 90% format validation code, so it >> shouldn't be >> hard. > > But how do you create the statics for these dynamically at run time? > Because if you can, one could do the ideal "instantiate the whole thing > from a plist setup" I dreamed out loud about in one of my last > emails... Well, that's it - you don't. They're stored in a mutable dictionary (which happens to be a static). We could have done every base that way, but then we're using NSDictionary's lookup table as opposed to coding our own, so there would be speed issues for the commonly accessed bases. I thought this was a decent compromise - fast, direct access for all pre-defined bases, slower but manageable access for everything customized. John _______________________________________________ This mind intentionally left blank From a.griekspoor at nki.nl Mon Aug 16 11:41:45 2004 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Mon, 16 Aug 2004 17:41:45 +0200 Subject: [Biococoa-dev] CVS from XCode [solved] In-Reply-To: <680C8985-EF14-11D8-927C-000393CFDE0C@nki.nl> References: <680C8985-EF14-11D8-927C-000393CFDE0C@nki.nl> Message-ID: I'm so ashamed! I was looking around for help on the web to get my XCode working. So the obvious thing to do was Googling. - Hey, someone knows how.... - An email from a certain Jim Balhoff, hmmm rings a bell doesn't it? - Hey, it from the BioCocoa archives! - Hey, it's directed to me ;-) - OOPS! [BEGIN] > > - There is one more thing you should do in order to make Xcode work > with the CVS repository over ssh: > If you?d like to use ssh without having to type your password every > time (useful if you are accessing CVS via Xcode, for example), you?ll > want to create a public/private key pair with the ssh-keygen command. > The advantage of doing this is that your scripts will be able to run > without human intervention. The disadvantage is that anyone who can > access your account on your local Mac OS X box will also be able to > access those remote servers which have stored your public key. There is another way to do this, which works very well. Use SSHPassKey to store your password in the Mac OS X keychain. It can be configured so that Xcode will use this to get the cvs password (you don't need to install the Project Builder plug-in mentioned in the ReadMe). Look here: - Jim [END] That did the trick indeed. The things I did to make XCode work with BioInformatics CVS: 1 Download the program Jim pointed me to: http://www.codefab.com/unsupported/SSHPassKey_v1.1-1.dmg 2 Copy the app to your applications folder and run it 3 Click "Configure Login Environment to use SSHPassKey 4 Log off and back in 5 Open terminal and set the environment variables and your loginname: For Bash: export CVS_RSH='ssh' export CVSROOT=':ext:loginname at bioinformatics.org:/cvsroot' For Tsch: setenv CVS_RSH 'ssh' setenv CVSROOT ':ext:loginname at bioinformatics.org:/cvsroot' 6 Do a checkout: cvs checkout BioCocoa your password will be asked (and stored by SSHPassKey in the keychain) 7 Open the project in Xcode and then follow the steps in this documents: http://developer.apple.com/documentation/DeveloperTools/Conceptual/ Xcode_SCM/ 8 SSHPassKey will ask you for your password now instead of XCode. That should do it!! Thanks a lot Jim, sorry for not recognizing your hints earlier... Cheers, Alex Op 16-aug-04 om 1:39 heeft Alexander Griekspoor het volgende geschreven: > John, > > Did you get the CVS working for you already? Perhaps you or one of the > other can help me out here. > Using Apple's document here: > http://developer.apple.com/documentation/DeveloperTools/Conceptual/ > Xcode_SCM/ProjectSetup/ProjectSetup.html#//apple_ref/doc/uid/ > TP40001208 > I managed to checkout the framework and import it in XCode. Everything > works fine (checked out ok using my username and password) but now > XCode keeps asking me for authentication which fails every time. It > seems this is a "known" issue if I believe this snippet from the > following link: http://maczealots.com/tutorials/xcode-cvs/ > > [BEGIN] > You are also going to need to set SSH up so that it won't need to > enter a password when accessing your repository's machine. XCode has > some sort of issues when it comes to entering an authorization > password that I could not get past. On your laptop enter the following > commands > ssh-keygen -t dsa > > Hit enter to accept the default values for each prompt. Next, you will > need to copy the contents of the id_dsa.pub file so you can paste it > into the authorized_keys file on the repository machine. On the client > machine: > cat ~/.ssh/id_dsa.pub > (Copy the output) > ssh justin at Gavin.local > vi ~/.ssh/authorized_keys > (Paste previous output and save the file) > > If you can now ssh to the repository machine without entering a > password, you should have no trouble with XCode asking for a password. > > [END] > > I don't believe I have access to that file, but perhaps someone else > has the answer to this question.... (I did activate ssh connections in > Xcode) > Thanks! > Alex > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > Windows is a 32-bit patch to a 16-bit shell for an 8-bit > operating system, written for a 4-bit processor by a 2- > bit company without 1 bit of sense. > > ********************************************************* > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From kvddrift at earthlink.net Mon Aug 16 17:40:41 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 16 Aug 2004 17:40:41 -0400 Subject: [Biococoa-dev] Nomenclature Message-ID: Hi all, I am probably the oldest, so I will make the decision on the naming :) After reading Alex' convincing plea for BCSymbol, I suggest we use that as the super class for nucletides, amino acids, etc: BCSymbol | | -------------BCNucleotideDNA | | -------------BCNucleotideRNA | | -------------BCAminoAcid | | -------------??? For the sequence we'll use: BCSequence | | -------------BCSequenceDNA | | -------------BCSequenceRNA | | -------------BCSequenceProtein | | ---------??? Tonight I will commit a (for now empty) BCSymbol class and update BCSequence accordingly. John, can you update your BCSequenceDNABase class to reflect these changes too? Cheers, - Koen. From jtimmer at bellatlantic.net Wed Aug 18 08:40:53 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Wed, 18 Aug 2004 08:40:53 -0400 Subject: [Biococoa-dev] Nomenclature In-Reply-To: Message-ID: I've created a new account at bioinformatics, this time jrtimmer. Hopefully, this time it'll work better. > I am probably the oldest, so I will make the decision on the naming :) Okay, I've got to ask - how old are you? The following methods and their associated variables can probably be shifted up to the BCSymbol class (I'll do this later today): - (NSString *)symbolString; - (NSString *) savableRepresentation; - (NSString *) description; If we're only going to be working with items that have single letter codes, we could also move - (unichar) symbol; As well. The question I have is whether we'll ever allow anyone to rename the symbols? For the DNA nucleotides, I've overridden these methods to cause them to have no effect them, since allowing a rename could lead to all sorts of bizarre behavior. Is there any case where it wouldn't? Does anyone know a good class diagramming program? It would be great to have the following drawn out for the users. It'll help them know where to look for method documentation - > BCSymbol > | > | > -------------BCNucleotideDNA > | > | > -------------BCNucleotideRNA > | > | > -------------BCAminoAcid > | > | > -------------??? > > > For the sequence we'll use: > > BCSequence > | > | > -------------BCSequenceDNA > | > | > -------------BCSequenceRNA > | > | > -------------BCSequenceProtein > | > | > ---------??? > From mek at mekentosj.com Wed Aug 18 09:07:30 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Wed, 18 Aug 2004 15:07:30 +0200 Subject: Fwd: [Biococoa-dev] Nomenclature Message-ID: <8FBC570C-F117-11D8-9DA7-000393CFDE0C@mekentosj.com> > I've created a new account at bioinformatics, this time jrtimmer. > Hopefully, this time it'll work better. Welcome John! Kiddin' ;-) > >> I am probably the oldest, so I will make the decision on the naming :) >> Okay, I've got to ask - how old are you? That's unfair you should tell yours as well then ;-) And to prevent the obvious, I'm from 1977, that would make me 27.... > > > The following methods and their associated variables can probably be > shifted > up to the BCSymbol class (I'll do this later today): > - (NSString *)symbolString; > - (NSString *) savableRepresentation; > - (NSString *) description; Ok! > > If we're only going to be working with items that have single letter > codes, > we could also move > - (unichar) symbol; > As well. Shall we make that (unichar) token; to prevent the name circle you warned us for? > The question I have is whether we'll ever allow anyone to rename the > symbols? For the DNA nucleotides, I've overridden these methods to > cause > them to have no effect them, since allowing a rename could lead to all > sorts > of bizarre behavior. Is there any case where it wouldn't? I guess not. Which methods are you talking about? We don't implement the set methods do we, only the get ones. Perhaps I don't see the point your making here. > Does anyone know a good class diagramming program? It would be great > to > have the following drawn out for the users. It'll help them know > where to > look for method documentation - I believe Koen or Jim pointed me to Autograf: http://autograf.sourceforge.net/ which produces very nice results. Documentation will be something we have to look into quite some more... Autograf should work together with XCode I believe, haven't tested it though. > >> BCSymbol >> | >> | >> -------------BCNucleotideDNA >> | >> | >> -------------BCNucleotideRNA >> | >> | >> -------------BCAminoAcid >> | >> | >> -------------??? >> >> >> For the sequence we'll use: >> >> BCSequence >> | >> | >> -------------BCSequenceDNA >> | >> | >> -------------BCSequenceRNA >> | >> | >> -------------BCSequenceProtein >> | >> | >> ---------??? >> > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* From a.griekspoor at nki.nl Wed Aug 18 09:13:04 2004 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Wed, 18 Aug 2004 15:13:04 +0200 Subject: [Biococoa-dev] Nomenclature In-Reply-To: <8FBC570C-F117-11D8-9DA7-000393CFDE0C@mekentosj.com> References: <8FBC570C-F117-11D8-9DA7-000393CFDE0C@mekentosj.com> Message-ID: <569441AE-F118-11D8-9DA7-000393CFDE0C@nki.nl> Reminded by the screenshot on the autograf homepage, XCode 2 from Tiger has some nice features as well.... (NDA) Op 18-aug-04 om 15:07 heeft Alexander Griekspoor het volgende geschreven: >> I've created a new account at bioinformatics, this time jrtimmer. >> Hopefully, this time it'll work better. > > Welcome John! Kiddin' ;-) > >> >>> I am probably the oldest, so I will make the decision on the naming >>> :) >>> Okay, I've got to ask - how old are you? > > That's unfair you should tell yours as well then ;-) > And to prevent the obvious, I'm from 1977, that would make me 27.... > >> >> >> The following methods and their associated variables can probably be >> shifted >> up to the BCSymbol class (I'll do this later today): >> - (NSString *)symbolString; >> - (NSString *) savableRepresentation; >> - (NSString *) description; > > Ok! > >> >> If we're only going to be working with items that have single letter >> codes, >> we could also move >> - (unichar) symbol; >> As well. > > Shall we make that (unichar) token; to prevent the name circle you > warned us for? > >> The question I have is whether we'll ever allow anyone to rename the >> symbols? For the DNA nucleotides, I've overridden these methods to >> cause >> them to have no effect them, since allowing a rename could lead to >> all sorts >> of bizarre behavior. Is there any case where it wouldn't? > > I guess not. Which methods are you talking about? We don't implement > the set methods do we, only the get ones. Perhaps I don't see the > point your making here. > >> Does anyone know a good class diagramming program? It would be great >> to >> have the following drawn out for the users. It'll help them know >> where to >> look for method documentation - > > I believe Koen or Jim pointed me to Autograf: > http://autograf.sourceforge.net/ which produces very nice results. > Documentation will be something we have to look into quite some > more... Autograf should work together with XCode I believe, haven't > tested it though. > >> >>> BCSymbol >>> | >>> | >>> -------------BCNucleotideDNA >>> | >>> | >>> -------------BCNucleotideRNA >>> | >>> | >>> -------------BCAminoAcid >>> | >>> | >>> -------------??? >>> >>> >>> For the sequence we'll use: >>> >>> BCSequence >>> | >>> | >>> -------------BCSequenceDNA >>> | >>> | >>> -------------BCSequenceRNA >>> | >>> | >>> -------------BCSequenceProtein >>> | >>> | >>> ---------??? >>> >> >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> >> > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > E-mail: a.griekspoor at nki.nl > AIM: mekentosj at mac.com > Web: http://www.mekentosj.com > > EnzymeX - To cut or not to cut > http://www.mekentosj.com/enzymex > > ********************************************************* > > > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > E-mail: a.griekspoor at nki.nl > AIM: mekentosj at mac.com > Web: http://www.mekentosj.com > > EnzymeX - To cut or not to cut > http://www.mekentosj.com/enzymex > > ********************************************************* > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From jtimmer at bellatlantic.net Wed Aug 18 09:41:58 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Wed, 18 Aug 2004 09:41:58 -0400 Subject: [Biococoa-dev] Nomenclature In-Reply-To: <8FBC570C-F117-11D8-9DA7-000393CFDE0C@mekentosj.com> Message-ID: >>> I am probably the oldest, so I will make the decision on the naming :) >>> Okay, I've got to ask - how old are you? > > That's unfair you should tell yours as well then ;-) > And to prevent the obvious, I'm from 1977, that would make me 27.... I was born 12/12/66, which makes me 37 for a few months. That's why I thought there was half a chance I might be the oldest, and can claim the decision making ability. > Shall we make that (unichar) token; to prevent the name circle you > warned us for? Well, if we're not doing the full inheritance in the class name, every instantiated version will be a nucleotide or amino acid, so this won't be as much of a problem. >> The question I have is whether we'll ever allow anyone to rename the >> symbols? For the DNA nucleotides, I've overridden these methods to >> cause >> them to have no effect them, since allowing a rename could lead to all >> sorts >> of bizarre behavior. Is there any case where it wouldn't? > > I guess not. Which methods are you talking about? We don't implement > the set methods do we, only the get ones. Perhaps I don't see the point > your making here. Koen put them in the BCSymbol class, so the nucleotide classes would inherit them, which is why I overrode them. JT _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Wed Aug 18 14:57:22 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Wed, 18 Aug 2004 14:57:22 -0400 Subject: [Biococoa-dev] Nomenclature In-Reply-To: Message-ID: Okay, this is much closer to being done. Attached, you should find an updated version of BCSymbol, and BCNucleotideDNA in all its glory, with full headerdoc. Also, the .plist file has all the bases defined, though not the non-base and gap. It all compiles without error, but will almost certain blow up in the face of anyone who tries to use it. I will try to fill out BCSequenceDNA over the next couple of days, and then try to use this and see what happens. John _______________________________________________ This mind intentionally left blank -------------- next part -------------- A non-text attachment was scrubbed... Name: BCNucleotideDNA.m Type: application/octet-stream Size: 18360 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: BCSymbol.h Type: application/octet-stream Size: 515 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: BCSymbol.m Type: application/octet-stream Size: 836 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: BCNucleotideDNA.h Type: application/octet-stream Size: 8963 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: base template.plist Type: application/octet-stream Size: 13098 bytes Desc: not available URL: From kvddrift at earthlink.net Wed Aug 18 17:25:28 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 18 Aug 2004 17:25:28 -0400 Subject: [Biococoa-dev] Nomenclature In-Reply-To: References: Message-ID: <20892EB0-F15D-11D8-9DA8-003065A5FDCC@earthlink.net> On Aug 18, 2004, at 8:40 AM, John Timmer wrote: > Okay, I've got to ask - how old are you? > > 39 :) - Koen. From kvddrift at earthlink.net Wed Aug 18 20:19:52 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 18 Aug 2004 20:19:52 -0400 Subject: [Biococoa-dev] Nomenclature In-Reply-To: References: Message-ID: <7D7ECD6D-F175-11D8-9DA8-003065A5FDCC@earthlink.net> On Aug 18, 2004, at 9:41 AM, John Timmer wrote: >> I guess not. Which methods are you talking about? We don't implement >> the set methods do we, only the get ones. Perhaps I don't see the >> point >> your making here. > Koen put them in the BCSymbol class, so the nucleotide classes would > inherit > them, which is why I overrode them. > The initWithName that I put in BCSymbol is for the singleletter code, so I guess that can be changed to initWithSymbol. If we don't want to use an accessor to set the symbol (I agree), you think this will work: - (id)initWithSymbol:(unichar)aChar { if ( self = [super init] ) { symbol = aChar; } return self; } I am not sure if the line symbol = aChar; is legal. - Koen. From kvddrift at earthlink.net Wed Aug 18 20:40:04 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 18 Aug 2004 20:40:04 -0400 Subject: [Biococoa-dev] Nomenclature In-Reply-To: References: Message-ID: <4FEA9116-F178-11D8-9DA8-003065A5FDCC@earthlink.net> On Aug 18, 2004, at 2:57 PM, John Timmer wrote: > Okay, this is much closer to being done. Attached, you should find an > updated version of BCSymbol, and BCNucleotideDNA in all its glory, > with full > headerdoc. Also, the .plist file has all the bases defined, though > not the > non-base and gap. > John, I am confused by your use of symbolString. I think it is the same what I used for 'name', but I am not sure. Or did you mean this: unichar symbol // 'C' NSString *symbolString // @"C" NSString *name // @"Cytidine" Let's make sure we are using the same definitions. Also, make sure that your BCNucleotideDNA class is a subclass of BCSymbol, not NSObject. - Koen. From mek at mekentosj.com Thu Aug 19 03:12:28 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 19 Aug 2004 09:12:28 +0200 Subject: [Biococoa-dev] Nomenclature In-Reply-To: <7D7ECD6D-F175-11D8-9DA8-003065A5FDCC@earthlink.net> References: <7D7ECD6D-F175-11D8-9DA8-003065A5FDCC@earthlink.net> Message-ID: <215FDFCF-F1AF-11D8-99FA-000393CFDE0C@mekentosj.com> Nice work John! I agree with Koen on the following snippet: > > - (id)initWithSymbol:(unichar)aChar > { > if ( self = [super init] ) > { > symbol = aChar; > } > > return self; > } > > I am not sure if the line > > symbol = aChar; > > is legal. If we really want to follow the guidelines, we would still have the accessor methods, but @private. Then you can do: [self setSymbol: aChar]; in the init method (I don't see why it shouldn't work by the way). Finally, two nitpicking remarks about the plist: - you could take aMino for M and Keto for K (the official abbreviation), although it's a bit odd to see that some have names and others only a letter, but that's the way it is. - single base should be Single base (capitalized) (esthetic reason only) - Compelement should be spelled Complement ;-) Nice work on the headerdoc as well! Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* From jtimmer at bellatlantic.net Thu Aug 19 12:39:48 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 19 Aug 2004 12:39:48 -0400 Subject: [Biococoa-dev] Nomenclature In-Reply-To: <4FEA9116-F178-11D8-9DA8-003065A5FDCC@earthlink.net> Message-ID: An idea on init options - My sense is that when you can't create anything useful without a full set of information about what you're creating, the best thing to do is make an "initWithDictionary" method, and document the format of the dictionary required. I think this will be true for pretty much all our individual BCSymbols - certainly, it's true for bases and amino acids. For a basic "init" call, maybe we could just return a non-base or non-amino acid item? > I am confused by your use of symbolString. I think it is the same what > I used for 'name', but I am not sure. Or did you mean this: > > unichar symbol // 'C' > NSString *symbolString // @"C" > NSString *name // @"Cytidine" > > Let's make sure we are using the same definitions. Ah, I see the problem. For name, I used the full chemical name (ie - adenosine, cysteine). For the symbol, I used the one letter code. Which way do people think makes more sense? > > Also, make sure that your BCNucleotideDNA class is a subclass of > BCSymbol, not NSObject. It should have been done - if it wasn't please let me know. The new account seems to work fine (thanks, Peter!), so I'm going to try to set up CVS with the MAIN branch now. Or I may just spend time reading the man page for CVS.... Cheers, John _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Thu Aug 19 13:26:04 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 19 Aug 2004 13:26:04 -0400 Subject: [Biococoa-dev] CVS In-Reply-To: Message-ID: Okay, I followed the email Alex sent a while back, and I did manage to get CVS to work. The problem I'm having now is that the project won't build, and can't seem to find some of its files (several were red, and when re-found with the "get info" panel, can't be imported by other files). I'd also like to know which version I should be editing. I seem to be on version 1.3, but the release version is 1.5. I checked, but no other versions seem to show up in the SCM dialogs. Please forgive my newbie-ness. It will all pay off in the end, I'm sure... Cheers, JT _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Thu Aug 19 13:50:56 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 19 Aug 2004 13:50:56 -0400 Subject: [Biococoa-dev] CVS In-Reply-To: Message-ID: Lest there be some confusion, I did a clean checkout, and then opened the project, and the attached screenshot is the result _______________________________________________ This mind intentionally left blank -------------- next part -------------- A non-text attachment was scrubbed... Name: Picture 2.pdf Type: application/pdf Size: 49080 bytes Desc: not available URL: From mek at mekentosj.com Thu Aug 19 14:47:12 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 19 Aug 2004 20:47:12 +0200 Subject: [Biococoa-dev] CVS In-Reply-To: References: Message-ID: <2EB404FF-F210-11D8-A0B0-000393CFDE0C@mekentosj.com> John, I have exactly the same thing in the version I checked out John.... No clue why things are missing, but perhaps Koen, Jim, and/or Peter know they are the ones who checked things in or set things up. > Please forgive my newbie-ness. It will all pay off in the end, I'm > sure... It already does! Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From jtimmer at bellatlantic.net Thu Aug 19 15:08:27 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 19 Aug 2004 15:08:27 -0400 Subject: [Biococoa-dev] CVS In-Reply-To: <2EB404FF-F210-11D8-A0B0-000393CFDE0C@mekentosj.com> Message-ID: > I have exactly the same thing in the version I checked out John.... > No clue why things are missing, but perhaps Koen, Jim, and/or Peter > know they are the ones who checked things in or set things up. Well, I've fixed them in my local copy, and could commit things back, but I'm afraid of screwing up the current version when it was decided we'd be moving a lot of things off to a new branch. When you do a basic checkout, do you get the main branch? If not, what version do you get? Incidentally, for those of us using SSHPassKey, I've found my life improves dramatically after having edited its info.plist to set LSUIElement to 1 (so it doesn't keep popping into and out of the dock every 10 seconds). Cheers, John _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Thu Aug 19 17:50:23 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 19 Aug 2004 23:50:23 +0200 Subject: [Biococoa-dev] CVS In-Reply-To: References: Message-ID: Arghh, this whole CVS stuff in Xcode is driving me mad.... I had it working, but now if I try to repeat the stuff I get the following error: cvs server: WARNING: global `-l' option ignored. ? build ? BioCocoa.pbproj/griek.mode1 cvs [update aborted]: -t/-f wrappers not supported by this version of CVS What now?!!? Anyway, it seems the main branch when I do a checkout, and still have the red files. > Incidentally, for those of us using SSHPassKey, I've found my life > improves > dramatically after having edited its info.plist to set LSUIElement to > 1 (so > it doesn't keep popping into and out of the dock every 10 seconds). I see what you mean, nice tip! Anyone a solution for the error above? Alex Op 19-aug-04 om 21:08 heeft John Timmer het volgende geschreven: > >> I have exactly the same thing in the version I checked out John.... >> No clue why things are missing, but perhaps Koen, Jim, and/or Peter >> know they are the ones who checked things in or set things up. > > Well, I've fixed them in my local copy, and could commit things back, > but > I'm afraid of screwing up the current version when it was decided we'd > be > moving a lot of things off to a new branch. When you do a basic > checkout, > do you get the main branch? If not, what version do you get? > > Incidentally, for those of us using SSHPassKey, I've found my life > improves > dramatically after having edited its info.plist to set LSUIElement to > 1 (so > it doesn't keep popping into and out of the dock every 10 seconds). > > Cheers, > > John > > > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Mac vs Windows 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* From jtimmer at bellatlantic.net Thu Aug 19 18:37:49 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 19 Aug 2004 18:37:49 -0400 Subject: [Biococoa-dev] CVS In-Reply-To: Message-ID: I don't know what to suggest on the error below, but most things seem to be working okay for me at the moment. I haven't updated the CVS project to fix the missing files problem, since I'm not sure I should be mucking around too much with the main branch. The other reason I'm not doing that is that the files I've exported are meant to go in subfolders, and I'm storing them there on my local copy, but using Xcode to export them sends them to the root directory. Koen, you seem to have gotten subfolders to work - any advice on this? The other question I have is in terms of deleting files - I've renamed my utils to emphasize that it works with strings and exported the new version - now both copies are there. I know I could read up on CVS and probably sort things out, but it seems that Xcode does its own thing in order to provide compatibility with three different code management systems, so I'm not sure anything I discover would apply. Once I figure out how to push files to the right locations and delete older copies, I'll push the changes I've made back to the repository, and we can (hopefully) get everyone on the same page. > Arghh, this whole CVS stuff in Xcode is driving me mad.... > I had it working, but now if I try to repeat the stuff I get the > following error: > > cvs server: WARNING: global `-l' option ignored. > ? build > ? BioCocoa.pbproj/griek.mode1 > cvs [update aborted]: -t/-f wrappers not supported by this version of > CVS > > What now?!!? > > Anyway, it seems the main branch when I do a checkout, and still have > the red files. > _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Thu Aug 19 19:02:08 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 19 Aug 2004 19:02:08 -0400 Subject: [Biococoa-dev] Nomenclature In-Reply-To: References: Message-ID: On Aug 19, 2004, at 12:39 PM, John Timmer wrote: > An idea on init options - > > My sense is that when you can't create anything useful without a full > set of > information about what you're creating, the best thing to do is make an > "initWithDictionary" method, and document the format of the dictionary > required. I think this will be true for pretty much all our individual > BCSymbols - certainly, it's true for bases and amino acids. > > For a basic "init" call, maybe we could just return a non-base or > non-amino > acid item? > My idea was to use initWithSymbol, and then use that symbol to look up the data in the the plist, and store that info in the object. We probably can have both. - Koen. From kvddrift at earthlink.net Thu Aug 19 19:08:24 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 19 Aug 2004 19:08:24 -0400 Subject: [Biococoa-dev] CVS In-Reply-To: References: Message-ID: When I was committing stuff, I did everything from the commandline, only a few days ago I got CVS and Xcode working. So when I made that subfolder, I did that from the commandline. Just as everyone else, this whole CVS thing is pretty new to me :) I posted a good link on CVS a few weeks ago on this list. I'll look later tonight, now my kids need to take a bath. - Koen. On Aug 19, 2004, at 6:37 PM, John Timmer wrote: > I don't know what to suggest on the error below, but most things seem > to be > working okay for me at the moment. > > I haven't updated the CVS project to fix the missing files problem, > since > I'm not sure I should be mucking around too much with the main branch. > The > other reason I'm not doing that is that the files I've exported are > meant to > go in subfolders, and I'm storing them there on my local copy, but > using > Xcode to export them sends them to the root directory. Koen, you seem > to > have gotten subfolders to work - any advice on this? The other > question I > have is in terms of deleting files - I've renamed my utils to > emphasize that > it works with strings and exported the new version - now both copies > are > there. > > I know I could read up on CVS and probably sort things out, but it > seems > that Xcode does its own thing in order to provide compatibility with > three > different code management systems, so I'm not sure anything I discover > would > apply. > > Once I figure out how to push files to the right locations and delete > older > copies, I'll push the changes I've made back to the repository, and we > can > (hopefully) get everyone on the same page. > > >> Arghh, this whole CVS stuff in Xcode is driving me mad.... >> I had it working, but now if I try to repeat the stuff I get the >> following error: >> >> cvs server: WARNING: global `-l' option ignored. >> ? build >> ? BioCocoa.pbproj/griek.mode1 >> cvs [update aborted]: -t/-f wrappers not supported by this version of >> CVS >> >> What now?!!? >> >> Anyway, it seems the main branch when I do a checkout, and still have >> the red files. >> > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From jtimmer at bellatlantic.net Thu Aug 19 19:58:54 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 19 Aug 2004 19:58:54 -0400 Subject: [Biococoa-dev] CVS In-Reply-To: Message-ID: Okay, it sounds like I can do certain things with the command line, and others with Xcode, so I should be set. I'll play janitor later tonight (when all you Europeans should be sleeping!) so as not to interfere with everyone's work. Hopefully, when I'm done, I'll have re-linked all the missing files and added to the folder structure in place. If this is a problem for anyone, please let me know within the next two hours or so (while I'm commuting home and making dinner). Cheers, John _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Thu Aug 19 20:26:02 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 19 Aug 2004 20:26:02 -0400 Subject: [Biococoa-dev] CVS In-Reply-To: References: Message-ID: <849E992F-F23F-11D8-9DA8-003065A5FDCC@earthlink.net> Hi, I think I know what's going on. If we add files to the project, and commit them with CVS, we also need to update the .pbproj file, otherwise the pbproj in CVS doesn't know it needs to include those files. > Okay, it sounds like I can do certain things with the command line, and > others with Xcode, so I should be set. I'll play janitor later tonight > (when all you Europeans should be sleeping!) Well, I'm European, but live in NC. But go ahead with the cleanup. I won't have much time to do anything tonight. - Koen. From kvddrift at earthlink.net Thu Aug 19 20:28:02 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 19 Aug 2004 20:28:02 -0400 Subject: [Biococoa-dev] CVS In-Reply-To: References: Message-ID: On Aug 19, 2004, at 7:08 PM, Koen van der Drift wrote: > I posted a good link on CVS a few weeks ago on this list. Here it is: - Koen. From jtimmer at bellatlantic.net Thu Aug 19 23:48:38 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 19 Aug 2004 23:48:38 -0400 Subject: [Biococoa-dev] CVS In-Reply-To: <849E992F-F23F-11D8-9DA8-003065A5FDCC@earthlink.net> Message-ID: Okay, currently if you do a clean checkout, you get a project with no missing files and no errors during build. I've rearranged the folder structure a bit and moved a lot of things into sub-folders (I'm thinking of creating a "Resources" folder, too). Given the number of changes and the new folder structure, I'd recommend doing a clean checkout (of course, keeping a separate copy around in case you've got internal changes you haven't committed). My general take on things is that Xcode's CVS interface handles everything other than folder structures - by default, it puts any new files at the root level, regardless of where they actually are. Given that, it's best to deal with placing the new folders and the files in them into the CVS repository from the command line (thanks for the link, Koen!). The process looks like: create folders and nested files in the Finder/Xcode close the project add them to the repository in the command line reopen the project - everything should just work I've made the groups in the project reflect the directory structure (it makes the most sense), but the groups don't necessarily reflect the folder structure - they're stored internally in the project file, and that needs to be updated for changes in grouping to stick. That's what works tonight, at least. The downside of all of this is that I did a LOT of individual commits to making sure things were working at intermediate steps, so many files went up a few versions. > Well, I'm European, but live in NC. But go ahead with the cleanup. I > won't have much time to do anything tonight. Ah, I was wondering whether Earthlink had a European business. Are you at Duke with Jim? Your source repository janitor - John _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Fri Aug 20 13:33:52 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 20 Aug 2004 13:33:52 -0400 Subject: [Biococoa-dev] BCSymbol Nomenclature In-Reply-To: Message-ID: A modest proposal, since I'm thinking of header doc-ing everything this evening (and I'm the second oldest, so it's my turn ;) : symbol provides a 1 letter code as a unichar symbolString provides that as a string name provides the chemical name for the item We move all initialization methods into the subclasses, since Koen and I are writing them and handling them somewhat differently. We'll drop the "setName:" method, since Koen doesn't want to do that either. I'd like to create a standard "gap" and "non-chemical" symbol for the bases, and thought they should parallel what's done with amino acids (the non-chemical is for situations where a long string is provided and needs to be translated into objects - the user may need to know where problems are). I also don't want them to conflict with a "stop" amino acid, which would be needed for translating an entire frame. For stop, I propose: Symbol: X Name: stop For gap: symbol: - name: gap For non-chemical: name: undefined symbol: ? I'm not sold on any of these (especially the last), so I invite comments. Cheers, John _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Fri Aug 20 14:17:21 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 20 Aug 2004 20:17:21 +0200 Subject: [Biococoa-dev] BCSymbol Nomenclature In-Reply-To: References: Message-ID: <2DE2C4EE-F2D5-11D8-B4F5-000393CFDE0C@mekentosj.com> John, My comments as the youngest: > symbol provides a 1 letter code as a unichar > symbolString provides that as a string > name provides the chemical name for the item Ok by me! > We move all initialization methods into the subclasses, since Koen and > I are > writing them and handling them somewhat differently. I'm not in favour of this one, can you identify what exactly you do differently? I would strongly ask you guys to agree on one of the two implementations as both classes are descendents of BCSymbol it would be not more than logical to keep implementations as similar as possible as well. The aminoacids are encoded through a plist as well aren't they? > We'll drop the "setName:" method, since Koen doesn't want to do that > either. Please leave the accessor methods, but make them private. This is the recommended way apple advises us to go. Things are more straight forward for people to read the code and prevents making mistakes. Thus in the .m file: @private -(void)setName:(NSString *)name; @end init... [self setName: @"blabla"]; > I'd like to create a standard "gap" and "non-chemical" symbol for the > bases, > and thought they should parallel what's done with amino acids (the > non-chemical is for situations where a long string is provided and > needs to > be translated into objects - the user may need to know where problems > are). > I also don't want them to conflict with a "stop" amino acid, which > would be > needed for translating an entire frame. > > For stop, I propose: > Symbol: X > Name: stop Commonly the symbol for stop is an asterix "*", please use that one. Also a stop is kind of an ambiguous aminoacid like W is an ambiguous base. Do we provide this one in addition to the three stop codons we know? Also, how do we implement the fact that in one species something can be a stop codon, but in another not? This influences the symbol right? > For gap: > symbol: - > name: gap > For non-chemical: > name: undefined > symbol: ? Both ok! Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Mac vs Windows 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* From jtimmer at bellatlantic.net Fri Aug 20 15:43:57 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 20 Aug 2004 15:43:57 -0400 Subject: [Biococoa-dev] BCSymbol Nomenclature In-Reply-To: <2DE2C4EE-F2D5-11D8-B4F5-000393CFDE0C@mekentosj.com> Message-ID: >> We move all initialization methods into the subclasses, since Koen and >> I are >> writing them and handling them somewhat differently. > > I'm not in favour of this one, can you identify what exactly you do > differently? I would strongly ask you guys to agree on one of the two > implementations as both classes are descendents of BCSymbol it would be > not more than logical to keep implementations as similar as possible as > well. The aminoacids are encoded through a plist as well aren't they? Koen's dictionary is symbol based, and I could switch to that easily, and he could add a chemical name to his. I could change my "initWithDictionary" to an "initWithName" and look things up in the dictionary as he does. All that would do would be to provide parallel method names, though - they'd still wind up doing very different things, because the information content of the individual BCSymbols is going to be quite different. Compare his: - (void)setHydropathyValues To - (BCNucleotideDNA *) initWithDictionary: (NSDictionary *)entry To see what I mean. Even if we force a bunch of parallels, there may still be significant differences between how things work. Koen can initialize the amino acids individually as needed, whereas I need to initialize the bases all at once because of the whole complementation issue that amino acids don't have. We could make Koen's initialization code work the same way mine does, but I think my code's a bit ugly and hard to follow, while Koen's current stuff is much cleaner. I see the point that having as many parallels as possible would make it easier for new people to come in and work on the code, but I'm just not sure how easily that would work in this case. One thought I did have about the BCSymbol class - We could move the .plist dictionary loading and validation code there. After all, if one is missing or damaged, I wouldn't trust the other, and we could have a common exception throw for indications of bundle damage. Does this sound good? > Please leave the accessor methods, but make them private. This is the > recommended way apple advises us to go. Things are more straight > forward for people to read the code and prevents making mistakes. Thus > in the .m file: > > @private > -(void)setName:(NSString *)name; > @end > > init... > [self setName: @"blabla"]; Is there any reason this is preferred to : name = [entry copy]; If you don't want the name changed, I'd prefer to not make any way of doing so short of creating a new object. But maybe that's just because I always assume maliciousness on the part of my users.... > Commonly the symbol for stop is an asterix "*", please use that one. > Also a stop is kind of an ambiguous aminoacid like W is an ambiguous > base. Do we provide this one in addition to the three stop codons we > know? Also, how do we implement the fact that in one species something > can be a stop codon, but in another not? This influences the symbol > right? Well, that's where we use a BCCodonSet - send the translation method both the sequence and the set. We could have an enumeration of sets (BCUniversalCodonSet, BCVertebrateCodonSet, etc.) corresponding to integers, as Apple does for many things, and have the sets defined in a .plist, which is working well for us. How a program chooses which codon set to use, however, isn't our problem - we just provide the facility to do so. And * is fine with me. _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Fri Aug 20 17:55:15 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 20 Aug 2004 17:55:15 -0400 Subject: [Biococoa-dev] BCSymbol Nomenclature In-Reply-To: References: Message-ID: <9E10CC30-F2F3-11D8-BB8E-003065A5FDCC@earthlink.net> On Aug 20, 2004, at 1:33 PM, John Timmer wrote: > symbol provides a 1 letter code as a unichar > symbolString provides that as a string > name provides the chemical name for the item > I agree, I will update my classes accordingly, and use private set-methods. - Koen. From kvddrift at earthlink.net Fri Aug 20 17:55:10 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 20 Aug 2004 17:55:10 -0400 Subject: [Biococoa-dev] BCSymbol Nomenclature In-Reply-To: References: Message-ID: <9B32B5A0-F2F3-11D8-BB8E-003065A5FDCC@earthlink.net> On Aug 20, 2004, at 3:43 PM, John Timmer wrote: >> I'm not in favour of this one, can you identify what exactly you do >> differently? I would strongly ask you guys to agree on one of the two >> implementations as both classes are descendents of BCSymbol it would >> be >> not more than logical to keep implementations as similar as possible >> as >> well. The aminoacids are encoded through a plist as well aren't they? I agree with Alex, we should try to put as much as possible in BCSequence and BCSymbol, specialized stuff can go in the subclasses. > Koen's dictionary is symbol based, and I could switch to that easily, > and he > could add a chemical name to his. I could change my > "initWithDictionary" to > an "initWithName" and look things up in the dictionary as he does. I suggest to rename initWithName to initWithSymbol. But we can have both an initWithDictionary, and initwithSymbol class. Put the same stuff in BCSymbol, and then the subclass can call [super initWithSymbol], etc and do any additional specialized stuff. John, in what case would you call initWithDictionary? If I create a sequence from a string or other BCSequence I have to iterate through each symbol and then call initwithSymbol passing that symbol, so I am not sure in what situation one needs to call initWithDictionary. > > I see the point that having as many parallels as possible would make it > easier for new people to come in and work on the code, but I'm just > not sure > how easily that would work in this case. See above, just override the same method in your subclass and add any specialized stuff. Additionally I wrote this before I read this email, but it is on the same subject, so I just put it here: We should be thinking about a better synchronization between the methods in BCSequence and BCSequenceDNA.The following methods in BCSequenceDNA have an equivalent in BCSequence: -(NSArray*)sequenceBaseArray <-> -(NSArray *)sequence -(void)setSequenceBaseArray <-> -(void)setSequence -(void)removeBasesInRange <-> -(void)removeSymbolsInRange -(BCSequenceDNA *)sequenceInRange <-> -(NSArray *)partialSequence -(int)length <-> -(unsigned int)numberOfSymbols Most of the above can be removed from BCSequenceDNA and use the equivalent method in BCSequence. The nice thing about inheretance is that you can just ask for a sequence, no need to specify if it is a DNA, protein, etc. The method that calls it can then if needed do a validation to make sure that the right symbols are in the sequence. Also, I suggest that the two init methods return (id) instead of BCSequenceDNA. If you agree, I will update the BCSymbol, BCAminoAcid and BCSequence class later tonight. - Koen. From kvddrift at earthlink.net Fri Aug 20 19:08:03 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 20 Aug 2004 19:08:03 -0400 Subject: [Biococoa-dev] 2 BCSequenceDNA questions Message-ID: John, Why are you using a separate theSequence in this class? Just use the 'sequence' from BCSequence, and you can remove - (BCSequenceDNA *) init { self = [super init]; if ( self != nil ) theSequence = [[NSArray array] retain]; return self; } and dealloc as well. Also any reason for commenting out almost everything? - Koen. (I just updated some stuff in BCSymbol and BCAminoAcid, but my local copy was not updated to the latest commits from John about an hour earlier. CVS gave all kinds of errors, until I figured out what went wrong. Not sure how we can prevent this, though) From jtimmer at bellatlantic.net Fri Aug 20 19:46:25 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 20 Aug 2004 19:46:25 -0400 Subject: [Biococoa-dev] 2 BCSequenceDNA questions In-Reply-To: Message-ID: I've actually deleted them and committed the changes, because I noticed that, too. You may want to check the status before doing more work. The point for those files was just to put my ideas in place so you could see which ones overlapped with yours for the protein sequence and move them to the base class. I didn't implement most things, so I commented out the code just to make sure it didn't mess with the compiling. One problem I'm foreseeing, though: When we init a sequence with a string, how do we tell it which objects to use, nucleotides or amino acids? Not sure about how to handle this gracefully... John > Why are you using a separate theSequence in this class? Just use the > 'sequence' from BCSequence, and you can remove > > - (BCSequenceDNA *) init { > self = [super init]; > if ( self != nil ) > theSequence = [[NSArray array] retain]; > return self; > } > > > and dealloc as well. > > Also any reason for commenting out almost everything? > > > - Koen. > > (I just updated some stuff in BCSymbol and BCAminoAcid, but my local > copy was not updated to the latest commits from John about an hour > earlier. CVS gave all kinds of errors, until I figured out what went > wrong. Not sure how we can prevent this, though) _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Fri Aug 20 19:59:59 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 20 Aug 2004 19:59:59 -0400 Subject: [Biococoa-dev] BCSymbol Nomenclature In-Reply-To: <9B32B5A0-F2F3-11D8-BB8E-003065A5FDCC@earthlink.net> Message-ID: >> Koen's dictionary is symbol based, and I could switch to that easily, >> and he >> could add a chemical name to his. I could change my >> "initWithDictionary" to >> an "initWithName" and look things up in the dictionary as he does. > > I suggest to rename initWithName to initWithSymbol. But we can have > both an initWithDictionary, and initwithSymbol class. Put the same > stuff in BCSymbol, and then the subclass can call [super > initWithSymbol], etc and do any additional specialized stuff. > > John, in what case would you call initWithDictionary? If I create a > sequence from a string or other BCSequence I have to iterate through > each symbol and then call initwithSymbol passing that symbol, so I am > not sure in what situation one needs to call initWithDictionary. I've updated my .plist file to be symbol based. I'll update the intialization method to be "initWithSymbol" tonight. It looks like it's all a matter of where you put the abstraction - either looking up the dictionary when you initialize the base, or looking up the dictionary and using it to initialize the base. When you create a sequence, you'd probably just go through a character at a time and use the "+ (id) baseForSymbol: (unichar)symbol" method, which in turn will get you the appropriate singleton reference, initializing it if necessary. You should never have to initialize a base manually. Feel free to delete anything from BCSequenceDNA that's held in the superclass, as I wrote in my last message. Koen, I'm going to go home and eat dinner now, so I won't get in your way for a little while. I'll work exclusively on BCNucleotideDNA and its .plist, so I should stay out of your way (I hope). Cheers, John _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Fri Aug 20 20:10:55 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 20 Aug 2004 20:10:55 -0400 Subject: [Biococoa-dev] unichar question Message-ID: <91E7FB2A-F306-11D8-BB8E-003065A5FDCC@earthlink.net> Hi, Anyone kniws why this gives a warning and how to fix it: symbolString = [[NSString alloc] initWithCharacters: [self symbol] length: 1]; passing arg 1 of `initWithCharacters:length:' makes pointer from integer without a cast thanks, - Koen. From jtimmer at bellatlantic.net Fri Aug 20 21:17:57 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 20 Aug 2004 21:17:57 -0400 Subject: [Biococoa-dev] unichar question In-Reply-To: <91E7FB2A-F306-11D8-BB8E-003065A5FDCC@earthlink.net> Message-ID: > Hi, > > Anyone kniws why this gives a warning and how to fix it: > > symbolString = [[NSString alloc] initWithCharacters: [self symbol] > length: 1]; > > passing arg 1 of `initWithCharacters:length:' makes pointer from > integer without a cast > Unichars are basically integers, so the signature of the method is wrong. The flipside is that initWithCharacters expects a pointer to the characters, so you have to use the & operand to convert it. Try the following: - (id)initWithSymbol:(unichar )aSymbol { if ( self = [super init] ) { symbol = aSymbol; symbolString = [[NSString alloc] initWithCharacters: &symbol length: 1]; } return self; } _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Fri Aug 20 21:34:25 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 20 Aug 2004 21:34:25 -0400 Subject: [Biococoa-dev] nice article Message-ID: <3C967E87-F312-11D8-BB8E-003065A5FDCC@earthlink.net> See http://www.macdevcenter.com/pub/a/mac/2004/08/20/bioinformatics.html? page=3. Is this the future of BioCocoa? ;-) - koen. From kvddrift at earthlink.net Fri Aug 20 22:13:36 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 20 Aug 2004 22:13:36 -0400 Subject: [Biococoa-dev] unichar question In-Reply-To: References: Message-ID: On Aug 20, 2004, at 9:17 PM, John Timmer wrote: > Unichars are basically integers, so the signature of the method is > wrong. > The flipside is that initWithCharacters expects a pointer to the > characters, > so you have to use the & operand to convert it. > Yep, that worked. Thanks! Another question (sorry to keep bugging you). You are using the following code the access a variable: - (NSString *) name{ return [[name copy] autorelease]; } Why not just use: - (NSString *) name{ return name; } Is there a particular reason why you use this format? Oh, and what's the difference between symbolString and savableRepresentation? thanks, - Koen. From jtimmer at bellatlantic.net Fri Aug 20 22:45:56 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 20 Aug 2004 22:45:56 -0400 Subject: [Biococoa-dev] unichar question In-Reply-To: Message-ID: > You are using the following code the access a variable: > > - (NSString *) name{ > return [[name copy] autorelease]; > } > > > Why not just use: > > - (NSString *) name{ > return name; > } > > Is there a particular reason why you use this format? If you return a pointer to the actual name (as in the 2nd case), it's possible to actually manipulate the contents of memory there in C. By returning a copy, you protect the actual instance variable. > Oh, and what's the difference between symbolString and > savableRepresentation? In the nucleotide cases, none. In the sequence classes, there will be. I just thought it would be convenient to provide a single method signature throughout the project so that people don't have to think about what to call when they need something like that. Cheers, JT _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Sat Aug 21 12:11:45 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sat, 21 Aug 2004 12:11:45 -0400 Subject: [Biococoa-dev] Molecular weights In-Reply-To: Message-ID: I've added molecular weights to all the items in the nucleotide and amino acid dictionary files. When I get a chance, I'll add the variable and write accessor methods to the BCSymbol class. Should make these sorts of calculations very easy in the BCSequence class. Koen, you may want to check the amino acid weight values - they came off the web, and may not reflect what each weighs when incorporated into a protein. Thanks, John _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Sat Aug 21 12:22:54 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 21 Aug 2004 12:22:54 -0400 Subject: [Biococoa-dev] Molecular weights In-Reply-To: References: Message-ID: <5AB743A0-F38E-11D8-BB8E-003065A5FDCC@earthlink.net> On Aug 21, 2004, at 12:11 PM, John Timmer wrote: > Koen, you may want to check the amino acid weight values - they came > off the > web, and may not reflect what each weighs when incorporated into a > protein. > These are already in the aminoacid plist. (Monoisotopic and Average). I suggest we use the same keys for the nucleotides because these are the values used in mass spectra (my background). I will find the correct values for nucleotides and add those to the base plist. Then we can move the code to set/get these values to BCSymbol. - Koen. From kvddrift at earthlink.net Sat Aug 21 15:39:33 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 21 Aug 2004 15:39:33 -0400 Subject: [Biococoa-dev] singletons Message-ID: Hi, I looked around the BioJava code to see how they implement the use of singletons for amino acids. Does anyone know where this is coded, I couldn't find it. thanks, - Koen. From mek at mekentosj.com Sat Aug 21 15:58:36 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 21 Aug 2004 21:58:36 +0200 Subject: [Biococoa-dev] singletons In-Reply-To: References: Message-ID: <7D280F59-F3AC-11D8-9520-000393CFDE0C@mekentosj.com> Koen, If I'm correct, an amino acids is of class AtomicSymbol, a subclass of Symbol defined in org.biojava.bio.symbol. If you have the source besure to checkout AtomicSymbol.java Also, take a look at Geneticcode.java The Aminoacids themselves seem to be defined in AlphabetManager.xml That's what I could find at first glance... Cheers, Alex Op 21-aug-04 om 21:39 heeft Koen van der Drift het volgende geschreven: > Hi, > > I looked around the BioJava code to see how they implement the use of > singletons for amino acids. Does anyone know where this is coded, I > couldn't find it. > > thanks, > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* From kvddrift at earthlink.net Sat Aug 21 16:27:15 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 21 Aug 2004 16:27:15 -0400 Subject: [Biococoa-dev] singletons In-Reply-To: <7D280F59-F3AC-11D8-9520-000393CFDE0C@mekentosj.com> References: <7D280F59-F3AC-11D8-9520-000393CFDE0C@mekentosj.com> Message-ID: <7DA2D714-F3B0-11D8-BB8E-003065A5FDCC@earthlink.net> What I am actually looking for is the equivalent of the code that John put in BCNucleotideDNA where he has a class method for each possible base. Before I start doing that for all + 20 amino acids, I want to be sure that that is the most effective way to do it. Somehow I have the feeling this can be much more simplified, maybe by mainiaining an array of singletons (is that what BioJava calls an alphabet?). But maybe not, therefore I was looking for the way BioJava does this. My apologies if I was not clear when I asked my question :) - Koen. On Aug 21, 2004, at 3:58 PM, Alexander Griekspoor wrote: > Koen, > > If I'm correct, an amino acids is of class AtomicSymbol, a subclass of > Symbol defined in org.biojava.bio.symbol. > If you have the source besure to checkout AtomicSymbol.java > Also, take a look at Geneticcode.java > The Aminoacids themselves seem to be defined in AlphabetManager.xml > That's what I could find at first glance... > Cheers, > Alex > > > Op 21-aug-04 om 21:39 heeft Koen van der Drift het volgende geschreven: > >> Hi, >> >> I looked around the BioJava code to see how they implement the use of >> singletons for amino acids. Does anyone know where this is coded, I >> couldn't find it. >> >> thanks, >> >> - Koen. >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> >> > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > LabAssistant - Get your life organized! > http://www.mekentosj.com/labassistant > > ********************************************************* > > > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > The requirements said: Windows 2000 or better. > So I got a Macintosh. > > ********************************************************* > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From jtimmer at bellatlantic.net Sat Aug 21 22:10:18 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sat, 21 Aug 2004 22:10:18 -0400 Subject: [Biococoa-dev] Molecular weights In-Reply-To: <5AB743A0-F38E-11D8-BB8E-003065A5FDCC@earthlink.net> Message-ID: > > These are already in the aminoacid plist. (Monoisotopic and Average). I > suggest we use the same keys for the nucleotides because these are the > values used in mass spectra (my background). I will find the correct > values for nucleotides and add those to the base plist. Then we can > move the code to set/get these values to BCSymbol. > Ooops, sorry about that. I had no idea what those terms meant - my background as a developmental biologist showing. I'll go through and delete everything tomorrow. Sorry - John _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Sun Aug 22 04:39:40 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 22 Aug 2004 10:39:40 +0200 Subject: Fwd: [Biococoa-dev] singletons Message-ID: I got an error from the mailserver at biococoa so I'll try to send it again: Got it Koen. If I look at it (again not understanding it completely as I haven't had the time to really dive in biojava yet), it seems much of that code is indeed in the (singleton object) AlphabetManager (AlphabetManager.java). Quoted from tutorial: The set of Symbol objects which may be found in a particular type of sequence data are defined in an Alphabet. It it always possible to define custom Symbols and Alphabets, but BioJava supplies a set of predefined alphabets for representing biological molecules. These are accessible through a central registry called the AlphabetManager, and through convenience methods. FiniteAlphabet dna = DNATools.getDNA(); Iterator dnaSymbols = dna.iterator(); while (dnaSymbols.hasNext()) { Symbol s = (Symbol) dnaSymbols.next(); System.out.println(s.getName()); } Quoted from source: /** * Utility methods for working with Alphabets. Also acts as a registry for * well-known alphabets. * *

* The alphabet interfaces themselves don't give you a lot of help in actually * getting an alphabet instance. This is where the AlphabetManager comes in * handy. It helps out in serialization, generating derived alphabets and * building CrossProductAlphabet instances. It also contains limited support for * parsing complex alphabet names back into the alphabets. *

* * @author Matthew Pocock * @author Thomas Down */ It seems to get the details from the xml file AlphabetManager.xml It also has the methods to create the symbols like this one: /** *

* Generate a new AtomicSymbol instance with a token, name and Annotation. *

* *

* Use this method if you wish to create an AtomicSymbol instance. Initially it * will not be a member of any alphabet. *

* * @param token the Char token returned by getToken() (ignpred as of BioJava 1.2) * @param name the String returned by getName() * @param annotatin the Annotation returned by getAnnotation() * @return a new AtomicSymbol instance * @deprecated Use the two-arg version of this method instead. */ static public AtomicSymbol createSymbol( char token, String name, Annotation annotation ) { AtomicSymbol as = new FundamentalAtomicSymbol(name, annotation); return as; } It also seems to contain the code to convert items in the xml file to symbols, though my java isn't that good here. Anyway, I already mentioned before that I very much like the idea of an intermediate Alphabet layer also in BioCocoa. In that, symbols make up alphabets, this way you for instance "solve" the problem John is having that he has to instantiate all nucleotides at once by creating the alphabets when needed (and thus automatically fill it with all singletons that belong in there). This might also answer the species specific protein problem, although I have to admit that I don;t know exactly yet. Alphabets can be used for both proteins and dna/rna as its simply a bag of symbols. Thus also solving the problem that we need acgtn for dna and acgun for RNA (uracil is still missing now), we could have a DNAAlphabet and RNAAlphabet. A lot of questions and answers can be found in the cookbook on the biojava website by the way. Quote from cookbook: In BioJava Alphabets are collections of Symbols. Common biological alphabets (DNA, RNA, protein etc) are registered with the BioJava AlphabetManager at startup and can be accessed by name. The DNA, RNA and protein alphabets can also be accessed using convenient static methods from DNATools, RNATools and ProteinTools respectively. Both of these approaches are shown in the example below import org.biojava.bio.symbol.*; import java.util.*; import org.biojava.bio.seq.*; public class AlphabetExample { public static void main(String[] args) { Alphabet dna, rna, prot; //get the DNA alphabet by name dna = AlphabetManager.alphabetForName("DNA"); //get the RNA alphabet by name rna = AlphabetManager.alphabetForName("RNA"); //get the Protein alphabet by name prot = AlphabetManager.alphabetForName("PROTEIN"); //get the protein alphabet that includes the * termination Symbol prot = AlphabetManager.alphabetForName("PROTEIN-TERM"); //get those same Alphabets from the Tools classes dna = DNATools.getDNA(); rna = RNATools.getRNA(); prot = ProteinTools.getAlphabet(); //or the one with the * symbol prot = ProteinTools.getTAlphabet(); } } Well, perhaps you get more ideas (and better) ideas when checking some of the BioJava code). Tell us if you think how they solved the problem. Again, also their tutorial and cookbook seems to give quite a bit of info (I love the way they handle things like crossAlphabets where you can for instance get all the symbols from two alphabets and also the way the create codons (which consist of three symbols, but itself is again one symbol. Thus you can create a sequence of codon symbols.) In general I agree that it feels cluncky to instantiate so many things hard coded and manually. I would love to see one of us come up with a method to completely instantiate a singleton symbol (or alphabet of symbols) from a plist, much like you would instantiate a dictionary from a plist. The problem I see preventing this is that you have to declare your statics beforehand, but maybe this is completely false. Anyone can come up with the -(id)initAlphabetFromFile: method we're looking for? ;-) Cheers, Alex Op 21-aug-04 om 21:39 heeft Koen van der Drift het volgende geschreven: > Hi, > > I looked around the BioJava code to see how they implement the use of > singletons for amino acids. Does anyone know where this is coded, I > couldn't find it. > > thanks, > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From kvddrift at earthlink.net Sun Aug 22 07:18:37 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 22 Aug 2004 07:18:37 -0400 Subject: [Biococoa-dev] singletons In-Reply-To: References: Message-ID: <039835FE-F42D-11D8-84C9-003065A5FDCC@earthlink.net> On Aug 22, 2004, at 4:39 AM, Alexander Griekspoor wrote: > In general I agree that it feels cluncky to instantiate so many things > hard coded and manually. I would love to see one of us come up with a > method to completely instantiate a singleton symbol (or alphabet of > symbols) from a plist, much like you would instantiate a dictionary > from a plist. The problem I see preventing this is that you have to > declare your statics beforehand, but maybe this is completely false. > Anyone can come up with the -(id)initAlphabetFromFile: method we're > looking for? ;-) > Alex, Thanks for all the info. I'll dive into the biojava source again :) - Koen. From kvddrift at earthlink.net Sun Aug 22 07:45:27 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 22 Aug 2004 07:45:27 -0400 Subject: [Biococoa-dev] singletons In-Reply-To: <039835FE-F42D-11D8-84C9-003065A5FDCC@earthlink.net> References: <039835FE-F42D-11D8-84C9-003065A5FDCC@earthlink.net> Message-ID: > Thanks for all the info. I'll dive into the biojava source again :) > > BTW, it would be nice if we could build the biojava code and step through it. Anyone has been able to build it with Xcode? I have used ant (from fink) but compilation stopped halfway. - Koen. From kvddrift at earthlink.net Sun Aug 22 07:55:43 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 22 Aug 2004 07:55:43 -0400 Subject: [Biococoa-dev] singletons In-Reply-To: <039835FE-F42D-11D8-84C9-003065A5FDCC@earthlink.net> References: <039835FE-F42D-11D8-84C9-003065A5FDCC@earthlink.net> Message-ID: <324B2DDD-F432-11D8-84C9-003065A5FDCC@earthlink.net> On Aug 22, 2004, at 7:18 AM, Koen van der Drift wrote: > Thanks for all the info. I'll dive into the biojava source again :) > This looks like a useful site: . Especially this page - Koen. From kvddrift at earthlink.net Sun Aug 22 09:04:20 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 22 Aug 2004 09:04:20 -0400 Subject: [Biococoa-dev] singletons In-Reply-To: References: <039835FE-F42D-11D8-84C9-003065A5FDCC@earthlink.net> Message-ID: On Aug 22, 2004, at 7:45 AM, Koen van der Drift wrote: > BTW, it would be nice if we could build the biojava code and step > through it. Anyone has been able to build it with Xcode? I have used > ant (from fink) but compilation stopped halfway. > OK, solved it. Here's what to do: 1. install ant, I am already using fink, so that was easy for me. 2. get the biojava 1.4pre1 source, and unpack it using tar -zxvf 3. cd into the biojava directory, and type 'ant' 4. open the ant-build directory, and copy biojava.jar and bytecodes.jar into ~/Library/Java/Extensions 5. Open Xcode and create a new Javatool project and add your code That's all folks. Stepping through the code only works for code I type myself, not for the biojava library. - Koen. From mek at mekentosj.com Sun Aug 22 09:59:44 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 22 Aug 2004 15:59:44 +0200 Subject: [Biococoa-dev] singletons In-Reply-To: References: <039835FE-F42D-11D8-84C9-003065A5FDCC@earthlink.net> Message-ID: <85770C44-F443-11D8-B84B-000393CFDE0C@mekentosj.com> I downloaded the source a long time ago, just threw everything in an empty xcode project just to be able to quickly browse the code and search it. Haven't build it / run it though. The link you send of BioJava is the same as the cookbook, indeed very handy.... Curious if you can find some of the tactics they use and how useful you think they could be for us... Cheers, Alex Op 22-aug-04 om 15:04 heeft Koen van der Drift het volgende geschreven: > > On Aug 22, 2004, at 7:45 AM, Koen van der Drift wrote: > >> BTW, it would be nice if we could build the biojava code and step >> through it. Anyone has been able to build it with Xcode? I have used >> ant (from fink) but compilation stopped halfway. >> > > OK, solved it. Here's what to do: > > 1. install ant, I am already using fink, so that was easy for me. > 2. get the biojava 1.4pre1 source, and unpack it using tar -zxvf > 3. cd into the biojava directory, and type 'ant' > 4. open the ant-build directory, and copy biojava.jar and > bytecodes.jar into ~/Library/Java/Extensions > 5. Open Xcode and create a new Javatool project and add your code > > That's all folks. Stepping through the code only works for code I type > myself, not for the biojava library. > > > - Koen. > > > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From jtimmer at bellatlantic.net Sun Aug 22 10:47:43 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sun, 22 Aug 2004 10:47:43 -0400 Subject: [Biococoa-dev] singletons In-Reply-To: <7DA2D714-F3B0-11D8-BB8E-003065A5FDCC@earthlink.net> Message-ID: > What I am actually looking for is the equivalent of the code that John > put in BCNucleotideDNA where he has a class method for each possible > base. Before I start doing that for all + 20 amino acids, I want to be > sure that that is the most effective way to do it. Somehow I have the > feeling this can be much more simplified, maybe by mainiaining an array > of singletons (is that what BioJava calls an alphabet?). But maybe > not, therefore I was looking for the way BioJava does this. My > apologies if I was not clear when I asked my question :) > Well, it's very easy to avoid the proliferation of methods. Simply have a static NSDictionary with all the bases/aa's, and have a single accessor method that pulls the base/aa out of the dictionary based on a name/symbol. What I was trying to avoid by not doing that was the cost in terms of processing time of doing the lookup in the dictionary, which will get substantial once you start iterating over thousands of Symbols. I'd like this to scale well to BAC sized items (800,000 Symbols or so). Proteins don't often go above about 2000 aa's, so this may not be much of a worry for you. One thing that does bug me is the need for an "if" statement in each method - longer term, I was thinking that it might be nice to have a single initialization method required before using Symbols. That way, any delay during initialization could be accompanied by a notification to the user - "Loading Bases and Amino Acids..." John _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Sun Aug 22 14:01:18 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sun, 22 Aug 2004 14:01:18 -0400 Subject: [Biococoa-dev] BCSequence In-Reply-To: Message-ID: This is primarily directed to Koen, but the answers will be relevant to everybody, so I'll post it to the list: I'm looking over BCSequence in order to start implementing BCSequenceDNA. There's a couple of places where I'm not entirely sure what your long term intentions are, so I thought I'd clarify things before digging in. I'm not entirely sure about the use of the following variables - SequenceCountedSet range startposition, endposition It looks like you use them to report back information generated by calling other methods, with a process that looks like: Call method, store information in these variables Call other methods to get information Reset information when 1st method is called again Is that correct? My big worry about this approach is what might happen with a threaded app, where the first method might be called from a different thread before the information could be retrieved. The "sequenceString" variable appears to be separate from the sequence itself - what is it used for? I know Alex asked, but I can't seem to find the email. If we're going to have a generic "initWithString" method, we're going to have to define some order of preference for what type of sequence to generate. Maybe try DNA, if that doesn't work for all letters, try RNA, if that doesn't work, try protein? If we keep a BCSequenceType variable, then something like: - (void)insertSymbolsFromString:(NSString *)s atIndex:(int)index; Can use that in order to decide what type of sequence to add. For a consistent naming convention: When setting/getting an array of symbols, use SymbolArray in the method name. When setting/getting a sequence object, use Sequence in the method name. It's probably worth creating both types of methods for all the methods like "partialSequence", since we don't know what's going to be most useufl for the users. Think that's it for now - John _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Sun Aug 22 14:23:05 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 22 Aug 2004 14:23:05 -0400 Subject: [Biococoa-dev] BCSequence In-Reply-To: References: Message-ID: <4FA5B93E-F468-11D8-84C9-003065A5FDCC@earthlink.net> On Aug 22, 2004, at 2:01 PM, John Timmer wrote: > I'm looking over BCSequence in order to start implementing > BCSequenceDNA. > There's a couple of places where I'm not entirely sure what your long > term > intentions are, so I thought I'd clarify things before digging in. > > I'm not entirely sure about the use of the following variables - > SequenceCountedSet > range > startposition, endposition > > It looks like you use them to report back information generated by > calling > other methods, with a process that looks like: > Call method, store information in these variables > Call other methods to get information > Reset information when 1st method is called again > I used them internally in my own app to indicate where a subsequence begins and ends in a complete sequence, eg if the user selects a part in a view. So whenever a sequence is created/edited these variables are set. A view can use it through a controller to eg show these numbers. > Is that correct? My big worry about this approach is what might > happen with > a threaded app, where the first method might be called from a different > thread before the information could be retrieved. But isn't that true for every variable, such as the NSArray of Symbols itself? > > > The "sequenceString" variable appears to be separate from the sequence > itself - what is it used for? I know Alex asked, but I can't seem to > find > the email. > It's a NSSting representation of the array of Symbols, using their 'symbolString' variable. > > If we're going to have a generic "initWithString" method, we're going > to > have to define some order of preference for what type of sequence to > generate. Maybe try DNA, if that doesn't work for all letters, try > RNA, if > that doesn't work, try protein? If we keep a BCSequenceType variable, > then > something like: > - (void)insertSymbolsFromString:(NSString *)s atIndex:(int)index; > Can use that in order to decide what type of sequence to add. Or, have a validation method in each initWithString method that checks if it is really a DNA, protein, etc. So the user does something like: mysequence = [[BCSequenceProtein] alloc] initWithString: @"ELVISLIVES"]; It will return nil, or an NSError if it is not a protein, eg when the string @"KOENVANDERDRIFT" is passed. > > > For a consistent naming convention: > When setting/getting an array of symbols, use SymbolArray in the method > name. > When setting/getting a sequence object, use Sequence in the method > name. sounds good! > > It's probably worth creating both types of methods for all the methods > like > "partialSequence", since we don't know what's going to be most useufl > for > the users. I'm not sure what you mean here. - Koen. From mek at mekentosj.com Sun Aug 22 16:40:31 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 22 Aug 2004 22:40:31 +0200 Subject: [Biococoa-dev] singletons In-Reply-To: References: Message-ID: <824B6091-F47B-11D8-8CE1-000393CFDE0C@mekentosj.com> > Well, it's very easy to avoid the proliferation of methods. Simply > have a > static NSDictionary with all the bases/aa's, and have a single accessor > method that pulls the base/aa out of the dictionary based on a > name/symbol. > > What I was trying to avoid by not doing that was the cost in terms of > processing time of doing the lookup in the dictionary, which will get > substantial once you start iterating over thousands of Symbols. I'd > like > this to scale well to BAC sized items (800,000 Symbols or so). > Proteins > don't often go above about 2000 aa's, so this may not be much of a > worry for > you. I agree, 800,000 symbols might give rise to problems when accessing things using object messaging... > One thing that does bug me is the need for an "if" statement in each > method Well I guess a switch statement is pretty lightweight and not really a problem, even with 800,000 symbols (not tested, no experience disclaimer here). > - longer term, I was thinking that it might be nice to have a single > initialization method required before using Symbols. That way, any > delay > during initialization could be accompanied by a notification to the > user - > "Loading Bases and Amino Acids..." Again, here Alphabets might come in very handy. If you use a DNA sequence for the first time, you init the DNA alphabet containing the DNA symbols all at once for the first time. Although I don't see why initializing these few objects (at most 20 or so for amino acids) would take longer then the blink of an eye. Certainly not long enough for the message to be necessary. What I was thinking about was perhaps to have two methods for instantiating sequences from files/strings. One that does it directly ("while you wait") and one that does it non-blocking and works through delegates notifications ala didCreateSequence if time really gets long for huge sized sequences. Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* From mek at mekentosj.com Sun Aug 22 16:59:41 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 22 Aug 2004 22:59:41 +0200 Subject: [Biococoa-dev] BCSequence In-Reply-To: <4FA5B93E-F468-11D8-84C9-003065A5FDCC@earthlink.net> References: <4FA5B93E-F468-11D8-84C9-003065A5FDCC@earthlink.net> Message-ID: <3003A26E-F47E-11D8-8CE1-000393CFDE0C@mekentosj.com> >> Is that correct? My big worry about this approach is what might >> happen with >> a threaded app, where the first method might be called from a >> different >> thread before the information could be retrieved. > > But isn't that true for every variable, such as the NSArray of Symbols > itself? [NSLock lock] Ooh, thread safety... Help! Let's make that a secondary goal or not? [NSLock unlock] ;-) > It's a NSSting representation of the array of Symbols, using their > 'symbolString' variable. John, I send you a copy of the emails where we discussed this to make sure... >> If we're going to have a generic "initWithString" method, we're going >> to >> have to define some order of preference for what type of sequence to >> generate. Maybe try DNA, if that doesn't work for all letters, try >> RNA, if >> that doesn't work, try protein? First RNA (you can check for the presence of uracil), then DNA, else protein, else error. >> If we keep a BCSequenceType variable, then >> something like: >> - (void)insertSymbolsFromString:(NSString *)s atIndex:(int)index; >> Can use that in order to decide what type of sequence to add. Alphabet? > Or, have a validation method in each initWithString method that checks > if it is really a DNA, protein, etc. So the user does something like: > > mysequence = [[BCSequenceProtein] alloc] initWithString: > @"ELVISLIVES"]; > > It will return nil, or an NSError if it is not a protein, eg when the > string @"KOENVANDERDRIFT" is passed. I like both ideas. We could have: - a utility method that is passed a string and check what the most likely type it is - a general init method which uses the above method - specific init methods for each type that include validation of the type above - specific init methods that are passed alphabets to use (if we go that way, the previous method could be replaced) - a utility method to validate strings for type and/or alphabets >> For a consistent naming convention: >> When setting/getting an array of symbols, use SymbolArray in the >> method >> name. >> When setting/getting a sequence object, use Sequence in the method >> name. > > sounds good! Copy that! >> It's probably worth creating both types of methods for all the >> methods like >> "partialSequence", since we don't know what's going to be most useufl >> for >> the users. Not duplicate, make one detailed method which accepts a range (the method should therefore also to range validation, and if not properly, return range exceptions), and one convenience method (without the range parameter) that assumes the range to be the complete sequence: - doOperationOnSequence{ [self doOperationOnSequenceForRange: ]; } - doOperationOnSequenceForRange: (NSRange) aRange{ blabla; } > I'm not sure what you mean here. But perhaps I misunderstood the remark as well... Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Mac vs Windows 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* From kvddrift at earthlink.net Sun Aug 22 22:10:28 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 22 Aug 2004 22:10:28 -0400 Subject: [Biococoa-dev] BCSequence In-Reply-To: References: Message-ID: <9A8EB7C4-F4A9-11D8-A53B-003065A5FDCC@earthlink.net> On Aug 22, 2004, at 5:07 PM, John Timmer wrote: >> I used them internally in my own app to indicate where a subsequence >> begins and ends in a complete sequence, eg if the user selects a part >> in a view. So whenever a sequence is created/edited these variables >> are >> set. A view can use it through a controller to eg show these numbers. > Okay, so I can more or less ignore them in the DNA subclass, at least > for > now, correct? > Sure, but they work for every sequence, DNA, protein, etc >>> >> It's a NSSting representation of the array of Symbols, using their >> 'symbolString' variable. > > Okay, so I'll just make sure to update it whenever a method changes the > sequence. Isn't that what we tried to avoid by using a string to contain a sequence? I think it's better just to recreate an NSString, when someone asks for it, instead of updating it with every edit. > I was thinking more along the lines of when a user calls: > aSequence = [[BCSequence alloc ] initWithString: @"ELVISLIVES"]; > We should probably return some sort of useful subclass, but the > question is > which one? The generic initWithString can be regarded as 'abstract'. Just use [BCSequenceDNA alloc ] initWithString] or [BCSequenceProtein alloc ] initWithString], etc. We can use a Factory class (BCSequenceFactory) that figures out what kind of sequence we are dealing with, the BCSequence should be unaware of its subclasses. >> I'm not sure what you mean here. >> > > One example - provide both: > - (NSArray *) symbolArrayFromRange: (NSRange) entry; > And > - (BCSequence *)sequenceFromRange: (NSRange) entry; > > Ah ok, sounds good to me. But for the second one you need to create a BCSequence, so you might want to call it using init or copy in its name. - Koen. From mek at mekentosj.com Mon Aug 23 01:45:03 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 23 Aug 2004 07:45:03 +0200 Subject: [Biococoa-dev] BCSequence In-Reply-To: References: <9A8EB7C4-F4A9-11D8-A53B-003065A5FDCC@earthlink.net> Message-ID: <9492AF58-F4C7-11D8-8109-000393CFDE0C@mekentosj.com> Just to add a little thing I forgot: Of course we could create a caching variable here, but that would involve 1) a PRIVATE string variable 2) a boolean variable that marks the sequence dirty whenever the sequence is edited. This in case this string is often requested and we don't want to recalculate it every time. But at the moment that would be a "performance and optimization" method, I would not implement that until we are in that phase and have an idea of where the bottleneck are in the framework. Unless we have to create the "marked dirty" system anyway for another reason (I could imagine ending up with a solid editing workflow for the features and ranges, which can then be used for this subject as well). But at the moment I would suggest creating a stringRepresentation method that converts the symbol based array to a string each time it's called. Alex Op 23-aug-04 om 7:39 heeft Alexander Griekspoor het volgende geschreven: >>>> It's a NSSting representation of the array of Symbols, using their >>>> 'symbolString' variable. >>> >>> Okay, so I'll just make sure to update it whenever a method changes >>> the >>> sequence. >> >> Isn't that what we tried to avoid by using a string to contain a >> sequence? I think it's better just to recreate an NSString, when >> someone asks for it, instead of updating it with every edit. > > Indeed, this is exactly what we don't want to do!! Therefore, this > should NOT be a variable, rather a METHOD. Please change it to > something like: -(NSString *)stringRepresentation; > > Cheers, > Alex > > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > E-mail: a.griekspoor at nki.nl > AIM: mekentosj at mac.com > Web: http://www.mekentosj.com > > EnzymeX - To cut or not to cut > http://www.mekentosj.com/enzymex > > ********************************************************* > > > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > The requirements said: Windows 2000 or better. > So I got a Macintosh. > > ********************************************************* > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From mek at mekentosj.com Mon Aug 23 01:49:17 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 23 Aug 2004 07:49:17 +0200 Subject: Fwd: [Biococoa-dev] BCSequence Message-ID: <2BB92BA6-F4C8-11D8-8109-000393CFDE0C@mekentosj.com> I again get the "device full" message from the mailman at biococoa for some reason, so just to make sure, here it is again.... Begin doorgestuurd bericht: > Van: Alexander Griekspoor > Datum: 23 augustus 2004 7:39:13 GMT+02:00 > Aan: Koen van der Drift > Kopie: BioCocoa Mailinglist > Onderwerp: Antw.: [Biococoa-dev] BCSequence > >>>> It's a NSSting representation of the array of Symbols, using their >>>> 'symbolString' variable. >>> >>> Okay, so I'll just make sure to update it whenever a method changes >>> the >>> sequence. >> >> Isn't that what we tried to avoid by using a string to contain a >> sequence? I think it's better just to recreate an NSString, when >> someone asks for it, instead of updating it with every edit. > > Indeed, this is exactly what we don't want to do!! Therefore, this > should NOT be a variable, rather a METHOD. Please change it to > something like: -(NSString *)stringRepresentation; > > Cheers, > Alex > > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > E-mail: a.griekspoor at nki.nl > AIM: mekentosj at mac.com > Web: http://www.mekentosj.com > > EnzymeX - To cut or not to cut > http://www.mekentosj.com/enzymex > > ********************************************************* > > > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > The requirements said: Windows 2000 or better. > So I got a Macintosh. > > ********************************************************* > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 3470 bytes Desc: not available URL: From jtimmer at bellatlantic.net Mon Aug 23 12:10:45 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Mon, 23 Aug 2004 12:10:45 -0400 Subject: [Biococoa-dev] BCSequence In-Reply-To: <2BB92BA6-F4C8-11D8-8109-000393CFDE0C@mekentosj.com> Message-ID: >>>>> It's a NSSting representation of the array of Symbols, using their >>>>> 'symbolString' variable. >>>> >>>> Okay, so I'll just make sure to update it whenever a method changes the >>>> sequence. >>> >>> Isn't that what we tried to avoid by using a string to contain a sequence? I >>> think it's better just to recreate an NSString, when someone asks for it, >>> instead of updating it with every edit. >> >> Indeed, this is exactly what we don't want to do!! Therefore, this should NOT >> be a variable, rather a METHOD. Please change it to something like: >> -(NSString *)stringRepresentation; Okay, so given all this, how about the following: We delete the "sequenceString" variable from BCSequence We implement " -(NSString *)stringRepresentation" to generate the string on the fly using "symbolString" on each Symbol (already done and waiting on approval to commit). Since the "initWithString" is not meant to be used in the base class anyway, and there's no variable to stick a string into anymore, we can have it return nil. Subclasses should override it, as they should have. As far as thread safety, I agree it is probably too early to start locking various methods down, but I think we should try as best we can to design our classes so that if/when it's time to do so, things are as simple as adding a few locks in critical places, rather than discovering that we need to redesign the class then. Thanks to Google, I now know the difference between these float monoisotopicMass; float averageMass; And I'll see if I can't look up some values for the nucleotides. Koen, since this is your field: for the average mass of the ambiguous nucleotides, I was just averaging all possible nucleotides (ie, for Y, I took the average of the values for C and T). Since monoisotopic mass is supposed to not be an average, should I return 0 for those cases? Cheers, John _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Mon Aug 23 14:49:55 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 23 Aug 2004 20:49:55 +0200 Subject: [Biococoa-dev] BCSequence In-Reply-To: References: Message-ID: <39C925EE-F535-11D8-A8BD-000393CFDE0C@mekentosj.com> Op 23-aug-04 om 18:10 heeft John Timmer het volgende geschreven: > >>>>>> It's a NSSting representation of the array of Symbols, using their >>>>>> 'symbolString' variable. >>>>> >>>>> Okay, so I'll just make sure to update it whenever a method >>>>> changes the >>>>> sequence. >>>> >>>> Isn't that what we tried to avoid by using a string to contain a >>>> sequence? I >>>> think it's better just to recreate an NSString, when someone asks >>>> for it, >>>> instead of updating it with every edit. >>> >>> Indeed, this is exactly what we don't want to do!! Therefore, this >>> should NOT >>> be a variable, rather a METHOD. Please change it to something like: >>> -(NSString *)stringRepresentation; > > Okay, so given all this, how about the following: > We delete the "sequenceString" variable from BCSequence Yes please > We implement " -(NSString *)stringRepresentation" to generate the > string on > the fly using "symbolString" on each Symbol (already done and waiting > on > approval to commit). Yep, I agree > Since the "initWithString" is not meant to be used in the base class > anyway, > and there's no variable to stick a string into anymore, we can have it > return nil. Subclasses should override it, as they should have. I think that's indeed a good plan. > As far as thread safety, I agree it is probably too early to start > locking > various methods down, but I think we should try as best we can to > design our > classes so that if/when it's time to do so, things are as simple as > adding a > few locks in critical places, rather than discovering that we need to > redesign the class then. True, although I do not have that much experience to keep that in mind ;-) Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From a.griekspoor at nki.nl Mon Aug 23 17:18:24 2004 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Mon, 23 Aug 2004 23:18:24 +0200 Subject: [Biococoa-dev] Aminoacid plist Message-ID: Hi guys, Stupid question perhaps, but do I notice correctly that the aminoacid plist entries do not have the full name of the amino acid? Where do you get that from then? A threeLetterCode Ala Monoisotopic 71.03711 Average 71.08 pKa 0.0 KyteDoolittle 1.8 HoppWoods -0.5 Would be nice to have Name -Alanine there as well, or not? For the rest, the classes now really start to look really nice guys, including the headerdoc entries, well done! Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* From kvddrift at earthlink.net Mon Aug 23 17:42:28 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 23 Aug 2004 17:42:28 -0400 Subject: [Biococoa-dev] Aminoacid plist In-Reply-To: References: Message-ID: <547BDCB2-F54D-11D8-A53B-003065A5FDCC@earthlink.net> On Aug 23, 2004, at 5:18 PM, Alexander Griekspoor wrote: > Would be nice to have > Name > -Alanine > there as well, or not? > > Yep, already saw that as well. I will fix this tonight. - Koen. From kvddrift at earthlink.net Mon Aug 23 17:46:29 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 23 Aug 2004 17:46:29 -0400 Subject: [Biococoa-dev] BCSequence In-Reply-To: References: Message-ID: On Aug 23, 2004, at 12:10 PM, John Timmer wrote: > And I'll see if I can't look up some values for the nucleotides. > Koen, since this is your field: for the average mass of the ambiguous > nucleotides, I was just averaging all possible nucleotides (ie, for Y, > I > took the average of the values for C and T). Since monoisotopic mass > is > supposed to not be an average, should I return 0 for those cases? > I have already found the values, and will edit the plist tonight. You cannot average mass values (at least not for mass spec data), I suggest we'll leave them empty for the time being, and think of a solution. One question, the DNA-nucleotides are actually deoxy-nucleotides (I hope I didn't say anything stupid here :), so they are an oxygen short. But there are no deoxy's listed in the plist, so when you use a adenine, which one are you referring to? This is important to input the correct massvalues. thanks, - Koen. From jtimmer at bellatlantic.net Mon Aug 23 17:51:46 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Mon, 23 Aug 2004 17:51:46 -0400 Subject: [Biococoa-dev] BCSequence In-Reply-To: Message-ID: Okay, I'm about to commit a major update to BCSequence that eliminates the sequence string and reworks some of the related methods and initializers. I also took the time to reorganize the methods a bit to make navigating the file a bit easier. There is a very real chance I've done something stupid accidentally. Koen, you may want to make a copy of your version before picking the new one up out of CVS just in case you need to fix something I've done. John _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Mon Aug 23 18:01:57 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Mon, 23 Aug 2004 18:01:57 -0400 Subject: [Biococoa-dev] BCSequence In-Reply-To: Message-ID: >> And I'll see if I can't look up some values for the nucleotides. >> Koen, since this is your field: for the average mass of the ambiguous >> nucleotides, I was just averaging all possible nucleotides (ie, for Y, >> I >> took the average of the values for C and T). Since monoisotopic mass >> is >> supposed to not be an average, should I return 0 for those cases? >> > > I have already found the values, and will edit the plist tonight. You > cannot average mass values (at least not for mass spec data), I suggest > we'll leave them empty for the time being, and think of a solution. One > question, the DNA-nucleotides are actually deoxy-nucleotides (I hope I > didn't say anything stupid here :), so they are an oxygen short. But > there are no deoxy's listed in the plist, so when you use a adenine, > which one are you referring to? This is important to input the correct > massvalues. No, you have that exactly right - this is all DNA, so assume deoxyribose. We can deal with RNA some other time. Now, my chance to say something potentially stupid - So both values you are using are specific for Mass Spec? If so, I'd guess we want a third, which is just the basic molecular weight, suitable for calculating kD or size on a protein gel. That's what I already have in place for the bases, so I guess you can leave that there and add your keys under the names you've been using. I'll dig up the average molecular weights for amino acids and add them sometime after you commit your version with the names. Cheeers, John _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Mon Aug 23 21:52:54 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 23 Aug 2004 21:52:54 -0400 Subject: [Biococoa-dev] BCSequence In-Reply-To: References: Message-ID: <5058D692-F570-11D8-A53B-003065A5FDCC@earthlink.net> On Aug 23, 2004, at 6:01 PM, John Timmer wrote: > > No, you have that exactly right - this is all DNA, so assume > deoxyribose. > We can deal with RNA some other time. Thanks, I will add the values. > > Now, my chance to say something potentially stupid - > So both values you are using are specific for Mass Spec? If so, I'd > guess > we want a third, which is just the basic molecular weight, suitable for > calculating kD or size on a protein gel. > That's what I already have in > place for the bases, so I guess you can leave that there and add your > keys > under the names you've been using. I'll dig up the average molecular > weights for amino acids and add them sometime after you commit your > version > with the names. There is no need for a third field, for calculating the kD of large biopolymers use the averageMass value. The Molecular Weight value has no additional information. For calculating peptide masses (eg after a protein digest) use the monoisotopic value. The same for oligonucleotides. - Koen. From kvddrift at earthlink.net Mon Aug 23 21:55:42 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 23 Aug 2004 21:55:42 -0400 Subject: [Biococoa-dev] BCSequence In-Reply-To: References: Message-ID: On Aug 23, 2004, at 5:51 PM, John Timmer wrote: > There is a very real chance I've done something stupid accidentally. > Koen, > you may want to make a copy of your version before picking the new one > up > out of CVS just in case you need to fix something I've done. > Too late :) But if there is an error you can always revert it. That's the nice thing of CVS. I haven't had time to look at it, though. In the next few days I'll start working on a BCSequenceProtein class. - Koen. From kvddrift at earthlink.net Tue Aug 24 20:05:34 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 24 Aug 2004 20:05:34 -0400 Subject: [Biococoa-dev] BCSequence In-Reply-To: References: Message-ID: <7CC0D7C8-F62A-11D8-A53B-003065A5FDCC@earthlink.net> On Aug 23, 2004, at 5:51 PM, John Timmer wrote: > Okay, I'm about to commit a major update to BCSequence that eliminates > the > sequence string and reworks some of the related methods and > initializers. I > also took the time to reorganize the methods a bit to make navigating > the > file a bit easier. > John, A few comments. You put this in the header for the molecularWeight method: /*! @method - (float) molecularWeight @abstract returns the predicted molecular weight of the sequence @discussion calculates the predicted molecular mass of the sequence, based on * average isotope use. Subclasses calculate based on the loss of atoms (ie - * H20 in peptide bond formation) and use averages for symbols that represent more * than 1 individual symbol. */ It's better if we make the method as follows: - (float) molecularWeight (int) mode where in mode we pass a enumerated constant monoisotopic or average. You cannot predict in advance if a client needs the monoisotopic or average mass. Also we should always add water for any sequence (only one watermolecule is necessary for a whole sequence), so we might as well put it in the same method instead of delegating it to the subclasses. The masses in the plist are actually residue masses, so without H2O. I will make these changes tonight. You added the method length, but there is alreasy a method numberOfSymbols. However, because the naming of the first one is more logical, I will remove the numberOfSymbols method as well. I'll wait for some input before I commit my changes. Otherwise, a great improvement! I started with a BCSequenceProtein class, but see that most of the sequence manipulation is already in BCSequence. Not sure right now what else should go in that class. As Alex already suggested, additional functionality (pI calculations, digests, etc) should be in helper classes, such as BCProteinTools. - Koen. From jtimmer at bellatlantic.net Tue Aug 24 21:47:51 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 24 Aug 2004 21:47:51 -0400 Subject: [Biococoa-dev] BCSequence In-Reply-To: <7CC0D7C8-F62A-11D8-A53B-003065A5FDCC@earthlink.net> Message-ID: > It's better if we make the method as follows: > > - (float) molecularWeight (int) mode > That's a great idea. The main reason I was putting in a new method was that I added a separate weight for the ambiguous bases that's the average of every base they represent, since that would be more accurate than returning 0 for estimating the weight. I also wanted to avoid stomping on the figures you put in, since I don't know enough about how they're typically used. I just wanted to use something different that didn't interfere with what you were likely to do. > > Also we should always add water for any sequence (only one > watermolecule is necessary for a whole sequence), so we might as well > put it in the same method instead of delegating it to the subclasses. > The masses in the plist are actually residue masses, so without H2O. I > will make these changes tonight. Yeah, I might want to handle providing options for whether there's a 5' phosphate and such. I'll have to think about that. Even so, each of the subclasses will probably need to override this method because of these sorts of issues. > You added the method length, but there is alreasy a method > numberOfSymbols. However, because the naming of the first one is more > logical, I will remove the numberOfSymbols method as well. Just my carelessness there - didn't see yours. > Otherwise, a great improvement! I started with a BCSequenceProtein > class, but see that most of the sequence manipulation is already in > BCSequence. Not sure right now what else should go in that class. As > Alex already suggested, additional functionality (pI calculations, > digests, etc) should be in helper classes, such as BCProteinTools. I'll see if anything occurs to me, but I think we're getting close to moving on to either the wrappers for the sequences that hold features and such or some of the tools for translation and calculations. My ISP and bioinformatics.org are not speaking to each other today, so I'm done committing for today. I'm going to work on a small program that links to the framework and lets you do sequence transformations. To test whether everything is working. Cheers, John _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Tue Aug 24 22:38:48 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 24 Aug 2004 22:38:48 -0400 Subject: [Biococoa-dev] BCSequence In-Reply-To: References: Message-ID: On Aug 24, 2004, at 9:47 PM, John Timmer wrote: > >> It's better if we make the method as follows: >> >> - (float) molecularWeight (int) mode >> > That's a great idea. The main reason I was putting in a new method > was that > I added a separate weight for the ambiguous bases that's the average of > every base they represent, since that would be more accurate than > returning > 0 for estimating the weight. I think for the time being that will do. But I'll check some of the literature to see how this is solved when people do MS on nucleotide sequences. > Yeah, I might want to handle providing options for whether there's a 5' > phosphate and such. I'll have to think about that. Even so, each of > the > subclasses will probably need to override this method because of these > sorts > of issues. We should think about a class that stores modifications on nucleotides, amino acids, etc. I think we discussed this a few weeks ago (modifications, I mean). This should be handled the same for all BCSymbol classes. A phosphate or methyl group is the same for every sequence. > I'll see if anything occurs to me, but I think we're getting close to > moving > on to either the wrappers for the sequences that hold features and > such or > some of the tools for translation and calculations. I think we should also discuss on how to make the original BioCocoa classes (I/O) compatible with the new classes. Do we maintain backward compatibility, or do we start from scratch? > My ISP and bioinformatics.org are not speaking to each other today, so > I'm > done committing for today. I'm going to work on a small program that > links > to the framework and lets you do sequence transformations. To test > whether > everything is working. Sounds good - we can add it to the project in a folder 'demos' - Koen. From mek at mekentosj.com Wed Aug 25 02:00:49 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Wed, 25 Aug 2004 08:00:49 +0200 Subject: [Biococoa-dev] BCSequence In-Reply-To: References: Message-ID: <1D3DCFC5-F65C-11D8-A8BD-000393CFDE0C@mekentosj.com> Great work guys! >> Yeah, I might want to handle providing options for whether there's a >> 5' >> phosphate and such. I'll have to think about that. Even so, each of >> the >> subclasses will probably need to override this method because of >> these sorts >> of issues. > > We should think about a class that stores modifications on > nucleotides, amino acids, etc. I think we discussed this a few weeks > ago (modifications, I mean). This should be handled the same for all > BCSymbol classes. A phosphate or methyl group is the same for every > sequence. Let's start a little new discussion then ;-) In principle these modifications can be seen as features right? So now we have three names/kinds around in two pairs: - Modifications (example: methylgroup, phosphate etc) - Features (example: alpha-helix, nuclear localization signal etc) Or - Features (example: methylgroup, phosphate etc) - Annotations (example: alpha-helix, nuclear localization signal etc) These are the two options I see (I don't think we need all three around). Whichever we choose (I think I like the first most there's something to say for the other one as well, see below), for now I'll use the first in my email. Modifications and features are very alike, and a modification could be seen as a special feature and thus a subclass (inheriting the add/removing/editing/syncing etc methods, but add weight, pi etc ). Also the question raises wether we should keep them in two arrays (features and modifications) or in one (features). If you display all features of a sequence, it perhaps would be nice to see the modifications as well. The reason why a modification should be a separate (sub)class is that it has some special properties we have to account for while working with sequences. Examples: 1 The molecular weight method should take modifications into account. A methylgroup adds weight. Thus modifications should have an addedWeight: mode: method that can accept negative values as well (if the modification removes more weight than it adds). After calculating the weight, all modifications should be enumerated and checked for effect on MW (for this reason it perhaps would be nicer to keep modifications and features in separate arrays). Some thoughts on weight calculation (and other calculations in general): - let's add a "mother" method that also has a range: parameter (to calculate weight etc of subranges) - let's add a boolean accountForModifications: (or something similar) parameter as well The current implementation would become the convenience method, all calling the one with additional parameters range and accountForModifications. 2 Restriction enzymes have to account for the methylation modifications as well as some enzymes don't cut methylated DNA. Summing things up, the modifications kind of act like a (single) special symbol and has effects on calculations, transformations etc. Features do not have an effect on these and can span multiple symbols (therefore could also be called annotations as that is more what they are). In both cases it would be nice to add a (p)list of predefined modifications and features. >> I'll see if anything occurs to me, but I think we're getting close to >> moving >> on to either the wrappers for the sequences that hold features and >> such or >> some of the tools for translation and calculations. > > I think we should also discuss on how to make the original BioCocoa > classes (I/O) compatible with the new classes. Do we maintain backward > compatibility, or do we start from scratch? I think that's a pretty easy choice, as we don't know of anyone using BioCocoa at the moment, I think we should just start from scratch. Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From jtimmer at bellatlantic.net Wed Aug 25 08:16:16 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Wed, 25 Aug 2004 08:16:16 -0400 Subject: [Biococoa-dev] BCSequence In-Reply-To: <1D3DCFC5-F65C-11D8-A8BD-000393CFDE0C@mekentosj.com> Message-ID: >> >> I think we should also discuss on how to make the original BioCocoa >> classes (I/O) compatible with the new classes. Do we maintain backward >> compatibility, or do we start from scratch? > > I think that's a pretty easy choice, as we don't know of anyone using > BioCocoa at the moment, I think we should just start from scratch. > Cheers, > Alex Since the I/O classes also deal with metadata about the sequence (especially NCBI), I'd always viewed them as being one object layer up from the sequence containers that we've been working on. And I haven't had enough coffee yet this morning to consider methylation.... Cheers, John _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Wed Aug 25 08:39:24 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Wed, 25 Aug 2004 14:39:24 +0200 Subject: [Biococoa-dev] BCSequence In-Reply-To: References: Message-ID: >>> I think we should also discuss on how to make the original BioCocoa >>> classes (I/O) compatible with the new classes. Do we maintain >>> backward >>> compatibility, or do we start from scratch? >> >> I think that's a pretty easy choice, as we don't know of anyone using >> BioCocoa at the moment, I think we should just start from scratch. >> Cheers, >> Alex > > Since the I/O classes also deal with metadata about the sequence > (especially > NCBI), I'd always viewed them as being one object layer up from the > sequence > containers that we've been working on. They are one layer up, but they return strings at the moment. I guess that's very nice to keep around, but in addition I would like to see direct BCSequence class output in addition to prevent double parsing of the files. (first to create a string and a second time to convert the string to a BCSequence.). For that we need a sharedSeqIO controller or if we want to keep more or less the same setup as it is now, a BCReader and BCWriter controller/class. > And I haven't had enough coffee yet this morning to consider > methylation.... Good morning! Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From jtimmer at bellatlantic.net Wed Aug 25 13:24:34 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Wed, 25 Aug 2004 13:24:34 -0400 Subject: [Biococoa-dev] Demo project In-Reply-To: Message-ID: Ah, bioinformatics.org has returned, which is a good thing, as I'm staying at home following minor foot surgery yesterday. I'm creating the demo project for testing all our sequence stuff. Because it seems to be a good idea, I've tried to create a new build phase with a shell script that executes: cd .. ; xcodebuild -target BioCocoa In order to make sure BioCocoa is built. Unfortunately, it seems that no matter where I put this phase, it errors out. Has anyone done this succesfully? Thanks, John _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Wed Aug 25 13:54:21 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Wed, 25 Aug 2004 13:54:21 -0400 Subject: [Biococoa-dev] Demo project In-Reply-To: Message-ID: Never mind, I figured it out. Turns out you have to add the BioCocoa project file to the other project, then set it as a dependency for the demo app in its "get info" panel. Not entirely intuitive. JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Wed Aug 25 17:25:43 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 25 Aug 2004 17:25:43 -0400 Subject: [Biococoa-dev] BCSequence In-Reply-To: References: Message-ID: <5239586A-F6DD-11D8-98B9-003065A5FDCC@earthlink.net> On Aug 24, 2004, at 9:47 PM, John Timmer wrote: > That's a great idea. The main reason I was putting in a new method > was that > I added a separate weight for the ambiguous bases that's the average of > every base they represent, since that would be more accurate than > returning > 0 for estimating the weight. I also wanted to avoid stomping on the > figures > you put in, since I don't know enough about how they're typically > used. I > just wanted to use something different that didn't interfere with what > you > were likely to do. > There is an alternative solution, which is what BioPerl does: # Obtain the molecular weight of a sequence. Since the sequence may contain # ambiguous monomers, the molecular weight is returned as a (reference to) a # two element array containing greatest lower bound (GLB) and least upper bound # (LUB) of the molecular weight $weight = $seq_stats->get_mol_wt(); print "\nMolecular weight (using statistics object) of sequence ", $seqobj->id(), " is between ", $$weight[0], " and " , $$weight[1], "\n"; # or $weight = Bio::Tools::SeqStats->get_mol_wt($seqobj); print "\nMolecular weight (without statistics object) of sequence ", $seqobj->id(), " is between ", $$weight[0], " and " , $$weight[1], "\n"; I have looked at their code, and it doesn't seem to be that difficult to do. What I propose is the following. The bases have 4 (extra) variables: lowMonoisotopicMass highMonoisotopicMass lowAverageMass highAverageMass which are calculated during the initialization of the singletons. Then when we calculate the mass of a sequence, we actually calculate two masses, one high and one low. For ACGT these values are of course the same. Because we use the NSCountedSet we don't have to iterate through all symbols, just through the set and multiply by their occurance number. Although it will take more time to calculate, especially for DNA segments with a large number of nucleotides, I don't think this is critical, because we use the counted set. I will write this code and add it all in a new class MassCalculator which will end up in BCUtils. I will also work on a IsoelectricPointCalculator and ProteinDigest class. - Koen. From kvddrift at earthlink.net Wed Aug 25 17:25:52 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 25 Aug 2004 17:25:52 -0400 Subject: [Biococoa-dev] BCSequence In-Reply-To: <1D3DCFC5-F65C-11D8-A8BD-000393CFDE0C@mekentosj.com> References: <1D3DCFC5-F65C-11D8-A8BD-000393CFDE0C@mekentosj.com> Message-ID: <5768DE2A-F6DD-11D8-98B9-003065A5FDCC@earthlink.net> On Aug 25, 2004, at 2:00 AM, Alexander Griekspoor wrote: > Let's start a little new discussion then ;-) > In principle these modifications can be seen as features right? So now > we have three names/kinds around in two pairs: > - Modifications (example: methylgroup, phosphate etc) > - Features (example: alpha-helix, nuclear localization signal etc) > Or > - Features (example: methylgroup, phosphate etc) > - Annotations (example: alpha-helix, nuclear localization signal etc) I think we should treat the modifications as an array of BCSymbols. We could even make a BCModificationsArray if we add an intermediate class called BCSymbolArray as follows: BCSymbolArray | | ------BCSequence | | ------BCModificationsArray The BCSymbolArray can actually take care of a some of the code that is currently in BCSequence. If we calculate the mass of a molecule, we can just iterate over the BCModificationsArray to add the masses of the modifications. For me features and annotations relate more to secondary structures and author's comments (as found in a swissprot or ncbi file). But that's just a name game. > Modifications and features are very alike, and a modification could be > seen as a special feature and thus a subclass (inheriting the > add/removing/editing/syncing etc methods, but add weight, pi etc ). > Also the question raises wether we should keep them in two arrays > (features and modifications) or in one (features). If you display all > features of a sequence, it perhaps would be nice to see the > modifications as well. I think we should keep them separate. Modifications are per BCSymbol, features can span a whole range of BCSymbols. Also maybe we should move the mass calculations into a separate class that accepts a sequence to calculate the mass? The same for features, pI, etc. For example we have the class MassCalculator with the following methods -(id) MassCalculator initWithSequence:(BCSequence *)seq -(id) MassCalculator initWithSubSequence:(BCSequence *)seq inRange:(NSRange)aRange -(id) MassCalculator initWithString:(NSString *)seq -(id) MassCalculator initWithSubString:(NSString *)seq inRange:(NSRange)aRange -(float)getMass useMassType:(BCMassType)type addModifications:(BOOL)mods The getMass method iterates over all symbols and adds the mass, just as we do now in the molecularWeight method. Then we use it as follows: MassCalculator calculator = [[MassCalculator alloc] initwithSequence:mySequence]; float totalMass = [calculator getMass useMassType:BCAverage addModifications:YES]; [calculator release]; I prefer to use the word 'mass' instead of 'weight'. See eg the description in . If we want to keep a method molecularWeight around that's fine with me, we could just have it return the result of getMass using the averageMass type, which is the same value. > 1 The molecular weight method should take modifications into account. > A methylgroup adds weight. Thus modifications should have an > addedWeight: mode: method that can accept negative values as well (if > the modification removes more weight than it adds). Just put a negative value in the plist, and it will substract it when summing all modifications. > In both cases it would be nice to add a (p)list of predefined > modifications and features. > definitely yes. > I think that's a pretty easy choice, as we don't know of anyone using > BioCocoa at the moment, I think we should just start from scratch. > I agree, but it think it would be fair to Peter to let him have his say as well. He started BioCocoa and the IO classes are all his code and I don't want to throw that away :) - Koen. From mek at mekentosj.com Wed Aug 25 18:06:05 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 26 Aug 2004 00:06:05 +0200 Subject: [Biococoa-dev] BCSequence In-Reply-To: <5768DE2A-F6DD-11D8-98B9-003065A5FDCC@earthlink.net> References: <1D3DCFC5-F65C-11D8-A8BD-000393CFDE0C@mekentosj.com> <5768DE2A-F6DD-11D8-98B9-003065A5FDCC@earthlink.net> Message-ID: >> Let's start a little new discussion then ;-) >> In principle these modifications can be seen as features right? So >> now we have three names/kinds around in two pairs: >> - Modifications (example: methylgroup, phosphate etc) >> - Features (example: alpha-helix, nuclear localization signal etc) >> Or >> - Features (example: methylgroup, phosphate etc) >> - Annotations (example: alpha-helix, nuclear localization signal etc) > > > I think we should treat the modifications as an array of BCSymbols. Hmm, yes and no. Indeed modifications are kind of symbols, thus they could have BCSymbol as their superclass. But where symbols have no clue of there location (that's determined by the array in which they are), modifications should be kept in a kind of dictionary with the location as key for instance. In that case the schedule below would not make much sense. Alternatively we could adhere to your proposal, but that would mean that we the modifications should be come a real subclass of BCSymbol (BCModification I suggest) and have methods to set/get their location. > We could even make a BCModificationsArray if we add an intermediate > class called BCSymbolArray as follows: > > BCSymbolArray > | > | > ------BCSequence > | > | > ------BCModificationsArray > > > The BCSymbolArray can actually take care of a some of the code that is > currently in BCSequence. I was just wondering how you envision features in this setup then? Your setup groups modifications and symbols together with features as something else. In principle a good idea identity wise. I first had the idea to group modifications and features together being more distant from symbols. This has perhaps more advantages technical/programming wise as for both of these we have to keep track of locations and synchronization, having them as subclasses from one superclass would prevent a lot of duplication perhaps. I guess there's plenty to say for both options here. > If we calculate the mass of a molecule, we can just iterate over the > BCModificationsArray to add the masses of the modifications. Whichever we choose, that's indeed the idea. > For me features and annotations relate more to secondary structures > and author's comments (as found in a swissprot or ncbi file). But > that's just a name game. I see your point, indeed when the NCBI file lists phosphorylation it means that a particular sequence is annotated as a (potential) phosphorylation SITE and not as being actually phosphorylated. This is where I made the mistake. In that respect your right, the site is an annotation, an actual phosho-group on an amino-acid is a modification. >> Modifications and features are very alike, and a modification could >> be seen as a special feature and thus a subclass (inheriting the >> add/removing/editing/syncing etc methods, but add weight, pi etc ). >> Also the question raises wether we should keep them in two arrays >> (features and modifications) or in one (features). If you display all >> features of a sequence, it perhaps would be nice to see the >> modifications as well. > > I think we should keep them separate. Modifications are per BCSymbol, > features can span a whole range of BCSymbols. Yep > Also maybe we should move the mass calculations into a separate class > that accepts a sequence to calculate the mass? The same for features, > pI, etc. > > > For example we have the class MassCalculator with the following methods > > -(id) MassCalculator initWithSequence:(BCSequence *)seq > -(id) MassCalculator initWithSubSequence:(BCSequence *)seq > inRange:(NSRange)aRange > -(id) MassCalculator initWithString:(NSString *)seq > -(id) MassCalculator initWithSubString:(NSString *)seq > inRange:(NSRange)aRange > > -(float)getMass useMassType:(BCMassType)type > addModifications:(BOOL)mods > > The getMass method iterates over all symbols and adds the mass, just > as we do now in the molecularWeight method. > > > Then we use it as follows: > > MassCalculator calculator = [[MassCalculator alloc] > initwithSequence:mySequence]; > > float totalMass = [calculator getMass useMassType:BCAverage > addModifications:YES]; > > [calculator release]; I like the idea, looks very nice! The only thing I doubt about is if we should implement a string version of all methods as well. First of all the implementation will be completely different (it won't support modifications for instance, at least I would certainly not advise to implement string compatible ways to keep track of modifications), second if we keep all methods string compatible why bother using the sequences. Again, we should simply force people to see strings as a first or last step conversion only, from there it's BCSequence only. Other than that, it looks very promising Koen! > > I prefer to use the word 'mass' instead of 'weight'. See eg the > description in . If we want to > keep a method molecularWeight around that's fine with me, we could > just have it return the result of getMass using the averageMass type, > which is the same value. > Hey, you're the mass spec guy, who are we to do it otherwise? ;-) Mass is perfectly fine by me. Keep the molecularWeight around and just make it a convenience method indeed. >> 1 The molecular weight method should take modifications into account. >> A methylgroup adds weight. Thus modifications should have an >> addedWeight: mode: method that can accept negative values as well >> (if the modification removes more weight than it adds). > > Just put a negative value in the plist, and it will substract it when > summing all modifications. That's what I meant, just wanted to remind that you can have subtraction as well... >> I think that's a pretty easy choice, as we don't know of anyone using >> BioCocoa at the moment, I think we should just start from scratch. > > I agree, but it think it would be fair to Peter to let him have his > say as well. He started BioCocoa and the IO classes are all his code > and I don't want to throw that away :) You're right, perhaps I was a bit fast here, sorry for that, but given Peter's reaction on the last time we had a similar question, I thought to be on the safe side... Guess Peter will let us know ;-) ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From jtimmer at bellatlantic.net Wed Aug 25 19:18:23 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Wed, 25 Aug 2004 19:18:23 -0400 Subject: [Biococoa-dev] Progress! In-Reply-To: Message-ID: Just thought you guys would want to know, since we've been doing everything in theory so far: I have the test app done and working (including a linked BioCocoa build and bundling the framework). I needed two bug fixes: 1 related to the change to an "initWithSymbol" method from my original "initWithName". 1 related to me using pyrimidine in the .plist and pyrimadine in the .m file. And that's it! Reverse complementing the sequence worked, and it was even pretty fast on an 11Kb transcript - pretty good, what with all the object creation required (in fact, I just checked and NSDate says it takes 0.069829 of a second). One question: How does one handle sending a nib file via CVS? I'm going to add my test project, although bioinformatics.org has dropped off my ISP's radar screen again (can't even ping the place). Two comments: I don't have a strong opinion on how modifications are handled - I'll see how it's done on amino acids, and handle it accordingly for DNA. Molecular mass really isn't my thing, so I'll defer to Koen on how best to name things. Think that's it - Cheers, John From kvddrift at earthlink.net Wed Aug 25 20:55:20 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 25 Aug 2004 20:55:20 -0400 Subject: [Biococoa-dev] Progress! In-Reply-To: References: Message-ID: <9A98C152-F6FA-11D8-98B9-003065A5FDCC@earthlink.net> On Aug 25, 2004, at 7:18 PM, John Timmer wrote: > One question: How does one handle sending a nib file via CVS? I'm > going to > add my test project, although bioinformatics.org has dropped off my > ISP's > radar screen again (can't even ping the place). > Can you mail the files to me? I can try to commit them if you'd like. Otherwise, I think you can treat them as a file: cvs add myNib cvs commit -m "cool nib" myNib - Koen. From kvddrift at earthlink.net Wed Aug 25 20:55:28 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 25 Aug 2004 20:55:28 -0400 Subject: [Biococoa-dev] BCSequence In-Reply-To: References: <1D3DCFC5-F65C-11D8-A8BD-000393CFDE0C@mekentosj.com> <5768DE2A-F6DD-11D8-98B9-003065A5FDCC@earthlink.net> Message-ID: <9F2FD700-F6FA-11D8-98B9-003065A5FDCC@earthlink.net> On Aug 25, 2004, at 6:06 PM, Alexander Griekspoor wrote: >> >> I think we should treat the modifications as an array of BCSymbols. > Hmm, yes and no. Indeed modifications are kind of symbols, thus they > could have BCSymbol as their superclass. But where symbols have no > clue of there location (that's determined by the array in which they > are), modifications should be kept in a kind of dictionary with the > location as key for instance. Yes, you are absolutely right. One issue with a dictionary is that the indices change when a client requests a subsequence, how do we handle that? So maybe the modifications should be a member variable of each BCSymbol. The whenever inof about a BCSymbol is requested, either to calculate mass, draw it, or show an inspector panel, the info about the modification is available. > I see your point, indeed when the NCBI file lists phosphorylation it > means that a particular sequence is annotated as a (potential) > phosphorylation SITE and not as being actually phosphorylated. This is > where I made the mistake. In that respect your right, the site is an > annotation, an actual phosho-group on an amino-acid is a modification. Well, I didn't even mean it that way :) But a -Me or -PO4 group should be treated as a modification, that's the general accepted term anyway (post-translational modifications). So I still would favour to keep the modifications separate from feautures/annotations. Also because they are probably not always requested at the same time. And it's also more OOP-like to have smaller objects instead of putting everything together. >> For example we have the class MassCalculator with the following >> methods >> >> -(id) MassCalculator initWithSequence:(BCSequence *)seq >> -(id) MassCalculator initWithSubSequence:(BCSequence *)seq >> inRange:(NSRange)aRange >> -(id) MassCalculator initWithString:(NSString *)seq >> -(id) MassCalculator initWithSubString:(NSString *)seq >> inRange:(NSRange)aRange >> >> -(float)getMass useMassType:(BCMassType)type >> addModifications:(BOOL)mods >> >> The getMass method iterates over all symbols and adds the mass, just >> as we do now in the molecularWeight method. >> >> >> Then we use it as follows: >> >> MassCalculator calculator = [[MassCalculator alloc] >> initwithSequence:mySequence]; >> >> float totalMass = [calculator getMass useMassType:BCAverage >> addModifications:YES]; >> >> [calculator release]; > > I like the idea, looks very nice! The only thing I doubt about is if > we should implement a string version of all methods as well. Yes, you are right - I won't add those methods :) - Koen. From mek at mekentosj.com Thu Aug 26 01:46:11 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 26 Aug 2004 07:46:11 +0200 Subject: [Biococoa-dev] Progress! In-Reply-To: References: Message-ID: <3C1148D0-F723-11D8-B1AB-000393CFDE0C@mekentosj.com> BCProgress ? ;-) This is great news John! Unfortunately I can't build the project, it claims a problem with class BCMassCalculator, but that's probably because Koen is in the progress of implementing his stuff. > And that's it! Reverse complementing the sequence worked, and it was > even > pretty fast on an 11Kb transcript - pretty good, what with all the > object > creation required (in fact, I just checked and NSDate says it takes > 0.069829 > of a second). Wow! 0.06 seconds, that's terrific! Keep up the good work, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* From mek at mekentosj.com Thu Aug 26 01:54:30 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 26 Aug 2004 07:54:30 +0200 Subject: [Biococoa-dev] BCSequence In-Reply-To: <9F2FD700-F6FA-11D8-98B9-003065A5FDCC@earthlink.net> References: <1D3DCFC5-F65C-11D8-A8BD-000393CFDE0C@mekentosj.com> <5768DE2A-F6DD-11D8-98B9-003065A5FDCC@earthlink.net> <9F2FD700-F6FA-11D8-98B9-003065A5FDCC@earthlink.net> Message-ID: <6574BB0A-F724-11D8-B1AB-000393CFDE0C@mekentosj.com> >>> I think we should treat the modifications as an array of BCSymbols. >> Hmm, yes and no. Indeed modifications are kind of symbols, thus they >> could have BCSymbol as their superclass. But where symbols have no >> clue of there location (that's determined by the array in which they >> are), modifications should be kept in a kind of dictionary with the >> location as key for instance. > > Yes, you are absolutely right. One issue with a dictionary is that the > indices change when a client requests a subsequence, how do we handle > that? So maybe the modifications should be a member variable of each > BCSymbol. The whenever inof about a BCSymbol is requested, either to > calculate mass, draw it, or show an inspector panel, the info about > the modification is available. That won't do unfortunately I'm afraid, the BCSymbols refer to a singleton objects, so you can't set individual variables per symbol. What you could do is completely mirror the symbol array with a modificationarray, filling up the blank spots with a empty modification object, that way you could just get the same position in the both arrays to match symbol and modification, but that's a serious hack of course, wastes too much memory in addition. As we have to come up with a solution for features/annotations as well (you can't keep a mirror array here as features can span multiple symbols), we have to think about this a bit more. As said, many bookkeeping methods for features and modifications can be identical in the end. While editing and using subranges we just have to account for keeping these objects up-to-date as well. It will involve a lot of range checking, but should be doable. >> I see your point, indeed when the NCBI file lists phosphorylation it >> means that a particular sequence is annotated as a (potential) >> phosphorylation SITE and not as being actually phosphorylated. This >> is where I made the mistake. In that respect your right, the site is >> an annotation, an actual phosho-group on an amino-acid is a >> modification. > > Well, I didn't even mean it that way :) But a -Me or -PO4 group should > be treated as a modification, that's the general accepted term anyway > (post-translational modifications). So I still would favour to keep > the modifications separate from feautures/annotations. You convinced me here. > Also because they are probably not always requested at the same time. > And it's also more OOP-like to have smaller objects instead of putting > everything together. Yep. >> The only thing I doubt about is if we should implement a string >> version of all methods as well. > > Yes, you are right - I won't add those methods :) Way to go! Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* From kvddrift at earthlink.net Thu Aug 26 02:24:14 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 26 Aug 2004 02:24:14 -0400 Subject: [Biococoa-dev] Progress! In-Reply-To: <3C1148D0-F723-11D8-B1AB-000393CFDE0C@mekentosj.com> References: <3C1148D0-F723-11D8-B1AB-000393CFDE0C@mekentosj.com> Message-ID: <8D482092-F728-11D8-98B9-003065A5FDCC@earthlink.net> On Aug 26, 2004, at 1:46 AM, Alexander Griekspoor wrote: > Unfortunately I can't build the project, it claims a problem with > class BCMassCalculator, but that's probably because Koen is in the > progress of implementing his stuff. What error do you get Alex? - Koen. From kvddrift at earthlink.net Thu Aug 26 02:27:10 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 26 Aug 2004 02:27:10 -0400 Subject: [Biococoa-dev] BCSequence In-Reply-To: <6574BB0A-F724-11D8-B1AB-000393CFDE0C@mekentosj.com> References: <1D3DCFC5-F65C-11D8-A8BD-000393CFDE0C@mekentosj.com> <5768DE2A-F6DD-11D8-98B9-003065A5FDCC@earthlink.net> <9F2FD700-F6FA-11D8-98B9-003065A5FDCC@earthlink.net> <6574BB0A-F724-11D8-B1AB-000393CFDE0C@mekentosj.com> Message-ID: On Aug 26, 2004, at 1:54 AM, Alexander Griekspoor wrote: >> Yes, you are absolutely right. One issue with a dictionary is that >> the indices change when a client requests a subsequence, how do we >> handle that? So maybe the modifications should be a member variable >> of each BCSymbol. The whenever inof about a BCSymbol is requested, >> either to calculate mass, draw it, or show an inspector panel, the >> info about the modification is available. > > That won't do unfortunately I'm afraid, the BCSymbols refer to a > singleton objects, so you can't set individual variables per symbol. Ahh, of course. That reminds me, I still need to add the singleton code to initialize each AA... - Koen. From mek at mekentosj.com Thu Aug 26 03:36:15 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 26 Aug 2004 09:36:15 +0200 Subject: [Biococoa-dev] Progress! In-Reply-To: <8D482092-F728-11D8-98B9-003065A5FDCC@earthlink.net> References: <3C1148D0-F723-11D8-B1AB-000393CFDE0C@mekentosj.com> <8D482092-F728-11D8-98B9-003065A5FDCC@earthlink.net> Message-ID: <9CAA00AA-F732-11D8-B1AB-000393CFDE0C@mekentosj.com> This is what I get, also after a clean install, both if I try to build John's demo app, and if I try to build the framework from it's own project. Guess, the error is in the latter then. Building target ?BioCocoa? with build style ?Development? (optimization:level ?size?, debug-symbols:on) ? (1 error, 1 warning) >> ld: warning prebinding disabled because of undefined symbols ld: Undefined symbols: .objc_class_name_BCMassCalculator Alex Op 26-aug-04 om 8:24 heeft Koen van der Drift het volgende geschreven: > > On Aug 26, 2004, at 1:46 AM, Alexander Griekspoor wrote: > >> Unfortunately I can't build the project, it claims a problem with >> class BCMassCalculator, but that's probably because Koen is in the >> progress of implementing his stuff. > > > What error do you get Alex? > > > - Koen. > > > ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From kvddrift at earthlink.net Thu Aug 26 06:07:06 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 26 Aug 2004 06:07:06 -0400 Subject: [Biococoa-dev] Progress! In-Reply-To: <9CAA00AA-F732-11D8-B1AB-000393CFDE0C@mekentosj.com> References: <3C1148D0-F723-11D8-B1AB-000393CFDE0C@mekentosj.com> <8D482092-F728-11D8-98B9-003065A5FDCC@earthlink.net> <9CAA00AA-F732-11D8-B1AB-000393CFDE0C@mekentosj.com> Message-ID: On Aug 26, 2004, at 3:36 AM, Alexander Griekspoor wrote: > Building target ?BioCocoa? with build style ?Development? > (optimization:level ?size?, debug-symbols:on) ? (1 error, 1 warning) > >> > ld: warning prebinding disabled because of undefined symbols > ld: Undefined symbols: > .objc_class_name_BCMassCalculator > Thats strange... Is the file BCMassCalculator present in BCUtils? Maybe you have to remove/add BCMassCalculator.m and BCMassCalculator.h to the project manually. Otherwise I have no clue. - Koen. From mek at mekentosj.com Thu Aug 26 06:53:50 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 26 Aug 2004 12:53:50 +0200 Subject: [Biococoa-dev] Progress! In-Reply-To: References: <3C1148D0-F723-11D8-B1AB-000393CFDE0C@mekentosj.com> <8D482092-F728-11D8-98B9-003065A5FDCC@earthlink.net> <9CAA00AA-F732-11D8-B1AB-000393CFDE0C@mekentosj.com> Message-ID: <36B38670-F74E-11D8-B1AB-000393CFDE0C@mekentosj.com> Ok, fixed.. both files were checked out but were not in the project, importing them fixed it.... No clue if this is something still in the repository this way, but it will probably be repaired the next time you guys commit your new files. Alex Op 26-aug-04 om 12:07 heeft Koen van der Drift het volgende geschreven: > > On Aug 26, 2004, at 3:36 AM, Alexander Griekspoor wrote: > >> Building target ?BioCocoa? with build style ?Development? >> (optimization:level ?size?, debug-symbols:on) ? (1 error, 1 warning) >> >> >> ld: warning prebinding disabled because of undefined symbols >> ld: Undefined symbols: >> .objc_class_name_BCMassCalculator >> > > Thats strange... > > Is the file BCMassCalculator present in BCUtils? Maybe you have to > remove/add BCMassCalculator.m and BCMassCalculator.h to the project > manually. > > Otherwise I have no clue. > > > - Koen. > > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From a.griekspoor at nki.nl Thu Aug 26 08:18:33 2004 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Thu, 26 Aug 2004 14:18:33 +0200 Subject: [Biococoa-dev] 2 initial observations Message-ID: <0C3FAB35-F75A-11D8-B1AB-000393CFDE0C@nki.nl> Hi guys, Two minor things I noted thusfar; In BCNucleotideDNA.h: @method representsBase: (BCNucleotideDNA *)entry; @abstract Evaluates whether the receiver represents the entry @discussion When called on adenosine, this method will return YES if the entry is adenosine, * weak, any base, etc. */ - (BOOL) representsBase: (BCNucleotideDNA *) entry; /*! @method isRepresentedByBase: (BCNucleotideDNA *)entry; @abstract Evaluates whether the receiver is represented by the entry @discussion When called on adenosine, this method will return YES if the entry is adenosine, * weak, any base, etc. */ - (BOOL) isRepresentedByBase: (BCNucleotideDNA *) entry; The headerdoc info for both methods is exactly the same, guess that can't be right John.... Koen, In your masscalculator there are two methods: -(float) getMassUsingMassType:(BCMassType)massType; -(float) getMassUsingMassType:(BCMassType)type addModifications:(BOOL)mods; Why isn't the first parameter name identical in both methods. It would be the finishing touch to make it either: -(float) getMassUsingMassType:(BCMassType)type; -(float) getMassUsingMassType:(BCMassType)type addModifications:(BOOL)mods; or -(float) getMassUsingMassType:(BCMassType)massType; -(float) getMassUsingMassType:(BCMassType)massType addModifications:(BOOL)mods; I like the first one more. It's a detail I know, sorry for the nitpicking... Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From jtimmer at bellatlantic.net Thu Aug 26 08:20:29 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 26 Aug 2004 08:20:29 -0400 Subject: [Biococoa-dev] Progress! In-Reply-To: Message-ID: > > On Aug 25, 2004, at 11:02 PM, John Timmer wrote: > >> No, they were in a separate folder within BioCocoa. What is the error >> that >> you see? >> > > I now know why I get the error. If I look in BioCocoa.framework, I only > see the two original headers BCReader.h and BCCreator.h. Anyone knows > how to fix that, so that when I build the framework, *all* headers are > used? I thought I fixed that in the BioCocoa project. Once again, however, bioinformatics has vanished, so I can't do this myself. In the BioCocoa project, select the framework target. In the section of your Xcode window that shows all the associated files, the first three columns should be an icon, the file name, and then one called "role". Each item in role is a popup, set to public, private, or project. Anything set to public there has its header included in the framework. Cheers, John _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Thu Aug 26 08:33:15 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 26 Aug 2004 14:33:15 +0200 Subject: [Biococoa-dev] Progress! In-Reply-To: References: Message-ID: <1A629F51-F75C-11D8-B1AB-000393CFDE0C@mekentosj.com> I still have troubles building the demo-app as well, the framework itself builds fine now. Though I indeed had to set many headers to public as well, I still get the following error: /Users/griek/BioCocoa/demo_app/theController.m:4:35: BioCocoa/BCSequenceDNA.h: No such file or directory I don't have the time to really dive into it right now, so don't bother to much yet, I'll have a look later.... Alex Op 26-aug-04 om 14:20 heeft John Timmer het volgende geschreven: >> >> On Aug 25, 2004, at 11:02 PM, John Timmer wrote: >> >>> No, they were in a separate folder within BioCocoa. What is the >>> error >>> that >>> you see? >>> >> >> I now know why I get the error. If I look in BioCocoa.framework, I >> only >> see the two original headers BCReader.h and BCCreator.h. Anyone knows >> how to fix that, so that when I build the framework, *all* headers are >> used? > > I thought I fixed that in the BioCocoa project. Once again, however, > bioinformatics has vanished, so I can't do this myself. In the > BioCocoa > project, select the framework target. In the section of your Xcode > window > that shows all the associated files, the first three columns should be > an > icon, the file name, and then one called "role". Each item in role is > a > popup, set to public, private, or project. Anything set to public > there has > its header included in the framework. > > Cheers, > > John > > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* From jtimmer at bellatlantic.net Thu Aug 26 11:04:30 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 26 Aug 2004 11:04:30 -0400 Subject: [Biococoa-dev] Progress! In-Reply-To: <36B38670-F74E-11D8-B1AB-000393CFDE0C@mekentosj.com> Message-ID: What's everybody's status with demo_app? I did a clean checkout and things worked, but there may be a path dependency I missed, so it only works on my machine. If there are problems, what's the current error? JT _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Thu Aug 26 11:09:58 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 26 Aug 2004 17:09:58 +0200 Subject: [Biococoa-dev] Progress! In-Reply-To: References: Message-ID: See my last email, that's the current status.... Alex Op 26-aug-04 om 17:04 heeft John Timmer het volgende geschreven: > What's everybody's status with demo_app? I did a clean checkout and > things > worked, but there may be a path dependency I missed, so it only works > on my > machine. If there are problems, what's the current error? > > JT > > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Mac vs Windows 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* From jtimmer at bellatlantic.net Thu Aug 26 11:20:33 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 26 Aug 2004 11:20:33 -0400 Subject: [Biococoa-dev] Progress! In-Reply-To: Message-ID: At your convenience, could you check two things: Do a "get info" on the framework in the demo project and ensure that it's got the correct path to the built framework in the BioCocoa project. Use the disclosure triangles on the framework to reveal what headers are included in the built framework. This could be the same public/private problem that Koen was seeing. I've committed changes to the project file that should ensure that the right headers are there, but I may have only done the commit this morning (I'm not sure how much I got done during the 1 hour I had a clean connection last night). Thanks, John > See my last email, that's the current status.... > Alex > _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Thu Aug 26 11:49:37 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 26 Aug 2004 17:49:37 +0200 Subject: [Biococoa-dev] Progress! In-Reply-To: References: Message-ID: <88767444-F777-11D8-B1AB-000393CFDE0C@mekentosj.com> Op 26-aug-04 om 17:20 heeft John Timmer het volgende geschreven: > At your convenience, could you check two things: > > Do a "get info" on the framework in the demo project and ensure that > it's > got the correct path to the built framework in the BioCocoa project. Is correct... > Use the disclosure triangles on the framework to reveal what headers > are > included in the built framework. This could be the same public/private > problem that Koen was seeing. Yep, that's it, I miss the headers as well, even though I rebuild the BioCocoa framework after setting all headers to public. The newly build framework does indeed contain all headers now, but still I can't get it to update in the demo_app project, where it keeps giving the nasty missing headers warning.... Alex Ps. Why having the demo_app in a separate project instead of what Peter did for his sequence converter, have a separate target within the BioCocoa framework. I think it would be nice to have all these things in one project as we can make subfolder in the project as well. But perhaps this becomes to complex in the end, it depends on what and how many demo_apps we ship along with the project. ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* From kvddrift at earthlink.net Thu Aug 26 19:39:04 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 26 Aug 2004 19:39:04 -0400 Subject: [Biococoa-dev] Progress! In-Reply-To: <88767444-F777-11D8-B1AB-000393CFDE0C@mekentosj.com> References: <88767444-F777-11D8-B1AB-000393CFDE0C@mekentosj.com> Message-ID: <1D6BEA09-F7B9-11D8-98B9-003065A5FDCC@earthlink.net> On Aug 26, 2004, at 11:49 AM, Alexander Griekspoor wrote: > Yep, that's it, I miss the headers as well, even though I rebuild the > BioCocoa framework after setting all headers to public. > The newly build framework does indeed contain all headers now, but > still I can't get it to update in the demo_app project, where it keeps > giving the nasty missing headers warning.... I also still get that error. I did a complete fresh checkout, but that didn't help. Not so much progress for us ;-) > Ps. Why having the demo_app in a separate project instead of what > Peter did for his sequence converter, have a separate target within > the BioCocoa framework. I think it would be nice to have all these > things in one project as we can make subfolder in the project as well. > But perhaps this becomes to complex in the end, it depends on what and > how many demo_apps we ship along with the project. I like that idea too. We can have a few Tabs with various options, translate, digest, etc. - Koen. From kvddrift at earthlink.net Thu Aug 26 19:40:31 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 26 Aug 2004 19:40:31 -0400 Subject: [Biococoa-dev] 2 initial observations In-Reply-To: <0C3FAB35-F75A-11D8-B1AB-000393CFDE0C@nki.nl> References: <0C3FAB35-F75A-11D8-B1AB-000393CFDE0C@nki.nl> Message-ID: <51AC3F7A-F7B9-11D8-98B9-003065A5FDCC@earthlink.net> On Aug 26, 2004, at 8:18 AM, Alexander Griekspoor wrote: > In your masscalculator there are two methods: > > -(float) getMassUsingMassType:(BCMassType)massType; > -(float) getMassUsingMassType:(BCMassType)type > addModifications:(BOOL)mods; > > Thanks for spotting that - I will fix it. - Koen. From jtimmer at bellatlantic.net Thu Aug 26 23:07:13 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 26 Aug 2004 23:07:13 -0400 Subject: [Biococoa-dev] Progress! In-Reply-To: <1D6BEA09-F7B9-11D8-98B9-003065A5FDCC@earthlink.net> Message-ID: >> Ps. Why having the demo_app in a separate project instead of what >> Peter did for his sequence converter, have a separate target within >> the BioCocoa framework. I think it would be nice to have all these >> things in one project as we can make subfolder in the project as well. >> But perhaps this becomes to complex in the end, it depends on what and >> how many demo_apps we ship along with the project. > > > I like that idea too. We can have a few Tabs with various options, > translate, digest, etc. > As requested, I've checked in a new project that has a separate target for the demo app. It builds nicely after a clean checkout. I think I got all of its files, but I'm not positive at this point John _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Fri Aug 27 02:56:28 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 27 Aug 2004 08:56:28 +0200 Subject: [Biococoa-dev] Progress! In-Reply-To: References: Message-ID: <3879FA7C-F7F6-11D8-AC61-000393CFDE0C@mekentosj.com> Yep, works perfectly now! Nice work! Alex Op 27-aug-04 om 5:07 heeft John Timmer het volgende geschreven: > >>> Ps. Why having the demo_app in a separate project instead of what >>> Peter did for his sequence converter, have a separate target within >>> the BioCocoa framework. I think it would be nice to have all these >>> things in one project as we can make subfolder in the project as >>> well. >>> But perhaps this becomes to complex in the end, it depends on what >>> and >>> how many demo_apps we ship along with the project. >> >> >> I like that idea too. We can have a few Tabs with various options, >> translate, digest, etc. >> > > As requested, I've checked in a new project that has a separate target > for > the demo app. It builds nicely after a clean checkout. I think I got > all > of its files, but I'm not positive at this point > > > John > > > _______________________________________________ > This mind intentionally left blank > > > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Mac vs Windows 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* From mek at mekentosj.com Fri Aug 27 03:17:44 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 27 Aug 2004 09:17:44 +0200 Subject: [Biococoa-dev] Demo App In-Reply-To: References: Message-ID: <31137A12-F7F9-11D8-AC61-000393CFDE0C@mekentosj.com> Hi John, Wow, the conversion is really fast, this is really promising! I tested it with 67kb, no problem, conversion within half a second on my 1Ghz G4.... I got one exception now. If you try to convert the string (easy to use in debugging): ACGTNWMYRKHVDB-? You get two exceptions: 2004-08-27 09:07:20.075 demo_app[1121] *** +[BCNucleotideDNA Y]: selector not recognized 2004-08-27 09:07:43.000 demo_app[1121] *** +[BCNucleotideDNA R]: selector not recognized Guess, you will know immediately where something is lacking (plist?). Finally, I guess it's easy to add the following methods in addition to complement and reverse complement: - (BCSequenceDNA *) reverseOfSequence; - (BCSequenceDNA *) uppercaseSequence; - (BCSequenceDNA *) lowercaseSequence; I'm not sure if the "Of" in the method name is necessary here, because it doesn't take a parameter right? [theSequence reverseSequence] sounds more logical than [theSequence reverseOfSequence], in the latter case I question myself: wchich sequence? It would even be fine to call [theSequence reverse] but that would suggest that you change the current sequence, while reverseSequence describes the method better as we return a new sequence object. Cheers, Alex Op 27-aug-04 om 5:07 heeft John Timmer het volgende geschreven: > >>> Ps. Why having the demo_app in a separate project instead of what >>> Peter did for his sequence converter, have a separate target within >>> the BioCocoa framework. I think it would be nice to have all these >>> things in one project as we can make subfolder in the project as >>> well. >>> But perhaps this becomes to complex in the end, it depends on what >>> and >>> how many demo_apps we ship along with the project. >> >> >> I like that idea too. We can have a few Tabs with various options, >> translate, digest, etc. >> > > As requested, I've checked in a new project that has a separate target > for > the demo app. It builds nicely after a clean checkout. I think I got > all > of its files, but I'm not positive at this point > > > John > > > _______________________________________________ > This mind intentionally left blank > > > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* From a.griekspoor at nki.nl Fri Aug 27 03:58:46 2004 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Fri, 27 Aug 2004 09:58:46 +0200 Subject: [Biococoa-dev] [BioCocoa] Demo_App Message-ID: Stupid remark perhaps, but I forgot to mention that I don't get the reverseComplement op my input sequence.... Alex ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From jtimmer at bellatlantic.net Fri Aug 27 10:45:32 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 27 Aug 2004 10:45:32 -0400 Subject: [Biococoa-dev] Demo App In-Reply-To: <31137A12-F7F9-11D8-AC61-000393CFDE0C@mekentosj.com> Message-ID: Thanks - I found the mistakes in the .plist, and things should work fine now. Incidentally, a 2.4 Mb BAC took about 46 seconds to reverse complement. I should probably do some typical use profiling to figure out at what point someone using this should notify the user that it's busy. You're right on the nomenclature - the "of" implies that you're handing the sequence to it. I'll change the names and add the "reverse" as well. Internally, there is no case to the sequence since it's all objects, so the remaining two methods should return NSStrings. The first one already exists (as "description"), but I could add the second if it's useful. Switching gears, has anyone given any thought to a translation machinery? BioJava does it by having a translator object that you send the sequence and a translation dictionary to, which seems to make some sense. It will be a bit tricky to code, though, as different things will take different numbers of tokens to translate (ie - DNA -> RNA vs. RNA -> protein). I've got an idea on this, but I thought I'd ask if anyone had one before poisoning your mind with my thoughts. Cheers, John > Hi John, > > Wow, the conversion is really fast, this is really promising! I tested > it with 67kb, no problem, conversion within half a second on my 1Ghz > G4.... I got one exception now. If you try to convert the string (easy > to use in debugging): ACGTNWMYRKHVDB-? > You get two exceptions: > 2004-08-27 09:07:20.075 demo_app[1121] *** +[BCNucleotideDNA Y]: > selector not recognized > 2004-08-27 09:07:43.000 demo_app[1121] *** +[BCNucleotideDNA R]: > selector not recognized > Guess, you will know immediately where something is lacking (plist?). > > Finally, I guess it's easy to add the following methods in addition to > complement and reverse complement: > - (BCSequenceDNA *) reverseOfSequence; > - (BCSequenceDNA *) uppercaseSequence; > - (BCSequenceDNA *) lowercaseSequence; > > I'm not sure if the "Of" in the method name is necessary here, because > it doesn't take a parameter right? > [theSequence reverseSequence] sounds more logical than [theSequence > reverseOfSequence], in the latter case I question myself: wchich > sequence? It would even be fine to call [theSequence reverse] but that > would suggest that you change the current sequence, while > reverseSequence describes the method better as we return a new sequence > object. > > Cheers, > Alex _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Fri Aug 27 13:38:24 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 27 Aug 2004 19:38:24 +0200 Subject: [Biococoa-dev] Demo App In-Reply-To: References: Message-ID: > Thanks - I found the mistakes in the .plist, and things should work > fine > now. Incidentally, a 2.4 Mb BAC took about 46 seconds to reverse > complement. Great! That's pretty rapid! What system are you on John? > I should probably do some typical use profiling to figure out > at what point someone using this should notify the user that it's busy. Yep, as said we could always decide to go for non-blocking methods (with delegate messages) in addition to the current ones. > You're right on the nomenclature - the "of" implies that you're > handing the > sequence to it. You don't, because you call the method ON the sequence, you're not handing any sequences. > I'll change the names and add the "reverse" as well. Great! > Internally, there is no case to the sequence since it's all objects, > so the > remaining two methods should return NSStrings. > The first one already exists (as "description"), but I could add the > second if it's useful. Ah, you're right... Still if we provide the uppercase method, implement the lowercase one as well as a convenience one. > Switching gears, has anyone given any thought to a translation > machinery? John, here's a snippet from previous discussions about this: >> The idea I would propose is that there is a shared translation >> util object that you could feed a dna sequence and get (in the >> requested >> frames) the translated sequences back as protein sequence objects. >> It's the >> app task to control/organize these.. [clipped] > I was actually thinking we could have a BCGeneticCode, consisting of > BCCodons, again, extensible through .plists, to let us define the code > for > different species. Create a genetic code, hand it a dictionary, then > submit > your sequence to it, and get the amino acid sequence back. Great! The BCGeneticCode would be an argument you pass to the translation object along with the sequence: (BCSequenceProtein *)translateDNASequence: (BCSequenceDNA *)sequence usingCode: (BCGeneticCode *)code inFrames:(NSArray *)frames; with a number of convenience methods like inFrame: (int)frame that all call this method or something like that... I already did the very nice and exiting work (ahum) of creating such a plist for EnzymeX, so we have this one already ;-) BCCodons express their sequence in the BCTokens right? > BioJava does it by having a translator object that you send the > sequence and > a translation dictionary to, which seems to make some sense. It will > be a > bit tricky to code, though, as different things will take different > numbers > of tokens to translate (ie - DNA -> RNA vs. RNA -> protein). I've got > an > idea on this, but I thought I'd ask if anyone had one before poisoning > your > mind with my thoughts. We could have two different methods for translation to either RNA or protein. We should also take species specific translation into account, that's the reason for geneticcode objects. We can have a number of codes already predefined like [BCGeneticCode standardCode] as a classmethod. I was thinking a bit about this as well yesterday, and came of with the following problem; how do we return multiple frames? I you do a translateDNASequence: usingCode: (BCGeneticCode *)code inFrame: you just return a BCSequenceProtein But what if you want all frames, or all forward frames, do we return a dictionary of BCSequenceProteins with the frame as key? Finally, let's define how we call each frame: -3, -2, -1, +1, +2, +3? Anyway, don't hesitate to poison us with your thoughts John, it's nice to brain storm a bit and think about different options... Alex > > Cheers, > > John > > >> Hi John, >> >> Wow, the conversion is really fast, this is really promising! I tested >> it with 67kb, no problem, conversion within half a second on my 1Ghz >> G4.... I got one exception now. If you try to convert the string (easy >> to use in debugging): ACGTNWMYRKHVDB-? >> You get two exceptions: >> 2004-08-27 09:07:20.075 demo_app[1121] *** +[BCNucleotideDNA Y]: >> selector not recognized >> 2004-08-27 09:07:43.000 demo_app[1121] *** +[BCNucleotideDNA R]: >> selector not recognized >> Guess, you will know immediately where something is lacking (plist?). >> >> Finally, I guess it's easy to add the following methods in addition to >> complement and reverse complement: >> - (BCSequenceDNA *) reverseOfSequence; >> - (BCSequenceDNA *) uppercaseSequence; >> - (BCSequenceDNA *) lowercaseSequence; >> >> I'm not sure if the "Of" in the method name is necessary here, because >> it doesn't take a parameter right? >> [theSequence reverseSequence] sounds more logical than [theSequence >> reverseOfSequence], in the latter case I question myself: wchich >> sequence? It would even be fine to call [theSequence reverse] but that >> would suggest that you change the current sequence, while >> reverseSequence describes the method better as we return a new >> sequence >> object. >> >> Cheers, >> Alex > > > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From jtimmer at bellatlantic.net Fri Aug 27 16:39:32 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 27 Aug 2004 16:39:32 -0400 Subject: [Biococoa-dev] Demo App In-Reply-To: Message-ID: >> Thanks - I found the mistakes in the .plist, and things should work >> fine >> now. Incidentally, a 2.4 Mb BAC took about 46 seconds to reverse >> complement. > Great! That's pretty rapid! What system are you on John? A 1.33GHz G4 laptop. I'm not sure if it stressed the disk at all, so I'd imagine it was more a function of RAM access and processor, in which case this is an above average machine. > I already did the very nice and exiting work (ahum) of creating such a > plist for EnzymeX, so we have this one already ;-) > BCCodons express their sequence in the BCTokens right? Could you send me a copy of the .plist? I've been debating between a flatfile with all possible combinations and a tree structure with keys that are BCSymbols themselves, which should allow us to use ambiguous bases more easily. To explain the tree option in detail: you simply enumerate the keys and query each one as to whether it represents the first base. If it does, you grab the dictionary it keys for, and repeat the process with the second base. On the third base of the codon, the dictionary simply contains the answer - in the case of a translation, the amino acid. If it fails at any point, it returns undefined. This should cut down on the number of items we have to put in the dictionary considerably, and provide a translation even if the sequence isn't high quality. Plus, I already know how to populate an object from text references thanks to the nucleotide experience. > We could have two different methods for translation to either RNA or > protein. We should also take species specific translation into account, > that's the reason for geneticcode objects. We can have a number of > codes already predefined like [BCGeneticCode standardCode] as a > classmethod. I was thinking of making a single generic method that would handle all translations, but I guess there's only going to be a few, so specialized methods make more sense. > I was thinking a bit about this as well yesterday, and came of with the > following problem; how do we return multiple frames? > I you do a translateDNASequence: usingCode: (BCGeneticCode *)code > inFrame: you just return a BCSequenceProtein > But what if you want all frames, or all forward frames, do we return a > dictionary of BCSequenceProteins with the frame as key? > Finally, let's define how we call each frame: -3, -2, -1, +1, +2, +3? If a method can return more than one result, clearly it should return an array. As for frames, I think the non-zero integers are the way to go - we should try to make usage familiar to biologists (unless it's too difficult or annoying to do so ;). Cheers, John _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Fri Aug 27 16:58:56 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 27 Aug 2004 22:58:56 +0200 Subject: [Biococoa-dev] Demo App In-Reply-To: References: Message-ID: >>> Thanks - I found the mistakes in the .plist, and things should work >>> fine >>> now. Incidentally, a 2.4 Mb BAC took about 46 seconds to reverse >>> complement. >> Great! That's pretty rapid! What system are you on John? > A 1.33GHz G4 laptop. I'm not sure if it stressed the disk at all, so > I'd > imagine it was more a function of RAM access and processor, in which > case > this is an above average machine. At least for me, 2.4Mb is an above average sequence length as well ;-) >> I already did the very nice and exiting work (ahum) of creating such a >> plist for EnzymeX, so we have this one already ;-) >> BCCodons express their sequence in the BCTokens right? > Could you send me a copy of the .plist? Sure, it's attached... In any case it might save some work... Oops, while attaching it, I notice I saved one plist for each species, but I guess copy pasting them in one file still saves some work... The structure of the plist should be self explanatory: AAA K Lys 24.1 The number is the relative codon usage (to see if it is rare or not). > I've been debating between a flatfile with all possible combinations > and a tree structure with keys that > are BCSymbols themselves, which should allow us to use ambiguous bases > more > easily. To explain the tree option in detail: you simply enumerate > the keys and > query each one as to whether it represents the first base. If it > does, you > grab the dictionary it keys for, and repeat the process with the second > base. On the third base of the codon, the dictionary simply contains > the > answer - in the case of a translation, the amino acid. If it fails at > any > point, it returns undefined. > > This should cut down on the number of items we have to put in the > dictionary > considerably, and provide a translation even if the sequence isn't high > quality. Plus, I already know how to populate an object from text > references thanks to the nucleotide experience. Sounds great! Guess we just have to see if it works in practice and if it's fast enough, but I can't see why not. >> We could have two different methods for translation to either RNA or >> protein. We should also take species specific translation into >> account, >> that's the reason for geneticcode objects. We can have a number of >> codes already predefined like [BCGeneticCode standardCode] as a >> classmethod. > I was thinking of making a single generic method that would handle all > translations, but I guess there's only going to be a few, so > specialized > methods make more sense. And as these to a so different the first things in your method you would do is diverge between protein and rna, so why not make things much more transparant and keep it separate. Of course you can always add the convenience method which sorts out what to do and call the proper method. >> I was thinking a bit about this as well yesterday, and came of with >> the >> following problem; how do we return multiple frames? >> I you do a translateDNASequence: usingCode: (BCGeneticCode *)code >> inFrame: you just return a BCSequenceProtein >> But what if you want all frames, or all forward frames, do we return a >> dictionary of BCSequenceProteins with the frame as key? >> Finally, let's define how we call each frame: -3, -2, -1, +1, +2, +3? > If a method can return more than one result, clearly it should return > an > array. Yes, and no, if we allow multipleframes as a parameters (say -3, +1 and +2) we should either return an array in the same order (or fixed order), or a dictionary with those framenumbers as keys. In the latter case no confusion can occur, which can easily occur in the first case. Say I ask a convenience method translateReverseFrames do I get an array back in the order -3, -2, -1 or -1, -2, -3? Headerdoc will help you out here, but with the dictionary no question would be there in the first case. But in this case I agree with an array, we just have to make sure it is clearly documented what and how things are returned. > As for frames, I think the non-zero integers are the way to go - we > should try to make usage familiar to biologists (unless it's too > difficult > or annoying to do so ;). Yup! Certainly! Cheers, Alex -------------- next part -------------- A non-text attachment was scrubbed... Name: Codon Tables.zip Type: application/zip Size: 17996 bytes Desc: not available URL: -------------- next part -------------- ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From kvddrift at earthlink.net Fri Aug 27 19:35:11 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 27 Aug 2004 19:35:11 -0400 Subject: [Biococoa-dev] BCAminoAcid Message-ID: Hi, Still need to implement the code for instantiating each amino acid. Unfortunately I haven't had much time lately, so this part of BioCocoa isn't working yet. Hopefully I get some inspiration this weekend to figure out how I want to do it. Tonight I am going to drink beer and try to get my new DSL modem to work :-p - Koen. From kvddrift at earthlink.net Fri Aug 27 23:30:03 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 27 Aug 2004 23:30:03 -0400 Subject: [Biococoa-dev] BCAminoAcid In-Reply-To: References: Message-ID: <8CDFA75D-F8A2-11D8-AAB8-003065A5FDCC@earthlink.net> On Aug 27, 2004, at 7:35 PM, Koen van der Drift wrote: > Hi, > > Still need to implement the code for instantiating each amino acid. > Unfortunately I haven't had much time lately, so this part of BioCocoa > isn't working yet. Hopefully I get some inspiration this weekend to > figure out how I want to do it. Tonight I am going to drink beer and > try to get my new DSL modem to work :-p > Well, I ran out of beer, and our outside phoneline needs to be set up for DSL (although Earthlink said is was). So instead I added the singletons for all aminoacids. It still feels somehow cumbersome to me, but I don't have an alternative right now so copied what John did for the bases. BTW, John, I think this : adenosineRepresentation = [[BCNucleotideDNA alloc] initWithSymbol: [@"A" characterAtIndex: 0]]; can be replaced by: adenosineRepresentation = [[BCNucleotideDNA alloc] initWithSymbol: 'A']; save a couple of calculations :) - Koen. From kvddrift at earthlink.net Fri Aug 27 23:51:36 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 27 Aug 2004 23:51:36 -0400 Subject: [Biococoa-dev] Progress! In-Reply-To: References: Message-ID: <8F5B2877-F8A5-11D8-AAB8-003065A5FDCC@earthlink.net> On Aug 26, 2004, at 11:07 PM, John Timmer wrote: > As requested, I've checked in a new project that has a separate target > for > the demo app. It builds nicely after a clean checkout. I think I got > all > of its files, but I'm not positive at this point > Yep, now works here too :) - Koen. From kvddrift at earthlink.net Sat Aug 28 00:12:54 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 28 Aug 2004 00:12:54 -0400 Subject: [Biococoa-dev] mass calculator bug Message-ID: <8929D548-F8A8-11D8-AAB8-003065A5FDCC@earthlink.net> Hi, To test my mass calculator code, I added the following line to the demo code: mw = [theSequence molecularWeight:BCAverage]; This causes an error in: @implementation BCMassCalculator -(id) initWithSequence:(BCSequence *)seq { if (self = [super init]) { sequence = [seq copy]; <--------- error } return self; } the error is: 2004-08-28 00:08:25.565 demo_app[2490] *** -[BCSequenceDNA copyWithZone:]: selector not recognized Anyone have an idea what this means and how to fix it? thanks, - Koen. From mek at mekentosj.com Sat Aug 28 06:24:40 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 28 Aug 2004 12:24:40 +0200 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: <8929D548-F8A8-11D8-AAB8-003065A5FDCC@earthlink.net> References: <8929D548-F8A8-11D8-AAB8-003065A5FDCC@earthlink.net> Message-ID: <789C14A3-F8DC-11D8-957E-000393CFDE0C@mekentosj.com> Ha Koen, This is because our custom object doesn't support the copying protocol (see NSCopying in documentation). And now that we're busy anyway, let's add the very similar NSCoding as well... [DISCLAIMER: typed in mail and not tested] So we start with the first interface line in the BCSequence header: @interface BCSequence : NSObject { should become: @interface BCSequence : NSObject { Next we add the following methods to support the protocols: - (id)copyWithZone:(NSZone *)zone; - (id)initWithCoder:(NSCoder *)coder; - (void)encodeWithCoder:(NSCoder *)coder; Now the implementation: - (id)copyWithZone:(NSZone *)zone{ BCSequence *copy = [[[self class] allocWithZone: zone] init]; [copy setSequenceType : [self sequenceType ]]; [copy setSequence: [self sequence]]; [copy setSequenceCountedSet: [self sequenceCountedSet]]; [copy setRange: [self range]]; [copy setStartPosition: [self startPosition]]; [copy setEndPosition: [self endPosition]]; return copy; } Of course you note one problem here, we haven't implemented a number of accessors. I already mentioned once that we should for all variables like Apple advices us to do, and always call the accessors if you want to change things (instead of directly accessing the variable). This is one example where this comes in handy. If we don't want to make some accessors public, we have to @private them so they're invisible outside the class. Anyway, I leave this up to you guys. An alternative would be a special init method that takes all variables as input and do everything in one call. The implementation for the NSCoding methods: - (id)initWithCoder:(NSCoder *)decoder{ if(self = [super init]{ if ( [coder allowsKeyedCoding] ) { sequenceType = [coder decodeIntForKey:@"BCSequenceType"]; sequence = [[coder decodeObjectForKey: @"BCSequence"] retain]; sequenceCountedSet = [[coder decodeObjectForKey: @"BCSequenceCountedSet"] retain]; range = [[coder decodeObjectForKey: @"BCSequenceRange"] rangeValue]; startPosition = [coder decodeIntForKey:@"BCSequenceStartPosition"]; endPosition = [coder decodeIntForKey:@"BCSequenceEndPosition"]; } else { [coder decodeValueOfObjCType:@encode(int) at: &sequenceType]; sequence = [[coder decodeObject] retain]; sequenceCountedSet = [[coder decodeObject] retain]; [coder decodeValueOfObjCType:@encode(NSRange) at: &range]; [coder decodeValueOfObjCType:@encode(int) at: &startPosition]; [coder decodeValueOfObjCType:@encode(int) at: &endPosition]; } } return self; } In BCMassCalculator, InitWithSequence, change: > sequence = [seq copy]; <--------- error to sequence = [seq copyWithZone: [self zone]]; // I believe this is more efficient memory wise - (void)encodeWithCoder:(NSCoder *)coder{ if ( [coder allowsKeyedCoding] ) { [coder encodeInt: sequenceType forKey: @"BCSequenceType"]; [coder encodeObject: sequence forKey: @"BCSequence"]; [coder encodeObject: sequenceCountedSet forKey: @"BCSequenceCountedSet"]; [coder encodeObject: [NSValue valueWithRange: range] forKey: @"BCSequenceRange"]; [coder encodeInt: startPosition forKey: @"BCSequenceStartPosition"]; [coder encodeInt: endPosition forKey: @"BCSequenceEndPosition"]; } else { [coder encodeValueOfObjCType:@encode(int) at:&sequenceType]; // Not sure if it should be @encode(BCSequenceType) [coder encodeObject: sequence]; [coder encodeObject: sequenceCountedSet]; [coder encodeValueOfObjCType:@encode(NSRange) at:&range]; [coder encodeValueOfObjCType:@encode(int) at:&startPosition]; [coder encodeValueOfObjCType:@encode(int) at:&endPosition]; } return; } Subclasses like BCSequenceDNA need to override this method, first call [super copyWithZone: zone] and then add the subclass specific variables. The same holds true for initWithCoder and encodeWithCoder (call [super encodeWithCoder:coder] for instance). One note, perhaps we should actually adhere to the MutableCopyingProtocol as our objects are mutable, but I'm not sure (and don't see many differences). Cheers, Alex Op 28-aug-04 om 6:12 heeft Koen van der Drift het volgende geschreven: > Hi, > > To test my mass calculator code, I added the following line to the > demo code: > > mw = [theSequence molecularWeight:BCAverage]; > > > This causes an error in: > > > @implementation BCMassCalculator > > -(id) initWithSequence:(BCSequence *)seq > { > if (self = [super init]) > { > sequence = [seq copy]; <--------- error > } > > return self; > } > > > the error is: > > 2004-08-28 00:08:25.565 demo_app[2490] *** -[BCSequenceDNA > copyWithZone:]: selector not recognized > > > Anyone have an idea what this means and how to fix it? > > > thanks, > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From kvddrift at earthlink.net Sat Aug 28 06:37:29 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 28 Aug 2004 06:37:29 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: <789C14A3-F8DC-11D8-957E-000393CFDE0C@mekentosj.com> References: <8929D548-F8A8-11D8-AAB8-003065A5FDCC@earthlink.net> <789C14A3-F8DC-11D8-957E-000393CFDE0C@mekentosj.com> Message-ID: <42DE9EE9-F8DE-11D8-AAB8-003065A5FDCC@earthlink.net> On Aug 28, 2004, at 6:24 AM, Alexander Griekspoor wrote: > Ha Koen, Ha Alex :) > > This is because our custom object doesn't support the copying protocol > (see NSCopying in documentation). And now that we're busy anyway, > let's add the very similar NSCoding as well... [DISCLAIMER: typed in > mail and not tested] > > So we start with the first interface line in the BCSequence header: > @interface BCSequence : NSObject { > should become: > @interface BCSequence : NSObject { > Hmm, I have never used such a construction before and have done similar things (passing an object to another class). > In BCMassCalculator, InitWithSequence, change: > >> sequence = [seq copy]; <--------- error > > to sequence = [seq copyWithZone: [self zone]]; // I believe this is > more efficient memory wise > Even if I use an accessor: [self setSequence: seq]; and then: - (void)setSequence:(BCSequence *)s { [s retain]; [sequence release]; sequence = s; } I get the same copywithZone error. But maybe this is different because BCSequence passes a copy of itself? Anyway, I agree that we should implemement a 'copy constructor' (C++ speak) if that solves the problem. - Koen. From kvddrift at earthlink.net Sat Aug 28 06:45:31 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 28 Aug 2004 06:45:31 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: <789C14A3-F8DC-11D8-957E-000393CFDE0C@mekentosj.com> References: <8929D548-F8A8-11D8-AAB8-003065A5FDCC@earthlink.net> <789C14A3-F8DC-11D8-957E-000393CFDE0C@mekentosj.com> Message-ID: <625D9D66-F8DF-11D8-AAB8-003065A5FDCC@earthlink.net> On Aug 28, 2004, at 6:24 AM, Alexander Griekspoor wrote: > Of course you note one problem here, we haven't implemented a number > of accessors. I already mentioned once that we should for all > variables like Apple advices us to do, and always call the accessors > if you want to change things (instead of directly accessing the > variable). This is one example where this comes in handy. If we don't > want to make some accessors public, we have to @private them so > they're invisible outside the class. Yes, I fully agree. What's the syntax again to use @private? > Anyway, I leave this up to you guys. An alternative would be a special > init method that takes all variables as input and do everything in one > call. > I don't like that idea, I prefer the accessors. - Koen. From kvddrift at earthlink.net Sat Aug 28 08:50:24 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 28 Aug 2004 08:50:24 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: <789C14A3-F8DC-11D8-957E-000393CFDE0C@mekentosj.com> References: <8929D548-F8A8-11D8-AAB8-003065A5FDCC@earthlink.net> <789C14A3-F8DC-11D8-957E-000393CFDE0C@mekentosj.com> Message-ID: On Aug 28, 2004, at 6:24 AM, Alexander Griekspoor wrote: > [copy setSequence: [self sequence]]; > John, you implemented this method setSequence as follows: - (void)setSequence:(NSMutableArray *)anArray { [sequenceArray release]; sequenceArray = [[NSMutableArray alloc] init]; NSEnumerator *theEnumerator = [sequenceArray objectEnumerator]; id aSymbol; while ( aSymbol = [theEnumerator nextObject] ) { if ( [aSymbol isKindOfClass: [BCSymbol class]] ) [sequenceArray addObject: aSymbol]; } } I am now implementing all the accessors, and was wondering why you don't use this: - (void) setSequence: (NSMutableArray *) anArray { [anArray retain]; [sequence release]; sequence = anArray; } I think now the copyWithZone is available and we mark this @private, the latter should be enough. Also, I suggest we use 'sequenceArray' instead of 'sequence'. Any objections anyone? cheers, - Koen. From kvddrift at earthlink.net Sat Aug 28 09:51:22 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 28 Aug 2004 09:51:22 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: <789C14A3-F8DC-11D8-957E-000393CFDE0C@mekentosj.com> References: <8929D548-F8A8-11D8-AAB8-003065A5FDCC@earthlink.net> <789C14A3-F8DC-11D8-957E-000393CFDE0C@mekentosj.com> Message-ID: <58CE73D6-F8F9-11D8-AAB8-003065A5FDCC@earthlink.net> Hi, Running into another problem. I implemented the NSCopy code for BCSequence, and try to get the mass calculation to work. I cannot get a reference to a BCSymbol to get the masses. This is what I do: NSEnumerator *enumerator; BCNucleotideDNA *aSymbol; enumerator = [[sequence sequenceArray] objectEnumerator]; while ( aSymbol = [enumerator nextObject] ) { total += [aSymbol monoisotopicMass]; <-- this crashes I guess don't understand how to use the singleton/flyweight pattern yet. Any ideas? thanks, - Koen. From kvddrift at earthlink.net Sat Aug 28 10:46:15 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 28 Aug 2004 10:46:15 -0400 Subject: [Biococoa-dev] rangeOfSubsequence Message-ID: <034560AA-F901-11D8-AAB8-003065A5FDCC@earthlink.net> Hi, Is it possible to move all the rangeOfSubsequence variants that are now in BCSequeneDNA to BCSequence? I think the same code can be useful for proteins as well. thanks, - Koen. From jtimmer at bellatlantic.net Sat Aug 28 11:13:27 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sat, 28 Aug 2004 11:13:27 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: Message-ID: The following was left over from when I started with a DNA sequence class before deciding what would be generally useful and moved to the superclass - I had used "sequenceArray" instead of "sequence", but forgot to change it in this location. The idea behind it (and a lot else that I code) is to prevent a user from doing something stupid, accidentally or maliciously. Since we can't control what's in the array the user is handing us when it's a raw array (as opposed to when we're handed a sequence) we need to make sure that its contents will work with all the methods we're using on the contents - hence the test for BCSymbols. > > John, you implemented this method setSequence as follows: > > - (void)setSequence:(NSMutableArray *)anArray > { > [sequenceArray release]; > > sequenceArray = [[NSMutableArray alloc] init]; > > NSEnumerator *theEnumerator = [sequenceArray objectEnumerator]; > id aSymbol; > while ( aSymbol = [theEnumerator nextObject] ) { > if ( [aSymbol isKindOfClass: [BCSymbol class]] ) > [sequenceArray addObject: aSymbol]; > } > } > Is it possible to move all the rangeOfSubsequence variants that are now > in BCSequeneDNA to BCSequence? I think the same code can be useful for > proteins as well. Please do! And please fix my experimentation with the adenosine @"A" -> 'A' conversion. And rename anything you feel is necessary, as long as you're careful with the find and replace! It looks like I won't have CVS access all weekend (Koen, if you're annoyed at Earthlink, I'm annoyed at Verizon), so I'll be working on something new instead of contributing to the work on sequences, copying, and coding. Incidentally, have you done a "step into" in the debugger to see where things actually error out? And what's the error message? Cheers, John _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Sat Aug 28 11:54:30 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 28 Aug 2004 11:54:30 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: References: Message-ID: <8C43C9BF-F90A-11D8-AAB8-003065A5FDCC@earthlink.net> Hi John, > Since we can't control what's in the array the user is handing us when > it's > a raw array (as opposed to when we're handed a sequence) we need to > make > sure that its contents will work with all the methods we're using on > the > contents - hence the test for BCSymbols. If we keep the method private, we don't have to worry about that, I guess. We could implement a validateInput method that checks (using an NSScanner) if any character in the string are not allowed. Or we could use an BCAlphabet class that manages the symbols. But in the meantime I'll leave it alone. > > > >> Is it possible to move all the rangeOfSubsequence variants that are >> now >> in BCSequeneDNA to BCSequence? I think the same code can be useful for >> proteins as well. > Please do! And please fix my experimentation with the adenosine @"A" > -> 'A' > conversion. And rename anything you feel is necessary, as long as > you're > careful with the find and replace! Ok, I will start working on that. I'll also replace 'sequence' with 'sequenceArray'. > Incidentally, have you done a "step into" in the debugger to see where > things actually error out? And what's the error message? If I step into the offending line I go straight to the assembly code, which is not very helpful (to me atlast :) It has to do with how I obtain a reference to each BCSymbol, see my other post. - Koen. From kvddrift at earthlink.net Sat Aug 28 12:00:33 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 28 Aug 2004 12:00:33 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: References: Message-ID: <647021AE-F90B-11D8-AAB8-003065A5FDCC@earthlink.net> On Aug 28, 2004, at 11:13 AM, John Timmer wrote: > And please fix my experimentation with the adenosine @"A" -> 'A' > conversion. Now I look closer at that code: NSDictionary *tempDict = [baseDefinitions objectForKey: @"A"]; if ( tempDict != nil && [tempDict isKindOfClass: [NSDictionary class]] ) { adenosineRepresentation = [[BCNucleotideDNA alloc] initWithSymbol: [@"A" characterAtIndex: 0]]; [baseDefinitions removeObjectForKey: @"A"]; } Why not write this as one line: adenosineRepresentation = [[BCNucleotideDNA alloc] initWithSymbol:'A']; I don't understand the function of tempDict, and the removeObjectForKey line. - Koen. From jtimmer at bellatlantic.net Sat Aug 28 13:25:45 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sat, 28 Aug 2004 13:25:45 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: <647021AE-F90B-11D8-AAB8-003065A5FDCC@earthlink.net> Message-ID: > > On Aug 28, 2004, at 11:13 AM, John Timmer wrote: > >> And please fix my experimentation with the adenosine @"A" -> 'A' >> conversion. > > Now I look closer at that code: > > NSDictionary *tempDict = [baseDefinitions objectForKey: @"A"]; > if ( tempDict != nil && [tempDict isKindOfClass: [NSDictionary > class]] ) { > adenosineRepresentation = [[BCNucleotideDNA alloc] > initWithSymbol: [@"A" characterAtIndex: 0]]; > [baseDefinitions removeObjectForKey: @"A"]; > } > > Why not write this as one line: > > adenosineRepresentation = [[BCNucleotideDNA alloc] initWithSymbol:'A']; > > I don't understand the function of tempDict, and the removeObjectForKey > line. The use of tempDict is just basic error trapping. If the file's corrupted, they key could be absent or malformed, so I check whether it's present and a valid NSDictionary before trying to use it. If we were being really careful, I'd have an "else" statement that threw an exception, but I can't really decide how best to define exceptions at this point. JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Sat Aug 28 15:01:40 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 28 Aug 2004 15:01:40 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: References: Message-ID: On Aug 28, 2004, at 1:25 PM, John Timmer wrote: > The use of tempDict is just basic error trapping. If the file's > corrupted, > they key could be absent or malformed, so I check whether it's present > and a > valid NSDictionary before trying to use it. > But you're not using the tempDict when initializing the Symbol, so I am still confused... - Koen. From mek at mekentosj.com Sat Aug 28 15:29:09 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 28 Aug 2004 12:29:09 -0700 Subject: [Biococoa-dev] mass calculator bug Message-ID: <200408281229.AA3094348020@mekentosj.com> Sorry guys, I'm somewhere else right now and don't have the time to really go into details, which I will a.s.a.p. (somewhere tomorrow).... Hope it doesn't slow things down to much. Alex ---------- Original Message ---------------------------------- From: John Timmer Date: Sat, 28 Aug 2004 11:13:27 -0400 > >The following was left over from when I started with a DNA sequence class >before deciding what would be generally useful and moved to the superclass - >I had used "sequenceArray" instead of "sequence", but forgot to change it in >this location. The idea behind it (and a lot else that I code) is to >prevent a user from doing something stupid, accidentally or maliciously. >Since we can't control what's in the array the user is handing us when it's >a raw array (as opposed to when we're handed a sequence) we need to make >sure that its contents will work with all the methods we're using on the >contents - hence the test for BCSymbols. > >> >> John, you implemented this method setSequence as follows: >> >> - (void)setSequence:(NSMutableArray *)anArray >> { >> [sequenceArray release]; >> >> sequenceArray = [[NSMutableArray alloc] init]; >> >> NSEnumerator *theEnumerator = [sequenceArray objectEnumerator]; >> id aSymbol; >> while ( aSymbol = [theEnumerator nextObject] ) { >> if ( [aSymbol isKindOfClass: [BCSymbol class]] ) >> [sequenceArray addObject: aSymbol]; >> } >> } > > > >> Is it possible to move all the rangeOfSubsequence variants that are now >> in BCSequeneDNA to BCSequence? I think the same code can be useful for >> proteins as well. >Please do! And please fix my experimentation with the adenosine @"A" -> 'A' >conversion. And rename anything you feel is necessary, as long as you're >careful with the find and replace! > > >It looks like I won't have CVS access all weekend (Koen, if you're annoyed >at Earthlink, I'm annoyed at Verizon), so I'll be working on something new >instead of contributing to the work on sequences, copying, and coding. > >Incidentally, have you done a "step into" in the debugger to see where >things actually error out? And what's the error message? > >Cheers, > >John > >_______________________________________________ >This mind intentionally left blank > > >_______________________________________________ >Biococoa-dev mailing list >Biococoa-dev at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/biococoa-dev > ___________________________________________________________ $0 Web Hosting with up to 120MB web space, 1000 MB Transfer 10 Personalized POP and Web E-mail Accounts, and much more. Signup at www.doteasy.com From kvddrift at earthlink.net Sat Aug 28 16:50:26 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 28 Aug 2004 16:50:26 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: References: Message-ID: On Aug 28, 2004, at 11:13 AM, John Timmer wrote: > >> Is it possible to move all the rangeOfSubsequence variants that are >> now >> in BCSequeneDNA to BCSequence? I think the same code can be useful for >> proteins as well. > Please do! Already running into one problem here. These methods call "isRepresentedByBase" a couple of times. But that is only defined in BCNucleotideSymbol, not in BCSymbol. I can think of a few solutions: 1. Change the name to isRepresentedBySymbol, and put an empty method in BCSymbol. I don;t think that the one in BCNucleotideSymbol gets called then, because I am treating all symbols as BCSymbol 2. Add an extra line that tests if the symbol is a BCNucleotideSymbol, and then excecute the line containing isRepresentedBySymbol. 3. ... I think the second one is the way to go, but it might bring additional problems. Any other ideas? thanks, - Koen. From kvddrift at earthlink.net Sat Aug 28 19:31:11 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 28 Aug 2004 19:31:11 -0400 Subject: [Biococoa-dev] Factories Message-ID: <58A5ADA6-F94A-11D8-AAB8-003065A5FDCC@earthlink.net> Hi, Still thinking about the best way to implement a flyweight pattern to create symbols. I finally think I understand how this works and made a new class called BCAminoAcidFactory. Here's its implementation: #import "BCAminoAcidFactory.h" static NSMutableDictionary *aminoAcidDictionary; @implementation BCAminoAcidFactory - ( BCAminoAcid *)aminoAcidWithSymbol: (unichar) aSymbol { BCAminoAcid *aminoAcid = nil; NSString *symbolString = [NSString stringWithCharacters: &aSymbol length: 1]; if ( aminoAcidDictionary == nil ) { aminoAcidDictionary = [[NSMutableDictionary alloc] init]; } aminoAcid = [aminoAcidDictionary objectForKey: symbolString]; if ( aminoAcid == nil ) { aminoAcid = [[BCAminoAcid alloc] initWithSymbol: aSymbol]; [aminoAcidDictionary setObject: aminoAcid forKey: symbolString]; } return aminoAcid; } This is in turn called by any object that needs an amino acid. So initWithSymbol is never called directly, alwayd through the factory object. If the aa already exists, it will return the one that's in the static dictionary, otherwise it'll create a new one. Let me know what you think, and if I overlooked something critical. I'll wait for comments before committing it to CVS. - Koen. From a.griekspoor at nki.nl Sun Aug 29 15:33:45 2004 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sun, 29 Aug 2004 21:33:45 +0200 Subject: [Biococoa-dev] [BioCocoa] Speed vs Safety Message-ID: <577255A2-F9F2-11D8-80BF-000393CFDE0C@nki.nl> A general discussion topic based on the emails from the last days. Koen asked John a few times, why did you do this while we can do it in one line of code, and the answer John gave was that he did quite some error checking and in general expressed his concerns about users/developers trying to execute malicious code. Personally, I'm impressed by the safety John builds into his code, but am worried that sometimes this goes a bit to far and might have the opposite effect. Let me explain. Up 'til now all of us have been producing end-user apps, and I bet you if a user finds a way to do things wrong, they will (otherwise known as Murphy's Law #128193 or something). And the developer has to prevent this, so he builds a lot of safety measures and error checking in his methods. But this time it's a bit different, we have someone in between, the developer who uses our framework. And the way I see it, we have to make sure that our methods do what they should according to the docs, but the end responsibility of getting something from the end-user and putting it in our framework in the right way lies at the developer. One of the examples Koen mentioned was an accessor to set the sequenceArray, which would normally be fairly easy: -(void)setSequenceArray; (NSArray *)theArray{ NSArray *oldArray = sequenceArray; sequence = [theArray copy]; [oldArray release]; } or one of the well known variants. In this particular example John thought "well, what if someone passes an array that doesn't contain BCSymbols. but something else?", and adds a method that enumerates over the array and checks if each object is of class BCSymbol. It creates safety at the expense of speed/memory usage. Imagine setting a 2.4Mb sequence. The real question here is of course whether the increased safety is worth the reduced speed and increased memory usage. I think not for a number of reasons. First, we create a well document framework that does what we say it will do, and does it fast. It is the developers task to adhere to the documentation and error check things he puts in. I also think this is something not to unfair to expect from a developer, at least more than from a end-user. Part of this is that the source is available, thus if a developer gets an unexpected output/crash he should be able to find out where things go wrong, something an end-user can't do. In return he gets back a framework that is fast(er) and lightweight. Also, our code becomes not only faster and less memory intensive, it becomes more transparent as well. Second, most of these crashes will occur during development (if the developer does proper testing), a stage where crashes are less important and where things can still be fixed/checked/circumvented by the developer. John also brought up the point that he does checking in case a developer tries to "hack" stuff or change things he should change. Of course we should make things private and inaccessible when we don't developers to do that, but what if they really want to? What if they find this way by messing with direct C calls? What if they do things that messes up the whole framework, stalls things, crashes their app, boggles down their laptop? Honestly I don't care! As long as do exactly what is documented, do that job very well, and watch out for easy pitfalls, then if someone wants to do it otherwise it is his problem. Remember this is a developer, and developers are used to experiment, let him. If he comes up with something really great because he changed some stuff we didn't think of, or thought wasn't a good plan, great! I'll be the first to applaud that. If he starts complaining that it doesn't work then, well we can tell him to follow the documentation or see if we can change some stuff. Finally, I rather have the developer feel that he did something wrong than us trying to mask/fix his mistakes. How many times did our apps crash before we shipped it as we misused the Cocoa frameworks? Well mine did very often. Then I fixed my mistakes and now my apps rarely crash. And are perfectly Cocoa framework compatible in addition. And that's the way it should be. You quickly arrive at the discussion we have seen in the webbrowser world hundreds of times. Many webpages are not 100% correct (in fact most are not). The creators of webbrowsers had two options, either stop rendering a page with incorrect syntax, or try to make the best of it, Obviously the last option is only viable one if you have end-users that certainly don't like a browser that only displays half the webpage you give it. But this has one big disadvantage, website developers who see that their (syntactically not 100% correct) webpages are rendered perfectly fine, will publish these and won't put the effort in fixing these 'bugs' as you get nothing in return for the effort spent. The result is only more incorrect pages. Most code in safari is not in rendering pages, it is in the code that tries to display something useful of malformed webpages. In our case we first of all have to legacy base, and second deal with developers who have a problem if the framework doesn't pick their malformed input, and most important of all, will then fix that. The proposed "solution" is rather simple: - document precisely what the developer should do and what he can expect. - in general go for speedy, transparent code with very general error checking - generate exceptions if input, parameter, and/or output is unexpected Thus for the above setSequenceArray example: the compiler checks whether an array is put in, so we can expect an array with potentially unknown objects, but in principle that has no consequences for the rest of THIS method, so no error checking needed here. Other methods that depend on an array with proper types of objects should rely on general exceptions raised by the cocoa framework already, and furthermore only check if both input, parameters and output is of an expected number/type ( as far as not already checked by the compiler like in the above example). Again, this probably has a lot to do with the switch from app to framework development, and I'm sure we will have these kind of discussions a lot more in the future. Let me know what you guys think... Cheers, Alex Ps. At WWDC's Cocoa optimization session we were told that in general a for(i=0; i < x; i++), where x is [theArray count] is faster and less memory intensive than an enumerator (no object messaging involved). Although the latter is definitely more elegant, if we are looking at speed bottlenecks with large sequences, this might be something to keep in mind. ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Mac vs Windows 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* From mek at mekentosj.com Sun Aug 29 16:02:43 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 29 Aug 2004 22:02:43 +0200 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: <625D9D66-F8DF-11D8-AAB8-003065A5FDCC@earthlink.net> References: <8929D548-F8A8-11D8-AAB8-003065A5FDCC@earthlink.net> <789C14A3-F8DC-11D8-957E-000393CFDE0C@mekentosj.com> <625D9D66-F8DF-11D8-AAB8-003065A5FDCC@earthlink.net> Message-ID: <6392BF46-F9F6-11D8-80BF-000393CFDE0C@mekentosj.com> >> Of course you note one problem here, we haven't implemented a number >> of accessors. I already mentioned once that we should for all >> variables like Apple advices us to do, and always call the accessors >> if you want to change things (instead of directly accessing the >> variable). This is one example where this comes in handy. If we don't >> want to make some accessors public, we have to @private them so >> they're invisible outside the class. > > Yes, I fully agree. What's the syntax again to use @private? From Objc.pdf @private The instance variable is accessible only within the class that declares it. @protected The instance variable is accessible within the class that declares it and within classes that inherit it. @public The instance variable is accessible everywhere. Example header file: @interface Worker : NSObject { char *name; @private int age; char *evaluation; @protected id job; float wage; @public id boss; } By default, all unmarked instance variables (like name above) are @protected. This is for instance variables. For methods making them "private" is pretty easy, leave the declaration out of the header file and put it in the .m file before you begin the @implementation: Example Prefcontroller.m: #import "PrefController.h" @interface PrefController (Private) NSView * currentPane; id delegate; -(void)setupToolbar; @end @implementation PrefController // ================================================================ #pragma mark --- INIT & DEALLOC // ================================================================ - (PrefController*)init { self = [super initWithWindowNibName:@"Preferences"]; etc... This way the currenPane variable and the setupToolbar are available within the class, but nobody outside it can see it as they are not declared in the (public) header file. >> Anyway, I leave this up to you guys. An alternative would be a >> special init method that takes all variables as input and do >> everything in one call. >> > > I don't like that idea, I prefer the accessors. Me too to be honest ;-) ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From mek at mekentosj.com Sun Aug 29 16:21:14 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 29 Aug 2004 22:21:14 +0200 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: <42DE9EE9-F8DE-11D8-AAB8-003065A5FDCC@earthlink.net> References: <8929D548-F8A8-11D8-AAB8-003065A5FDCC@earthlink.net> <789C14A3-F8DC-11D8-957E-000393CFDE0C@mekentosj.com> <42DE9EE9-F8DE-11D8-AAB8-003065A5FDCC@earthlink.net> Message-ID: >> So we start with the first interface line in the BCSequence header: >> @interface BCSequence : NSObject { >> should become: >> @interface BCSequence : NSObject { >> > > Hmm, I have never used such a construction before and have done > similar things (passing an object to another class). It's not passing an object to another class: With the line: @interface BCSequence: NSObject { you simply declare that the BCSequence object adheres to protocol1 and protocol2. Protocols are lists of methods that you promise this class has implemented. An example protocol from EnzymeX: @protocol EXMapSequenceViewDelegate - (NSString *) sequenceForMapView:(EXMapSequenceView *)mview; - (void) setVisibleSequence: (NSRange) aRange; - (NSRange) visibleSequence; - (NSRange) selectedSequence; - (void)generateMap; @end This has a number of big advantages, first the compiler will help to check if classes conform to required protocols. Second, it allows you to check if an object at runtime supports certain methods using NSObject's conformsToProtocol: classmethod. Finally, you can promise the compiler that a class you are calling supports the protocol: [< EXMapSequenceViewDelegate>theObject doThis]; , even if you don't know exactly what type of class theObject will be at runtime (as long as it adheres to the protocol it's fine). A number of protocols are predefined in the cocoa frameworks and for instance . In order to make your class NSCoding compatible, you have to let this know by adding that particular part to your @interface line, plus you have to implement the methods declared in the protocol (initWithCoder and encodeWithCoder in this case). These are described in the docs. >> In BCMassCalculator, InitWithSequence, change: >> >>> sequence = [seq copy]; <--------- error >> >> to sequence = [seq copyWithZone: [self zone]]; // I believe this is >> more efficient memory wise >> > > Even if I use an accessor: > > [self setSequence: seq]; > > and then: > > - (void)setSequence:(BCSequence *)s > { > [s retain]; > [sequence release]; > sequence = s; > } > > I get the same copywithZone error. But maybe this is different > because BCSequence passes a copy of itself? That's probably because you still did not implement the copyWithZone method and thus did not implement the NSCopying protocol. Did you implement the methods that I wrote in my email, and change the interface line? Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* From mek at mekentosj.com Sun Aug 29 16:30:53 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 29 Aug 2004 22:30:53 +0200 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: <8C43C9BF-F90A-11D8-AAB8-003065A5FDCC@earthlink.net> References: <8C43C9BF-F90A-11D8-AAB8-003065A5FDCC@earthlink.net> Message-ID: <52B35CA9-F9FA-11D8-80BF-000393CFDE0C@mekentosj.com> >> Since we can't control what's in the array the user is handing us >> when it's >> a raw array (as opposed to when we're handed a sequence) we need to >> make >> sure that its contents will work with all the methods we're using on >> the >> contents - hence the test for BCSymbols. > > If we keep the method private, we don't have to worry about that, I > guess. We could implement a validateInput method that checks (using an > NSScanner) if any character in the string are not allowed. Or we could > use an BCAlphabet class that manages the symbols. But in the meantime > I'll leave it alone. I think the initWithSequence method should be a public one, like NSString's initWithString. See my mail "safety vs speed" why I think we don't have to worry for a different reason. But the validateSequence method you propose would be a very welcome addition which would help the developer to make sure he feeds us a proper sequence object. We leave him the choice then if he wants to trade some speed for safety. >>> Is it possible to move all the rangeOfSubsequence variants that are >>> now >>> in BCSequeneDNA to BCSequence? I think the same code can be useful >>> for >>> proteins as well. >> Please do! And please fix my experimentation with the adenosine @"A" >> -> 'A' >> conversion. And rename anything you feel is necessary, as long as >> you're >> careful with the find and replace! > > Ok, I will start working on that. I'll also replace 'sequence' with > 'sequenceArray'. Very nice! > Incidentally, have you done a "step into" in the debugger to see where >> things actually error out? And what's the error message? > > If I step into the offending line I go straight to the assembly code, > which is not very helpful (to me atlast :) As BCSymbol supports -description now, can you check that the enumerator at least returns symbols by putting in a NSLog(@"%@", aSymbol); ? Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From mek at mekentosj.com Sun Aug 29 16:38:19 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 29 Aug 2004 22:38:19 +0200 Subject: [Biococoa-dev] mass calculator bug Message-ID: <5C977056-F9FB-11D8-80BF-000393CFDE0C@mekentosj.com> > The use of tempDict is just basic error trapping. If the file's > corrupted, > they key could be absent or malformed, so I check whether it's present > and a > valid NSDictionary before trying to use it. > > If we were being really careful, I'd have an "else" statement that > threw an > exception, but I can't really decide how best to define exceptions at > this > point. > > JT This is the other example which triggered my safety vs speed email. WE provide the basic plist, so it should work period. I can see two reason why it couldn't. First, the download/installation went terribly wrong, in case this probably isn't the only thing broken and everything goes bananas. Second, the developer (or user) has tinkered/changed our plist, which he obviously did wrong. Why then save his *ss? He should simply repair/adjust the plist before shipping his app. I don't see a reason for the error checking code. To black and white? Alex ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From mek at mekentosj.com Sun Aug 29 16:50:52 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 29 Aug 2004 22:50:52 +0200 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: References: Message-ID: <1D6E40AB-F9FD-11D8-80BF-000393CFDE0C@mekentosj.com> >>> Is it possible to move all the rangeOfSubsequence variants that are >>> now >>> in BCSequeneDNA to BCSequence? I think the same code can be useful >>> for >>> proteins as well. >> Please do! can I add that reverse is a method that could apply to general sequence as well? > Already running into one problem here. These methods call > "isRepresentedByBase" a couple of times. But that is only defined in > BCNucleotideSymbol, not in BCSymbol. > > I can think of a few solutions: > > 1. Change the name to isRepresentedBySymbol, and put an empty method > in BCSymbol. I don;t think that the one in BCNucleotideSymbol gets > called then, because I am treating all symbols as BCSymbol > > 2. Add an extra line that tests if the symbol is a BCNucleotideSymbol, > and then excecute the line containing isRepresentedBySymbol. > > 3. ... In general these methods do the same thing, except that DNA sequence can be ambiguous, thus need additional stuff. Wouldn't it an idea to implement the general way in the BCSequence class, but override the method in BCSequenceDNA to do it the "DNA" way. This way you don't have to check if a symbol is of type BCNucleotide on every aminoacid, and also not on every nucleotide as the method implementation is automatically different for both types. Alex ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From mek at mekentosj.com Sun Aug 29 17:17:38 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 29 Aug 2004 23:17:38 +0200 Subject: [Biococoa-dev] Factories In-Reply-To: <58A5ADA6-F94A-11D8-AAB8-003065A5FDCC@earthlink.net> References: <58A5ADA6-F94A-11D8-AAB8-003065A5FDCC@earthlink.net> Message-ID: Koen, John implemented 17 nucleotides in the BCSequenceDNA class, why doesn't the same approach work for aminoacids? All the class methods like + (BCAminoAcid *) alanine are in place isn't it? Why doesn't that work? You just implement the equivalent of: aBase = [BCNucleotideDNA baseForSymbol: [entry characterAtIndex: loopCounter]]; thus aminoAcidForSymbol: which would be similar as well: + (id) aminoAcidForSymbol: (unichar)entry { switch ( entry ) { case 'A' : case 'a' : { return [BCAminoAcid alanine]; break; } case 'C' : case 'c' : { return [BCAminoAcid cysteine]; break; } etc... The C switch statement is extremely lightweight in comparison to the object messaging involved in the use of a dictionary. For me it's fine if we even stick to John's way to instantiate all aminoacids at once as well, pretty big chance you need more than one aminoacid any way once one is called. But even if you want them instantiated on a per symbol basis, you can easily do that in the class method: + (BCAminoAcid *) alanine { if ( adenosineRepresentation == nil ) INIT SHARED ALANINE HERE (initAminoAcidWithSymbol: 'A') return adenosineRepresentation; } Again, given that the nucleotides come in 17 different forms and the aminoacids in about 20, I don't see why diverge their implementations. It will certainly help to setup the system if it already works for one of the two types, we can focus on optimizing both systems at ones as experience with one can be applied on the other as well, and we will be able to create more shared methods in the superclasses. Alex Op 29-aug-04 om 1:31 heeft Koen van der Drift het volgende geschreven: > Hi, > > Still thinking about the best way to implement a flyweight pattern to > create symbols. I finally think I understand how this works and made a > new class called BCAminoAcidFactory. Here's its implementation: > > #import "BCAminoAcidFactory.h" > > static NSMutableDictionary *aminoAcidDictionary; > > @implementation BCAminoAcidFactory > > > - ( BCAminoAcid *)aminoAcidWithSymbol: (unichar) aSymbol > { > BCAminoAcid *aminoAcid = nil; > NSString *symbolString = [NSString stringWithCharacters: &aSymbol > length: 1]; > > if ( aminoAcidDictionary == nil ) > { > aminoAcidDictionary = [[NSMutableDictionary alloc] init]; > } > > aminoAcid = [aminoAcidDictionary objectForKey: symbolString]; > > if ( aminoAcid == nil ) > { > aminoAcid = [[BCAminoAcid alloc] initWithSymbol: aSymbol]; > [aminoAcidDictionary setObject: aminoAcid forKey: symbolString]; > } > > return aminoAcid; > } > > > This is in turn called by any object that needs an amino acid. So > initWithSymbol is never called directly, alwayd through the factory > object. If the aa already exists, it will return the one that's in the > static dictionary, otherwise it'll create a new one. > > Let me know what you think, and if I overlooked something critical. > I'll wait for comments before committing it to CVS. > > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From kvddrift at earthlink.net Sun Aug 29 17:31:45 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 29 Aug 2004 17:31:45 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: References: <8929D548-F8A8-11D8-AAB8-003065A5FDCC@earthlink.net> <789C14A3-F8DC-11D8-957E-000393CFDE0C@mekentosj.com> <42DE9EE9-F8DE-11D8-AAB8-003065A5FDCC@earthlink.net> Message-ID: On Aug 29, 2004, at 4:21 PM, Alexander Griekspoor wrote: > That's probably because you still did not implement the copyWithZone > method and thus did not implement the NSCopying protocol. > Did you implement the methods that I wrote in my email, and change the > interface line? > It's all working now and in CVS. thanks, - Koen. From jtimmer at bellatlantic.net Sun Aug 29 18:36:18 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sun, 29 Aug 2004 18:36:18 -0400 Subject: [Biococoa-dev] [BioCocoa] Speed vs Safety In-Reply-To: <577255A2-F9F2-11D8-80BF-000393CFDE0C@nki.nl> Message-ID: > A general discussion topic based on the emails from the last days. > > Koen asked John a few times, why did you do this while we can do it in > one line of code, and the answer John gave was that he did quite some > error checking and in general expressed his concerns about > users/developers trying to execute malicious code. > Personally, I'm impressed by the safety John builds into his code, but > am worried that sometimes this goes a bit to far and might have the > opposite effect. Let me explain. Okay, just got back in from a weekend away. You're right that I'm probably projecting the amazing and unexpected things that users do to developers, hence the paranoia. A couple of things to be said in my favor - I've tried to limit most of the paranoia to situations involving either object creation or file input. There's a couple of reasons for this. One is that, in comparison to memory allocation or disk input and output, error checking's going to be time inexpensive (all the tests for the tempDictionary didn't add any time that was distinguishable from background noise to the DNA manipulations in the demo_app). The second is that, if errors are checked for here, it's going to be relatively difficult to introduce them later (though clearly, as you note, not impossible). So how about we make a deal - I won't complain if you remove my error checking code from anything that doesn't involve initializing or disk-I/O, and you don't complain about me putting them in those places? > Ps. At WWDC's Cocoa optimization session we were told that in general a > for(i=0; i < x; i++), where x is [theArray count] is faster and less > memory intensive than an enumerator (no object messaging involved). > Although the latter is definitely more elegant, if we are looking at > speed bottlenecks with large sequences, this might be something to keep > in mind. Didn't know this - I'll take it into account in the future, and may go back and modify some code accordingly. Incidentally, the tempDict variable was used back when I did my nucleotides with an "initWithDictionary:" method, and it didn't seem worth removing the error test after I changed it to an "initWithSymbol". Cheers, John _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Sun Aug 29 18:41:32 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sun, 29 Aug 2004 18:41:32 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: <5C977056-F9FB-11D8-80BF-000393CFDE0C@mekentosj.com> Message-ID: >> The use of tempDict is just basic error trapping. If the file's >> corrupted, >> they key could be absent or malformed, so I check whether it's present >> and a >> valid NSDictionary before trying to use it. >> >> If we were being really careful, I'd have an "else" statement that >> threw an >> exception, but I can't really decide how best to define exceptions at >> this >> point. >> >> JT > > This is the other example which triggered my safety vs speed email. > WE provide the basic plist, so it should work period. I can see two > reason why it couldn't. First, the download/installation went terribly > wrong, in case this probably isn't the only thing broken and everything > goes bananas. Second, the developer (or user) has tinkered/changed our > plist, which he obviously did wrong. Why then save his *ss? He should > simply repair/adjust the plist before shipping his app. I don't see a > reason for the error checking code. To black and white? > Alex Here, I'd argue that we'd do a service to developers to throw an exception that was informative. If just one nucleotide got corrupted in the .plist, it'd be a nightmare to figure out why the app was crashing (especially if it was an ambiguous one that wasn't used often). Even a good NSLog would be better than nothing. JT _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Sun Aug 29 18:44:18 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sun, 29 Aug 2004 18:44:18 -0400 Subject: [Biococoa-dev] Factories In-Reply-To: Message-ID: Koen - If you'd like, when I get into work and have access to the CVS tomorrow, I'll set up the factory methods for all the amino acids. It'll be pretty easy for me, since the work on the nucleotides is pretty fresh in my mind. JT _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Sun Aug 29 18:46:36 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sun, 29 Aug 2004 18:46:36 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: Message-ID: > Already running into one problem here. These methods call > "isRepresentedByBase" a couple of times. But that is only defined in > BCNucleotideSymbol, not in BCSymbol. > > I can think of a few solutions: > > 1. Change the name to isRepresentedBySymbol, and put an empty method in > BCSymbol. I don;t think that the one in BCNucleotideSymbol gets called > then, because I am treating all symbols as BCSymbol > > 2. Add an extra line that tests if the symbol is a BCNucleotideSymbol, > and then excecute the line containing isRepresentedBySymbol. > > 3. ... I think Alex's 3 was to have the general method in the BCSequence class, and leave them overridden in BCSequenceDNA, which makes the most sense to me. I think this is the last email that came over the weekend that I should respond to - let me know if I'm mistaken. JT _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Sun Aug 29 18:46:36 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sun, 29 Aug 2004 18:46:36 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: Message-ID: > Already running into one problem here. These methods call > "isRepresentedByBase" a couple of times. But that is only defined in > BCNucleotideSymbol, not in BCSymbol. > > I can think of a few solutions: > > 1. Change the name to isRepresentedBySymbol, and put an empty method in > BCSymbol. I don;t think that the one in BCNucleotideSymbol gets called > then, because I am treating all symbols as BCSymbol > > 2. Add an extra line that tests if the symbol is a BCNucleotideSymbol, > and then excecute the line containing isRepresentedBySymbol. > > 3. ... I think Alex's 3 was to have the general method in the BCSequence class, and leave them overridden in BCSequenceDNA, which makes the most sense to me. I think this is the last email that came over the weekend that I should respond to - let me know if I'm mistaken. JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Sun Aug 29 18:48:11 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 29 Aug 2004 18:48:11 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: <1D6E40AB-F9FD-11D8-80BF-000393CFDE0C@mekentosj.com> References: <1D6E40AB-F9FD-11D8-80BF-000393CFDE0C@mekentosj.com> Message-ID: <80ED010C-FA0D-11D8-9E22-003065A5FDCC@earthlink.net> On Aug 29, 2004, at 4:50 PM, Alexander Griekspoor wrote: >>>> Is it possible to move all the rangeOfSubsequence variants that are >>>> now >>>> in BCSequeneDNA to BCSequence? I think the same code can be useful >>>> for >>>> proteins as well. >>> Please do! > > can I add that reverse is a method that could apply to general > sequence as well? Yes you can, actually you just did :) > In general these methods do the same thing, except that DNA sequence > can be ambiguous, thus need additional stuff. Wouldn't it an idea to > implement the general way in the BCSequence class, but override the > method in BCSequenceDNA to do it the "DNA" way. That sounds like a good plan. I'll fix that tonight and commit it to CVS. Also the general reverse method. - Koen. From kvddrift at earthlink.net Sun Aug 29 19:01:38 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 29 Aug 2004 19:01:38 -0400 Subject: [Biococoa-dev] Factories In-Reply-To: References: <58A5ADA6-F94A-11D8-AAB8-003065A5FDCC@earthlink.net> Message-ID: <62343E20-FA0F-11D8-9E22-003065A5FDCC@earthlink.net> On Aug 29, 2004, at 5:17 PM, Alexander Griekspoor wrote: > John implemented 17 nucleotides in the BCSequenceDNA class, why > doesn't the same approach work for aminoacids? All the class methods > like + (BCAminoAcid *) alanine are in place isn't it? Why doesn't that > work? I'm sure it does work, actually it's already implemented in BCAminoAcid in CVS. I was just confused by all the different methods for each base/amino acid. For instance this code: + (BCAminoAcid *) alanine { if ( alanineRepresentation == nil ) [BCAminoAcid initAminoAcids]; return alanineRepresentation; } Every time an amino acid representation is not present, initAminoAcids gets called and makes a representation for each amino acid. This just appeared redundant to me. So I started to read some more, and found that the combination of Singleton/Flyweight/Factory patterns is a widely used approach in OOP, especially when you deal with a large amount of similar objects. So based on some sample code I read, I implemented smething similar in BCAminoAcid, and BCSequenceProtein. But I will also leave in the method that John uses for nucleotides. - Koen. From kvddrift at earthlink.net Sun Aug 29 19:03:11 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 29 Aug 2004 19:03:11 -0400 Subject: [Biococoa-dev] Factories In-Reply-To: References: Message-ID: <99C90B8C-FA0F-11D8-9E22-003065A5FDCC@earthlink.net> On Aug 29, 2004, at 6:44 PM, John Timmer wrote: > If you'd like, when I get into work and have access to the CVS > tomorrow, > I'll set up the factory methods for all the amino acids. It'll be > pretty > easy for me, since the work on the nucleotides is pretty fresh in my > mind. > John, It should already be in CVS. But if you can check that I didn't forget a method or made another mistake, I'd appreciate it. thanks, - Koen. From jtimmer at bellatlantic.net Sun Aug 29 19:04:30 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Sun, 29 Aug 2004 19:04:30 -0400 Subject: [Biococoa-dev] Factories In-Reply-To: <99C90B8C-FA0F-11D8-9E22-003065A5FDCC@earthlink.net> Message-ID: > > It should already be in CVS. But if you can check that I didn't forget > a method or made another mistake, I'd appreciate it. I'll try it out tomorrow, but the last version I got doesn't have a stop codon, which would be useful (especially as I'm designing translations right now ;). Cheers, John _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Sun Aug 29 19:42:31 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 29 Aug 2004 19:42:31 -0400 Subject: [Biococoa-dev] Factories In-Reply-To: References: Message-ID: <187E366E-FA15-11D8-9E22-003065A5FDCC@earthlink.net> On Aug 29, 2004, at 7:04 PM, John Timmer wrote: > I'll try it out tomorrow, but the last version I got doesn't have a > stop > codon, which would be useful (especially as I'm designing translations > right > now ;). > Feel free to add that, I'm not that familiair with stop codons. Could you explain to me why they should be in BCAminoAcid? I thought they are part of a nucleotide sequence, not a protein. - Koen. From kvddrift at earthlink.net Sun Aug 29 21:15:16 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 29 Aug 2004 21:15:16 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: References: Message-ID: <0D3434E4-FA22-11D8-9E22-003065A5FDCC@earthlink.net> On Aug 29, 2004, at 6:46 PM, John Timmer wrote: >> Already running into one problem here. These methods call >> "isRepresentedByBase" a couple of times. But that is only defined in >> BCNucleotideSymbol, not in BCSymbol. >> >> I can think of a few solutions: >> >> 1. Change the name to isRepresentedBySymbol, and put an empty method >> in >> BCSymbol. I don;t think that the one in BCNucleotideSymbol gets called >> then, because I am treating all symbols as BCSymbol >> >> 2. Add an extra line that tests if the symbol is a BCNucleotideSymbol, >> and then excecute the line containing isRepresentedBySymbol. >> >> 3. ... > > I think Alex's 3 was to have the general method in the BCSequence > class, and > leave them overridden in BCSequenceDNA, which makes the most sense to > me. > I need some help from you DNA dudes here. Looking at the methods, they are all very similar, until the lines: if ( [selfSymbol isRepresentedByBase: entrySymbol] || [entrySymbol isRepresentedByBase: selfSymbol] ) { or variants thereof. After that I really don't follow the flow of the code, too many counters and symbols ;-) Here's my question: isRepresentedByBase a specific test that is only needed for bases, or should amino acids be tested the same way using a similar method? What I am trying to get at is to see if it is possible to have a separate method to test the entrySymbol and selfSymbol that goes in BCNucleotideDNA and BCAminoAcid (or BCSequenceDNA and BCSequenceProtein). Then we can keep all the rangeOfSubsequence in BCSequence. BTW, just saw that there is also a method subSequenceInRange which doesn't do all the checking - I guess I wrote that one ;D - Koen. From kvddrift at earthlink.net Sun Aug 29 21:52:57 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 29 Aug 2004 21:52:57 -0400 Subject: [Biococoa-dev] oops :) Message-ID: <50C707B3-FA27-11D8-9E22-003065A5FDCC@earthlink.net> Hi, Just found that some earlier commits from me in CVS are giving many errors. I don' know ho to revert it in CVS, so I've attached the correct ones in this email. If someone knows how to fix it in CVS please do so. thanks, and sorry, - Koen. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: BCNucleotideDNA.h URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: BCSequence.h URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: BCSequence.m Type: application/octet-stream Size: 6816 bytes Desc: not available URL: From mek at mekentosj.com Mon Aug 30 01:27:03 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 30 Aug 2004 07:27:03 +0200 Subject: [Biococoa-dev] oops :) In-Reply-To: <50C707B3-FA27-11D8-9E22-003065A5FDCC@earthlink.net> References: <50C707B3-FA27-11D8-9E22-003065A5FDCC@earthlink.net> Message-ID: <397C2074-FA45-11D8-80BF-000393CFDE0C@mekentosj.com> Doesn't it work if you simply replace the contents of your checkout version with the contents of these files and then re-commit? My experience with CVS is really lacking here, sorry... Alex Op 30-aug-04 om 3:52 heeft Koen van der Drift het volgende geschreven: > Hi, > > Just found that some earlier commits from me in CVS are giving many > errors. I don' know ho to revert it in CVS, so I've attached the > correct ones in this email. If someone knows how to fix it in CVS > please do so. > > thanks, and sorry, > > > - Koen. > > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* From a.griekspoor at nki.nl Mon Aug 30 01:44:52 2004 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Mon, 30 Aug 2004 07:44:52 +0200 Subject: [Biococoa-dev] [BioCocoa] Speed vs Safety In-Reply-To: References: Message-ID: > You're right that I'm probably projecting the amazing and unexpected > things > that users do to developers, hence the paranoia. Believe me I've seen amazing things indeed ;-) > > A couple of things to be said in my favor - John, just to make clear, my email was absolutely not personal and I was certainly not accusing you of anything here, what I hope to do initialize here is to let us shift our thoughts from one centered on users to one centered on developers as our target audience. > I've tried to limit most of the > paranoia to situations involving either object creation or file input. > There's a couple of reasons for this. One is that, in comparison to > memory > allocation or disk input and output, error checking's going to be time > inexpensive (all the tests for the tempDictionary didn't add any time > that > was distinguishable from background noise to the DNA manipulations in > the > demo_app). to be relatively difficult to introduce them later (though > clearly, as you > note, not impossible). > That's true for the nucleotides which are instantiated once, but the initWithSequence method is used very often as integral part of manipulations as well, like the example Koen gave in, where it really matters speed-wise. In addition, I can imagine that certainly during manipulations you use these methods while being absolutely sure already that the sequence is 100% correct, and still it would be checked again. > So how about we make a deal - I won't complain if you remove my error > checking code from anything that doesn't involve initializing or > disk-I/O, > and you don't complain about me putting them in those places? Well, I would like to setup the deal differently. The checking is fine by me in our singleton classes which are instantiated only once. In the other init classes like initWithSequence and classes where we are handed stuff we don't know the contents of for sure, we don't do the error checking but we do offer the developer validation methods as Koen proposed (like BOOL validateSequence:) so that the developer can opt to for speed or safety. Deal? ;-) > Incidentally, the tempDict variable was used back when I did my > nucleotides > with an "initWithDictionary:" method, and it didn't seem worth > removing the > error test after I changed it to an "initWithSymbol". Well, if it adds code and doesn't make things clearer, let's remove it or not? I love making deals ;-) Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From mek at mekentosj.com Mon Aug 30 01:47:36 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 30 Aug 2004 07:47:36 +0200 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: References: Message-ID: <18992BF4-FA48-11D8-80BF-000393CFDE0C@mekentosj.com> >> This is the other example which triggered my safety vs speed email. >> WE provide the basic plist, so it should work period. I can see two >> reason why it couldn't. First, the download/installation went terribly >> wrong, in case this probably isn't the only thing broken and >> everything >> goes bananas. Second, the developer (or user) has tinkered/changed our >> plist, which he obviously did wrong. Why then save his *ss? He should >> simply repair/adjust the plist before shipping his app. I don't see a >> reason for the error checking code. To black and white? >> Alex > > > Here, I'd argue that we'd do a service to developers to throw an > exception > that was informative. If just one nucleotide got corrupted in the > .plist, > it'd be a nightmare to figure out why the app was crashing (especially > if it > was an ambiguous one that wasn't used often). Even a good NSLog would > be > better than nothing. > To black and white indeed then, again let's do the checking as a service where speed/memory is no issue (like here), and spin off error checking where speed/memory is an issue in separate methods. That way we provide the service, but don't make it obligatory. Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* From mek at mekentosj.com Mon Aug 30 01:51:56 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 30 Aug 2004 07:51:56 +0200 Subject: [Biococoa-dev] Factories In-Reply-To: <62343E20-FA0F-11D8-9E22-003065A5FDCC@earthlink.net> References: <58A5ADA6-F94A-11D8-AAB8-003065A5FDCC@earthlink.net> <62343E20-FA0F-11D8-9E22-003065A5FDCC@earthlink.net> Message-ID: > Every time an amino acid representation is not present, initAminoAcids > gets called and makes a representation for each amino acid. This just > appeared redundant to me. So I started to read some more, and found > that the combination of Singleton/Flyweight/Factory patterns is a > widely used approach in OOP, especially when you deal with a large > amount of similar objects. Yep, the trick here is that you assure that whichever of the aminoacids is asked for first, all of them are created at once, and this will not be called again. ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* From mek at mekentosj.com Mon Aug 30 02:13:38 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 30 Aug 2004 08:13:38 +0200 Subject: [Biococoa-dev] Factories In-Reply-To: <187E366E-FA15-11D8-9E22-003065A5FDCC@earthlink.net> References: <187E366E-FA15-11D8-9E22-003065A5FDCC@earthlink.net> Message-ID: Oops, classical mistake from us DNA people :-) Indeed, a stop amino acid doesn't exist, only a stop codon. The encoded tRNA doesn't have an amino acid attached, therefore the ribozyme falls off the mRNA and translation is terminated. So conceptually adding a stop amino acid is not right here. I think we should implement the intermediate layer here, which was a good thing to do anyway: the BCCodon and BCAlphabet objects. BCCodon's are objects containing three BCSymbols sized BCSequenceDNA (or perhaps RNA to be more precise actually), which can also act as their identifier, and a BCSymbol of type aminoacid. A group of BCCodons forms a species specific BCAlphabet which contains the species name, and serves as the central point to pass around in translation methods, and can be generated by an AlphabetManager. The AphabetManager allows manipulation of the BCAlphabet objects and also facilitates the creation of predefined commonly used alphabets (from a plist). I'm just thinking a bit out loud here about the following. In 4Peaks I get a nucleotide sequence derived from the trace file, which I "translate" to a protein sequence. But commonly this indeed contains a lot of stops: ACTW*GGH*LAK etc. By definition this is can not be a protein as Koen nicely mentioned. Perhaps we can make BCCodon a subclass of BCSymbol as well (I think that makes sense) and add a BCSequence subclass called BCSequenceCodons. I think this can greatly help in implementing translations and also in things like ORF finding. The nice thing here is that we can model the Protein Sequence as a real protein in which we don't have to think about what to do with stops in calculations like pI. The flowchart would be as follows then: BCSequenceDNA -> BCSequenceRNA -> BCSequenceCodons -> BCSequenceProtein - every '->' is a translation step. - of course we can make a number of convenience methods that would allow to do BCSequenceDNA -> BCSequenceProtein or BCSequenceRNA -> BCSequenceProtein with parameters to define what to do when a stop is encountered. I would love to see what you guys think.... Alex > >> I'll try it out tomorrow, but the last version I got doesn't have a >> stop >> codon, which would be useful (especially as I'm designing >> translations right >> now ;). >> > > Feel free to add that, I'm not that familiair with stop codons. Could > you explain to me why they should be in BCAminoAcid? I thought they > are part of a nucleotide sequence, not a protein. > > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From mek at mekentosj.com Mon Aug 30 02:25:54 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 30 Aug 2004 08:25:54 +0200 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: <0D3434E4-FA22-11D8-9E22-003065A5FDCC@earthlink.net> References: <0D3434E4-FA22-11D8-9E22-003065A5FDCC@earthlink.net> Message-ID: <728C5F70-FA4D-11D8-80BF-000393CFDE0C@mekentosj.com> > I need some help from you DNA dudes here. > > Looking at the methods, they are all very similar, until the lines: > > if ( [selfSymbol isRepresentedByBase: entrySymbol] || > [entrySymbol isRepresentedByBase: selfSymbol] ) { > > or variants thereof. After that I really don't follow the flow of the > code, too many counters and symbols ;-) > > Here's my question: isRepresentedByBase a specific test that is only > needed for bases, or should amino acids be tested the same way using a > similar method? Yes, it is because of the whole ambiguity problem of DNA, for example, if you have the sequence ATWG then TWG is a subsequence, but so is TAG and TTG. > What I am trying to get at is to see if it is possible to have a > separate method to test the entrySymbol and selfSymbol that goes in > BCNucleotideDNA and BCAminoAcid (or BCSequenceDNA and > BCSequenceProtein). Then we can keep all the rangeOfSubsequence in > BCSequence. Perhaps it's indeed a good plan to let BCSymbol have the isRepresentedBySymbol method that BCNucleotideDNA overrides to check for ambiguous bases as well, that way BCSequence could have the general methods. Furthermore, I as wondering if it's nice to have two complemented methods here: - isRepresentedBySymbol A isRepresentedBySymbol W but not vice versa - representsSymbol W representsSymbol A but not vice versa It's also nice to add this particular example to the headerdoc. the line would then become: if ( [selfSymbol isRepresentedBySymbol: entrySymbol] || [selfSymbol representsSymbol: entrySymbol] ) { But John knows more on this, so perhaps it's already there. > BTW, just saw that there is also a method subSequenceInRange which > doesn't do all the checking - I guess I wrote that one ;D Well, if the modus is 'strict' and want to check if a a sequence is exactly the same symbolwise (ATWG == ATWG but not ATAG), you don't need the checking, so it's nice to have anyway! Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From jtimmer at bellatlantic.net Mon Aug 30 08:46:24 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Mon, 30 Aug 2004 08:46:24 -0400 Subject: [Biococoa-dev] Factories In-Reply-To: Message-ID: > Oops, classical mistake from us DNA people :-) Indeed, a stop amino > acid doesn't exist, only a stop codon. The encoded tRNA doesn't have an > amino acid attached, therefore the ribozyme falls off the mRNA and > translation is terminated. So conceptually adding a stop amino acid is > not right here. > I think we should implement the intermediate layer here, which was a > good thing to do anyway: the BCCodon and BCAlphabet objects. > BCCodon's are objects containing three BCSymbols sized BCSequenceDNA > (or perhaps RNA to be more precise actually), which can also act as > their identifier, and a BCSymbol of type aminoacid. A group of BCCodons > forms a species specific BCAlphabet which contains the species name, > and serves as the central point to pass around in translation methods, > and can be generated by an AlphabetManager. The AphabetManager allows > manipulation of the BCAlphabet objects and also facilitates the > creation of predefined commonly used alphabets (from a plist). I'm going to check in code and a .plist later today that's my first stab at a translation object. There may be a way that I'm missing, but as I set out to design things, I couldn't come up with a way to translate that uses codons that's easier or more clear to code than going straight through the DNA itself. It would be if we refused to translate sequences with ambiguous bases, but I don't like that idea. Basically, with codons it turned into one giant lookup table where every potential codon had to be represented. Using the tree-like structure, all the triplets where the wobble base doesn't matter can be represented by a single entry, and most have only a purine/pyrimidine entry in the wobble position, and ambiguous bases are easy to handle. The downside is that, if you initially lump the DNA sequence into codons, you have to decompose them into individual bases again to use this layout. Again, I very well may be missing something, but it'll take me committing the code for you to get a better sense of that, I'd imagine ;). > I'm just thinking a bit out loud here about the following. In 4Peaks I > get a nucleotide sequence derived from the trace file, which I > "translate" to a protein sequence. But commonly this indeed contains a > lot of stops: ACTW*GGH*LAK etc. By definition this is can not be a > protein as Koen nicely mentioned. Perhaps we can make BCCodon a > subclass of BCSymbol as well (I think that makes sense) and add a > BCSequence subclass called BCSequenceCodons. I think this can greatly > help in implementing translations and also in things like ORF finding. > The nice thing here is that we can model the Protein Sequence as a real > protein in which we don't have to think about what to do with stops in > calculations like pI. Okay, we do seem to have a problem. Stop codons don't belong in a protein, and would screw up calculations on the protein (how do you do a molecular weight of something discontiguous?) but as you saw, there's many cases imaginable where you're going to need the full stretch of amino acid symbols that include stop codons (I'm going to want a bunch when I do the ORF methods in BCSequenceDNA). A potential solution: have a BCSequenceAminoAcid, that may contain stop codons. BCSequenceProtein can be a subclass of that, or a separate class entirely. It would (just maybe?) need to validate its sequences to ensure that there are no stop codons. Cheers, John _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Mon Aug 30 11:32:35 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Mon, 30 Aug 2004 11:32:35 -0400 Subject: [Biococoa-dev] Factories In-Reply-To: <8185D48D-FA85-11D8-80BF-000393CFDE0C@mekentosj.com> Message-ID: > >> Okay, we do seem to have a problem. Stop codons don't belong in a >> protein, >> and would screw up calculations on the protein (how do you do a >> molecular >> weight of something discontiguous?) but as you saw, there's many cases >> imaginable where you're going to need the full stretch of amino acid >> symbols >> that include stop codons (I'm going to want a bunch when I do the ORF >> methods in BCSequenceDNA). > And that's exactly where the BCSequenceCodons comes in! This is the > intermediate your are looking for. If you enumerate over each codon and > ask for it's representing aminoacid you get the AVTV*KLATC list you > want including your stop codons. This sequence can also be passed to > your ORF finder object that can generate the openreading frame (the DNA > sequence can be easily extracted as well as each codon also has it's > characteristic three nucleotide sequence as a variable). > > Like in real life the BCSequenceCodons acts as the intermediate between > DNA/RNA and a real protein... > I'll have a detailed look at the translation problem later today.... > Alex I want to start out by saying that I like the idea of a codon, and I think they're a great idea in theory. The issue I have is that I can't figure out how to make them work in practice. The problem I have is that basically a codon is a cluster of 3 nucleotides. Its meaning depends on the genetic code, its derivation depends on the reading frame, etc. - the codons themselves are essentially devoid of information unless they're provided with a lot of context. I'm just not seeing an easy way to provide all of that context within a codon itself without having way too many codon items to manage, or generating every single codon uniquely, on the fly. They also seem a bit wasteful - making codons would involve composing them from combinations of bases, but they'd have to be decomposed into individual bases again to handle translation easily. What I've been thinking of during my commute in was a BCSequenceTranslation, which would contain that sort of context - A reference to the original sequence it was translated from. A reading frame indication and/or range of translation A genetic code reference. The ability to derive BCSequenceProtein objects from it. This isn't ideal either - the DNA sequence can be edited after it's created - so I'm not entirely happy with it. It's just that I'm not happy with any other options at this point, either. I had a nice weekend, too, so I don't think it's just that I'm generally unhappy ;). Cheers, John _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Mon Aug 30 14:31:27 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Mon, 30 Aug 2004 14:31:27 -0400 Subject: [Biococoa-dev] Factories In-Reply-To: Message-ID: Just checked in the translation code. Since we're not sure what to return and aa's representing stops may not exist, I'm returning an NSArray using gaps in place of stop codons. It did 1 frame of my 11KB test sequence in about 0.2 of a second, including intializing the translation dictionary. Don't know if that's good or not... JT _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Mon Aug 30 18:29:44 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Tue, 31 Aug 2004 00:29:44 +0200 Subject: [Biococoa-dev] Factories In-Reply-To: References: Message-ID: <17E19354-FAD4-11D8-80BF-000393CFDE0C@mekentosj.com> Last mail first... I think 0.2 seconds for 11Kb is more than fine by me. I believe 0.2s is about the criteria for being experienced as instant from a user perspective. For my use this would be very nice.... Well done! Op 30-aug-04 om 20:31 heeft John Timmer het volgende geschreven: > Just checked in the translation code. Since we're not sure what to > return > and aa's representing stops may not exist, I'm returning an NSArray > using > gaps in place of stop codons. > > It did 1 frame of my 11KB test sequence in about 0.2 of a second, > including > intializing the translation dictionary. Don't know if that's good or > not... > > JT > > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From mek at mekentosj.com Mon Aug 30 18:55:47 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Tue, 31 Aug 2004 00:55:47 +0200 Subject: [Biococoa-dev] Factories In-Reply-To: References: Message-ID: > I want to start out by saying that I like the idea of a codon, and I > think > they're a great idea in theory. The issue I have is that I can't > figure out > how to make them work in practice. > > The problem I have is that basically a codon is a cluster of 3 > nucleotides. > Its meaning depends on the genetic code, its derivation depends on the > reading frame, etc. - the codons themselves are essentially devoid of > information unless they're provided with a lot of context. True. > I'm just not > seeing an easy way to provide all of that context within a codon itself > without having way too many codon items to manage, or generating every > single codon uniquely, on the fly. Ok, here's a poor man's overview of what I had in mind. BCCodon{ BCSequenceDNA "ATG" BCAminoAcid "Methionine" } BCSequenceCodon{ NSArray codons -> BCCodon 1, BCCodon 2, etc (Species "Homo Sapiens" / BCAlphabet "Homo Sapiens") (Frame "+1") Methods to convert output a BCSequenceDNA object (iterate over codons and read back sequence from BCCodons). Methods to convert output to BCSequenceProtein object (interate over codons and read aminoacids from BCCodons) The latter might need to ORF finder or needs parameters to define what to do based on stops. Return longest protein, return first protein, return all proteins, return all proteins longer then... etc) } The BCCodons are indeed species specific and instantiate by the AlphabetManager on a per alphabet manager, and are BCSymbol (subclass) singletons in a static dictionary. Most commonly used alphabets are predefined and can be instantiated directly using class methods BCAlphabet{ NSArray codons or dictionary with DNA triplet as key. Species "Homo Sapiens" } I think again much of the code used for Nucleotides and Aminoacids can be used for the BCCodons as well, as they are BCSymbol subclasses. In each alphabet are 64 possible triplets, if encoded in a plist these should be easy to implement in a static dictionary. I most certainly agree that there are problems with this approach as well, some of which you mention below. But what I understood from your methods is that you for instance create translation dictionaries as well... > They also seem a bit wasteful - making > codons would involve composing them from combinations of bases, but > they'd > have to be decomposed into individual bases again to handle translation > easily. Well could be, compositing them from combinations of bases wouldn't be necessary if you just add a BCSequenceDNA object for the triplet. You can then just use the sequence comparison methods from BCSequenceDNA to check for equality. But I agree that this could include decompositing as well, perhaps there's a way to optimize this. > > What I've been thinking of during my commute in was a > BCSequenceTranslation, > which would contain that sort of context - > A reference to the original sequence it was translated from. I don't think that's wise for syncing reasons as you already mentioned. In addition it should be a problem to iterate back over the codons to get your DNA sequence back it's just a matter of adding the triplets to a sequence for every codon (is there a appendSequence method in BCSequenceDNA already?) > A reading frame indication and/or range of translation That could be a variable in BCSequenceCodon > A genetic code reference. Idem > The ability to derive BCSequenceProtein objects from it. See above, similar for DNA sequences, just iterate over the codons and append the aminoacid they represent. I mentioned a few example methods above already. > > This isn't ideal either - the DNA sequence can be edited after it's > created > - so I'm not entirely happy with it. It's just that I'm not happy > with any > other options at this point, either. Right, perhaps we still have to think of a clever way to have some super object that can contain all kinds of info and keep things in sync when you edit one of the subcontents. Ideally in a way that you only update locally the sequence instead of recalculating the whole thing. No clue how to do this however. > > I had a nice weekend, too, so I don't think it's just that I'm > generally > unhappy ;). Nope, I do believe that John, this is quite a complex matter... > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From kvddrift at earthlink.net Mon Aug 30 20:20:24 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 30 Aug 2004 20:20:24 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: <728C5F70-FA4D-11D8-80BF-000393CFDE0C@mekentosj.com> References: <0D3434E4-FA22-11D8-9E22-003065A5FDCC@earthlink.net> <728C5F70-FA4D-11D8-80BF-000393CFDE0C@mekentosj.com> Message-ID: <8D4E3FAA-FAE3-11D8-A120-003065A5FDCC@earthlink.net> On Aug 30, 2004, at 2:25 AM, Alexander Griekspoor wrote: >> What I am trying to get at is to see if it is possible to have a >> separate method to test the entrySymbol and selfSymbol that goes in >> BCNucleotideDNA and BCAminoAcid (or BCSequenceDNA and >> BCSequenceProtein). Then we can keep all the rangeOfSubsequence in >> BCSequence. > Perhaps it's indeed a good plan to let BCSymbol have the > isRepresentedBySymbol method that BCNucleotideDNA overrides to check > for ambiguous bases as well, that way BCSequence could have the > general methods. > Here is my suggestion, but I don't know if this will work. Make the following method in BCSymbol: -(BOOL) isEqualToSymbol : (BCSymbol *)aSymbol { return (self == aSymbol); } Then override this for BCNucleotideDNA to do all the ambiguity testing. Now the method - (NSRange) rangeOfSubsequence: (BCSequence *)entry withinRange: (NSRange)theLimit can be much simplified and only needs to be in BCSequence: - (NSRange) rangeOfSubsequence: (BCSequence *)entry withinRange: (NSRange)theLimit { // do bounds checking if ( theLimit.location + theLimit.length >= [sequenceArray count] ) return NSMakeRange( NSNotFound, 0); // get the region to check NSArray *subSequence = [sequenceArray subarrayWithRange: theLimit]; int loopCounter; BCSymbol *entrySymbol, *selfSymbol; BOOL haveMatch = NO; for ( loopCounter = 0 ; loopCounter < [subSequence count] - [entry length] ; loopCounter++ ) { selfSymbol = [subSequence objectAtIndex: loopCounter]; entrySymbol = [entry symbolAtIndex: 0]; haveMatch = [selfSymbol isEqualToSymbol: entrySymbol]; if ( haveMatch ) return NSMakeRange( loopCounter, [entry length] ); } } // went through the whole sequence without finding anything return NSMakeRange( NSNotFound, 0); } The same can be done for - (NSArray *) rangesOfSubsequence: (BCSequence *)entry There is no need for having the same code in a super and derived class. John or Alex, if you think this will work, can you post the code that should be in isEqualToSymbol for BCNucleotideDNA? thanks, - Koen. From jtimmer at bellatlantic.net Mon Aug 30 21:05:14 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Mon, 30 Aug 2004 21:05:14 -0400 Subject: [Biococoa-dev] mass calculator bug In-Reply-To: <8D4E3FAA-FAE3-11D8-A120-003065A5FDCC@earthlink.net> Message-ID: > > On Aug 30, 2004, at 2:25 AM, Alexander Griekspoor wrote: > >>> What I am trying to get at is to see if it is possible to have a >>> separate method to test the entrySymbol and selfSymbol that goes in >>> BCNucleotideDNA and BCAminoAcid (or BCSequenceDNA and >>> BCSequenceProtein). Then we can keep all the rangeOfSubsequence in >>> BCSequence. >> Perhaps it's indeed a good plan to let BCSymbol have the >> isRepresentedBySymbol method that BCNucleotideDNA overrides to check >> for ambiguous bases as well, that way BCSequence could have the >> general methods. >> Actually, I checked something in that takes full advantage of code in both classes. The problem was that I had _strict versions that checked for equality with ==, and regular methods that worked with ambiguous bases. I eventually realized that the _strict version was a a good general testing method. I took the _strict code, modified it slightly to work with all BCSymbols, and used that for the range methods in the superclass. I overrode the range methods in the DNA subclass to test for ambiguous bases, and I re-implemented the _strict versions in order to have them call up to the super's regular method. I haven't tested the code to make sure it works yet, but it compiles just fine. John _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Tue Aug 31 15:29:29 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 31 Aug 2004 15:29:29 -0400 Subject: [Biococoa-dev] Translation issues In-Reply-To: Message-ID: Okay, I've stopped coding for a little while, and thought about this. I'm not entirely sure I can get things to work, but I think there may be a way to implement things fairly efficiently once I add a couple of methods to BCSequenceDNA. If things work like I think they well, the biggest advantage will probably be that it's easier to write an optimized form of the matching method. I'll try to actually write the code tonight and tomorrow, and I'll run some speed tests when it's done and let you know the results. JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Tue Aug 31 17:24:58 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 31 Aug 2004 17:24:58 -0400 Subject: [Biococoa-dev] Factories In-Reply-To: References: Message-ID: <35D6D2E2-FB94-11D8-A120-003065A5FDCC@earthlink.net> On Aug 30, 2004, at 2:31 PM, John Timmer wrote: > It did 1 frame of my 11KB test sequence in about 0.2 of a second, How did you measure that? - Koen. From kvddrift at earthlink.net Tue Aug 31 17:26:57 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 31 Aug 2004 17:26:57 -0400 Subject: [Biococoa-dev] Translation issues In-Reply-To: References: Message-ID: <7D08B347-FB94-11D8-A120-003065A5FDCC@earthlink.net> John, Have you looked at the bioperl code? They are doing translations, so it might be of help how to approach this. See eg this page: http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/CodonTable.html - Koen. On Aug 31, 2004, at 3:29 PM, John Timmer wrote: > Okay, I've stopped coding for a little while, and thought about this. > I'm > not entirely sure I can get things to work, but I think there may be a > way > to implement things fairly efficiently once I add a couple of methods > to > BCSequenceDNA. If things work like I think they well, the biggest > advantage > will probably be that it's easier to write an optimized form of the > matching > method. > > I'll try to actually write the code tonight and tomorrow, and I'll run > some > speed tests when it's done and let you know the results. > > > JT > > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From jtimmer at bellatlantic.net Tue Aug 31 17:49:08 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 31 Aug 2004 17:49:08 -0400 Subject: [Biococoa-dev] Factories In-Reply-To: <35D6D2E2-FB94-11D8-A120-003065A5FDCC@earthlink.net> Message-ID: > > On Aug 30, 2004, at 2:31 PM, John Timmer wrote: > >> It did 1 frame of my 11KB test sequence in about 0.2 of a second, > > > > How did you measure that? > I should admit that I've tested it a bunch of times now, and the results are very variable depending on the burden on my system. Best results tend to be less than 0.15, but if the machine's busy, it can easily take over 0.3 of a second. Regardless, I use NSDate's as in the following code snippet: NSDate *now = [NSDate date]; if ( [[theInput string] length] == 0) return; BCSequenceDNA *theSequence = [BCSequenceDNA DNASequenceWithString: [theInput string] skippingNonBases: YES]; NSArray *translation = [BCUtilDNATranslator translationOfSequence: theSequence inRange: NSMakeRange( 0, [theSequence length] ) usingDictionary: @"universal genetic code"]; NSLog ( @"%f", [now timeIntervalSinceNow] ); _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Tue Aug 31 18:00:06 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Wed, 1 Sep 2004 00:00:06 +0200 Subject: [Biococoa-dev] Factories In-Reply-To: References: Message-ID: <1E347544-FB99-11D8-80BF-000393CFDE0C@mekentosj.com> I've added that a few hours ago to the demo_app, and send the changed files to Koen to be added to cvs, unfortunately my email is hold of for the moderator's approval as I attached a zip file with the source. Alex Op 31-aug-04 om 23:49 heeft John Timmer het volgende geschreven: >> >> On Aug 30, 2004, at 2:31 PM, John Timmer wrote: >> >>> It did 1 frame of my 11KB test sequence in about 0.2 of a second, >> >> >> >> How did you measure that? >> > > I should admit that I've tested it a bunch of times now, and the > results are > very variable depending on the burden on my system. Best results tend > to be > less than 0.15, but if the machine's busy, it can easily take over 0.3 > of a > second. > > Regardless, I use NSDate's as in the following code snippet: > > NSDate *now = [NSDate date]; > if ( [[theInput string] length] == 0) > return; > > BCSequenceDNA *theSequence = [BCSequenceDNA DNASequenceWithString: > [theInput > string] skippingNonBases: YES]; > > NSArray *translation = [BCUtilDNATranslator translationOfSequence: > theSequence inRange: NSMakeRange( 0, [theSequence length] ) > usingDictionary: > @"universal genetic code"]; > > NSLog ( @"%f", [now timeIntervalSinceNow] ); > > > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From jtimmer at bellatlantic.net Tue Aug 31 18:53:17 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 31 Aug 2004 18:53:17 -0400 Subject: [Biococoa-dev] Factories In-Reply-To: <1E347544-FB99-11D8-80BF-000393CFDE0C@mekentosj.com> Message-ID: > I've added that a few hours ago to the demo_app, and send the changed > files to Koen to be added to cvs, unfortunately my email is hold of for > the moderator's approval as I attached a zip file with the source. > Alex > Okay, I'll commit my current controller.m. Since it's there for testing purposes, I've not been committing changes, since there's no real reason to keep people updating it as I try different things. If I do anything interesting, though, I will commit from now on. You may take it that something new is working if you see a little "C" next to the file! Cheers, JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Tue Aug 31 20:13:40 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 31 Aug 2004 20:13:40 -0400 Subject: [Biococoa-dev] Factories In-Reply-To: References: Message-ID: On Aug 31, 2004, at 6:53 PM, John Timmer wrote: > Okay, I'll commit my current controller.m. I have added Alex's code to John's commit plus some additional goodies. - Koen. From a.griekspoor at nki.nl Tue Aug 31 14:47:33 2004 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Tue, 31 Aug 2004 20:47:33 +0200 Subject: [Biococoa-dev] [BioCocoa] My first (tiny) contribution Message-ID: <3870ED56-FB7E-11D8-80BF-000393CFDE0C@nki.nl> Hi guys, To make things a bit more quantitative I've added a few lines of code to John's demo_app, which displays the time it took to process your code. It also adds a custom field that you can use to log one kind of extra info. In the demo_app it shows the number of input nucleotides... Perhaps you can commit it for me Koen, I know how to commit, but do not have the experience to make sure it ends up in the demo_app folder... Cheers, Alex -------------- next part -------------- A non-text attachment was scrubbed... Name: demo_app.zip Type: application/zip Size: 36872 bytes Desc: not available URL: -------------- next part -------------- ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer *********************************************************