From jtimmer at bellatlantic.net Wed Dec 1 10:53:02 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Wed, 01 Dec 2004 10:53:02 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: Message-ID: >> The question then becomes how to determine which type of sequence to >> return. >> The way I would imagine is to have a flag to determine whether to ask >> for >> user input - this could put up a standard dialog box. If the flag is >> false, >> the factory method could create each type of possible sequence, then >> use the >> sequence counted set to look for undefined symbols. Compare the >> results, >> and take the one with the fewest undefined symbols. In case of a tie, >> default to DNA>RNA>protein. > > Sounds good, this code can also go in the factory class. However, I > don't think we should use a dialog box for the framework. This is the > sole responsibility of the developer who uses BioCocoa. Okay, skip the dialog. I'll try to spare some time this evening to implement this, since it's my idea and seems on the surface to be fairly simple (I'm sure that will be wrong). I guess it would go in the case BCSequence untyped section of the code you showed. > > Could you show a more concrete interface? It's still kinda vague to me > :( Okay, I'll try to throw something together this weekend. Actually implementing this is going to be problematic, since a lot of the internals are going to depend on other parts of the code being implemented. We're going to have to nearly complete the implementation before it actually can be tested. >> The last issue seems to be around the quote from Koen: >>> I agree, but let's then focus on having these one-liners in BCSequence >>> only, not in the subclasses. >> I remember this quote as bothering me when I first read it, because >> there >> are some one liners that clearly belong in a specific sequence >> subclass (ie >> - finding the longest open reading frame should not be available to a >> protein sequence, and finding the hydrophobicity should not be >> available to >> nucleotides or codons). I seem to remember that reading further >> alleviated >> my concerns on this, but I can't remember how. Since Alex and I share >> this >> concern, could you clarify what you meant here, Koen? > > > If we add code to a wrapper that checks if the type of sequence then I > don't see any problem. If the sequence type by accident is the wrong > one (which I really don't think is going to happen), the wrapper should > return nil, or an error, or an NSNotification. Hope that's more clear. You haven't coded for actual users much have you? ;). You'd be amazed at the seemingly impossible situations they generate on a regular basis - anything can and will happen. Anyway, personally, I feel that throwing errors at compile time rather than while a program is running is the better option, but I have no problem being outvoted on this. A compromise would be to have the wrapper accept any sequence, but only add the one liner to the specific sequence class that should call that method. Cheers, JT _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Wed Dec 1 13:41:02 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Wed, 1 Dec 2004 19:41:02 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: Message-ID: <8D19D5E3-43C8-11D9-9436-000D93AE89A4@mekentosj.com> >>> The question then becomes how to determine which type of sequence to >>> return. >>> The way I would imagine is to have a flag to determine whether to ask >>> for >>> user input - this could put up a standard dialog box. If the flag is >>> false, >>> the factory method could create each type of possible sequence, then >>> use the >>> sequence counted set to look for undefined symbols. Compare the >>> results, >>> and take the one with the fewest undefined symbols. In case of a >>> tie, >>> default to DNA>RNA>protein. >> >> Sounds good, this code can also go in the factory class. However, I >> don't think we should use a dialog box for the framework. This is the >> sole responsibility of the developer who uses BioCocoa. > > Okay, skip the dialog. I'll try to spare some time this evening to > implement this, since it's my idea and seems on the surface to be > fairly > simple (I'm sure that will be wrong). I guess it would go in the case > BCSequence untyped section of the code you showed. Ok guys I'll just sit back and relax while you two work this out.... >> If we add code to a wrapper that checks if the type of sequence then I >> don't see any problem. If the sequence type by accident is the wrong >> one (which I really don't think is going to happen), the wrapper >> should >> return nil, or an error, or an NSNotification. Hope that's more clear. > > You haven't coded for actual users much have you? ;). > You'd be amazed at > the seemingly impossible situations they generate on a regular basis - > anything can and will happen. That's quite a bold statement IMHO, in addition if your users do impossible things, rethink your interface I would suggest... Plus, we're not talking about end users here anyway... > > Anyway, personally, I feel that throwing errors at compile time rather > than > while a program is running is the better option My point exactly! Tell the developer he does a stupid thing (or help him to prevent his users do stupid things). > , but I have no problem being > outvoted on this. A compromise would be to have the wrapper accept any > sequence, but only add the one liner to the specific sequence class > that > should call that method. Sorry, but I have no clue what you're discussing here, what is the problem with convenience methods in all (sub)classes? Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From mek at mekentosj.com Wed Dec 1 13:51:22 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Wed, 1 Dec 2004 19:51:22 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: Message-ID: > Actually, what I suggested is to have a factory that handles the > creation of *every* sequence. You feed the factory with a string, > array, etc, and a BCSequenceType and/or BCSymbolSet, and the factory > returns the right BCSequence. I think we should choose here, at the moment we did for BCSequenceType. Or, we adopt the whole BioJava approach with general sequences, "typed" bases on the used BCSymbolSet. But I don't see a real mix. In addition, we had many discussion about the BioJava approach and I still see no clear advantage, even more there are some real problems associated with it. > If the type is not specified, then the guess code comes into action. > The point I am trying to make is that we should always use BCSequence > as a *return type* for the factory as well as within BCSequenceReader. > Otherwise we need a factory class for each BCSequence subclass. > Internally the factory creates the right subclass, and even though the > return type is BCSequence, the actual type will be the created > subclass. That's the nice thing of inheritance! That's a goo idea. > > > So maybe: > > BCSequenceFactory *myFactory = [[BCSequenceFactory] alloc ] init]; Please make that a shared one... > > BCSequence *newSequence = [myFactory createSequenceUsingString: > @"AACCTTGG" usingType: BCDNASequence]; > > -(BCSequence *) createSequenceUsingString: (NSString *) string > usingTyp: (BCSequenceType) type > { > switch (type) > { > case BCDNASequence: > { > return [BCSequenceDNA DNASequenceWithString: string]; > break; > } > > ..... > > and so on. > Ok, I get it, and if you feed it [myFactory createSequenceUsingString: @"AACCTTGG" usingType: BCUnknownSequence]; or something alike, or nil it would call a "determine sequence" method, which guesses the type of sequence and then again calls the method above with the proper type... > Note that in the snippet I am actually using BCSequenceDNA ;-). If you > guys really want it, it's fine with me if we keep those around for > convenience. But I still think that we should put most code in > BCSequence, except maybe for the init methods. And the convenience methods not to forget! >> Ramble #2 is about the sequence wrapper/bundle, and how to implement >> that to >> handle the multiple sequences in an alignment file. I had envisioned >> the >> wrapper as holding features, and a bundle as linking related >> sequences. If >> this is the way we go, we'd have to implement both in order to handle >> this >> circumstance. >> >> A short summary of how I expected a bundle to work - >> Each wrapper would have a unique bundle ID, and a reference to its >> bundle. >> Features within the wrapper, features could include a bundle ID. >> Basically, >> if code wanted to look at a feature, it would check to make sure that >> the >> bundle reference was not nil - if it wasn't, it would take the >> feature's >> bundle ID, and ask the bundle for the sequence corresponding to that >> ID. >> Given that a feature should have an NSRange, this would allow the two >> sequences to be aligned. > > > Could you show a more concrete interface? It's still kinda vague to me > :( Yes, I must copy that, a bundle is more a related to file structures, so perhaps explain it in terms of arrays and dictionaries. Or did you specifically address storage here? Koen referred to the BioPerl docs and what I saw there was something very attractive with respect to features and annotations.... Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From mek at mekentosj.com Wed Dec 1 13:56:10 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Wed, 1 Dec 2004 19:56:10 +0100 Subject: [Biococoa-dev] first non-whitespace character In-Reply-To: <3A4EB628-433B-11D9-AE33-003065A5FDCC@earthlink.net> References: <31244BE5-419A-11D9-9F46-003065A5FDCC@earthlink.net> <5A8A46E8-4208-11D9-A769-000D93AE89A4@mekentosj.com> <3A4EB628-433B-11D9-AE33-003065A5FDCC@earthlink.net> Message-ID: >> [NSCharacterSet alphanumericCharacterSet] as the set > > It took some trial and error, but eventually I came up with > [[NSCharacterSet whitespaceCharacterSet] invertedSet]. I thought of that one as well, but I didn't know if the other one was more cheaper and sufficient already... > So in fact everything that's not whitespace. Another solution of > course could be to use the union of all BCSymbolSets. However, not all > have been filled in so far, so right now I cannot use that. That depends of course if you're working with strings you need the NSCharacterSets, if you are working with BCSymbols, the idea would be to use a BCSymbolSet instead of going the string-way again. Anyway, during file import it's the first I believe from your example... Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From jtimmer at bellatlantic.net Wed Dec 1 13:58:41 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Wed, 01 Dec 2004 13:58:41 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: <8D19D5E3-43C8-11D9-9436-000D93AE89A4@mekentosj.com> Message-ID: >>> If we add code to a wrapper that checks if the type of sequence then I >>> don't see any problem. If the sequence type by accident is the wrong >>> one (which I really don't think is going to happen), the wrapper >>> should >>> return nil, or an error, or an NSNotification. Hope that's more clear. >> >> You haven't coded for actual users much have you? ;). >> You'd be amazed at >> the seemingly impossible situations they generate on a regular basis - >> anything can and will happen. > That's quite a bold statement IMHO, in addition if your users do > impossible things, rethink your interface I would suggest... > Plus, we're not talking about end users here anyway... Just a joke! I thought the wink made that clear. I was just struck by the statement "(which I really don't think is going to happen)" - I've thought that many times and been amazed at how often some of those things happened. Anyway, no offense was meant, and I hope none was taken. JT _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Wed Dec 1 14:00:01 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Wed, 1 Dec 2004 20:00:01 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: Message-ID: <33DAD784-43CB-11D9-9436-000D93AE89A4@mekentosj.com> No problem John... Alex Op 1-dec-04 om 19:58 heeft John Timmer het volgende geschreven: > >>>> If we add code to a wrapper that checks if the type of sequence >>>> then I >>>> don't see any problem. If the sequence type by accident is the wrong >>>> one (which I really don't think is going to happen), the wrapper >>>> should >>>> return nil, or an error, or an NSNotification. Hope that's more >>>> clear. >>> >>> You haven't coded for actual users much have you? ;). >>> You'd be amazed at >>> the seemingly impossible situations they generate on a regular basis >>> - >>> anything can and will happen. >> That's quite a bold statement IMHO, in addition if your users do >> impossible things, rethink your interface I would suggest... >> Plus, we're not talking about end users here anyway... > > Just a joke! I thought the wink made that clear. I was just struck > by the > statement "(which I really don't think is going to happen)" - I've > thought > that many times and been amazed at how often some of those things > happened. > > Anyway, no offense was meant, and I hope none was taken. > > JT > > > _______________________________________________ > This mind intentionally left blank > > > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* From jtimmer at bellatlantic.net Wed Dec 1 14:10:17 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Wed, 01 Dec 2004 14:10:17 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: <8D19D5E3-43C8-11D9-9436-000D93AE89A4@mekentosj.com> Message-ID: >>> If we add code to a wrapper that checks if the type of sequence then I >>> don't see any problem. If the sequence type by accident is the wrong >>> one (which I really don't think is going to happen), the wrapper >>> should >>> return nil, or an error, or an NSNotification. Hope that's more clear. >> >> Anyway, personally, I feel that throwing errors at compile time rather >> than >> while a program is running is the better option > My point exactly! Tell the developer he does a stupid thing (or help > him to prevent his users do stupid things). Okay, since Alex seems to agree to some extent, let me just clarify what I think is the issue: Koen, who doesn't like the idea of sequence subclasses, wants to make all methods accept all sequences, even if they can't make any sense out of it - for example, a nucleotide sequence being sent to a hydrophobicity calculator. Nonsense like this could be handled by throwing an exception or something similar. Alex and I feel that it's better to help the developer recognize potential errors by having type requirements for these methods, where possible. The converse is that I can't see any possible benefit of allowing a situation that can throw an exception when it can be prevented easily. Koen may be able to explain benefits I can't see, since he is advocating this. One compromise would be to have the actual method implementation (which is in a non-sequence class) accept any type of sequence, but place the convenience method that calls it in the specific sequence class(es) that should be used with the method. Again, I'd prefer otherwise, but if I get outvoted on that, please allow me this concession. I hope that's a cogent explanation of things. JT _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Wed Dec 1 19:41:36 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Wed, 01 Dec 2004 19:41:36 -0500 Subject: [Biococoa-dev] BCSequenceFactory In-Reply-To: Message-ID: > Hi, > > I added a new class BCSequenceFactory (in BCTools/BCSequenceTools). For > now it can create DNA, RNA and proteins from a string, but the other > methods should be fairly easy to fill in. I have not yet added the > 'guess-the-type-code". To get an idea how to use it, I have added the > factorycode in the readSwissProt file. > I added the guess the type method implementation. Since guessing the type involved creating the sequence, I changed its return value to the actual sequence, rather than the type. Cheers, JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Wed Dec 1 20:12:25 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 1 Dec 2004 20:12:25 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: Message-ID: <3A06BF2C-43FF-11D9-9428-003065A5FDCC@earthlink.net> On Dec 1, 2004, at 10:53 AM, John Timmer wrote: > > Okay, I'll try to throw something together this weekend. Actually > implementing this is going to be problematic, since a lot of the > internals > are going to depend on other parts of the code being implemented. > We're > going to have to nearly complete the implementation before it actually > can > be tested. Why not describe a class structure on the list first, this way we can all discuss about it. Once it's in CVS it will be more difficult (albeit not impossible) to change stuff. What I mean is a scheme such as: BCSequenceBundle | -> has-a NSArray of BCSequenceWrapper BCSequenceWrapper | -> has-a BCSequence, BCFeatures, etc BCFeatures | -> has-a dictionary of features (d'oh) etc This is just an example, not the way I think it should be, because I have no idea how it should be :) > Anyway, personally, I feel that throwing errors at compile time rather > than > while a program is running is the better option Actually, I think that is a good point. But we have to be aware of the fact that a developer can have all compiler warnings turned off. - Koen. From kvddrift at earthlink.net Wed Dec 1 20:14:14 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 1 Dec 2004 20:14:14 -0500 Subject: [Biococoa-dev] BCSequenceFactory In-Reply-To: References: Message-ID: <7B4D1459-43FF-11D9-9428-003065A5FDCC@earthlink.net> On Dec 1, 2004, at 7:41 PM, John Timmer wrote: > I added the guess the type method implementation. Since guessing the > type > involved creating the sequence, I changed its return value to the > actual > sequence, rather than the type. > Thanks John - I will try to test it later tonight. - Koen. From kvddrift at earthlink.net Wed Dec 1 20:15:36 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 1 Dec 2004 20:15:36 -0500 Subject: [Biococoa-dev] first non-whitespace character In-Reply-To: References: <31244BE5-419A-11D9-9F46-003065A5FDCC@earthlink.net> <5A8A46E8-4208-11D9-A769-000D93AE89A4@mekentosj.com> <3A4EB628-433B-11D9-AE33-003065A5FDCC@earthlink.net> Message-ID: On Dec 1, 2004, at 1:56 PM, Alexander Griekspoor wrote: > That depends of course if you're working with strings you need the > NSCharacterSets, if you are working with BCSymbols, the idea would be > to use a BCSymbolSet instead of going the string-way again We could have BCSymbolSet return an NSCharacterSet in such a case. - Koen. From kvddrift at earthlink.net Wed Dec 1 20:20:28 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 1 Dec 2004 20:20:28 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: Message-ID: <59BCD61E-4400-11D9-9428-003065A5FDCC@earthlink.net> On Dec 1, 2004, at 1:51 PM, Alexander Griekspoor wrote: > I think we should choose here, at the moment we did for > BCSequenceType. Or, we adopt the whole BioJava approach with general > sequences, "typed" bases on the used BCSymbolSet. But I don't see a > real mix. In addition, we had many discussion about the BioJava > approach and I still see no clear advantage, even more there are some > real problems associated with it. > When using the BCSequenceType, we just use plain DNA, RNA, and protein. There is no info about the symbols that belong to it, eg is it 'strict', 'ambiguous', etc. Using symbolsets, we actually define the possible symbols. And we could also differentiate easily between: [BCSequenceDNA dnaSequenceWithString: string skippingNonBases: NO]; and [BCSequenceDNA dnaSequenceWithString: string skippingNonBases: YES]; and even extend the possibilities. - Koen. From kvddrift at earthlink.net Wed Dec 1 22:13:29 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 1 Dec 2004 22:13:29 -0500 Subject: [Biococoa-dev] BCSequenceFactory In-Reply-To: References: Message-ID: <23B4E5BC-4410-11D9-9428-003065A5FDCC@earthlink.net> On Dec 1, 2004, at 7:41 PM, John Timmer wrote: > I added the guess the type method implementation. Since guessing the > type > involved creating the sequence, I changed its return value to the > actual > sequence, rather than the type. > John, Works fine for a protein file, although I had to change the lines if ( aSymbol = nullSymbol ) altCount++; to: if ( aSymbol == nullSymbol ) altCount++; to get it to work ;-) BTW, would it be an idea to use BCSymbolCounter for this? cheers, - Koen. From mek at mekentosj.com Thu Dec 2 01:49:22 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 2 Dec 2004 07:49:22 +0100 Subject: [Biococoa-dev] BCSequenceFactory In-Reply-To: <23B4E5BC-4410-11D9-9428-003065A5FDCC@earthlink.net> References: <23B4E5BC-4410-11D9-9428-003065A5FDCC@earthlink.net> Message-ID: <4C5621BF-442E-11D9-9436-000D93AE89A4@mekentosj.com> Very nice John, also very compact! Although a possibility to use the loopcounter object, I wouldn't do it here as it spreads the code and makes t more complicated to understand. But perhaps if the loopcounter is a shared object and it would thus involve only one line of code (and no alloc inits), it could still me a nice idea. I leave that up to you... John, just from curiosity, how much faster is doing this the core foundation way instead of cocoa. One other remark, I understand the optimization to only create the sequence once, but I think we should in this case not replace the method, but add one. Thus, - (BCSequenceType *) guessSequenceTypeFromString: (NSString *) string; - (BCSequence *) guessSequenceFromString: (NSString *) string; Sometimes, you just want to know the type instead of getting the sequence back. In the documentation we can point the reader to the fact that if he want to have the sequence, there's the other method (and thus prevent him to do double work). Now I'm in the nitpicking mode anyway, the method name guidelines suggest to do something like: sequenceTypeForUntypedString: or sequenceTypeForUnknownString: and sequenceFromUntypedString: or sequenceFromUnknownString: The originals don't suggest that something is returned... cheers, Alex Op 2-dec-04 om 4:13 heeft Koen van der Drift het volgende geschreven: > > On Dec 1, 2004, at 7:41 PM, John Timmer wrote: > >> I added the guess the type method implementation. Since guessing the >> type >> involved creating the sequence, I changed its return value to the >> actual >> sequence, rather than the type. >> > > John, > > Works fine for a protein file, although I had to change the lines > > if ( aSymbol = nullSymbol ) > altCount++; > > to: > > if ( aSymbol == nullSymbol ) > altCount++; > > > to get it to work ;-) > > > > BTW, would it be an idea to use BCSymbolCounter for this? > > > cheers, > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 2866 bytes Desc: not available URL: From mek at mekentosj.com Thu Dec 2 02:02:01 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 2 Dec 2004 08:02:01 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: <3A06BF2C-43FF-11D9-9428-003065A5FDCC@earthlink.net> References: <3A06BF2C-43FF-11D9-9428-003065A5FDCC@earthlink.net> Message-ID: <10FA04AE-4430-11D9-9436-000D93AE89A4@mekentosj.com> > Why not describe a class structure on the list first, this way we can > all discuss about it. Once it's in CVS it will be more difficult > (albeit not > impossible) to change stuff. > > > What I mean is a scheme such as: > > > BCSequenceBundle > | > -> has-a NSArray of BCSequenceWrapper > > > BCSequenceWrapper > | > -> has-a BCSequence, BCFeatures, etc > > > BCFeatures > | > -> has-a dictionary of features (d'oh) > > etc > > > > This is just an example, not the way I think it should be, because I > have no idea how it should be :) The basic setup is something we all agree on I think, it would be nice indeed if all classes had the same BCSequence prefix, although not easy ;-) I like BCSequenceBundle, but I'm not sure if wrapper is appropriate. I know we consider it a wrapper object, but I would like to keep it closer to sequences, I thought of BCAnnotatedSequence, but it would loose the common prefix. Still, better than BCSequenceWrapper. Anyway, the (semi-)native englishmen on this list can jump in here ;-). Wait, would it be an idea to make it: BCAnnotatedSequenceBundle - a bundle of BCAnnotatedSequences (in an array) BCAnnotatedSequence - contains a dictionary of BCFeatures (so no BCFeatures object, because that has no advantage over a dictionary), a BCSequence, a dictionary of BCAnnotation BCFeature - a feature BCAnnotation - a non-feature annotation (notes, authors, links, etc etc) These two resemble BioPerls setup > >> Anyway, personally, I feel that throwing errors at compile time >> rather than >> while a program is running is the better option > > Actually, I think that is a good point. But we have to be aware of the > fact that a developer can have all compiler warnings turned off. Then it's a stupid developer IMHO.... Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* From mek at mekentosj.com Thu Dec 2 02:03:31 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 2 Dec 2004 08:03:31 +0100 Subject: [Biococoa-dev] first non-whitespace character In-Reply-To: References: <31244BE5-419A-11D9-9F46-003065A5FDCC@earthlink.net> <5A8A46E8-4208-11D9-A769-000D93AE89A4@mekentosj.com> <3A4EB628-433B-11D9-AE33-003065A5FDCC@earthlink.net> Message-ID: <467D2BFC-4430-11D9-9436-000D93AE89A4@mekentosj.com> > On Dec 1, 2004, at 1:56 PM, Alexander Griekspoor wrote: > >> That depends of course if you're working with strings you need the >> NSCharacterSets, if you are working with BCSymbols, the idea would be >> to use a BCSymbolSet instead of going the string-way again > > > We could have BCSymbolSet return an NSCharacterSet in such a case. This would be in general a great addition Koen, to have a -characterSetRepresentation method which would return the NSCharacterSet equivalent of the symbol set... Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From mek at mekentosj.com Thu Dec 2 02:16:57 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 2 Dec 2004 08:16:57 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: <59BCD61E-4400-11D9-9428-003065A5FDCC@earthlink.net> References: <59BCD61E-4400-11D9-9428-003065A5FDCC@earthlink.net> Message-ID: <26C7992D-4432-11D9-9436-000D93AE89A4@mekentosj.com> >> I think we should choose here, at the moment we did for >> BCSequenceType. Or, we adopt the whole BioJava approach with general >> sequences, "typed" bases on the used BCSymbolSet. But I don't see a >> real mix. In addition, we had many discussion about the BioJava >> approach and I still see no clear advantage, even more there are some >> real problems associated with it. >> > > When using the BCSequenceType, we just use plain DNA, RNA, and > protein. There is no info about the symbols that belong to it, eg is > it 'strict', 'ambiguous', etc. Using symbolsets, we actually define > the possible symbols. And we could also differentiate easily between: > > [BCSequenceDNA dnaSequenceWithString: string skippingNonBases: NO]; > > and > > [BCSequenceDNA dnaSequenceWithString: string skippingNonBases: YES]; > > > and even extend the possibilities. That makes sense, indeed. Also in the current setup we can have three methods (or even extend it more): [BCSequenceDNA dnaSequenceWithString: string]; [BCSequenceDNA dnaSequenceWithString: string skippingNonBases: YES]; [BCSequenceDNA dnaSequenceWithString: string skippingNonBases: YES usingSymbolSet: [BCSymbolSet dnaStrictSymbolSet]]; The first two are convenience methods for the last "mother method". One point Koen makes is that the symbolset preserves whether the symbolset used is strict or not, but we could do this as well by adding the BCDNAStrictSequence, BCRNAStrictSequence, BCProteinStrictSequence types, which I think is very nice anyway. We could implement this even in our sequence-guessing methods. The great advantage on having typed sequences as subclasses is that it isn't complex to implement simple DNA- or protein-only methods in one-liners/convenience methods (requiring shared objects, limiting the number of tool objects etc). Now I understand why you want to limit those to the superclass only, that would fit more in this concept. But again, let's choose for typed subclasses OR symbolset typed sequences. And if we do, make the greatest use of them in allowing them to differentiate in the subclass stage (hey, why would we do it anyway if we keep them similar). Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 2945 bytes Desc: not available URL: From mek at mekentosj.com Thu Dec 2 02:21:24 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 2 Dec 2004 08:21:24 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: Message-ID: Very nice summary John, this indeed the state of the union ;-) > Koen, who doesn't like the idea of sequence subclasses, wants to make > all > methods accept all sequences, even if they can't make any sense out of > it - > for example, a nucleotide sequence being sent to a hydrophobicity > calculator. Nonsense like this could be handled by throwing an > exception or > something similar. Again, this is the number one reason why I 'till now chose for typed subclasses, and still thinks it's a better option. > > Alex and I feel that it's better to help the developer recognize > potential > errors by having type requirements for these methods, where possible. > The > converse is that I can't see any possible benefit of allowing a > situation > that can throw an exception when it can be prevented easily. Koen may > be > able to explain benefits I can't see, since he is advocating this. He mentioned one, but it doesn't do it for me yet. > > One compromise would be to have the actual method implementation > (which is > in a non-sequence class) accept any type of sequence, but place the > convenience method that calls it in the specific sequence class(es) > that > should be used with the method. Again, I'd prefer otherwise, but if I > get > outvoted on that, please allow me this concession. It's better to choose, if we go for subclasses there's no advantage to do this, it even removes one as it lifts the type requirements! Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From jtimmer at bellatlantic.net Thu Dec 2 10:46:10 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 02 Dec 2004 10:46:10 -0500 Subject: [Biococoa-dev] BCSequenceFactory In-Reply-To: <4C5621BF-442E-11D9-9436-000D93AE89A4@mekentosj.com> Message-ID: >> > Works fine for a protein file, although I had to change the lines > > if ( aSymbol = nullSymbol ) > altCount++; > > to: > > if ( aSymbol == nullSymbol ) > altCount++; > > > to get it to work ;-) > Sorry about that ? it?s probably my most common stupid mistake. > > > BTW, would it be an idea to use BCSymbolCounter for this? > Yeah, but I had forgotten you moved it out of the BCSequence class. I couldn?t find it right away (even though it?s in the same directory ? duh!), so I just copied and pasted some code from elsewhere. This code would be faster anyway, so I don?t see the need to change it to the SymbolCounter ? opening files should be as quick as possible. And from Alex: > > Very nice John, also very compact! Although a possibility to use the > loopcounter object, I wouldn't do it here as it spreads the code and makes t > more complicated to understand. But perhaps if the loopcounter is a shared > object and it would thus involve only one line of code (and no alloc inits), > it could still me a nice idea. I leave that up to you... John, just from > curiosity, how much faster is doing this the core foundation way instead of > cocoa. Okay, loopCounter isn?t an object. It?s basically an integer type that will change according to however Apple defines an integer ? your code will compile appropriately even if Apple decides to define integers as 128-bit vectors at some point in the future. The speedups I was seeing in this code compared to object enumerators was somewhere around 2-3X, depending on the context. Given that we have to loop through 3 sequences, this will probably be significant. > > > One other remark, I understand the optimization to only create the sequence > once, but I think we should in this case not replace the method, but add one. > Thus, > - (BCSequenceType *) guessSequenceTypeFromString: (NSString *) string; > - (BCSequence *) guessSequenceFromString: (NSString *) string; > Sometimes, you just want to know the type instead of getting the sequence > back. In the documentation we can point the reader to the fact that if he want > to have the sequence, there's the other method (and thus prevent him to do > double work). Fair enough ? the type method can just call the sequence method, and return whatever type of sequence it is. It?s a nice one-liner. > > > Now I'm in the nitpicking mode anyway, the method name guidelines suggest to > do something like: > > sequenceTypeForUntypedString: or sequenceTypeForUnknownString: > and > sequenceFromUntypedString: or sequenceFromUnknownString: > > The originals don't suggest that something is returned... > If neither of you does this before tonight, I?ll check that in. > Cheers, JT > _______________________________________________ This mind intentionally left blank -------------- next part -------------- An HTML attachment was scrubbed... URL: From mek at mekentosj.com Thu Dec 2 14:40:03 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 2 Dec 2004 20:40:03 +0100 Subject: [Biococoa-dev] BCSequenceFactory In-Reply-To: References: Message-ID: > And from Alex: > > > Very nice John, also very compact! Although a possibility to use the > loopcounter object, I wouldn't do it here as it spreads the code and > makes t more complicated to understand. But perhaps if the loopcounter > is a shared object and it would thus involve only one line of code > (and no alloc inits), it could still me a nice idea. I leave that up > to you... John, just from curiosity, how much faster is doing this the > core foundation way instead of cocoa. > > Okay, loopCounter isn?t an object. ?It?s basically an integer type > that will change according to however Apple defines an integer ? your > code will compile appropriately even if Apple decides to define > integers as 128-bit vectors at some point in the future. ?The speedups > I was seeing in this code compared to object enumerators was somewhere > around 2-3X, depending on the context. ?Given that we have to loop > through 3 sequences, this will probably be significant. > Oh, sorry, I meant Koen's suggestion to use the BCSymbolCounter not the CFIndex loopcounter. The last line (just from curiosity...) is a non-related question about the CF usage. I fully understand that an enumerator is slower than looping through an array by hand, my question is however would: CFIndex loopCounter; for ( loopCounter = 0 ; loopCounter < aLimit ; loopCounter++ ) { aSymbol = (BCSymbol *)CFArrayGetValueAtIndex( (CFArrayRef) theContents, loopCounter); if ( aSymbol == nullSymbol ) altCount++; } be really faster than the cocoa equivalent: int loopCounter; for ( loopCounter = 0 ; loopCounter < aLimit ; loopCounter++ ) { aSymbol = [theContents objectAtIndex: loopCounter]; if ( aSymbol == nullSymbol ) altCount++; } I have my doubts.... (and it definitely looks better!) Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 2998 bytes Desc: not available URL: From jtimmer at bellatlantic.net Thu Dec 2 14:58:30 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 02 Dec 2004 14:58:30 -0500 Subject: [Biococoa-dev] BCSequenceFactory In-Reply-To: Message-ID: > > The last line (just from curiosity...) is a non-related question about the CF > usage. I fully understand that an enumerator is slower than looping through an > array by hand, my question is however would: > CFIndex loopCounter; > for ( loopCounter = 0 ; loopCounter < aLimit ; loopCounter++ ) { > aSymbol = (BCSymbol *)CFArrayGetValueAtIndex( (CFArrayRef) > theContents, loopCounter); > if ( aSymbol == nullSymbol ) > altCount++; > } > > be really faster than the cocoa equivalent: > int loopCounter; > for ( loopCounter = 0 ; loopCounter < aLimit ; loopCounter++ ) { > aSymbol = [theContents objectAtIndex: loopCounter]; > if ( aSymbol == nullSymbol ) > altCount++; > } > > I have my doubts.... (and it definitely looks better!) > Cheers, > Alex When I did code profiling, the number of ObjC runtime calls went down, without a corresponding increase in CoreFoundation. My assumption is that this would lower the total time a bit, but I?ve not gone back and confirmed that. This looks like a great chance to test that, because I agree that the readability of the later is much better. Maybe I?ll have info on that by tomorrow.... JT _______________________________________________ This mind intentionally left blank -------------- next part -------------- An HTML attachment was scrubbed... URL: From kvddrift at earthlink.net Thu Dec 2 17:35:14 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 2 Dec 2004 17:35:14 -0500 Subject: [Biococoa-dev] BCSequenceFactory In-Reply-To: References: Message-ID: <6EFAA366-44B2-11D9-9428-003065A5FDCC@earthlink.net> On Dec 2, 2004, at 2:58 PM, John Timmer wrote: > When I did code profiling, the number of ObjC runtime calls went > down, without a corresponding increase in CoreFoundation. ?My > assumption is that this would lower the total time a bit, but I?ve not > gone back and confirmed that. ?This looks like a great chance to test > that, because I agree that the readability of the later is much > better. > We can always make a oneliner: aSymbol = [theSequence symbolAtIndex: loopCounter]; that calls a convenience method: (BCSymbol *)CFArrayGetValueAtIndex( (CFArrayRef) theContents, loopCounter) I guess this can go into BCSequence. - Koen. From mek at mekentosj.com Thu Dec 2 17:42:03 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 2 Dec 2004 23:42:03 +0100 Subject: [Biococoa-dev] BCSequenceFactory In-Reply-To: <6EFAA366-44B2-11D9-9428-003065A5FDCC@earthlink.net> References: <6EFAA366-44B2-11D9-9428-003065A5FDCC@earthlink.net> Message-ID: <62D9C207-44B3-11D9-9436-000D93AE89A4@mekentosj.com> Yes, very nice! (that is, only if the corefoundation way of doing things offers a real speed benefit in this case)... Alex Op 2-dec-04 om 23:35 heeft Koen van der Drift het volgende geschreven: > > On Dec 2, 2004, at 2:58 PM, John Timmer wrote: > >> When I did code profiling, the number of ObjC runtime calls went >> down, without a corresponding increase in CoreFoundation. ?My >> assumption is that this would lower the total time a bit, but I?ve >> not gone back and confirmed that. ?This looks like a great chance to >> test that, because I agree that the readability of the later is much >> better. >> > > We can always make a oneliner: > > aSymbol = [theSequence symbolAtIndex: loopCounter]; > > that calls a convenience method: > > > (BCSymbol *)CFArrayGetValueAtIndex( (CFArrayRef) theContents, > loopCounter) > > > I guess this can go into BCSequence. > > > - Koen. > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* From kvddrift at earthlink.net Thu Dec 2 17:56:26 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 2 Dec 2004 17:56:26 -0500 Subject: [Biococoa-dev] bug alert Message-ID: <654E7508-44B5-11D9-9428-003065A5FDCC@earthlink.net> Hi, After all the recent additions, the translation demo (just updated in cvs) crashes after clicking the process button. Haven't had time yet to debug it, so if you feel like figuring it out, please go ahead. I'll have more time later tonight to see if I can fix it. - Koen. From kvddrift at earthlink.net Thu Dec 2 17:57:15 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 2 Dec 2004 17:57:15 -0500 Subject: [Biococoa-dev] BCSequenceFactory In-Reply-To: <62D9C207-44B3-11D9-9436-000D93AE89A4@mekentosj.com> References: <6EFAA366-44B2-11D9-9428-003065A5FDCC@earthlink.net> <62D9C207-44B3-11D9-9436-000D93AE89A4@mekentosj.com> Message-ID: <82AC0848-44B5-11D9-9428-003065A5FDCC@earthlink.net> On Dec 2, 2004, at 5:42 PM, Alexander Griekspoor wrote: > Yes, very nice! (that is, only if the corefoundation way of doing > things offers a real speed benefit in this case)... > Well, there is noharm is adding it. Just don't use it it's not faster ;-) - Koen. From jtimmer at bellatlantic.net Thu Dec 2 17:59:13 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 02 Dec 2004 17:59:13 -0500 Subject: [Biococoa-dev] BCSequenceFactory In-Reply-To: <62D9C207-44B3-11D9-9436-000D93AE89A4@mekentosj.com> Message-ID: >>> When I did code profiling, the number of ObjC runtime calls went >>> down, without a corresponding increase in CoreFoundation. ?My >>> assumption is that this would lower the total time a bit, but I?ve >>> not gone back and confirmed that. ?This looks like a great chance to >>> test that, because I agree that the readability of the later is much >>> better. >>> >> >> We can always make a oneliner: >> >> aSymbol = [theSequence symbolAtIndex: loopCounter]; >> >> that calls a convenience method: >> >> >> (BCSymbol *)CFArrayGetValueAtIndex( (CFArrayRef) theContents, >> loopCounter) >> >> >> I guess this can go into BCSequence. Yes, but then you get back into calling through the ObjC runtime, which is where things bogged down. Since you'd go through BCSequence, then into NSArray, it would be even worse than just calling it on the array. Not that the method wouldn't be a bad one, just that it would have a negative speed impact. JT _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Thu Dec 2 18:24:26 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 3 Dec 2004 00:24:26 +0100 Subject: [Biococoa-dev] BCSequenceFactory In-Reply-To: <82AC0848-44B5-11D9-9428-003065A5FDCC@earthlink.net> References: <6EFAA366-44B2-11D9-9428-003065A5FDCC@earthlink.net> <62D9C207-44B3-11D9-9436-000D93AE89A4@mekentosj.com> <82AC0848-44B5-11D9-9428-003065A5FDCC@earthlink.net> Message-ID: <4EF124AA-44B9-11D9-9436-000D93AE89A4@mekentosj.com> Op 2-dec-04 om 23:57 heeft Koen van der Drift het volgende geschreven: > > On Dec 2, 2004, at 5:42 PM, Alexander Griekspoor wrote: > >> Yes, very nice! (that is, only if the corefoundation way of doing >> things offers a real speed benefit in this case)... >> > > > > Well, there is noharm is adding it. Just don't use it it's not faster > ;-) > Oh sorry Koen, I didn't mean the method itself, that is fine, but I meant the use of: (BCSymbol *)CFArrayGetValueAtIndex( (CFArrayRef) theContents, loopCounter) instead of: [theContents objectAtIndex: loopCounter]; But I guess John will profile the difference so that we know... Alex (ps. an additional convenience objective c method will always make it slower of course, but the presence of such methods is nice and convenient). ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* From kvddrift at earthlink.net Thu Dec 2 20:43:24 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 2 Dec 2004 20:43:24 -0500 Subject: [Biococoa-dev] bug alert In-Reply-To: <654E7508-44B5-11D9-9428-003065A5FDCC@earthlink.net> References: <654E7508-44B5-11D9-9428-003065A5FDCC@earthlink.net> Message-ID: On Dec 2, 2004, at 5:56 PM, Koen van der Drift wrote: > After all the recent additions, the translation demo (just updated in > cvs) crashes after clicking the process button. Haven't had time yet > to debug it, so if you feel like figuring it out, please go ahead. > I'll have more time later tonight to see if I can fix it. > I think I found it. The console was showing hundreds of these lines after the crash: [BCCodonRNA release] so I commented out these lines in BCCodonRNA and BCCodonDNA: - (void) dealloc { [super release]; } Actually, they are not needed at all, because they don't add any code. No more crashes after that. I have a few questions, though: 1. is there any reason why this code should be in there? If there is, the error might be somewhere else, and my fix was not in the right place. 2, I also find these lines strange: - (BCCodon *) init { return nil; } any reason for that? 3. Why didn't this happen before? The translation demo has been used for a while now. cheers, - Koen. From kvddrift at earthlink.net Thu Dec 2 20:56:17 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 2 Dec 2004 20:56:17 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: Message-ID: <8596C7F5-44CE-11D9-9428-003065A5FDCC@earthlink.net> On Dec 2, 2004, at 2:21 AM, Alexander Griekspoor wrote: > Very nice summary John, this indeed the state of the union ;-) > >> Koen, who doesn't like the idea of sequence subclasses, wants to make >> all >> methods accept all sequences, even if they can't make any sense out >> of it - >> for example, a nucleotide sequence being sent to a hydrophobicity >> calculator. Nonsense like this could be handled by throwing an >> exception or >> something similar. > > Again, this is the number one reason why I 'till now chose for typed > subclasses, and still thinks it's a better option. >> >> Alex and I feel that it's better to help the developer recognize >> potential >> errors by having type requirements for these methods, where possible. >> The >> converse is that I can't see any possible benefit of allowing a >> situation >> that can throw an exception when it can be prevented easily. Koen >> may be >> able to explain benefits I can't see, since he is advocating this. > > He mentioned one, but it doesn't do it for me yet. >> >> One compromise would be to have the actual method implementation >> (which is >> in a non-sequence class) accept any type of sequence, but place the >> convenience method that calls it in the specific sequence class(es) >> that >> should be used with the method. Again, I'd prefer otherwise, but if >> I get >> outvoted on that, please allow me this concession. > > It's better to choose, if we go for subclasses there's no advantage to > do this, it even removes one as it lifts the type requirements! > Well let me add some points here. Although I never liked the idea of subclassing BCSequence, I think you guys are right that if we use one-liners it is better to call them from the appropriate subclass. But I still like the idea of have the wrapper test the sequence type first before continuing. It might be nonsense to do that, but the result won't be - because there are no results. Just returning nil will be sufficient, no need to start throwing exceptions around ;) My main reason for being against the subclassing was because at the time that I brought it up there was so much code-duplication, that I was wondering if there was a way to make it easier to maintain and understand. We didn't talk much about one-liners and convenience methods then, which is indeed a reasonable argument pro subclassing, However, I don't like the idea that was suggested in another recent mail, to also make subclasses for DNAStrict, proteinstrict, etc. These are only variations on the existing three subclasses, and should be differentiated from each other by their symbolset and/or sequencetype. cheers, - Koen. From kvddrift at earthlink.net Thu Dec 2 21:05:08 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 2 Dec 2004 21:05:08 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: <10FA04AE-4430-11D9-9436-000D93AE89A4@mekentosj.com> References: <3A06BF2C-43FF-11D9-9428-003065A5FDCC@earthlink.net> <10FA04AE-4430-11D9-9436-000D93AE89A4@mekentosj.com> Message-ID: On Dec 2, 2004, at 2:02 AM, Alexander Griekspoor wrote: > > The basic setup is something we all agree on I think, it would be nice > indeed if all classes had the same BCSequence prefix, although not > easy ;-) I like BCSequenceBundle, but I'm not sure if wrapper is > appropriate. I know we consider it a wrapper object, but I would like > to keep it closer to sequences, I thought of BCAnnotatedSequence, but > it would loose the common prefix. Still, better than > BCSequenceWrapper. Anyway, the (semi-)native englishmen on this list > can jump in here ;-). Hey, I'm semi-native american! > Wait, would it be an idea to make it: > BCAnnotatedSequenceBundle - a bundle of BCAnnotatedSequences (in an > array) > > BCAnnotatedSequence - contains a dictionary of BCFeatures (so no > BCFeatures object, because that has no advantage over a dictionary), a > BCSequence, a dictionary of BCAnnotation I like the BCAnnotatedSequence name. Although it kinda leaves out the features. BTW, I'm still confused about what is the difference between a feature and an annotation. I think we should define very well what we want to store in either one. And if the BCFeature and BCAnnotation aren't objects, what do you intend them to be? Should we also make a special class BCAnnotatedSequenceReader that takes care of extracting as much as info as possible from a file? Or should we do everything with BCSequenceReader.? >> Actually, I think that is a good point. But we have to be aware of >> the fact that a developer can have all compiler warnings turned off. > Then it's a stupid developer IMHO.... Yep - and we all know there's plenty of them ;-) cheers, - Koen. From jtimmer at bellatlantic.net Thu Dec 2 21:52:00 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 02 Dec 2004 21:52:00 -0500 Subject: [Biococoa-dev] bug alert In-Reply-To: <654E7508-44B5-11D9-9428-003065A5FDCC@earthlink.net> Message-ID: Okay, I'm trying to compile things so that I can do profiling using the translation demo. The framework compiles without any errors, but when I try to actually use it in the demo app, it errors out, saying the BCSequenceFactory is having a problem with the line: #import "../../BCFoundationDefines.h" Clearly, this works for the others files that include the defines using a similar path, so I have no idea what to make of this. Incidentally, I made the SequenceFactory's header public within the framework, and added: #import I put it under BCTools to BCFoundation.h. Any and all suggestions welcome, though the profiling may be a bit delayed. JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Thu Dec 2 22:01:10 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 2 Dec 2004 22:01:10 -0500 Subject: [Biococoa-dev] bug alert In-Reply-To: References: Message-ID: <95FE12FE-44D7-11D9-9428-003065A5FDCC@earthlink.net> On Dec 2, 2004, at 9:52 PM, John Timmer wrote: > Okay, I'm trying to compile things so that I can do profiling using the > translation demo. The framework compiles without any errors, but when > I try > to actually use it in the demo app, it errors out, saying the > BCSequenceFactory is having a problem with the line: > > #import "../../BCFoundationDefines.h" > > Clearly, this works for the others files that include the defines > using a > similar path, so I have no idea what to make of this. > > Incidentally, I made the SequenceFactory's header public within the > framework, and added: > > #import > > I put it under BCTools to BCFoundation.h. Any and all suggestions > welcome, > though the profiling may be a bit delayed. > John, I had the exact same problem yesterday. Did a clean target and rebuild, but still the same problem. Then I quit Xcode, and reopened the project, and it worked.... Very weird! I still don't know exactly what to do when I add a new file. I add and commit the files to cvs using Xcode, but I also have to make the project aware of this. What I do now is use the terminal and update the project.pbxproj file. Is that enough? BTW I also have these in my project: drwxr-xr-x 12 koen koen 408 2 Dec 21:51 . -rwxr-xr-x 1 koen koen 51861 26 Sep 17:23 .#project.pbxproj.1.26 drwxr-xr-x 16 koen koen 544 24 Nov 23:43 .. -rw-r--r-- 1 koen koen 6148 11 Nov 15:31 .DS_Store drwxr-xr-x 5 koen koen 170 1 Dec 21:13 CVS -rw-r--r-- 1 koen koen 153334 21 Sep 20:05 drjay.pbxuser -rw-r--r-- 1 koen koen 43071 12 Sep 10:20 griek.mode1 -rw-r--r-- 1 koen koen 95110 26 Sep 04:47 griek.pbxuser -rw-r--r-- 1 koen koen 42575 2 Dec 21:51 koen.mode1 -rw-r--r-- 1 koen koen 95142 2 Dec 21:51 koen.pbxuser -rwxr-xr-x 1 koen koen 26879 18 May 2004 mac.pbxuser -rwxr-xr-x 1 koen koen 59177 2 Dec 21:51 project.pbxproj Should I have all these? - Koen. From jtimmer at bellatlantic.net Thu Dec 2 23:06:22 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 02 Dec 2004 23:06:22 -0500 Subject: [Biococoa-dev] Profiling results In-Reply-To: <95FE12FE-44D7-11D9-9428-003065A5FDCC@earthlink.net> Message-ID: > I had the exact same problem yesterday. Did a clean target and rebuild, > but still the same problem. Then I quit Xcode, and reopened the > project, and it worked.... Very weird! > That did the trick, thanks - should have thought if it myself. Anyway, with the original code and an 11kb sequence, the following loop: NSAutoreleasePool *thePool; for ( loopCounter = 0 ; loopCounter < 100 ; loopCounter++ ) { thePool = [[NSAutoreleasePool alloc] init]; theSequence = [theFactory createSequenceWithString: inputString usingType: BCOtherSequence]; [thePool release]; } Clocked in with the following with three repeats 2004-12-02 22:50:20.552 Translation[14076] time after translation: -8.624725 2004-12-02 22:50:30.823 Translation[14076] time after translation: -8.609338 2004-12-02 22:50:41.563 Translation[14076] time after translation: -8.681358 Changing the part of BCSequenceFactory in question as follows: //aSymbol = (BCSymbol *)CFArrayGetValueAtIndex( (CFArrayRef) theContents, loopCounter); aSymbol = [theContents objectAtIndex: loopCounter]; Clocked in as: 2004-12-02 22:55:25.098 Translation[14553] time after translation: -9.005473 2004-12-02 22:55:34.217 Translation[14553] time after translation: -9.083258 2004-12-02 22:55:46.097 Translation[14553] time after translation: -9.020712 So, it's slower, but not much slower. In contrast, it is MUCH MUCH easier to read. So, in the future, I think it's safe to skip this unless you're doing multiple object lookups within a single loop. I do, however, now have a DP machine at work, so I'll try it there and see if threading is an issue for either of them (my bet would be no). > > drwxr-xr-x 12 koen koen 408 2 Dec 21:51 . > -rwxr-xr-x 1 koen koen 51861 26 Sep 17:23 .#project.pbxproj.1.26 > drwxr-xr-x 16 koen koen 544 24 Nov 23:43 .. > -rw-r--r-- 1 koen koen 6148 11 Nov 15:31 .DS_Store > drwxr-xr-x 5 koen koen 170 1 Dec 21:13 CVS > -rw-r--r-- 1 koen koen 153334 21 Sep 20:05 drjay.pbxuser > -rw-r--r-- 1 koen koen 43071 12 Sep 10:20 griek.mode1 > -rw-r--r-- 1 koen koen 95110 26 Sep 04:47 griek.pbxuser > -rw-r--r-- 1 koen koen 42575 2 Dec 21:51 koen.mode1 > -rw-r--r-- 1 koen koen 95142 2 Dec 21:51 koen.pbxuser > -rwxr-xr-x 1 koen koen 26879 18 May 2004 mac.pbxuser > -rwxr-xr-x 1 koen koen 59177 2 Dec 21:51 project.pbxproj Yeah, that's a bunch of our settings - build styles, window locations, etc. If you're curious as to what it looks like, simply control-click on it and view it as a text file. Cheers, JT _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Fri Dec 3 04:21:28 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 3 Dec 2004 10:21:28 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: <3A06BF2C-43FF-11D9-9428-003065A5FDCC@earthlink.net> <10FA04AE-4430-11D9-9436-000D93AE89A4@mekentosj.com> Message-ID: >> The basic setup is something we all agree on I think, it would be >> nice indeed if all classes had the same BCSequence prefix, although >> not easy ;-) I like BCSequenceBundle, but I'm not sure if wrapper is >> appropriate. I know we consider it a wrapper object, but I would like >> to keep it closer to sequences, I thought of BCAnnotatedSequence, but >> it would loose the common prefix. Still, better than >> BCSequenceWrapper. Anyway, the (semi-)native englishmen on this list >> can jump in here ;-). > > Hey, I'm semi-native american! Guess who I had in mind when adding the "semi" part ;-) A. ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* From mek at mekentosj.com Fri Dec 3 11:47:33 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 3 Dec 2004 17:47:33 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: <8596C7F5-44CE-11D9-9428-003065A5FDCC@earthlink.net> References: <8596C7F5-44CE-11D9-9428-003065A5FDCC@earthlink.net> Message-ID: <079FE99C-454B-11D9-A550-000D93AE89A4@mekentosj.com> > Well let me add some points here. Although I never liked the idea of > subclassing BCSequence, I think you guys are right that if we use > one-liners it is better to call them from the appropriate subclass. > But I still like the idea of have the wrapper test the sequence type > first before continuing. It might be nonsense to do that, but the > result won't be - because there are no results. Just returning nil > will be sufficient, no need to start throwing exceptions around ;) True, the question is how we organize the wrapper (you mean the BCAnnotatedSequence right, or whatever we decide to name it). There are basically two choices either let the developer separate the sequence from the annotations/features part and do all manipulations purely on the BCSequence, or make all methods accept besides BCSequences also the BCAnnotatedSequences. This latter his some clear advantages (such as the possibility to take features into account while calculating the MW for example), but it also has some clear problems. One thing it would mean is that we are almost forced to have also three types of BCAnnotatedSequence subclasses around (Koen might remark the benefit of a single BCSequence class here probably). In general the first option is what we want I think, BUT I haven't thought of a way to do the features that are connected to ranges within the sequence. Notes and annotations like creator, date etc are easy, they don't change (and are what I would call a BCAnnotation). Features (BCFeature objects) are much more of a problem, they are coupled to sequence ranges (i.e. a helix from aminoacid 10 to 15), and should be kept in sync while editing the sequence. The big problem here is, what architecture would be the smartest way of doing this. Any suggestions? > > My main reason for being against the subclassing was because at the > time that I brought it up there was so much code-duplication, that I > was wondering if there was a way to make it easier to maintain and > understand. We didn't talk much about one-liners and convenience > methods then, which is indeed a reasonable argument pro subclassing, > However, I don't like the idea that was suggested in another recent > mail, to also make subclasses for DNAStrict, proteinstrict, etc. I copy that, definitely not, but the general BCSequence class could have a simple strict boolean that can be set. Also, we can introduce the strict BCSequenceTypes for passing as arguments... > These are only variations on the existing three subclasses, and should > be differentiated from each other by their symbolset and/or > sequencetype. Yep Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* From mek at mekentosj.com Fri Dec 3 12:03:08 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 3 Dec 2004 18:03:08 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: <3A06BF2C-43FF-11D9-9428-003065A5FDCC@earthlink.net> <10FA04AE-4430-11D9-9436-000D93AE89A4@mekentosj.com> Message-ID: <348AEFF8-454D-11D9-A550-000D93AE89A4@mekentosj.com> > >> Wait, would it be an idea to make it: > >> BCAnnotatedSequenceBundle - a bundle of BCAnnotatedSequences (in an >> array) >> >> BCAnnotatedSequence - contains a dictionary of BCFeatures (so no >> BCFeatures object, because that has no advantage over a dictionary), >> a BCSequence, a dictionary of BCAnnotation > > I like the BCAnnotatedSequence name. Although it kinda leaves out the > features. True, but in some way also in that case I like to think as the sequence being annotated... > > BTW, I'm still confused about what is the difference between a feature > and an annotation. I think we should define very well what we want to > store in either one. And if the BCFeature and BCAnnotation aren't > objects, what do you intend them to be? Oh sorry, in the previous email you wrote: > > BCSequenceWrapper > | > -> has-a BCSequence, BCFeatures, etc > > > BCFeatures > | > -> has-a dictionary of features (d'oh) ..so I thought you wanted to create a BCFeatureS class, which would have a dictionary of features (BCFeature?). But I guess you meant the same thing as I do: A BCSequenceWrapper (object) has a dictionary containing multiple BCFeature objects. As for the distinction between features and annotations: it's pretty simple, features are linked to certain symbol ranges (phoshorylation at 12, helix from 15-18), annotations are like metadata, author = koen, link to article = ... , creation date.. etc I think that both should be tagged by string identifiers (like BCFeatureTypeAlphaHelix and BCAnnotationTypeAuthor), and we can predefine a complete set of them to be sure that things like author, date etc are standard items. The main advantage is that a developer can add it's own tags to for instance save a marked region in a sequence: (AGFeatureTypeMyMarkedRegion). We will make sure that these get written in and out of files for him, so he doesn't have to invent his own filetype. Note that it should be no problem to add multiple instances of BCFeatures with same tag (like authors). > > Should we also make a special class BCAnnotatedSequenceReader that > takes care of extracting as much as info as possible from a file? Or > should we do everything with BCSequenceReader.? Good question, I don't know, there should definitely be the option to either read in the features and annotations or not. It can go into one class, but it's perhaps nicer to have to classes. Then again, it would require a lot of code duplication which we don't want. We could make the one a subclass of the other, the annotated one takes off where the super class stops. No idea... >>> Actually, I think that is a good point. But we have to be aware of >>> the fact that a developer can have all compiler warnings turned off. >> Then it's a stupid developer IMHO.... > > Yep - and we all know there's plenty of them ;-) Very true (I tend to be one sometimes), but they tend to turn the warnings back on when things don't work ;-) Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From mek at mekentosj.com Fri Dec 3 12:06:57 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 3 Dec 2004 18:06:57 +0100 Subject: [Biococoa-dev] Profiling results In-Reply-To: References: Message-ID: John, > Clocked in with the following with three repeats > 2004-12-02 22:50:20.552 Translation[14076] time after translation: > -8.624725 > 2004-12-02 22:50:30.823 Translation[14076] time after translation: > -8.609338 > 2004-12-02 22:50:41.563 Translation[14076] time after translation: > -8.681358 > > Changing the part of BCSequenceFactory in question as follows: > //aSymbol = (BCSymbol *)CFArrayGetValueAtIndex( (CFArrayRef) > theContents, > loopCounter); > aSymbol = [theContents objectAtIndex: loopCounter]; > > Clocked in as: > 2004-12-02 22:55:25.098 Translation[14553] time after translation: > -9.005473 > 2004-12-02 22:55:34.217 Translation[14553] time after translation: > -9.083258 > 2004-12-02 22:55:46.097 Translation[14553] time after translation: > -9.020712 > > So, it's slower, but not much slower. In contrast, it is MUCH MUCH > easier > to read. So, in the future, I think it's safe to skip this unless > you're > doing multiple object lookups within a single loop. Definitely! A lot of cocoa stuff (like NSArrays and CFArrays) can be toll-free bridges, and I guess that's exactly what happens under the hood, so I thought already that this wouldn't speed things up so much. At the time the reason was really to get of the enumerator. Let's indeed do it the readible way and reside to Core Foundation only when we can get rid of a lot of object messaging (in terms of: really a lot..). Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* From kvddrift at earthlink.net Fri Dec 3 15:30:37 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 3 Dec 2004 15:30:37 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: <348AEFF8-454D-11D9-A550-000D93AE89A4@mekentosj.com> References: <3A06BF2C-43FF-11D9-9428-003065A5FDCC@earthlink.net> <10FA04AE-4430-11D9-9436-000D93AE89A4@mekentosj.com> <348AEFF8-454D-11D9-A550-000D93AE89A4@mekentosj.com> Message-ID: <30FD35AE-456A-11D9-9080-003065A5FDCC@earthlink.net> On Dec 3, 2004, at 12:03 PM, Alexander Griekspoor wrote: >> I like the BCAnnotatedSequence name. Although it kinda leaves out the >> features. > True, but in some way also in that case I like to think as the > sequence being annotated... >> Yes, that is true. >> BTW, I'm still confused about what is the difference between a >> feature and an annotation. I think we should define very well what we >> want to store in either one. And if the BCFeature and BCAnnotation >> aren't objects, what do you intend them to be? > Oh sorry, in the previous email you wrote: > >> >> BCSequenceWrapper >> | >> -> has-a BCSequence, BCFeatures, etc >> >> >> BCFeatures >> | >> -> has-a dictionary of features (d'oh) > > ..so I thought you wanted to create a BCFeatureS class, which would > have a dictionary of features (BCFeature?). I don;t know - I thought that was similar to what John proposed last week. I really have no idea what the structure should be. It was just an example to get the discussion going. > But I guess you meant the same thing as I do: A BCSequenceWrapper > (object) has a dictionary containing multiple BCFeature objects. > Yes. Although we probably will use BCAnnotatedSequence which is a BCSequence, instead of has-a (see my other email). > As for the distinction between features and annotations: it's pretty > simple, features are linked to certain symbol ranges (phoshorylation > at 12, helix from 15-18), annotations are like metadata, author = > koen, link to article = ... , creation date.. etc Yes, that makes sense. > > I think that both should be tagged by string identifiers (like > BCFeatureTypeAlphaHelix and BCAnnotationTypeAuthor), and we can > predefine a complete set of them to be sure that things like author, > date etc are standard items. The main advantage is that a developer > can add it's own tags to for instance save a marked region in a > sequence: (AGFeatureTypeMyMarkedRegion). We will make sure that these > get written in and out of files for him, so he doesn't have to invent > his own filetype. > Note that it should be no problem to add multiple instances of > BCFeatures with same tag (like authors). Sounds like a good plan. Not that this will not work with a dictionary, because it can have only one of each key. BTW, I'm pretty sure that there is a xml scheme out there that tries to unify these things for various dataformats. We could consider adapting such a thing (if it exists). > >> >> Should we also make a special class BCAnnotatedSequenceReader that >> takes care of extracting as much as info as possible from a file? Or >> should we do everything with BCSequenceReader.? > > Good question, I don't know, there should definitely be the option to > either read in the features and annotations or not. It can go into one > class, but it's perhaps nicer to have to classes. Then again, it would > require a lot of code duplication which we don't want. We could make > the one a subclass of the other, the annotated one takes off where the > super class stops. No idea... Food for thought. cheers, - Koen. From kvddrift at earthlink.net Fri Dec 3 15:23:33 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 3 Dec 2004 15:23:33 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: <079FE99C-454B-11D9-A550-000D93AE89A4@mekentosj.com> References: <8596C7F5-44CE-11D9-9428-003065A5FDCC@earthlink.net> <079FE99C-454B-11D9-A550-000D93AE89A4@mekentosj.com> Message-ID: <34416E28-4569-11D9-9080-003065A5FDCC@earthlink.net> On Dec 3, 2004, at 11:47 AM, Alexander Griekspoor wrote: >> Well let me add some points here. Although I never liked the idea of >> subclassing BCSequence, I think you guys are right that if we use >> one-liners it is better to call them from the appropriate subclass. >> But I still like the idea of have the wrapper test the sequence type >> first before continuing. It might be nonsense to do that, but the >> result won't be - because there are no results. Just returning nil >> will be sufficient, no need to start throwing exceptions around ;) > > True, the question is how we organize the wrapper (you mean the > BCAnnotatedSequence right, or whatever we decide to name it). No, I meant the general wrappers that do something with a sequence (translate, pI, search, etc). > There are basically two choices either let the developer separate the > sequence from the annotations/features part and do all manipulations > purely on the BCSequence, or make all methods accept besides > BCSequences also the BCAnnotatedSequences. This latter his some clear > advantages (such as the possibility to take features into account > while calculating the MW for example), but it also has some clear > problems. One thing it would mean is that we are almost forced to have > also three types of BCAnnotatedSequence subclasses around (Koen might > remark the benefit of a single BCSequence class here probably). LOL - actually, yes I would ;). But I would suggest the following. BCSequence *only* takes care of managing the symbol list, more or less like the SymbolList class they have in BioJava. The we have BCAnnotatedSequence as a subclass of BCSequence. So now we have a symbollist + all the additional info that makes it a real molecule. Then, only for convenience, we subclass BCAnnotatedSequence to BCSequenceDNA, BCSequenceProtein, etc. > Notes and annotations like creator, date etc are easy, they don't > change (and are what I would call a BCAnnotation). Features (BCFeature > objects) are much more of a problem, they are coupled to sequence > ranges (i.e. a helix from aminoacid 10 to 15), and should be kept in > sync while editing the sequence. The big problem here is, what > architecture would be the smartest way of doing this. Any suggestions? The BioPerl docs I mentioned recently use a separate Location object. I need to look more closely at it, to see how useful it is. One thing we have to watch for is that features need to have a 1-based numbering, not 0-based as we have so far. One possibility could be to couple features with individual BCSymbols. So we tell a BCSymbol that a feature XX starts there. However, what happens if in the example you mentioned above (helix from aminoacid 10 to 15), the user edits the sequence and removes AA 8-12? Then the startpoint of the feature is gone. So, I guess that might not be a good solution, although this problem (if any) will also manifest itself with ther solutions. >> >> However, I don't like the idea that was suggested in another recent >> mail, to also make subclasses for DNAStrict, proteinstrict, etc. > I copy that, definitely not, but the general BCSequence class could > have a simple strict boolean that can be set. For what? > Also, we can introduce the strict BCSequenceTypes for passing as > arguments... Sounds good. BTW, what's the difference between 'strict', 'skippingnonbases' and 'unambiguous' ? - Koen. From kvddrift at earthlink.net Fri Dec 3 15:56:12 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 3 Dec 2004 15:56:12 -0500 Subject: [Biococoa-dev] first non-whitespace character In-Reply-To: <467D2BFC-4430-11D9-9436-000D93AE89A4@mekentosj.com> References: <31244BE5-419A-11D9-9F46-003065A5FDCC@earthlink.net> <5A8A46E8-4208-11D9-A769-000D93AE89A4@mekentosj.com> <3A4EB628-433B-11D9-AE33-003065A5FDCC@earthlink.net> <467D2BFC-4430-11D9-9436-000D93AE89A4@mekentosj.com> Message-ID: On Dec 2, 2004, at 2:03 AM, Alexander Griekspoor wrote: >> We could have BCSymbolSet return an NSCharacterSet in such a case. > This would be in general a great addition Koen, to have a > -characterSetRepresentation method which would return the > NSCharacterSet equivalent of the symbol set... > Added :) - Koen. From mek at mekentosj.com Fri Dec 3 16:27:23 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 3 Dec 2004 22:27:23 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: <34416E28-4569-11D9-9080-003065A5FDCC@earthlink.net> References: <8596C7F5-44CE-11D9-9428-003065A5FDCC@earthlink.net> <079FE99C-454B-11D9-A550-000D93AE89A4@mekentosj.com> <34416E28-4569-11D9-9080-003065A5FDCC@earthlink.net> Message-ID: <1F6C30EE-4572-11D9-90C8-000D93AE89A4@mekentosj.com> Koen, Op 3-dec-04 om 21:23 heeft Koen van der Drift het volgende geschreven: > > On Dec 3, 2004, at 11:47 AM, Alexander Griekspoor wrote: > >>> Well let me add some points here. Although I never liked the idea of >>> subclassing BCSequence, I think you guys are right that if we use >>> one-liners it is better to call them from the appropriate subclass. >>> But I still like the idea of have the wrapper test the sequence type >>> first before continuing. It might be nonsense to do that, but the >>> result won't be - because there are no results. Just returning nil >>> will be sufficient, no need to start throwing exceptions around ;) >> >> True, the question is how we organize the wrapper (you mean the >> BCAnnotatedSequence right, or whatever we decide to name it). > > No, I meant the general wrappers that do something with a sequence > (translate, pI, search, etc). Ok, I get it, in general you want those classes be able to handle a general BCSequence object as well, and not only a specific subclass per se. > > >> There are basically two choices either let the developer separate the >> sequence from the annotations/features part and do all manipulations >> purely on the BCSequence, or make all methods accept besides >> BCSequences also the BCAnnotatedSequences. This latter his some clear >> advantages (such as the possibility to take features into account >> while calculating the MW for example), but it also has some clear >> problems. One thing it would mean is that we are almost forced to >> have also three types of BCAnnotatedSequence subclasses around (Koen >> might remark the benefit of a single BCSequence class here probably). > > LOL - actually, yes I would ;). But I would suggest the following. > BCSequence *only* takes care of managing the symbol list, more or less > like the SymbolList class they have in BioJava. The we have > BCAnnotatedSequence as a subclass of BCSequence. So now we have a > symbollist + all the additional info that makes it a real molecule. > Then, only for convenience, we subclass BCAnnotatedSequence to > BCSequenceDNA, BCSequenceProtein, etc. hmm, not sure, it feels like the layer at which we then subclass is the wrong one. But it might also be the only problem. > >> Notes and annotations like creator, date etc are easy, they don't >> change (and are what I would call a BCAnnotation). Features >> (BCFeature objects) are much more of a problem, they are coupled to >> sequence ranges (i.e. a helix from aminoacid 10 to 15), and should be >> kept in sync while editing the sequence. The big problem here is, >> what architecture would be the smartest way of doing this. Any >> suggestions? > > The BioPerl docs I mentioned recently use a separate Location object. > I need to look more closely at it, to see how useful it is. One thing > we have to watch for is that features need to have a 1-based > numbering, not 0-based as we have so far. One possibility could be to > couple features with individual BCSymbols. So we tell a BCSymbol that > a feature XX starts there. However, what happens if in the example you > mentioned above (helix from aminoacid 10 to 15), the user edits the > sequence and removes AA 8-12? Then the startpoint of the feature is > gone. So, I guess that might not be a good solution, although this > problem (if any) will also manifest itself with ther solutions. Exactly, what we have to emulate is an attributed string, that handles exactly the same problem(s). I think in general we don't need a location object, we need a range object and I don't see why NSRange wouldn't be good enough (even if our system is 1-bases). > >>> >>> However, I don't like the idea that was suggested in another recent >>> mail, to also make subclasses for DNAStrict, proteinstrict, etc. >> I copy that, definitely not, but the general BCSequence class could >> have a simple strict boolean that can be set. > > For what? For preserving the knowledge that a sequence uses a strict symbolset (the other option would be to have a symbolset property inside the BCSequence object. > >> Also, we can introduce the strict BCSequenceTypes for passing as >> arguments... > > Sounds good. > > BTW, what's the difference between 'strict', 'skippingnonbases' and > 'unambiguous' ? Basically they're the same thing, and yes, we should rename them to be similar I think... Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows vs Mac 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From mek at mekentosj.com Fri Dec 3 16:33:02 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 3 Dec 2004 22:33:02 +0100 Subject: [Biococoa-dev] first non-whitespace character In-Reply-To: References: <31244BE5-419A-11D9-9F46-003065A5FDCC@earthlink.net> <5A8A46E8-4208-11D9-A769-000D93AE89A4@mekentosj.com> <3A4EB628-433B-11D9-AE33-003065A5FDCC@earthlink.net> <467D2BFC-4430-11D9-9436-000D93AE89A4@mekentosj.com> Message-ID: Nice! Op 3-dec-04 om 21:56 heeft Koen van der Drift het volgende geschreven: > > On Dec 2, 2004, at 2:03 AM, Alexander Griekspoor wrote: > >>> We could have BCSymbolSet return an NSCharacterSet in such a case. >> This would be in general a great addition Koen, to have a >> -characterSetRepresentation method which would return the >> NSCharacterSet equivalent of the symbol set... >> > > Added :) > > - Koen. > > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows vs Mac 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* From jtimmer at bellatlantic.net Fri Dec 3 17:08:20 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 03 Dec 2004 17:08:20 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: <1F6C30EE-4572-11D9-90C8-000D93AE89A4@mekentosj.com> Message-ID: >> >> BTW, what's the difference between 'strict', 'skippingnonbases' and >> 'unambiguous' ? > Basically they're the same thing, and yes, we should rename them to be > similar I think... I'm largely sitting this discussion out (my boss just sent in her latest revisions to my paper, so I'll be sitting it out for the near future as well), but just to prove I am paying some attention: Strict and unambiguous would be the same - all symbols represent a single entity. SkippingNonBases isn't - that's what the "undefined" character is for. It is used when a character doesn't represent a symbol, ambiguous or otherwise. Renaming the concepts is fine, but we need at least two of the concepts around (3 if you count the opposite of unambiguous as a separate concept). And that's not even contemplating gapped sequences.... JT _______________________________________________ This mind intentionally left blank From mek at mekentosj.com Fri Dec 3 17:11:45 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 3 Dec 2004 23:11:45 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: Message-ID: <519D7B1C-4578-11D9-90C8-000D93AE89A4@mekentosj.com> Op 3-dec-04 om 23:08 heeft John Timmer het volgende geschreven: > >>> >>> BTW, what's the difference between 'strict', 'skippingnonbases' and >>> 'unambiguous' ? >> Basically they're the same thing, and yes, we should rename them to be >> similar I think... > > I'm largely sitting this discussion out (my boss just sent in her > latest > revisions to my paper, so I'll be sitting it out for the near future as > well), but just to prove I am paying some attention: > > Strict and unambiguous would be the same - all symbols represent a > single > entity. SkippingNonBases isn't - that's what the "undefined" > character is > for. It is used when a character doesn't represent a symbol, > ambiguous or > otherwise. Oh yes, sorry that's true... > > Renaming the concepts is fine, but we need at least two of the concepts > around (3 if you count the opposite of unambiguous as a separate > concept). Well, if you feed a boolean "strict" you have the two opposite concepts already... Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From kvddrift at earthlink.net Fri Dec 3 17:17:25 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 3 Dec 2004 17:17:25 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: <1F6C30EE-4572-11D9-90C8-000D93AE89A4@mekentosj.com> References: <8596C7F5-44CE-11D9-9428-003065A5FDCC@earthlink.net> <079FE99C-454B-11D9-A550-000D93AE89A4@mekentosj.com> <34416E28-4569-11D9-9080-003065A5FDCC@earthlink.net> <1F6C30EE-4572-11D9-90C8-000D93AE89A4@mekentosj.com> Message-ID: <1C32EF72-4579-11D9-9080-003065A5FDCC@earthlink.net> >> No, I meant the general wrappers that do something with a sequence >> (translate, pI, search, etc). > Ok, I get it, in general you want those classes be able to handle a > general BCSequence object as well, and not only a specific subclass > per se. No. What I proposed was that a wrapper class (eg BCTranslate) first checks the sequence type. If it can handle it, it will continue, if not, it will return nil (or an empty NSArray or whatever). Again, this was part of my idea not to subclass BCSequence, but to make the sequencetype a member of BCSequence. Which we actually already do. But we can also make a symbolset member for the same purpose. If we do it this way, then we only have to put the convenience methods in BCSequence. The alternative as proposed by John is to put convenience methods only in those subclasses on which the wrappers actually work. I guess that solution is also fine with me. BTW, anyone oppose if I cleanup BCSequence and it's subclasses, and add convenience methods? This would mainly involve the rangeOfSubstring methods (which will be replaced my convenience methods to BCFindSequence), and a few other smaller parts. Would you prefer to keep the rangeOfSubstring naming, or should I use the find prefix? > Exactly, what we have to emulate is an attributed string, that handles > exactly the same problem(s). I think in general we don't need a > location object, we need a range object and I don't see why NSRange > wouldn't be good enough (even if our system is 1-bases). Yes, NSRange sounds good to me. And the emulation of an attributed string indeed seems a way to go. I will read some of the bioperl docs tonight. >>> I copy that, definitely not, but the general BCSequence class could >>> have a simple strict boolean that can be set. >> >> For what? > For preserving the knowledge that a sequence uses a strict symbolset I think it's better to then extend the sequenceType enum. Otherwise we need to check for the sequencetype, and then for strict. > (the other option would be to have a symbolset property inside the > BCSequence object. I would really like that too. Can be helpfull in all sorts of cases. >> BTW, what's the difference between 'strict', 'skippingnonbases' and >> 'unambiguous' ? > Basically they're the same thing, and yes, we should rename them to be > similar I think... > That's what I also thought. My vote would go to unambiguous (a PITA to spell, though), it sounds like that's what's used the most, eg in the symbol plists. I guess you and John have a better view on this. - Koen. From kvddrift at earthlink.net Fri Dec 3 17:22:52 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 3 Dec 2004 17:22:52 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: Message-ID: On Dec 3, 2004, at 5:08 PM, John Timmer wrote: > Strict and unambiguous would be the same - all symbols represent a > single > entity. SkippingNonBases isn't - that's what the "undefined" > character is > for. It is used when a character doesn't represent a symbol, > ambiguous or > otherwise. > So is it safe to say that gap and undefined are actually just BCSymbols, not AA or base? In that case we can move their representations to BCSymbosl and make a general undefined/gap symbolset. - Koen. From mek at mekentosj.com Fri Dec 3 17:25:29 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 3 Dec 2004 23:25:29 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: Message-ID: <3CB73B3B-457A-11D9-90C8-000D93AE89A4@mekentosj.com> > So is it safe to say that gap and undefined are actually just > BCSymbols, not AA or base? In that case we can move their > representations to BCSymbosl and make a general undefined/gap > symbolset. Guess so yes, as gaps are needed in both protein and nucleotide alignments for example... Alex ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From mek at mekentosj.com Fri Dec 3 17:33:50 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 3 Dec 2004 23:33:50 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: <1C32EF72-4579-11D9-9080-003065A5FDCC@earthlink.net> References: <8596C7F5-44CE-11D9-9428-003065A5FDCC@earthlink.net> <079FE99C-454B-11D9-A550-000D93AE89A4@mekentosj.com> <34416E28-4569-11D9-9080-003065A5FDCC@earthlink.net> <1F6C30EE-4572-11D9-90C8-000D93AE89A4@mekentosj.com> <1C32EF72-4579-11D9-9080-003065A5FDCC@earthlink.net> Message-ID: <67B4AD0A-457B-11D9-90C8-000D93AE89A4@mekentosj.com> Op 3-dec-04 om 23:17 heeft Koen van der Drift het volgende geschreven: >>> No, I meant the general wrappers that do something with a sequence >>> (translate, pI, search, etc). >> Ok, I get it, in general you want those classes be able to handle a >> general BCSequence object as well, and not only a specific subclass >> per se. > > No. What I proposed was that a wrapper class (eg BCTranslate) first > checks the sequence type. If it can handle it, it will continue, if > not, it will return nil (or an empty NSArray or whatever). Ok, but then it can in principle handle general BCSequences as well ;-) > Again, this was part of my idea not to subclass BCSequence, but to > make the sequencetype a member of BCSequence. Which we actually > already do. But we can also make a symbolset member for the same > purpose. If we do it this way, then we only have to put the > convenience methods in BCSequence. I think we just have to keep the general setup the way it is now, and see how things and up after a while, perhaps it might prove wise to switch to the single BCSequence class in the end. > > The alternative as proposed by John is to put convenience methods only > in those subclasses on which the wrappers actually work. I guess that > solution is also fine with me. That's my favorite as well... > > BTW, anyone oppose if I cleanup BCSequence and it's subclasses, and > add convenience methods? Nope > This would mainly involve the rangeOfSubstring methods (which will be > replaced my convenience methods to BCFindSequence), and a few other > smaller parts. Would you prefer to keep the rangeOfSubstring naming, > or should I use the find prefix? Please indeed use the common prefixes wherever possible, but this comes down to the same thing as I raised forward for this discussion: > - (BCSequenceType *) guessSequenceTypeFromString: (NSString *) string; > - (BCSequence *) guessSequenceFromString: (NSString *) string; > Sometimes, you just want to know the type instead of getting the > sequence back. In the documentation we can point the reader to the > fact that if he want to have the sequence, there's the other method > (and thus prevent him to do double work). Now I'm in the nitpicking > mode anyway, the method name guidelines suggest to do something like: > > sequenceTypeForUntypedString: or sequenceTypeForUnknownString: > and > sequenceFromUntypedString: or sequenceFromUnknownString: > > The originals don't suggest that something is returned... methodnames should include the things they return (IF they return something of course), so if you would have findSubstring it wouldn't make sense, and actually neither really does findRangeOfSubstring: but I could live with the latter... >>>> I copy that, definitely not, but the general BCSequence class could >>>> have a simple strict boolean that can be set. >>> For what? >> For preserving the knowledge that a sequence uses a strict symbolset > > I think it's better to then extend the sequenceType enum. Otherwise we > need to check for the sequencetype, and then for strict. True > >> (the other option would be to have a symbolset property inside the >> BCSequence object. > > I would really like that too. Can be helpfull in all sorts of cases. Ok, feel free to add it... > > >>> BTW, what's the difference between 'strict', 'skippingnonbases' and >>> 'unambiguous' ? >> Basically they're the same thing, and yes, we should rename them to >> be similar I think... >> > > That's what I also thought. My vote would go to unambiguous (a PITA to > spell, though), it sounds like that's what's used the most, eg in the > symbol plists. I guess you and John have a better view on this. Unambiguous is nicer, but perhaps in this case strict is more practical, and would be my choice.. Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows vs Mac 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 4785 bytes Desc: not available URL: From mek at mekentosj.com Fri Dec 3 18:58:11 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 4 Dec 2004 00:58:11 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: <86AB0B2E-4581-11D9-9080-003065A5FDCC@earthlink.net> References: <8596C7F5-44CE-11D9-9428-003065A5FDCC@earthlink.net> <079FE99C-454B-11D9-A550-000D93AE89A4@mekentosj.com> <34416E28-4569-11D9-9080-003065A5FDCC@earthlink.net> <1F6C30EE-4572-11D9-90C8-000D93AE89A4@mekentosj.com> <1C32EF72-4579-11D9-9080-003065A5FDCC@earthlink.net> <67B4AD0A-457B-11D9-90C8-000D93AE89A4@mekentosj.com> <86AB0B2E-4581-11D9-9080-003065A5FDCC@earthlink.net> Message-ID: <2FF93847-4587-11D9-90C8-000D93AE89A4@mekentosj.com> Op 4-dec-04 om 0:17 heeft Koen van der Drift het volgende geschreven: > > On Dec 3, 2004, at 5:33 PM, Alexander Griekspoor wrote: > >>> No. What I proposed was that a wrapper class (eg BCTranslate) first >>> checks the sequence type. If it can handle it, it will continue, if >>> not, it will return nil (or an empty NSArray or whatever). >> Ok, but then it can in principle handle general BCSequences as well >> ;-) > > Exactly!!! See, no reason to subclass BCSequence :D LOL, as said let's see where the evolution of our framework is heading to ;-) > > >> methodnames should include the things they return (IF they return >> something of course), so if you would have findSubstring it wouldn't >> make sense, and actually neither really does findRangeOfSubstring: >> but I could live with the latter... > > Actually, BCFindSequence returns an NSArray of ranges, so that would > become: > > arrayOfRangesOfSubstring Ouch, that's awful, I believe nsstring uses something like occurencesOfString, but a find prefix would maybe do an even better job... Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* From kvddrift at earthlink.net Fri Dec 3 19:13:24 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 3 Dec 2004 19:13:24 -0500 (GMT-05:00) Subject: [Biococoa-dev] more ramblings Message-ID: <21797401.1102119205063.JavaMail.root@kermit.psp.pas.earthlink.net> >> Actually, BCFindSequence returns an NSArray of ranges, so that would >> become: >> >> arrayOfRangesOfSubstring >Ouch, that's awful, I believe nsstring uses something like >occurencesOfString, but a find prefix would maybe do an even better >job... Duh, we're not searching strings but sequences, so my suggestion was bs anyway. Why not just: findSequence - Koen. From mek at mekentosj.com Fri Dec 3 19:23:23 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 4 Dec 2004 01:23:23 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: <21797401.1102119205063.JavaMail.root@kermit.psp.pas.earthlink.net> References: <21797401.1102119205063.JavaMail.root@kermit.psp.pas.earthlink.net> Message-ID: Sometimes things can be so simple ;-) Op 4-dec-04 om 1:13 heeft Koen van der Drift het volgende geschreven: > >>> Actually, BCFindSequence returns an NSArray of ranges, so that would >>> become: >>> >>> arrayOfRangesOfSubstring >> Ouch, that's awful, I believe nsstring uses something like >> occurencesOfString, but a find prefix would maybe do an even better >> job... > > Duh, we're not searching strings but sequences, so my suggestion was > bs anyway. Why not just: > > findSequence > > > - Koen. > > > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From kvddrift at earthlink.net Fri Dec 3 20:01:26 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 3 Dec 2004 20:01:26 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: <21797401.1102119205063.JavaMail.root@kermit.psp.pas.earthlink.net> Message-ID: <061DA4CE-4590-11D9-9080-003065A5FDCC@earthlink.net> On Dec 3, 2004, at 7:23 PM, Alexander Griekspoor wrote: > Sometimes things can be so simple ;-) > I've replaced them in BCSequence. Not yet in in the subclasses, I'd like to get John's go-ahead too. Also the complement methods are moved to BCSequence, so they can be removed from the subclasses as well. Any objections? cheers, - Koen. From kvddrift at earthlink.net Fri Dec 3 20:09:46 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 3 Dec 2004 20:09:46 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: <67B4AD0A-457B-11D9-90C8-000D93AE89A4@mekentosj.com> References: <8596C7F5-44CE-11D9-9428-003065A5FDCC@earthlink.net> <079FE99C-454B-11D9-A550-000D93AE89A4@mekentosj.com> <34416E28-4569-11D9-9080-003065A5FDCC@earthlink.net> <1F6C30EE-4572-11D9-90C8-000D93AE89A4@mekentosj.com> <1C32EF72-4579-11D9-9080-003065A5FDCC@earthlink.net> <67B4AD0A-457B-11D9-90C8-000D93AE89A4@mekentosj.com> Message-ID: <301E29C3-4591-11D9-9080-003065A5FDCC@earthlink.net> On Dec 3, 2004, at 5:33 PM, Alexander Griekspoor wrote: >> >>> (the other option would be to have a symbolset property inside the >>> BCSequence object. >> >> I would really like that too. Can be helpfull in all sorts of cases. > Ok, feel free to add it... > Will do. Another way it can be used is to replace methods such as containsNonBaseSymbols or containsAmbiguousSymbols. Just check the sequencetype or symbolset, and you know. - Koen. From mek at mekentosj.com Fri Dec 3 20:44:19 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 4 Dec 2004 02:44:19 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: <21797401.1102119205063.JavaMail.root@kermit.psp.pas.earthlink.net> <061DA4CE-4590-11D9-9080-003065A5FDCC@earthlink.net> Message-ID: <03DA043B-4596-11D9-90C8-000D93AE89A4@mekentosj.com> Op 4-dec-04 om 2:13 heeft Koen van der Drift het volgende geschreven: > > On Dec 3, 2004, at 8:06 PM, Alexander Griekspoor wrote: > >> Well, a complement doesn't make sense for a protein does it? > > No, it doesn't. That's why I made it such that if you pass it to a > protein, the result will just be an empty sequence. This is actually a > good example of what I mean by putting the convenience methods only > in BCSequence. I know, but again at some point it doesn't make any sense anymore, than we can just as well get rid of the subclasses if that's where you are pointing at (again ;-). Either we go in your direction and throw everything in one class, or we do it nicely with subclasses. The fact that your complement tool object returns nil if you hand it a protein sequence is very elegant and nice (if that is what you want per se), but I don't see any reason why we are obliged to add the method call in BCSequence instead of only in the DNA/RNA subclasses. Again, it doesn't make any sense to add the possibility to call complement on a protein sequence, unless there is only one bcsequence class. And if I'm outnumbered in this opinion, than I even rather switch to the one-BCSequence approach, as now we're ending up in some strange hybrid. John, I think it's your call... Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From a.griekspoor at nki.nl Fri Dec 3 20:52:15 2004 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sat, 4 Dec 2004 02:52:15 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: <3CB73B3B-457A-11D9-90C8-000D93AE89A4@mekentosj.com> References: <3CB73B3B-457A-11D9-90C8-000D93AE89A4@mekentosj.com> Message-ID: <1F4BAD0A-4597-11D9-90C8-000D93AE89A4@nki.nl> Op 3-dec-04 om 23:25 heeft Alexander Griekspoor het volgende geschreven: >> So is it safe to say that gap and undefined are actually just >> BCSymbols, not AA or base? In that case we can move their >> representations to BCSymbosl and make a general undefined/gap >> symbolset. > > Guess so yes, as gaps are needed in both protein and nucleotide > alignments for example... > Alex Oops, I just thought of one small problem, an undefined symbol has a different letter assigned in the DNA (N) vs the protein world (*). Is this indeed a problem or is the undefined symbol different from the "every aminoacid/ every base" symbol? Alex > > ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* From kvddrift at earthlink.net Fri Dec 3 20:57:21 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 3 Dec 2004 20:57:21 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: <1F4BAD0A-4597-11D9-90C8-000D93AE89A4@nki.nl> References: <3CB73B3B-457A-11D9-90C8-000D93AE89A4@mekentosj.com> <1F4BAD0A-4597-11D9-90C8-000D93AE89A4@nki.nl> Message-ID: On Dec 3, 2004, at 8:52 PM, Alexander Griekspoor wrote: > Oops, I just thought of one small problem, an undefined symbol has a > different letter assigned in the DNA (N) vs the protein world (*). Is > this indeed a problem or is the undefined symbol different from the > "every aminoacid/ every base" symbol? > Actually, they both use the '?' symbol for undefined. The '*' is the stop amino acid and 'N' in the bases plist is named anyBase. - Koen. From kvddrift at earthlink.net Fri Dec 3 20:57:35 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 3 Dec 2004 20:57:35 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: <03DA043B-4596-11D9-90C8-000D93AE89A4@mekentosj.com> References: <21797401.1102119205063.JavaMail.root@kermit.psp.pas.earthlink.net> <061DA4CE-4590-11D9-9080-003065A5FDCC@earthlink.net> <03DA043B-4596-11D9-90C8-000D93AE89A4@mekentosj.com> Message-ID: On Dec 3, 2004, at 8:44 PM, Alexander Griekspoor wrote: >> >> No, it doesn't. That's why I made it such that if you pass it to a >> protein, the result will just be an empty sequence. This is actually >> a good example of what I mean by putting the convenience methods >> only in BCSequence. > > I know, but again at some point it doesn't make any sense anymore, > than we can just as well get rid of the subclasses if that's where you > are pointing at (again ;-). No, it's actually fine with me to put them in the DNA/RNA subclasses only. I guess I got carried away by my cleanup mode. What I actually wanted to do is to replace the current complement methods with the convenience method. - Koen. From kvddrift at earthlink.net Fri Dec 3 21:01:26 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 3 Dec 2004 21:01:26 -0500 Subject: [Biococoa-dev] compiler warning Message-ID: <67FC3F32-4598-11D9-9080-003065A5FDCC@earthlink.net> Hi, I get the following compiler warning in BCSequenceReader: BCFoundation/BCSequenceIO/BCSequenceReader.m:418: warning: `BCSequence' may not respond to `-autorelease' BCSequence is a subclass of NSObject, so it should inherit that method, right? - Koen. From a.griekspoor at nki.nl Fri Dec 3 21:02:24 2004 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sat, 4 Dec 2004 03:02:24 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: <3CB73B3B-457A-11D9-90C8-000D93AE89A4@mekentosj.com> <1F4BAD0A-4597-11D9-90C8-000D93AE89A4@nki.nl> Message-ID: <8A82B91E-4598-11D9-90C8-000D93AE89A4@nki.nl> Then it's fine, great! Op 4-dec-04 om 2:57 heeft Koen van der Drift het volgende geschreven: > > On Dec 3, 2004, at 8:52 PM, Alexander Griekspoor wrote: > >> Oops, I just thought of one small problem, an undefined symbol has a >> different letter assigned in the DNA (N) vs the protein world (*). Is >> this indeed a problem or is the undefined symbol different from the >> "every aminoacid/ every base" symbol? >> > > Actually, they both use the '?' symbol for undefined. The '*' is the > stop amino acid and 'N' in the bases plist is named anyBase. > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* From mek at mekentosj.com Fri Dec 3 21:03:31 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 4 Dec 2004 03:03:31 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: <21797401.1102119205063.JavaMail.root@kermit.psp.pas.earthlink.net> <061DA4CE-4590-11D9-9080-003065A5FDCC@earthlink.net> <03DA043B-4596-11D9-90C8-000D93AE89A4@mekentosj.com> Message-ID: >>> No, it doesn't. That's why I made it such that if you pass it to a >>> protein, the result will just be an empty sequence. This is actually >>> a good example of what I mean by putting the convenience methods >>> only in BCSequence. >> >> I know, but again at some point it doesn't make any sense anymore, >> than we can just as well get rid of the subclasses if that's where >> you are pointing at (again ;-). > > No, it's actually fine with me to put them in the DNA/RNA subclasses > only. I guess I got carried away by my cleanup mode. What I actually > wanted to do is to replace the current complement methods with the > convenience method. That's settled then, the convenience method replacement is perfectly fine. Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From mek at mekentosj.com Fri Dec 3 21:16:09 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 4 Dec 2004 03:16:09 +0100 Subject: [Biococoa-dev] compiler warning In-Reply-To: <67FC3F32-4598-11D9-9080-003065A5FDCC@earthlink.net> References: <67FC3F32-4598-11D9-9080-003065A5FDCC@earthlink.net> Message-ID: <76146BA9-459A-11D9-90C8-000D93AE89A4@mekentosj.com> Op 4-dec-04 om 3:01 heeft Koen van der Drift het volgende geschreven: > Hi, > > I get the following compiler warning in BCSequenceReader: > > BCFoundation/BCSequenceIO/BCSequenceReader.m:418: warning: > `BCSequence' may not respond to `-autorelease' > > > BCSequence is a subclass of NSObject, so it should inherit that > method, right? Yes, but you didn't import the BCSequence.m file ;-) You declared BCSequence as a class only in the header, that's why it worked, but you need the .m file for method checking... A. > > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* From kvddrift at earthlink.net Fri Dec 3 21:21:54 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 3 Dec 2004 21:21:54 -0500 Subject: [Biococoa-dev] compiler warning In-Reply-To: <76146BA9-459A-11D9-90C8-000D93AE89A4@mekentosj.com> References: <67FC3F32-4598-11D9-9080-003065A5FDCC@earthlink.net> <76146BA9-459A-11D9-90C8-000D93AE89A4@mekentosj.com> Message-ID: <4410034F-459B-11D9-9080-003065A5FDCC@earthlink.net> On Dec 3, 2004, at 9:16 PM, Alexander Griekspoor wrote: > Yes, but you didn't import the BCSequence.m file ;-) Of course ;-) (I assume you meant BCSequence.h, it must be getting late in Amsterdam ;) - Koen. From mek at mekentosj.com Fri Dec 3 21:23:19 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 4 Dec 2004 03:23:19 +0100 Subject: [Biococoa-dev] compiler warning In-Reply-To: <67FC3F32-4598-11D9-9080-003065A5FDCC@earthlink.net> References: <67FC3F32-4598-11D9-9080-003065A5FDCC@earthlink.net> Message-ID: <76CB449F-459B-11D9-90C8-000D93AE89A4@mekentosj.com> Just a few remarks while going through the code: - I also got a warning for a mistyped NSMutableString tempString, which shouldn't be mutable. - Why do you use autorelease and not release it directly right after you add it to the dictionary, you know you want to loose it anyway, so why don't do it directly. In fact I have seen major speedups/memory footprint reductions in my apps by a simple direct release instead of waiting for the autorelease if you use many objects. A common thing I do in a so-called double loop is to setup an autorelease pool for each inner loop cycle and release that everytime the outerloop continues, it can mean the difference between a nicely running app and a one that locks up the complete computer. - perhaps John is about to change this already, but why not remove the create prefix in the methods of the sequencefactory, thus - (BCSequence *) sequenceWithSequence: (BCSequence *) sequence; instead of - (BCSequence *) createSequenceWithSequence: (BCSequence *) sequence; it's shorter, and adheres better to the guidelines. Similar to the fact that it isn't + (NSArray *) createArrayWithCapacity; but arrayWithCapacity... Cheers, Alex Op 4-dec-04 om 3:01 heeft Koen van der Drift het volgende geschreven: > Hi, > > I get the following compiler warning in BCSequenceReader: > > BCFoundation/BCSequenceIO/BCSequenceReader.m:418: warning: > `BCSequence' may not respond to `-autorelease' > > > BCSequence is a subclass of NSObject, so it should inherit that > method, right? > > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 2413 bytes Desc: not available URL: From mek at mekentosj.com Fri Dec 3 21:24:56 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 4 Dec 2004 03:24:56 +0100 Subject: [Biococoa-dev] compiler warning In-Reply-To: <4410034F-459B-11D9-9080-003065A5FDCC@earthlink.net> References: <67FC3F32-4598-11D9-9080-003065A5FDCC@earthlink.net> <76146BA9-459A-11D9-90C8-000D93AE89A4@mekentosj.com> <4410034F-459B-11D9-9080-003065A5FDCC@earthlink.net> Message-ID: Op 4-dec-04 om 3:21 heeft Koen van der Drift het volgende geschreven: > > On Dec 3, 2004, at 9:16 PM, Alexander Griekspoor wrote: > >> Yes, but you didn't import the BCSequence.m file ;-) > > > Of course ;-) > > (I assume you meant BCSequence.h, it must be getting late in Amsterdam > ;) Indeed, way to late ;-) Couldn't sleep anyway because of the upcoming Sinterklaas evening, it always makes me nervous ;-) Good night! Alex > > > - Koen. > > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows vs Mac 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* From kvddrift at earthlink.net Fri Dec 3 21:29:12 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 3 Dec 2004 21:29:12 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: <8A82B91E-4598-11D9-90C8-000D93AE89A4@nki.nl> References: <3CB73B3B-457A-11D9-90C8-000D93AE89A4@mekentosj.com> <1F4BAD0A-4597-11D9-90C8-000D93AE89A4@nki.nl> <8A82B91E-4598-11D9-90C8-000D93AE89A4@nki.nl> Message-ID: <4909C53B-459C-11D9-9080-003065A5FDCC@earthlink.net> Not, quite, because the plist entry is different for both. Also the undefined AA has its own 3-letter code (Xaa). I need to think how I can solve this, if needed at all. BCsymbolSet now has 2 empty methods: unknownAndGapSymbolSet and unknownSymbolSet. Do we need these? If yes, we do need the neutral BCSymbol version of these two. If no, I will just remove them. - Koen. On Dec 3, 2004, at 9:02 PM, Alexander Griekspoor wrote: > Then it's fine, great! > > Op 4-dec-04 om 2:57 heeft Koen van der Drift het volgende geschreven: > >> >> On Dec 3, 2004, at 8:52 PM, Alexander Griekspoor wrote: >> >>> Oops, I just thought of one small problem, an undefined symbol has a >>> different letter assigned in the DNA (N) vs the protein world (*). >>> Is this indeed a problem or is the undefined symbol different from >>> the "every aminoacid/ every base" symbol? >>> >> >> Actually, they both use the '?' symbol for undefined. The '*' is the >> stop amino acid and 'N' in the bases plist is named anyBase. >> >> - Koen. >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> >> > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > The requirements said: Windows 2000 or better. > So I got a Macintosh. > > ********************************************************* > From jtimmer at bellatlantic.net Fri Dec 3 21:34:23 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 03 Dec 2004 21:34:23 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: <03DA043B-4596-11D9-90C8-000D93AE89A4@mekentosj.com> Message-ID: >>> Well, a complement doesn't make sense for a protein does it? >> >> No, it doesn't. That's why I made it such that if you pass it to a >> protein, the result will just be an empty sequence. This is actually a >> good example of what I mean by putting the convenience methods only >> in BCSequence. > > I know, but again at some point it doesn't make any sense anymore, than > we can just as well get rid of the subclasses if that's where you are > pointing at (again ;-). Either we go in your direction and throw > everything in one class, or we do it nicely with subclasses. The fact > that your complement tool object returns nil if you hand it a protein > sequence is very elegant and nice (if that is what you want per se), > but I don't see any reason why we are obliged to add the method call in > BCSequence instead of only in the DNA/RNA subclasses. Again, it doesn't > make any sense to add the possibility to call complement on a protein > sequence, unless there is only one bcsequence class. And if I'm > outnumbered in this opinion, than I even rather switch to the > one-BCSequence approach, as now we're ending up in some strange hybrid. > John, I think it's your call... Well, if it's my call, I'm pretty sure you know where I stand. It just seems like silliness to me to allow the following: Have a method associated with the data it can't possibly act on Have a method to be called on sequences that it has no relevance to Make it easier to have developers accidentally make stupid mistakes. As Alex said, failing with an empty sequence is a relatively elegant way of handling the situation. But there's absolutely no need for the situation to be handled - placing the method into the appropriate subclass (in this case the currently non-existent BCSequenceNucleotide, parent of DNA and RNA) is just clearer, safer, and more sensible. I just don't see a big benefit - the one claimed was preventing duplication of code, but adding the appropriate subclass eliminates the duplication. So my vote is definitely against putting these in BCSequence. Until there's some decision that the subclasses have to be eliminated, we should use them appropriately. JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Fri Dec 3 21:38:31 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 3 Dec 2004 21:38:31 -0500 Subject: [Biococoa-dev] compiler warning In-Reply-To: <76CB449F-459B-11D9-90C8-000D93AE89A4@mekentosj.com> References: <67FC3F32-4598-11D9-9080-003065A5FDCC@earthlink.net> <76CB449F-459B-11D9-90C8-000D93AE89A4@mekentosj.com> Message-ID: <963FF0B4-459D-11D9-9080-003065A5FDCC@earthlink.net> On Dec 3, 2004, at 9:23 PM, Alexander Griekspoor wrote: > Just a few remarks while going through the code: > - I also got a warning for a mistyped NSMutableString tempString, > which shouldn't be mutable. fixed. > - Why do you use autorelease and not release it directly right after > you add it to the dictionary, you know you want to loose it anyway, so > why don't do it directly. which autorelease are you referring to? > - perhaps John is about to change this already, but why not remove the > create prefix in the methods of the sequencefactory, thus > - (BCSequence *) sequenceWithSequence: (BCSequence *) sequence; > instead of > - (BCSequence *) createSequenceWithSequence: (BCSequence *) sequence; > it's shorter, and adheres better to the guidelines. Similar to the > fact that it isn't + (NSArray *) createArrayWithCapacity; but > arrayWithCapacity... > I'll change that. - Koen. From kvddrift at earthlink.net Fri Dec 3 21:42:39 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 3 Dec 2004 21:42:39 -0500 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: Message-ID: <2A09DD62-459E-11D9-9080-003065A5FDCC@earthlink.net> On Dec 3, 2004, at 9:34 PM, John Timmer wrote: > I just don't see a big benefit - > the one claimed was preventing duplication of code, but adding the > appropriate subclass eliminates the duplication. > Well, the same convenience method is now in 2 subclasses ;-) > So my vote is definitely against putting these in BCSequence. Until > there's > some decision that the subclasses have to be eliminated, we should use > them > appropriately. > See my other reply - I have no objections against putting them in the appropriate subclasses. - Koen. From kvddrift at earthlink.net Fri Dec 3 21:50:27 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 3 Dec 2004 21:50:27 -0500 Subject: [Biococoa-dev] BCSequenceFactory In-Reply-To: <4C5621BF-442E-11D9-9436-000D93AE89A4@mekentosj.com> References: <23B4E5BC-4410-11D9-9428-003065A5FDCC@earthlink.net> <4C5621BF-442E-11D9-9436-000D93AE89A4@mekentosj.com> Message-ID: <40A0DE02-459F-11D9-9080-003065A5FDCC@earthlink.net> On Dec 2, 2004, at 1:49 AM, Alexander Griekspoor wrote: > Sometimes, you just want to know the type instead of getting the > sequence back. In the documentation we can point the reader to the > fact that if he want to have the sequence, there's the other method > (and thus prevent him to do double work). Now I'm in the nitpicking > mode anyway, the method name guidelines suggest to do something like: > > sequenceTypeForUntypedString: or sequenceTypeForUnknownString: > and > sequenceFromUntypedString: or sequenceFromUnknownString: > I added this method: - (BCSequenceType) sequenceTypeForUntypedString: (NSString *) string { return [[self sequenceFromUntypedString: string] sequenceType]; } is that what you meant? - Koen. From mek at mekentosj.com Sat Dec 4 04:04:30 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 4 Dec 2004 10:04:30 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: References: Message-ID: <81DC0924-45D3-11D9-90C8-000D93AE89A4@mekentosj.com> John and Koen, Op 4-dec-04 om 3:34 heeft John Timmer het volgende geschreven: > >>>> Well, a complement doesn't make sense for a protein does it? >>> >>> No, it doesn't. That's why I made it such that if you pass it to a >>> protein, the result will just be an empty sequence. This is actually >>> a >>> good example of what I mean by putting the convenience methods only >>> in BCSequence. >> >> I know, but again at some point it doesn't make any sense anymore, >> than >> we can just as well get rid of the subclasses if that's where you are >> pointing at (again ;-). Either we go in your direction and throw >> everything in one class, or we do it nicely with subclasses. The fact >> that your complement tool object returns nil if you hand it a protein >> sequence is very elegant and nice (if that is what you want per se), >> but I don't see any reason why we are obliged to add the method call >> in >> BCSequence instead of only in the DNA/RNA subclasses. Again, it >> doesn't >> make any sense to add the possibility to call complement on a protein >> sequence, unless there is only one bcsequence class. And if I'm >> outnumbered in this opinion, than I even rather switch to the >> one-BCSequence approach, as now we're ending up in some strange >> hybrid. >> John, I think it's your call... > > Well, if it's my call, I'm pretty sure you know where I stand. Then it perhaps wasn't really fair from the start to ask for your opinion, but ok. > It just > seems like silliness to me to allow the following: > Have a method associated with the data it can't possibly act on > Have a method to be called on sequences that it has no relevance to > Make it easier to have developers accidentally make stupid mistakes. > Yep my point exactly. > As Alex said, failing with an empty sequence is a relatively elegant > way of > handling the situation. But there's absolutely no need for the > situation > to be handled - placing the method into the appropriate subclass (in > this > case the currently non-existent BCSequenceNucleotide, parent of DNA > and RNA) > is just clearer, safer, and more sensible. It might make a lot of sense to add this class. > I just don't see a big benefit - > the one claimed was preventing duplication of code, but adding the > appropriate subclass eliminates the duplication. > > So my vote is definitely against putting these in BCSequence. Until > there's > some decision that the subclasses have to be eliminated, we should use > them > appropriately. Amen Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From mek at mekentosj.com Sat Dec 4 04:06:31 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 4 Dec 2004 10:06:31 +0100 Subject: [Biococoa-dev] compiler warning In-Reply-To: <963FF0B4-459D-11D9-9080-003065A5FDCC@earthlink.net> References: <67FC3F32-4598-11D9-9080-003065A5FDCC@earthlink.net> <76CB449F-459B-11D9-90C8-000D93AE89A4@mekentosj.com> <963FF0B4-459D-11D9-9080-003065A5FDCC@earthlink.net> Message-ID: >> - Why do you use autorelease and not release it directly right after >> you add it to the dictionary, you know you want to loose it anyway, >> so why don't do it directly. > > which autorelease are you referring to? The kinds like this example from readClustal: [sequenceDictionary setObject: newSequence forKey: name]; [newSequence autorelease]; The newSequence can be released immediately.. A. ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 1156 bytes Desc: not available URL: From mek at mekentosj.com Sat Dec 4 04:07:48 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 4 Dec 2004 10:07:48 +0100 Subject: [Biococoa-dev] more ramblings In-Reply-To: <2A09DD62-459E-11D9-9080-003065A5FDCC@earthlink.net> References: <2A09DD62-459E-11D9-9080-003065A5FDCC@earthlink.net> Message-ID: Op 4-dec-04 om 3:42 heeft Koen van der Drift het volgende geschreven: > > On Dec 3, 2004, at 9:34 PM, John Timmer wrote: > >> I just don't see a big benefit - >> the one claimed was preventing duplication of code, but adding the >> appropriate subclass eliminates the duplication. >> > > Well, the same convenience method is now in 2 subclasses ;-) I think it makes sense to add the general nucleotide class as super of DNA and RNA sequences, unless at the end of the day it only contains one or two methods... > A. ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From mek at mekentosj.com Sat Dec 4 04:08:32 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 4 Dec 2004 10:08:32 +0100 Subject: [Biococoa-dev] BCSequenceFactory In-Reply-To: <40A0DE02-459F-11D9-9080-003065A5FDCC@earthlink.net> References: <23B4E5BC-4410-11D9-9428-003065A5FDCC@earthlink.net> <4C5621BF-442E-11D9-9436-000D93AE89A4@mekentosj.com> <40A0DE02-459F-11D9-9080-003065A5FDCC@earthlink.net> Message-ID: <126DB8B8-45D4-11D9-90C8-000D93AE89A4@mekentosj.com> Op 4-dec-04 om 3:50 heeft Koen van der Drift het volgende geschreven: > > On Dec 2, 2004, at 1:49 AM, Alexander Griekspoor wrote: > >> Sometimes, you just want to know the type instead of getting the >> sequence back. In the documentation we can point the reader to the >> fact that if he want to have the sequence, there's the other method >> (and thus prevent him to do double work). Now I'm in the nitpicking >> mode anyway, the method name guidelines suggest to do something like: >> >> sequenceTypeForUntypedString: or sequenceTypeForUnknownString: >> and >> sequenceFromUntypedString: or sequenceFromUnknownString: >> > > > I added this method: > > - (BCSequenceType) sequenceTypeForUntypedString: (NSString *) string > { > return [[self sequenceFromUntypedString: string] sequenceType]; > } In principle yes, great! ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows vs Mac 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* From kvddrift at earthlink.net Sat Dec 4 10:45:59 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 4 Dec 2004 10:45:59 -0500 Subject: [Biococoa-dev] bug alert In-Reply-To: References: <654E7508-44B5-11D9-9428-003065A5FDCC@earthlink.net> Message-ID: <980043A6-460B-11D9-82B2-003065A5FDCC@earthlink.net> Grrr - another crash creeped in :( I will look into it tonight, but feel free to do so as well. - Koen. From mek at mekentosj.com Sat Dec 4 17:34:42 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 4 Dec 2004 23:34:42 +0100 Subject: [Biococoa-dev] bug alert In-Reply-To: <980043A6-460B-11D9-82B2-003065A5FDCC@earthlink.net> References: <654E7508-44B5-11D9-9428-003065A5FDCC@earthlink.net> <980043A6-460B-11D9-82B2-003065A5FDCC@earthlink.net> Message-ID: Ha Koen, First of all the newly added BCSequenceFactory.h file was still set to project instead of private in a fresh checkout from the CVS. Next, I got a compile error that the BCFoundationDefines.h could not be found. The by now famous, quit XCode, delete the build directory, and reopen the project, helped however... After that everything seems to build and runs without problems Koen.. Something else, in general the idea would be to add the BCTools prefix to all tools, for instance: // BCTools #import #import I would like to propose to do the same for the following new ones: #import -> BCToolFindSequence (or even BCToolSequenceFinder) #import -> BCToolSymbolCounter #import -> BCToolComplement #import -> BCToolSequenceFactory Similarly, lets stick to the guidelines and add the "shared" prefix to all class methods that generate static objects, thus + sequenceFactory; -> + sharedSequenceFactory. Cheers, Alex Op 4-dec-04 om 16:45 heeft Koen van der Drift het volgende geschreven: > Grrr - another crash creeped in :( > > I will look into it tonight, but feel free to do so as well. > > > - Koen. > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 2557 bytes Desc: not available URL: From kvddrift at earthlink.net Sat Dec 4 18:38:11 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 4 Dec 2004 18:38:11 -0500 Subject: [Biococoa-dev] bug alert In-Reply-To: References: <654E7508-44B5-11D9-9428-003065A5FDCC@earthlink.net> <980043A6-460B-11D9-82B2-003065A5FDCC@earthlink.net> Message-ID: <8F5C7A70-464D-11D9-82B2-003065A5FDCC@earthlink.net> Alex, Please go ahead and submit these changes. - Koen. On Dec 4, 2004, at 5:34 PM, Alexander Griekspoor wrote: > Ha Koen, > > First of all the newly added BCSequenceFactory.h file was still set to > project instead of private in a fresh checkout from the CVS. > Next, I got a compile error that the BCFoundationDefines.h could not > be found. The by now famous, quit XCode, delete the build directory, > and reopen the project, helped however... > After that everything seems to build and runs without problems Koen.. > > Something else, in general the idea would be to add the BCTools prefix > to all tools, for instance: > // BCTools > #import > #import > > I would like to propose to do the same for the following new ones: > #import -> BCToolFindSequence (or even > BCToolSequenceFinder) > #import -> BCToolSymbolCounter > #import -> BCToolComplement > #import -> BCToolSequenceFactory > > Similarly, lets stick to the guidelines and add the "shared" prefix to > all class methods that generate static objects, thus > + sequenceFactory; -> + sharedSequenceFactory. > > Cheers, > Alex > > > Op 4-dec-04 om 16:45 heeft Koen van der Drift het volgende geschreven: > >> Grrr - another crash creeped in :( >> >> I will look into it tonight, but feel free to do so as well. >> >> >> - Koen. >> >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> >> > ************************************************************** > ** Alexander Griekspoor ** > ************************************************************** > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > MacOS X: The power of UNIX with the simplicity of the Mac > > *************************************************************** From mek at mekentosj.com Sat Dec 4 18:57:39 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 5 Dec 2004 00:57:39 +0100 Subject: [Biococoa-dev] bug alert In-Reply-To: <8F5C7A70-464D-11D9-82B2-003065A5FDCC@earthlink.net> References: <654E7508-44B5-11D9-9428-003065A5FDCC@earthlink.net> <980043A6-460B-11D9-82B2-003065A5FDCC@earthlink.net> <8F5C7A70-464D-11D9-82B2-003065A5FDCC@earthlink.net> Message-ID: <4749C079-4650-11D9-90C8-000D93AE89A4@mekentosj.com> Koen, if you don't mind, would you like to do it? it's one o'clock and still have to do one Sinterklaas poem ;-) So, the only things needed to be changed are: 1 make the BCSequenceFactory public (and thus update the project) 2 change the classnames of the new tools 3 change the names of the shared class methods In principles I would like to do #2 and 3, but that will take some time then (2-3 days), also you're busy with some of these classes, so to prevent getting sync problems, I would rather leave it up to you. Let's put it this way, if it hasn't been changed in a few days, I'll do it ok? Cheers, Alex Op 5-dec-04 om 0:38 heeft Koen van der Drift het volgende geschreven: > Alex, > > Please go ahead and submit these changes. > > - Koen. > > > On Dec 4, 2004, at 5:34 PM, Alexander Griekspoor wrote: > >> Ha Koen, >> >> First of all the newly added BCSequenceFactory.h file was still set >> to project instead of private in a fresh checkout from the CVS. >> Next, I got a compile error that the BCFoundationDefines.h could not >> be found. The by now famous, quit XCode, delete the build directory, >> and reopen the project, helped however... >> After that everything seems to build and runs without problems Koen.. >> >> Something else, in general the idea would be to add the BCTools >> prefix to all tools, for instance: >> // BCTools >> #import >> #import >> >> I would like to propose to do the same for the following new ones: >> #import -> BCToolFindSequence (or >> even BCToolSequenceFinder) >> #import -> BCToolSymbolCounter >> #import -> BCToolComplement >> #import -> BCToolSequenceFactory >> >> Similarly, lets stick to the guidelines and add the "shared" prefix >> to all class methods that generate static objects, thus >> + sequenceFactory; -> + sharedSequenceFactory. >> >> Cheers, >> Alex >> >> >> Op 4-dec-04 om 16:45 heeft Koen van der Drift het volgende geschreven: >> >>> Grrr - another crash creeped in :( >>> >>> I will look into it tonight, but feel free to do so as well. >>> >>> >>> - Koen. >>> >>> >>> _______________________________________________ >>> Biococoa-dev mailing list >>> Biococoa-dev at bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/biococoa-dev >>> >>> >> ************************************************************** >> ** Alexander Griekspoor ** >> ************************************************************** >> The Netherlands Cancer Institute >> Department of Tumorbiology (H4) >> Plesmanlaan 121, 1066 CX, Amsterdam >> Tel: + 31 20 - 512 2023 >> Fax: + 31 20 - 512 2029 >> AIM: mekentosj at mac.com >> E-mail: a.griekspoor at nki.nl >> Web: http://www.mekentosj.com >> >> MacOS X: The power of UNIX with the simplicity of the Mac >> >> *************************************************************** > > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From kvddrift at earthlink.net Sat Dec 4 21:29:21 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 4 Dec 2004 21:29:21 -0500 Subject: [Biococoa-dev] bug alert In-Reply-To: <980043A6-460B-11D9-82B2-003065A5FDCC@earthlink.net> References: <654E7508-44B5-11D9-9428-003065A5FDCC@earthlink.net> <980043A6-460B-11D9-82B2-003065A5FDCC@earthlink.net> Message-ID: <78E515EE-4665-11D9-82B2-003065A5FDCC@earthlink.net> On Dec 4, 2004, at 10:45 AM, Koen van der Drift wrote: > Grrr - another crash creeped in :( > > I will look into it tonight, but feel free to do so as well. > I did a fresh checkout from CVS, and the problem is gone. Must have been some code I changed locally... pffft. - Koen. From kvddrift at earthlink.net Mon Dec 6 18:36:13 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 6 Dec 2004 18:36:13 -0500 Subject: [Biococoa-dev] bug alert In-Reply-To: <4749C079-4650-11D9-90C8-000D93AE89A4@mekentosj.com> References: <654E7508-44B5-11D9-9428-003065A5FDCC@earthlink.net> <980043A6-460B-11D9-82B2-003065A5FDCC@earthlink.net> <8F5C7A70-464D-11D9-82B2-003065A5FDCC@earthlink.net> <4749C079-4650-11D9-90C8-000D93AE89A4@mekentosj.com> Message-ID: <9D8C040E-47DF-11D9-BDD3-003065A5FDCC@earthlink.net> On Dec 4, 2004, at 6:57 PM, Alexander Griekspoor wrote: >>> I would like to propose to do the same for the following new ones: >>> >>> #import -> BCToolSequenceFactory Actually, I don't think BCSequenceFactory should go in the BCTools hierarchy. A factory is not a tool that does something with a sequence, but it creates a sequence. Maybe it fits better in the BCSequence folder without a name change. - Koen. From mek at mekentosj.com Tue Dec 7 02:30:32 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Tue, 7 Dec 2004 08:30:32 +0100 Subject: [Biococoa-dev] bug alert In-Reply-To: <9D8C040E-47DF-11D9-BDD3-003065A5FDCC@earthlink.net> References: <654E7508-44B5-11D9-9428-003065A5FDCC@earthlink.net> <980043A6-460B-11D9-82B2-003065A5FDCC@earthlink.net> <8F5C7A70-464D-11D9-82B2-003065A5FDCC@earthlink.net> <4749C079-4650-11D9-90C8-000D93AE89A4@mekentosj.com> <9D8C040E-47DF-11D9-BDD3-003065A5FDCC@earthlink.net> Message-ID: That's perfectly fine and certainly makes sense, so go ahead. The point I wanted to make was only that all tools should have the BCTool prefix. Cheers, Alex Op 7-dec-04 om 0:36 heeft Koen van der Drift het volgende geschreven: > > On Dec 4, 2004, at 6:57 PM, Alexander Griekspoor wrote: > >>>> I would like to propose to do the same for the following new ones: >>>> >>>> #import -> BCToolSequenceFactory > > > Actually, I don't think BCSequenceFactory should go in the BCTools > hierarchy. A factory is not a tool that does something with a > sequence, but it creates a sequence. Maybe it fits better in the > BCSequence folder without a name change. > > - Koen. > > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From mek at mekentosj.com Wed Dec 8 16:52:48 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Wed, 8 Dec 2004 22:52:48 +0100 Subject: [Biococoa-dev] bug alert In-Reply-To: <36BF6890-4664-11D9-82B2-003065A5FDCC@earthlink.net> References: <24058291.1102206224828.JavaMail.root@kermit.psp.pas.earthlink.net> <629175DB-4655-11D9-90C8-000D93AE89A4@mekentosj.com> <36BF6890-4664-11D9-82B2-003065A5FDCC@earthlink.net> Message-ID: <806E5EF6-4963-11D9-85C1-000D93AE89A4@mekentosj.com> Hi guys, I've changed the stuff I discussed before, and described below. Please update your project from CVS, be careful when you have changes made since your last checkout. If you want to do a fresh checkout, don't forget the -P option to discard empty folders. Have a look if all works as it should. Just a nice thing to keep in mind for possible later use, renaming files is really easy in xcode, just right cick the file and rename it, XCode will handle the CVS synchronization for you. The only thing that then remains is a simple find and replace within the project to replace all old names with the new one... Cheers, Alex > Ha Koen, > > First of all the newly added BCSequenceFactory.h file was still set to > project instead of private in a fresh checkout from the CVS. > Next, I got a compile error that the BCFoundationDefines.h could not > be found. The by now famous, quit XCode, delete the build directory, > and reopen the project, helped however... > After that everything seems to build and runs without problems Koen.. > > Something else, in general the idea would be to add the BCTools prefix > to all tools, for instance: > // BCTools > #import > #import > > I would like to propose to do the same for the following new ones: > #import -> BCToolFindSequence (or even > BCToolSequenceFinder) > #import -> BCToolSymbolCounter > #import -> BCToolComplement > #import -> BCToolSequenceFactory > > Similarly, lets stick to the guidelines and add the "shared" prefix to > all class methods that generate static objects, thus > + sequenceFactory; -> + sharedSequenceFactory. > > Cheers, > Alex >>> >> ********************************************************* >> ** Alexander Griekspoor ** >> ********************************************************* >> The Netherlands Cancer Institute >> Department of Tumorbiology (H4) >> Plesmanlaan 121, 1066 CX, Amsterdam >> Tel: + 31 20 - 512 2023 >> Fax: + 31 20 - 512 2029 >> AIM: mekentosj at mac.com >> E-mail: a.griekspoor at nki.nl >> Web: http://www.mekentosj.com >> >> 4Peaks - For Peaks, Four Peaks. >> 2004 Winner of the Apple Design Awards >> Best Mac OS X Student Product >> http://www.mekentosj.com/4peaks >> >> ********************************************************* > > > ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 3678 bytes Desc: not available URL: From kvddrift at earthlink.net Thu Dec 9 17:32:22 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 9 Dec 2004 17:32:22 -0500 Subject: [Biococoa-dev] new release Message-ID: <31DC683E-4A32-11D9-BDD3-003065A5FDCC@earthlink.net> Hi, It looks like Peter released a new version of the original BioCocoa on the website (v 1.6). I guess this is for his publication, because it includes a Linux and Windows version. There is now also a link to a web-based conversion tool. http://bioinformatics.org/biococoa/ - Koen. From mek at mekentosj.com Thu Dec 9 17:39:23 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 9 Dec 2004 23:39:23 +0100 Subject: [Biococoa-dev] new release In-Reply-To: <31DC683E-4A32-11D9-BDD3-003065A5FDCC@earthlink.net> References: <31DC683E-4A32-11D9-BDD3-003065A5FDCC@earthlink.net> Message-ID: <2CB2D915-4A33-11D9-85C1-000D93AE89A4@mekentosj.com> Very nice indeed, Peter told me already he was preparing the release which is indeed for his revised paper. He also asked me to test the webservice which functions very well (although I rather build my own app ;-) He still has to update the bioinformatics homepage to reflect the latest version (it still list 1.5), and after that he'll probably tell us what's new in this release. He also told me he will bring all the new file formats to the new framework of course.... Congrats Peter! Alex Op 9-dec-04 om 23:32 heeft Koen van der Drift het volgende geschreven: > Hi, > > It looks like Peter released a new version of the original BioCocoa on > the website (v 1.6). I guess this is for his publication, because it > includes a Linux and Windows version. There is now also a link to a > web-based conversion tool. > > http://bioinformatics.org/biococoa/ > > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows vs Mac 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* From peter.schols at bio.kuleuven.ac.be Fri Dec 10 05:09:58 2004 From: peter.schols at bio.kuleuven.ac.be (Peter Schols) Date: Fri, 10 Dec 2004 11:09:58 +0100 Subject: [Biococoa-dev] new release In-Reply-To: <2CB2D915-4A33-11D9-85C1-000D93AE89A4@mekentosj.com> References: <31DC683E-4A32-11D9-BDD3-003065A5FDCC@earthlink.net> <2CB2D915-4A33-11D9-85C1-000D93AE89A4@mekentosj.com> Message-ID: Hi, The BioCocoa updates you are seeing now are indeed intended for the resubmission of a paper I submitted last year, when BioCocoa was at 1.1. I have already informed some people about this 'special version' but I haven't had a chance to send an announcement to the list yet, so I'll do it now ;-) - The 1.6 version you are seeing for download on the BioCocoa homepage is an updated version of BioCocoa 1.1. So it does not use the new architecture yet. In other words, it still uses NSStrings and NSDictionaries in stead of BCSequences and the like. So this new version only converts between multiple sequence/phylogeny file formats, which was the first goal of the original BioCocoa. - The major new additions to this version are support for the following file formats: Nona, TNT, Hennig86, Beast and Plain. - This 1.6 version is 100% GNUstep compliant, and as a consequence it runs well on Windows and Linux (with a working installation of GNUstep, of course). The lack of a Windows and Linux versions was one of the main issues the reviewers had with the first version. - None of the changes or additions in this version have been added to the BC repository yet. Of course I'll do this, but not before testing them a bit more and changing the I/O methods to use BCSequences (or BCSequenceWrappers or whatever we will be using by then). Best wishes, peter On 09 Dec 2004, at 23:39, Alexander Griekspoor wrote: > Very nice indeed, Peter told me already he was preparing the release > which is indeed for his revised paper. He also asked me to test the > webservice which functions very well (although I rather build my own > app ;-) He still has to update the bioinformatics homepage to reflect > the latest version (it still list 1.5), and after that he'll probably > tell us what's new in this release. He also told me he will bring all > the new file formats to the new framework of course.... > Congrats Peter! > Alex > > > > Op 9-dec-04 om 23:32 heeft Koen van der Drift het volgende geschreven: > >> Hi, >> >> It looks like Peter released a new version of the original BioCocoa >> on the website (v 1.6). I guess this is for his publication, because >> it includes a Linux and Windows version. There is now also a link to >> a web-based conversion tool. >> >> http://bioinformatics.org/biococoa/ >> >> >> - Koen. >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> >> > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > Windows vs Mac > 65 million years ago, there were more > dinosaurs than humans. > Where are the dinosaurs now? > > ********************************************************* > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From charles.parnot at stanford.edu Fri Dec 10 13:06:37 2004 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Fri, 10 Dec 2004 10:06:37 -0800 Subject: [Biococoa-dev] I am watching you Message-ID: Hi List! This is just a quick email to let you know I added myself to the list. I thought it would be nice of me to let all of you know I will be reading your stuff for a while. I am interested in helping out at some point. However, I am not sure it is reasonable for me to do so at the present time, as I am quite busy with other things (well, who isn't?). I will thus just be watching and grabbing information, and maybe commenting a few times, and let you know when I am ready to commit more, if that ever happens. Here is some short background on me: I am a postdoc in Stanford University, working on the structural aspects of the adrenergic receptor activation, and more generally on GPCRs (G-protein coupled receptors). I am French, and got my Ph.D. in Paris. I started writing some code about 2 years ago, first some Perl (still using it a lot), and now C/Cocoa, for my research (and for my pleasure at the same time, of course; I am really a big fan of Mac OS X and of Cocoa). You can read more about some of the stuff I am doing at: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ I have CVS-ed and browsed through the most recent version of the project, and I think I have a good understanding of the overall structure of BioCocoa, and even of some of the details. You do a really good job at commenting and organizing the code, and the overall design looks really good. I have also read the list archives for November and December 2004. As a starter, I am humbly asking one of you, whenever he/she has time, to summarize the different design options you had in the past or are stille considering for the BCSequence object (from the archives, I could only grab part of the debate). I know this is quite a big question, but I don't ask for too many details, just a quick overview of the different options and I think I can fill in the blanks. Then a related question is: why do you need a BCSequenceFactory, and not just use factory methods defined in the BCSequence superclass (when unknown sequence type) or subclasses (when known types). I should add that I have no intention to question any of the design decisions ;-) , and don't want to revive any past debate, I just want to be brought up to speed... Thanks to whoever answer those questions, and again, the BioCocoa project is a great initiative, and it looks really promising :-) Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Sat Dec 11 13:58:45 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 11 Dec 2004 13:58:45 -0500 Subject: [Biococoa-dev] I am watching you In-Reply-To: References: Message-ID: Hi Charles, Welcome to the world of BioCocoa. I almost marked your mail as junk, because of the subject ;-) Right now BioCocoa only has a few developers, so we can use all the help we can get. I guess developing for an open source project is similar to setting up an Xgrid project. Not all developers are working full time at the project, only when they have some cpu cycles left. Peter Schols started BioCocoa a while ago as a framework to read and write various sequence formats, with an emphasis on phylogenetic formats, which is his field. I joined his project early this year and added some methods to read various protein formats. This is still the version that you can doenload from the website. Then in the summer John Timmer and Alex Griekspoor (mek from mekentosj) joined and the project started from scratch in the current setup. Peter was really busy, so it were basically the three of us that coded what is now in CVS. > As a starter, I am humbly asking one of you, whenever he/she has time, > to summarize the different design options you had in the past or are > stille considering for the BCSequence object (from the archives, I > could only grab part of the debate). There are two different opinions about the use of BCSequence. My own idea is that we should have only one BCSequence class that takes care of managing the BCSymbols in it. To identify the sequence, I proposed we should have a symbolset member, eg dnaSymbolSet, proteinStrictSymbolSet. These are similar to the Alphabets you find in BioPerl and BioJava. This way you only have to keep the sequence related code in one class, instead of every possible subclass with small variations. The other idea, which is favored by John and Alex, is to subclass BCSequence, and have only code that is sensible for the specific subclass in that class. Eg a protein would never need to calculate the GC content, or a DNA doesn't need a isoelectric point calculator. Both designs have their advantages and disadvantages, right now we came up with a compromise: we subclass BCSequence, but the subclasses only contain convenience methods that call wrapper objects (BCTools) to perform a specific action for that subclass. > I know this is quite a big question, but I don't ask for too many > details, just a quick overview of the different options and I think I > can fill in the blanks. Then a related question is: why do you need a > BCSequenceFactory, and not just use factory methods defined in the > BCSequence superclass (when unknown sequence type) or subclasses (when > known types). I should add that I have no intention to question any of > the design decisions ;-) , and don't want to revive any past debate, I > just want to be brought up to speed... The idea of a factory class is to have all code that creates sequences in one central location, instead of spread out through various subclasses of BCSequence. It's just a way of factoring out code into smaller modules. The advantage is that when something changes/added in the way a sequence is created this only has to be done in one class (the factory class). This is also a well established design pattern, and used in many projects. > > Thanks to whoever answer those questions, and again, the BioCocoa > project is a great initiative, and it looks really promising :-) I hope I answered you questions, feel free to ask more and hopefully add code in a short while. cheers, - Koen. From kvddrift at earthlink.net Sat Dec 11 14:03:24 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 11 Dec 2004 14:03:24 -0500 Subject: [Biococoa-dev] new release In-Reply-To: References: <31DC683E-4A32-11D9-BDD3-003065A5FDCC@earthlink.net> <2CB2D915-4A33-11D9-85C1-000D93AE89A4@mekentosj.com> Message-ID: <55509A3E-4BA7-11D9-BDD3-003065A5FDCC@earthlink.net> > - None of the changes or additions in this version have been added to > the BC repository yet. Of course I'll do this, but not before testing > them a bit more and changing the I/O methods to use BCSequences (or > BCSequenceWrappers or whatever we will be using by then). > > Great work, Peter. I hope your publication gets accepted. Because you started BioCocoa, I think it would be great if you could jump into the discussion of the dataformat we should adapt for BioCocoa. I hope you don't feel that Alex, John, and me have stolen your baby ;) cheers, - Koen. From peter.schols at bio.kuleuven.ac.be Sun Dec 12 15:14:42 2004 From: peter.schols at bio.kuleuven.ac.be (Peter Schols) Date: Sun, 12 Dec 2004 21:14:42 +0100 Subject: [Biococoa-dev] new release In-Reply-To: <55509A3E-4BA7-11D9-BDD3-003065A5FDCC@earthlink.net> References: <31DC683E-4A32-11D9-BDD3-003065A5FDCC@earthlink.net> <2CB2D915-4A33-11D9-85C1-000D93AE89A4@mekentosj.com> <55509A3E-4BA7-11D9-BDD3-003065A5FDCC@earthlink.net> Message-ID: <75C394EC-4C7A-11D9-AD7B-00039345483C@bio.kuleuven.ac.be> > Great work, Peter. I hope your publication gets accepted. Thanks Koen! Once the 2.0 version has been completed, I think we should try another publication with all BC coauthors . > Because you started BioCocoa, I think it would be great if you could > jump into the discussion of the dataformat we should adapt for > BioCocoa. I hope you don't feel that Alex, John, and me have stolen > your baby ;) Absolutely not... ;-)) I'm very happy with the things you guys have been doing over the past few months! BC has grown from a simple sequence converter to a real Bio framework like BioJava or BioPerl. If things go as planned, I'l have a lot more time during the coming months and I'll jump back in, after almost a year of BCAbsence... I'd also like to welcome Charles to the growing BC community! Peter From mek at mekentosj.com Sun Dec 12 18:55:33 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 13 Dec 2004 00:55:33 +0100 Subject: [Biococoa-dev] I am watching you In-Reply-To: References: Message-ID: <4FAE3082-4C99-11D9-85C1-000D93AE89A4@mekentosj.com> I promised Charles to give a state-of-the-union on the list, but Koen did a very nice job in summarizing the current status! I just wanted to add the current focus and problems. As you read much of the archives of the last two months, you have seen the discussion about subclassing or not BCSequence, nicely summarized below. This will probably stay an issue of debate for a while, and is something to keep in the back of our heads a bit, we simply have to see which option in the end fits our framework best. At the moment the main focus in on the sequence IO, reading and writing the different file formats into BCSequence objects. Koen has done quite some work on this and the basics are working really well. More formats will come easy now, and also Peter has added a number of new formats to the 1.6 (original) version which he will port to the new framework as well. What we have to do now is design a the annotations/features part of the sequences, as well as the grouping of sequences into bundles. The basic idea would be to have a hierarchy like this: BCSequence - the basic sequence object BCAnnotatedSequence - a wrapper object containing a dictionary of BCFeatures , a BCSequence, a dictionary of BCAnnotations, or a subclass of BCSequence which is the second possibility BCAnnotatedSequenceBundle - a bundle of BCAnnotatedSequences, including interdependencies The question now is how to implement this system... All ideas, comments and suggestions are more than welcome! You see, lot's of work to do ;-) Cheers, Alex Koen did a really nice Op 11-dec-04 om 19:58 heeft Koen van der Drift het volgende geschreven: > Hi Charles, > > Welcome to the world of BioCocoa. I almost marked your mail as junk, > because of the subject ;-) > > Right now BioCocoa only has a few developers, so we can use all the > help we can get. I guess developing for an open source project is > similar to setting up an Xgrid project. Not all developers are working > full time at the project, only when they have some cpu cycles left. > > Peter Schols started BioCocoa a while ago as a framework to read and > write various sequence formats, with an emphasis on phylogenetic > formats, which is his field. I joined his project early this year and > added some methods to read various protein formats. This is still the > version that you can doenload from the website. Then in the summer > John Timmer and Alex Griekspoor (mek from mekentosj) joined and the > project started from scratch in the current setup. Peter was really > busy, so it were basically the three of us that coded what is now in > CVS. > >> As a starter, I am humbly asking one of you, whenever he/she has >> time, to summarize the different design options you had in the past >> or are stille considering for the BCSequence object (from the >> archives, I could only grab part of the debate). > > > There are two different opinions about the use of BCSequence. My own > idea is that we should have only one BCSequence class that takes care > of managing the BCSymbols in it. To identify the sequence, I proposed > we should have a symbolset member, eg dnaSymbolSet, > proteinStrictSymbolSet. These are similar to the Alphabets you find in > BioPerl and BioJava. This way you only have to keep the sequence > related code in one class, instead of every possible subclass with > small variations. The other idea, which is favored by John and Alex, > is to subclass BCSequence, and have only code that is sensible for the > specific subclass in that class. Eg a protein would never need to > calculate the GC content, or a DNA doesn't need a isoelectric point > calculator. Both designs have their advantages and disadvantages, > right now we came up with a compromise: we subclass BCSequence, but > the subclasses only contain convenience methods that call wrapper > objects (BCTools) to perform a specific action for that subclass. > > >> I know this is quite a big question, but I don't ask for too many >> details, just a quick overview of the different options and I think I >> can fill in the blanks. Then a related question is: why do you need a >> BCSequenceFactory, and not just use factory methods defined in the >> BCSequence superclass (when unknown sequence type) or subclasses >> (when known types). I should add that I have no intention to question >> any of the design decisions ;-) , and don't want to revive any past >> debate, I just want to be brought up to speed... > > The idea of a factory class is to have all code that creates sequences > in one central location, instead of spread out through various > subclasses of BCSequence. It's just a way of factoring out code into > smaller modules. The advantage is that when something changes/added in > the way a sequence is created this only has to be done in one class > (the factory class). This is also a well established design pattern, > and used in many projects. > > >> >> Thanks to whoever answer those questions, and again, the BioCocoa >> project is a great initiative, and it looks really promising :-) > > > I hope I answered you questions, feel free to ask more and hopefully > add code in a short while. > > > cheers, > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From kvddrift at earthlink.net Mon Dec 13 20:18:59 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 13 Dec 2004 20:18:59 -0500 Subject: [Biococoa-dev] sequence wrappers Message-ID: <21E05431-4D6E-11D9-823A-003065A5FDCC@earthlink.net> Hi, How about the following change in our class structure to accomodate the sequence wrapper nomenclature: We create a new, bare sequence class, which only function is to maintain an array of BCSymbols. A possible name is BCSimpleSequence or BCSymbolList, or whatever. Part of the current code in BCSequence could go in that class. Then we subclass that to BCSequence, which has annotations and features. This is then subclassed to the current subclasses. One advantage: no long ugly class names such as BCSequenceProteinAnnotated, etc. cheers, - Koen. From mek at mekentosj.com Tue Dec 14 09:20:58 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Tue, 14 Dec 2004 15:20:58 +0100 Subject: [Biococoa-dev] sequence wrappers In-Reply-To: <21E05431-4D6E-11D9-823A-003065A5FDCC@earthlink.net> References: <21E05431-4D6E-11D9-823A-003065A5FDCC@earthlink.net> Message-ID: <5FD2809C-4DDB-11D9-B50B-000D93AE89A4@mekentosj.com> That's a good idea Koen, if the others agree feel free to implement that... I'm strongly in favour of the BCSymbolList (in analogy with the biojava framework). Another step closer to a single sequence class, haha ;-) Alex Op 14-dec-04 om 2:18 heeft Koen van der Drift het volgende geschreven: > Hi, > > How about the following change in our class structure to accomodate > the sequence wrapper nomenclature: > > We create a new, bare sequence class, which only function is to > maintain an array of BCSymbols. A possible name is BCSimpleSequence or > BCSymbolList, or whatever. Part of the current code in BCSequence > could go in that class. Then we subclass that to BCSequence, which has > annotations and features. This is then subclassed to the current > subclasses. One advantage: no long ugly class names such as > BCSequenceProteinAnnotated, etc. > > > cheers, > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* From peter.schols at bio.kuleuven.ac.be Tue Dec 14 09:29:48 2004 From: peter.schols at bio.kuleuven.ac.be (Peter Schols) Date: Tue, 14 Dec 2004 15:29:48 +0100 Subject: [Biococoa-dev] sequence wrappers In-Reply-To: <5FD2809C-4DDB-11D9-B50B-000D93AE89A4@mekentosj.com> References: <21E05431-4D6E-11D9-823A-003065A5FDCC@earthlink.net> <5FD2809C-4DDB-11D9-B50B-000D93AE89A4@mekentosj.com> Message-ID: <9BAE62A2-4DDC-11D9-A0BD-00039345483C@bio.kuleuven.ac.be> Koen, this approach seems to make sense to me, although I'm not a specialist in this field. cheers, peter On 14 Dec 2004, at 15:20, Alexander Griekspoor wrote: > That's a good idea Koen, if the others agree feel free to implement > that... I'm strongly in favour of the BCSymbolList (in analogy with > the biojava framework). Another step closer to a single sequence > class, haha ;-) > Alex > > Op 14-dec-04 om 2:18 heeft Koen van der Drift het volgende geschreven: > >> Hi, >> >> How about the following change in our class structure to accomodate >> the sequence wrapper nomenclature: >> >> We create a new, bare sequence class, which only function is to >> maintain an array of BCSymbols. A possible name is BCSimpleSequence >> or BCSymbolList, or whatever. Part of the current code in BCSequence >> could go in that class. Then we subclass that to BCSequence, which >> has annotations and features. This is then subclassed to the current >> subclasses. One advantage: no long ugly class names such as >> BCSequenceProteinAnnotated, etc. >> >> >> cheers, >> >> - Koen. >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> >> > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > The requirements said: Windows 2000 or better. > So I got a Macintosh. > > ********************************************************* > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From jtimmer at bellatlantic.net Tue Dec 14 10:33:52 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 14 Dec 2004 10:33:52 -0500 Subject: [Biococoa-dev] sequence wrappers In-Reply-To: <21E05431-4D6E-11D9-823A-003065A5FDCC@earthlink.net> Message-ID: In terms of design of this class, how will it differ from a regular NSMutableArray? The existing class isn't too much more complex than an array other than in terms of the sequence-type specific code. I'm not trying to be argumentative, but more performance oriented. If we're only adding a couple of methods, there should be much less overhead to adding a category to NSMutableArray (you wouldn't be calling methods through as many objects to get to the actual sequence array). That said, the idea of having annotations and features be a part of the sequence specific subclasses allows some interesting possibilities, so I am intrigued.... JT > How about the following change in our class structure to accomodate the > sequence wrapper nomenclature: > > We create a new, bare sequence class, which only function is to > maintain an array of BCSymbols. A possible name is BCSimpleSequence or > BCSymbolList, or whatever. Part of the current code in BCSequence could > go in that class. Then we subclass that to BCSequence, which has > annotations and features. This is then subclassed to the current > subclasses. One advantage: no long ugly class names such as > BCSequenceProteinAnnotated, etc. > _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Tue Dec 14 17:13:05 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 14 Dec 2004 17:13:05 -0500 Subject: [Biococoa-dev] sequence wrappers In-Reply-To: References: Message-ID: <5408341C-4E1D-11D9-823A-003065A5FDCC@earthlink.net> On Dec 14, 2004, at 10:33 AM, John Timmer wrote: > In terms of design of this class, how will it differ from a regular > NSMutableArray? It's not intended to be a subclass of NSMutableArray, but it has an NSMutableArray as it's only member. The idea is to have a bare-bones class that *only* maintains an array of BCSymbols but without all the overhead of features and annotations. These can be handy for searching, and specific calculations. So it could have the following methods from the current BCSequence: initWithArray symbolAtIndex subSequenceStringInRange etc. but not: sequenceType symbolSet name The current BCSequence will be a subclass of this new class, and will have members for features and annotations. > > That said, the idea of having annotations and features be a part of the > sequence specific subclasses allows some interesting possibilities, so > I am > intrigued.... Well, that can be done with or without the BCSymbolList anyway. - Koen. From kvddrift at earthlink.net Tue Dec 14 17:14:03 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 14 Dec 2004 17:14:03 -0500 Subject: [Biococoa-dev] sequence wrappers In-Reply-To: <5FD2809C-4DDB-11D9-B50B-000D93AE89A4@mekentosj.com> References: <21E05431-4D6E-11D9-823A-003065A5FDCC@earthlink.net> <5FD2809C-4DDB-11D9-B50B-000D93AE89A4@mekentosj.com> Message-ID: <76A0CB60-4E1D-11D9-823A-003065A5FDCC@earthlink.net> On Dec 14, 2004, at 9:20 AM, Alexander Griekspoor wrote: > I'm strongly in favour of the BCSymbolList (in analogy with the > biojava framework). Another step closer to a single sequence class, > haha ;-) LOL, I really didn't propose it for that reason ;-) - Koen. From kvddrift at earthlink.net Wed Dec 15 20:22:53 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 15 Dec 2004 20:22:53 -0500 Subject: [Biococoa-dev] sequence wrappers In-Reply-To: <5408341C-4E1D-11D9-823A-003065A5FDCC@earthlink.net> References: <5408341C-4E1D-11D9-823A-003065A5FDCC@earthlink.net> Message-ID: <0226311C-4F01-11D9-A5E0-003065A5FDCC@earthlink.net> On Dec 14, 2004, at 5:13 PM, Koen van der Drift wrote: > So it could have the following methods from the current BCSequence: > > initWithArray > symbolAtIndex > subSequenceStringInRange > > etc. > > but not: > > sequenceType > symbolSet > name > Actually, it should have sequenceType and symbolSet. Just not the annotations and features. So basically it's the same as the current BCSequence. The current BCSequence will then be promoted to a BCAnnotatedSequence, but without the name change. I also propose to rename sequenceArray to symbolArray. - Koen. From kvddrift at earthlink.net Thu Dec 16 06:19:34 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 16 Dec 2004 06:19:34 -0500 Subject: [Biococoa-dev] sequence wrappers In-Reply-To: <3BFF72E7-4F2E-11D9-8D5C-000D93AE89A4@mekentosj.com> References: <5408341C-4E1D-11D9-823A-003065A5FDCC@earthlink.net> <0226311C-4F01-11D9-A5E0-003065A5FDCC@earthlink.net> <3BFF72E7-4F2E-11D9-8D5C-000D93AE89A4@mekentosj.com> Message-ID: <5D3A5B56-4F54-11D9-A5E0-003065A5FDCC@earthlink.net> On Dec 16, 2004, at 1:46 AM, Alexander Griekspoor wrote: >> Actually, it should have sequenceType and symbolSet. Just not the >> annotations and features. So basically it's the same as the current >> BCSequence. The current BCSequence will then be promoted to a >> BCAnnotatedSequence, but without the name change. > Exactly, that would be the idea; it's more a renaming of the current > BCSequence while we shifted the BCAnnotatedSequence in between the > original BCSequence and the BCAnnotatedSequence. Again, the real > question is whether we want the BCAnnotatedSequence be higher or lower > in the hierarchy compared to BCSequence.... > Not sure what you mean here Alex, I thought the idea was to rename the yet unexisting BCAnnotatedSequence to BCSequence, so the new hiercharchy would be: BCSymbolList -> BCSequence -> BCSequenceDNA, etc instead of: BCSequence -> BCAnnotatedSequence -> BCAnnotatedSequenceDNA, etc - Koen. From kvddrift at earthlink.net Thu Dec 16 07:26:53 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 16 Dec 2004 07:26:53 -0500 Subject: [Biococoa-dev] sequence wrappers In-Reply-To: <825AC9EA-4F56-11D9-8D5C-000D93AE89A4@mekentosj.com> References: <5408341C-4E1D-11D9-823A-003065A5FDCC@earthlink.net> <0226311C-4F01-11D9-A5E0-003065A5FDCC@earthlink.net> <3BFF72E7-4F2E-11D9-8D5C-000D93AE89A4@mekentosj.com> <5D3A5B56-4F54-11D9-A5E0-003065A5FDCC@earthlink.net> <825AC9EA-4F56-11D9-8D5C-000D93AE89A4@mekentosj.com> Message-ID: On Dec 16, 2004, at 6:34 AM, Alexander Griekspoor wrote: > Yes, that's indeed the idea, but I question whether we want: >> BCSymbolList -> BCSequence -> BCSequenceDNA, etc > > or: > > BCAnnotatedSequence as a wrapper object instead of subclassing > > I wouldn't make it a wrapper, but have separate BCAnnotations and BCFeatures objects. These could be wrappers to an NSMutableDictionary that holds all the info. BCSequence would then be a subclass of BCSymbolList with an additional BCAnnotations and BCFeatures member. - Koen. From mek at mekentosj.com Thu Dec 16 07:34:24 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 16 Dec 2004 13:34:24 +0100 Subject: [Biococoa-dev] sequence wrappers In-Reply-To: References: <5408341C-4E1D-11D9-823A-003065A5FDCC@earthlink.net> <0226311C-4F01-11D9-A5E0-003065A5FDCC@earthlink.net> <3BFF72E7-4F2E-11D9-8D5C-000D93AE89A4@mekentosj.com> <5D3A5B56-4F54-11D9-A5E0-003065A5FDCC@earthlink.net> <825AC9EA-4F56-11D9-8D5C-000D93AE89A4@mekentosj.com> Message-ID: Hmm, that doesn't make sense to me, BCAnnotation and BCFeature objects yes, but not special BCAnnotationS and BCFeatureS objects. These can simply be contained in an array or dictionary, no need for a collection object IMHO... Alex Op 16-dec-04 om 13:26 heeft Koen van der Drift het volgende geschreven: > > On Dec 16, 2004, at 6:34 AM, Alexander Griekspoor wrote: > >> Yes, that's indeed the idea, but I question whether we want: >>> BCSymbolList -> BCSequence -> BCSequenceDNA, etc >> >> or: >> >> BCAnnotatedSequence as a wrapper object instead of subclassing >> >> > > I wouldn't make it a wrapper, but have separate BCAnnotations and > BCFeatures objects. These could be wrappers to an NSMutableDictionary > that holds all the info. BCSequence would then be a subclass of > BCSymbolList with an additional BCAnnotations and BCFeatures member. > > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From kvddrift at earthlink.net Thu Dec 16 07:46:35 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 16 Dec 2004 07:46:35 -0500 Subject: [Biococoa-dev] sequence wrappers In-Reply-To: References: <5408341C-4E1D-11D9-823A-003065A5FDCC@earthlink.net> <0226311C-4F01-11D9-A5E0-003065A5FDCC@earthlink.net> <3BFF72E7-4F2E-11D9-8D5C-000D93AE89A4@mekentosj.com> <5D3A5B56-4F54-11D9-A5E0-003065A5FDCC@earthlink.net> <825AC9EA-4F56-11D9-8D5C-000D93AE89A4@mekentosj.com> Message-ID: <854A88F8-4F60-11D9-A5E0-003065A5FDCC@earthlink.net> On Dec 16, 2004, at 7:34 AM, Alexander Griekspoor wrote: > Hmm, that doesn't make sense to me, BCAnnotation and BCFeature objects > yes, but not special BCAnnotationS and BCFeatureS objects. These can > simply be contained in an array or dictionary, no need for a > collection object IMHO... > The advantage would be that we can have more descriptive convenience methods: addFeature: (id)feature forKey: (id)key instead of the less descriptive method from NSMutableDictionary: - (void)setObject:(id)anObject forKey:(id)aKey BTW, what's the difference between BCFeature and BCFeatureS ? - Koen. From mek at mekentosj.com Thu Dec 16 08:00:26 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 16 Dec 2004 14:00:26 +0100 Subject: Fwd: [Biococoa-dev] sequence wrappers Message-ID: <74D08E1A-4F62-11D9-8D5C-000D93AE89A4@mekentosj.com> > > On Dec 16, 2004, at 7:34 AM, Alexander Griekspoor wrote: > >> Hmm, that doesn't make sense to me, BCAnnotation and BCFeature >> objects yes, but not special BCAnnotationS and BCFeatureS objects. >> These can simply be contained in an array or dictionary, no need for >> a collection object IMHO... >> > > The advantage would be that we can have more descriptive convenience > methods: > > addFeature: (id)feature forKey: (id)key > > instead of the less descriptive method from NSMutableDictionary: > > - (void)setObject:(id)anObject forKey:(id)aKey > No, those should be part of the annotated sequence object! It then internally uses the normal mutablearray methods to add it to the list (thus hidden from the user): [myBCAnnotatedSequence addFeature: myFeature forKey: myKey]; I don't see why you would want to do all this outside the context of the annotatedSequence. You can always make the array of the BCAnnotatedSequence accessible (either directly or not) through accessors, but also those can be made nicely by adding these methods to the BCAnnotatedSequence class which again internally use a classical NSMutableArray: addFeature: (id)feature forKey: (id)key addFeatures: (NSArray *)features forKeys: (id)key or the addFeaturesWithKeys (Null terminated list) removeFeature: (id)feature forKey: (id)key (NSArray *)features etc > BTW, what's the difference between BCFeature and BCFeatureS ? Tell me if I'm wrong but you want to have - a bcfeature object, respresenting a single feature AND - a bcfeatures object, representing a collection of features I strongly argue to only implement the first and dump the second... Alex > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From mek at mekentosj.com Thu Dec 16 10:53:35 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 16 Dec 2004 16:53:35 +0100 Subject: Fwd: [Biococoa-dev] sequence wrappers Message-ID: >>> I strongly argue to only implement the first and dump the second... >> >> >> I disagree :). I favor to have just one BCFeatures object (which >> maintains a dictionary) instead of many BCFeature objects, each with >> one key-value pair. this makes it easy to acces the data: > > That exactly goes straight into your "I want lots of small objects > instead of one big" ;-) > Also, what in your vision is a feature then? How do you assign it a > range for instance without it being an object? >> >> [[mySequence features] featureForKey]; > > I would propose: [mySequence featureForKey: ] > Even simpler ;-) > Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From kvddrift at earthlink.net Thu Dec 16 13:29:06 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 16 Dec 2004 13:29:06 -0500 Subject: [Biococoa-dev] sequence wrappers In-Reply-To: References: Message-ID: <5EB1D272-4F90-11D9-AB71-003065A5FDCC@earthlink.net> On Dec 16, 2004, at 10:53 AM, Alexander Griekspoor wrote: >>>> I strongly argue to only implement the first and dump the second... >>> >>> >>> I disagree :). I favor to have just one BCFeatures object (which >>> maintains a dictionary) instead of many BCFeature objects, each with >>> one key-value pair. this makes it easy to acces the data: >> >> That exactly goes straight into your "I want lots of small objects >> instead of one big" ;-) Yes, I just realized that. I think we both agree however, to use a separate object to store features and annotations. I looked at the bioperl doc again, and I like their approach actually. I guess it also what you have in mind (perlcode): foreach my $feat_object ($seq_object->get_SeqFeatures) { print "primary tag: ", $feat_object->primary_tag, "\n"; foreach my $tag ($feat_object->get_all_tags) { print " tag: ", $tag, "\n"; foreach my $value ($feat_object->get_tag_values($tag)) { print " value: ", $value, "\n"; } } } The nice thing about this, what I didn't realize before, is that a feature itself can have a sub-feature. What they also do is store a sequence in the features object. I guess that's to store the subsequence of the feature. Not sure yet if I like that. Anyway, contrary to what I argued before, I think indeed that single BCFeature and BCAnnotation objects stored in an array (or dictionary) within a BCSequence object is the way to go. >>> >>> [[mySequence features] featureForKey]; >> >> I would propose: [mySequence featureForKey: ] >> Even simpler ;-) Yes, but in that case the BCFeature objects are stored in a dictonary, not an array. That's also a good idea. So, let's see: BCSymbolList -> has-a symbolArray // this is the previous BCSequence object | | -----BCSequence -> has-a featuresDictionary and annotationsDictionary | | ------BCSequenceDNA, etc // unchanged The featuresDictionary and annotationsDictionary have BCFeature and BCAnnotation objects as values, respectively, which can be accessed using the proper keys, using code such as: [mySequence featureForKey: ]. We're getting there! - Koen. From kvddrift at earthlink.net Fri Dec 17 14:34:45 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 17 Dec 2004 14:34:45 -0500 Subject: [Biococoa-dev] sequence wrappers In-Reply-To: <3BFF72E7-4F2E-11D9-8D5C-000D93AE89A4@mekentosj.com> References: <5408341C-4E1D-11D9-823A-003065A5FDCC@earthlink.net> <0226311C-4F01-11D9-A5E0-003065A5FDCC@earthlink.net> <3BFF72E7-4F2E-11D9-8D5C-000D93AE89A4@mekentosj.com> Message-ID: On Dec 16, 2004, at 1:46 AM, Alexander Griekspoor wrote: >> Actually, it should have sequenceType and symbolSet. Just not the >> annotations and features. So basically it's the same as the current >> BCSequence. The current BCSequence will then be promoted to a >> BCAnnotatedSequence, but without the name change. > Exactly, that would be the idea; it's more a renaming of the current > BCSequence while we shifted the BCAnnotatedSequence in between the > original BCSequence and the BCAnnotatedSequence. Again, the real > question is whether we want the BCAnnotatedSequence be higher or lower > in the hierarchy compared to BCSequence.... > >> I also propose to rename sequenceArray to symbolArray. > Yep If nobody objects I will go ahead and make the following changes this weekend: * add a new class BCSymbolList, which is the same as the current BCSequence class * remove all code from the current BCSequence class and make it a subclass of BCSymbolList * replace sequenceArray with symbolArray * replace the occurences of BCSequence throughout the code with BCSymbolList, if only a bare sequence is needed (eg for finding or MW calculating) After this is done we can start adding the code for features and annotations to BCSequence. It will involve many edits, so I want to be sure that everybody agrees with the proposed changes. cheers, - Koen. From mek at mekentosj.com Fri Dec 17 15:23:40 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 17 Dec 2004 21:23:40 +0100 Subject: [Biococoa-dev] sequence wrappers In-Reply-To: References: <5408341C-4E1D-11D9-823A-003065A5FDCC@earthlink.net> <0226311C-4F01-11D9-A5E0-003065A5FDCC@earthlink.net> <3BFF72E7-4F2E-11D9-8D5C-000D93AE89A4@mekentosj.com> Message-ID: <8A7BB64F-5069-11D9-BF6F-000D93AE89A4@mekentosj.com> No objection your honor ;-) Looking forward to the changes... Alex Op 17-dec-04 om 20:34 heeft Koen van der Drift het volgende geschreven: > > On Dec 16, 2004, at 1:46 AM, Alexander Griekspoor wrote: > >>> Actually, it should have sequenceType and symbolSet. Just not the >>> annotations and features. So basically it's the same as the current >>> BCSequence. The current BCSequence will then be promoted to a >>> BCAnnotatedSequence, but without the name change. >> Exactly, that would be the idea; it's more a renaming of the current >> BCSequence while we shifted the BCAnnotatedSequence in between the >> original BCSequence and the BCAnnotatedSequence. Again, the real >> question is whether we want the BCAnnotatedSequence be higher or >> lower in the hierarchy compared to BCSequence.... >> >>> I also propose to rename sequenceArray to symbolArray. >> Yep > > > If nobody objects I will go ahead and make the following changes this > weekend: > > > * add a new class BCSymbolList, which is the same as the current > BCSequence class > > * remove all code from the current BCSequence class and make it a > subclass of BCSymbolList > > * replace sequenceArray with symbolArray > > * replace the occurences of BCSequence throughout the code with > BCSymbolList, if only a bare sequence is needed (eg for finding or MW > calculating) > > > After this is done we can start adding the code for features and > annotations to BCSequence. > > > It will involve many edits, so I want to be sure that everybody agrees > with the proposed changes. > > > cheers, > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From kvddrift at earthlink.net Fri Dec 17 16:02:43 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 17 Dec 2004 16:02:43 -0500 Subject: [Biococoa-dev] sequence wrappers In-Reply-To: <8A7BB64F-5069-11D9-BF6F-000D93AE89A4@mekentosj.com> References: <5408341C-4E1D-11D9-823A-003065A5FDCC@earthlink.net> <0226311C-4F01-11D9-A5E0-003065A5FDCC@earthlink.net> <3BFF72E7-4F2E-11D9-8D5C-000D93AE89A4@mekentosj.com> <8A7BB64F-5069-11D9-BF6F-000D93AE89A4@mekentosj.com> Message-ID: On Dec 17, 2004, at 3:23 PM, Alexander Griekspoor wrote: > No objection your honor ;-) Looking forward to the changes... > At your service. The first three steps are now in cvs. So there will be some compiler warnings and broken code. I will do the rest later today or tomorrow. - Koen. > Op 17-dec-04 om 20:34 heeft Koen van der Drift het volgende geschreven: > >> >> On Dec 16, 2004, at 1:46 AM, Alexander Griekspoor wrote: >> >>>> Actually, it should have sequenceType and symbolSet. Just not the >>>> annotations and features. So basically it's the same as the current >>>> BCSequence. The current BCSequence will then be promoted to a >>>> BCAnnotatedSequence, but without the name change. >>> Exactly, that would be the idea; it's more a renaming of the current >>> BCSequence while we shifted the BCAnnotatedSequence in between the >>> original BCSequence and the BCAnnotatedSequence. Again, the real >>> question is whether we want the BCAnnotatedSequence be higher or >>> lower in the hierarchy compared to BCSequence.... >>> >>>> I also propose to rename sequenceArray to symbolArray. >>> Yep >> >> >> If nobody objects I will go ahead and make the following changes this >> weekend: >> >> >> * add a new class BCSymbolList, which is the same as the current >> BCSequence class >> >> * remove all code from the current BCSequence class and make it a >> subclass of BCSymbolList >> >> * replace sequenceArray with symbolArray >> >> * replace the occurences of BCSequence throughout the code with >> BCSymbolList, if only a bare sequence is needed (eg for finding or MW >> calculating) >> >> >> After this is done we can start adding the code for features and >> annotations to BCSequence. >> >> >> It will involve many edits, so I want to be sure that everybody >> agrees with the proposed changes. >> >> >> cheers, >> >> - Koen. >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> >> > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > The requirements said: Windows 2000 or better. > So I got a Macintosh. > > ********************************************************* > > > ************************************************************** > ** Alexander Griekspoor ** > ************************************************************** > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > MacOS X: The power of UNIX with the simplicity of the Mac > > *************************************************************** > From kvddrift at earthlink.net Fri Dec 17 20:36:02 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 17 Dec 2004 20:36:02 -0500 Subject: [Biococoa-dev] BCSequenceTool Message-ID: <2D923B88-5095-11D9-A684-003065A5FDCC@earthlink.net> Hi, While converting a bunch of files in the BCTools hierarchy, I noticed that they all have the same code to set the symbolList/sequence they act on. So I made a new class, BCSequenceTool and put all the shared code in it. It's only an abstract class, so it doesn't do much by itself, but all tools that act on a symbollist should subclass it. For the time being, I put it in BCTools, but it can be moved. Now most tools actuallly act on a symbol list, we might need to rethink the naming and organization of the files under BCTools. BTW, I also figured out how to change the project.pbxproj files from within Xcode. Go the left column, and open the SCM triangle. There you see all the files that are marked with a cvs tag, including the ones inside BioCocoa.pbproj. cheers, - Koen. From kvddrift at earthlink.net Sun Dec 19 10:59:02 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 19 Dec 2004 10:59:02 -0500 Subject: [Biococoa-dev] BCSequenceTool In-Reply-To: <2D923B88-5095-11D9-A684-003065A5FDCC@earthlink.net> References: <2D923B88-5095-11D9-A684-003065A5FDCC@earthlink.net> Message-ID: On Dec 17, 2004, at 8:36 PM, Koen van der Drift wrote: > Hi, > > While converting a bunch of files in the BCTools hierarchy, I noticed > that they all have the same code to set the symbolList/sequence they > act on. So I made a new class, BCSequenceTool and put all the shared > code in it. It's only an abstract class, so it doesn't do much by > itself, but all tools that act on a symbollist should subclass it. And we should probably also add BCToolTranslatorDNA as a subclass, too. - Koen. From kvddrift at earthlink.net Thu Dec 23 16:45:13 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 23 Dec 2004 16:45:13 -0500 Subject: [Biococoa-dev] TODO Message-ID: Hi, I added a TODO file in the Resources folder. Please have a look and feel free to comment (on the list) Hope you all have a good holiday. cheers, - Koen. From charles.parnot at stanford.edu Thu Dec 23 18:48:03 2004 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Thu, 23 Dec 2004 15:48:03 -0800 Subject: [Biococoa-dev] Sequence factory Message-ID: Hi, it's me again :-) OK, I have been looking at the code for a while, and I am now a fervent reader of the mailing list. It looks like the project is very active and quite exciting. I like most of the design choices that were made (regarding the 'debate', I have to say I like having BCSequence subclasses), and it looks like the Tools section is going in the right direction. Now, there is still one thing that I dislike (arghh!), and this is really something that stroke me at the beginning, when I was looking at the header files, and looking at Biococoa from the perspective of a USER (so as a developer using BioCocoa, not a developer developing BioCocoa, I am sure you all see the big difference;-). In my short life as a programmer, I have mostly used Cocoa, and I am very used to the design patterns of the Apple's framework and of course, love them. If I was ever to develop an application based on BioCocoa, I would want to be able to type (or the same with BCSymbolList): mySeq=[BCSequence sequenceWithString:@"AGTAGATTTGAGGT"]; but I would hate to have to go through the doc and find out I have to do instead: factory=[BCSequenceFactory sharedSequenceFactory]; mySeq=[factory sequenceWithString:@"AGTAGATTTGAGGT"]; Why do I need 2 classes? And I have to do this every time I use a sequence, which is going to happen very often? Why could not it work like NSString, NSNumber, or any other class, and have factory methods built into the class itself?? The reason that Koen advanced is that this allows to have all the code in one central location (to make the life of the BioCocoa developer easier.. but not necessarily the life of the developer using BioCocoa!). I think it does not need a separate class to get that separation. The superclass can easily take care of the all the factory methods, because it can return instances of the subclasses as necessary, so the code can be all in the superclass. In addition, the implementation can still be put in a separate file with the use of a category (the interfaces of the main implementation and of the category should however probably be combined in one header file so that it is easier to import all the methods at once). In fact, the code does not even need to be changed, except to replace the name BCSequenceFactory with the category BCSequence (BCSequenceFactory) and tweak the headers and imports. In a way, BCSymbolList would look a little bit like a class cluster, except there is no need for a placeholder class (actually, this could also be implemented to automagically take care of -(id)initWithString when called on the superclass), and except some of the subclasses would be public (if somebody using BioCocoa wants to use more static typing and catch more problems at compile time). I can elaborate more and give a possible implementation (which would mostly be some copy and paste), if you don't already all hate me! In fact, some of your factory methods are inside the subclasses and could be all put in the proposed superclass category, like 'dnaSequenceWithString:', etc... In a way, the factory code is thus already quite spread out (this would have to be in the TODO list anyway and could anyway be taken care of with the current design). Remember this is just a comment of a BioCocoa newbie that may miss some of the subtleties... and well, I could still live with the current design! Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From mek at mekentosj.com Thu Dec 23 19:49:15 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Fri, 24 Dec 2004 01:49:15 +0100 Subject: [Biococoa-dev] Sequence factory In-Reply-To: References: Message-ID: Hi Charles, To begin with the last part: > I can elaborate more and give a possible implementation (which would > mostly be some copy and paste), if you don't already all hate me! Absolutely not, I think we all agree that the more people share their opinion the better BioCocoa will be in the end. It's actually very nice to have someone come in with the "User" perspective at this stage, especially now we're having a bit of trouble deciding the best strategy to go for sequence handling. > In fact, some of your factory methods are inside the subclasses and > could be all put in the proposed superclass category, like > 'dnaSequenceWithString:', etc... In a way, the factory code is thus > already quite spread out (this would have to be in the TODO list > anyway and could anyway be taken care of with the current design). You're right, and yes go ahead, all ideas are welcome! > About the following: > In my short life as a programmer, I have mostly used Cocoa, and I am > very used to the design patterns of the Apple's framework and of > course, love them. If I was ever to develop an application based on > BioCocoa, I would want to be able to type (or the same with > BCSymbolList): > mySeq=[BCSequence sequenceWithString:@"AGTAGATTTGAGGT"]; > > but I would hate to have to go through the doc and find out I have to > do instead: > factory=[BCSequenceFactory sharedSequenceFactory]; > mySeq=[factory sequenceWithString:@"AGTAGATTTGAGGT"]; > > Why do I need 2 classes? And I have to do this every time I use a > sequence, which is going to happen very often? Why could not it work > like NSString, NSNumber, or any other class, and have factory methods > built into the class itself?? That was exactly my problem with this approach as well. Yet, it has some clear advantages, code centralization being the most important, but also think about caching (my favorite one is restriction enzyme analysis, a (shared/factory) object could initialize 600 enzymes, which can simply be kept around as long as you need. In contrast bringing this into a sequence object for instance would mean that you have to reinitialize the enzyme plist again and again. Now, coming back to your remark about the code. My proposal was (and still is) to add these kind of (standard apple-like) methods to the BCSequence class as convenience methods. You would simply call: mySeq=[BCSequence sequenceWithString:@"AGTAGATTTGAGGT"]; and behind the scene this (in this case class) method would invoke: factory=[BCSequenceFactory sharedSequenceFactory]; mySeq=[factory sequenceWithString:@"AGTAGATTTGAGGT"]; The best of both worlds. The User won't notice the difference, except that he now has the option to choose for simplicity or to optimize things if needed (for instance retaining the tool object). > In a way, BCSymbolList would look a little bit like a class cluster, > except there is no need for a placeholder class (actually, this could > also be implemented to automagically take care of -(id)initWithString > when called on the superclass), and except some of the subclasses > would be public (if somebody using BioCocoa wants to use more static > typing and catch more problems at compile time). Could you comment a bit more on this Charles? It's not entirely clear to me what you mean. > Remember this is just a comment of a BioCocoa newbie that may miss > some of the subtleties... and well, I could still live with the > current design! We didn't start so long ago either ;-) And now that I'm at it anyway, have a great Christmas holiday everyone! Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From kvddrift at earthlink.net Thu Dec 23 20:10:42 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 23 Dec 2004 20:10:42 -0500 Subject: [Biococoa-dev] Sequence factory In-Reply-To: References: Message-ID: On Dec 23, 2004, at 7:49 PM, Alexander Griekspoor wrote: >> Why do I need 2 classes? And I have to do this every time I use a >> sequence, which is going to happen very often? Why could not it work >> like NSString, NSNumber, or any other class, and have factory methods >> built into the class itself?? > > That was exactly my problem with this approach as well. Yet, it has > some clear advantages, code centralization being the most important, > but also think about caching (my favorite one is restriction enzyme > analysis, a (shared/factory) object could initialize 600 enzymes, > which can simply be kept around as long as you need. In contrast > bringing this into a sequence object for instance would mean that you > have to reinitialize the enzyme plist again and again. > Now, coming back to your remark about the code. My proposal was (and > still is) to add these kind of (standard apple-like) methods to the > BCSequence class as convenience methods. You would simply call: > mySeq=[BCSequence sequenceWithString:@"AGTAGATTTGAGGT"]; > and behind the scene this (in this case class) method would invoke: > factory=[BCSequenceFactory sharedSequenceFactory]; > mySeq=[factory sequenceWithString:@"AGTAGATTTGAGGT"]; > The best of both worlds. The User won't notice the difference, except > that he now has the option to choose for simplicity or to optimize > things if needed (for instance retaining the tool object). The original reason to put in the factory class was to have a central object that figures out what type of sequence we're dealing with when reading files. I agree here with Alex, his proposal is definitely a way to move forward. I guess the factory code is still pretty new in BioCocoa, so not all parts of it have yet been coded (wink, wink ;-). - Koen. From kvddrift at earthlink.net Fri Dec 24 11:10:38 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 24 Dec 2004 11:10:38 -0500 Subject: [Biococoa-dev] Sequence factory In-Reply-To: References: Message-ID: <59DDFC6E-55C6-11D9-9FB9-003065A5FDCC@earthlink.net> On Dec 23, 2004, at 7:49 PM, Alexander Griekspoor wrote: > Now, coming back to your remark about the code. My proposal was (and > still is) to add these kind of (standard apple-like) methods to the > BCSequence class as convenience methods. You would simply call: > mySeq=[BCSequence sequenceWithString:@"AGTAGATTTGAGGT"]; > and behind the scene this (in this case class) method would invoke: > factory=[BCSequenceFactory sharedSequenceFactory]; > mySeq=[factory sequenceWithString:@"AGTAGATTTGAGGT"]; > I thought a little bit more about how to implement this. Which class should actually contain this code? Right now the code + (BCSequenceDNA *) dnaSequenceWithString: (NSString *)entry skippingNonBases: (BOOL)skip { BCSequenceDNA *theReturn = [[BCSequenceDNA alloc] initWithString: entry skippingNonBases: skip]; return [theReturn autorelease]; } is already in place in BCSequenceDNA (and similar ones in other subclasses of BCSequence). Alex, are you proposing to replace this with somthing like: + (BCSequenceDNA *) dnaSequenceWithString: (NSString *)entry skippingNonBases: (BOOL)skip { BCSequenceFactory *factory = [BCSequenceFactory sharedSequenceFactory]; BCSequenceDNA *theReturn = [factory sequenceWithString:@"AGTAGATTTGAGGT"]; return [theReturn autorelease]; } Sound fine with me, just checking if that is what you had in mind :) Although I am not sure how to deal with the skippingNonBases part. Probably that should be replaced by passing the right symbol set instead? - Koen. From jtimmer at bellatlantic.net Fri Dec 24 12:28:35 2004 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 24 Dec 2004 12:28:35 -0500 Subject: [Biococoa-dev] Sequence factory In-Reply-To: <59DDFC6E-55C6-11D9-9FB9-003065A5FDCC@earthlink.net> Message-ID: > Sound fine with me, just checking if that is what you had in mind :) > Although I am not sure how to deal with the skippingNonBases part. > Probably that should be replaced by passing the right symbol set > instead? Chiming in from just north of London - happy holidays all! Skipping non-bases just means whether or not to allow undefined symbols in with your valid bases or not. I don't know how symbol sets would create the equivalent of that, and I can't remember whether that code allows gaps or not. I know it allows ambiguous bases - there's no "strict" version. Basically just wanted to be clear on what I was doing when I wrote the method, just in case it gets replaced. One little question on factories for my own edification - in other code I wrote, I decided when to make a factory based on the number of objects I had to create in order to get something to work. If I needed a lot of objects, it was best to create them only once, so I made a factory. It seems that the main decision on when to use one here is based on modularity of code, rather than practical time/memory issues. Is this accurate? Now back to my regularly scheduled vacation.... JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Fri Dec 24 13:31:19 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 24 Dec 2004 13:31:19 -0500 Subject: [Biococoa-dev] Sequence factory In-Reply-To: References: Message-ID: <0139BD38-55DA-11D9-9FB9-003065A5FDCC@earthlink.net> On Dec 24, 2004, at 12:28 PM, John Timmer wrote: > Skipping non-bases just means whether or not to allow undefined > symbols in > with your valid bases or not. I don't know how symbol sets would > create the > equivalent of that, and I can't remember whether that code allows gaps > or > not. I know it allows ambiguous bases - there's no "strict" version. What I meant was, how do we pass the skippingNonBases value to the BCSequenceFactory and then back to initWithFoo. If we use symbolsets, then we could eg have a 'strict' symbol set, which only allows ambiguous and unambiguous symbols, and a non-strict symbol set which also allows gaps and undefined symbols. Right now BCSymbolSet only supplies strict symbolsets, but it shouldn't be a problem to add one that also contains gap and undefined. > > Basically just wanted to be clear on what I was doing when I wrote the > method, just in case it gets replaced. > > One little question on factories for my own edification - in other > code I > wrote, I decided when to make a factory based on the number of objects > I had > to create in order to get something to work. If I needed a lot of > objects, > it was best to create them only once, so I made a factory. John, I am confused which 'factory' you are referring to. Do you mean BCSequenceFactory? > It seems that > the main decision on when to use one here is based on modularity of > code, > rather than practical time/memory issues. Is this accurate? The idea is to *always* use BCSequenceFactory, although it is hidden from the users, by the convenience methods that Alex proposed. BTW, what is you guys opinion on adding a BCSymbolListFactory class as well? - Koen. From charles.parnot at stanford.edu Sat Dec 25 18:56:16 2004 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sat, 25 Dec 2004 15:56:16 -0800 Subject: [Biococoa-dev] Sequence factory Message-ID: I hope I will address of all the points you raised. The bottom line is I am still not convinced that a separate BCSequenceFactory is needed, as you will see! But I am anyway glad that you want to have factory methods for the BCSequence object and make the life of the BioCocoa user easier. 1. More about BCSequenceFactory At 1:49 +0100 12/24/04, Alexander Griekspoor wrote: >That was exactly my problem with this approach as well. Yet, it has some clear advantages, code centralization being the most important, but also think about caching (my favorite one is restriction enzyme analysis, a (shared/factory) object could initialize 600 enzymes, which can simply be kept around as long as you need. In contrast bringing this into a sequence object for instance would mean that you have to reinitialize the enzyme plist again and again. Whichever way you do it, you have to create a sequence object. Now if you need some instance-independent stuff, like a list of enzymes, it can be handled many different ways and does not have to be loaded every time you create an object; it can be cached in another class or within the class. Caching in another class could use the shared singleton pattern, so you could simply have a special class to hold the information that needs to be kept around, and that can still be created lazily (for instance, a separate class dealing with enzymes would certainly be useful). Or the cached stuff could stay inside the class implementation, using the equivalent of 'class instance variables' that can be created with static variables private to the implementation file. Actually, a relevant case is that of enzymes. If the user tries to create an enzyme that has already been created, the factory method (a class method such as '+(BCEnzyme *)EcoRI') or even the 'init' method (an instance method like '-(id)initWithName:(NSString *)name') would return the cached BCEnzyme instance that has already been created. The the BCEnzyme.m implementation file would have a static NSDictionary with the current instances already created. 2. Again about BCSequenceFactory At 1:49 +0100 12/24/04, Alexander Griekspoor wrote: >You would simply call: > mySeq=[BCSequence sequenceWithString:@"AGTAGATTTGAGGT"]; >and behind the scene this (in this case class) method would invoke: > factory=[BCSequenceFactory sharedSequenceFactory]; > mySeq=[factory sequenceWithString:@"AGTAGATTTGAGGT"]; >The best of both worlds. The User won't notice the difference, except that he now has the option to choose for simplicity or to optimize things if needed (for instance retaining the tool object). Of course, this is a way to go and keep the existing pattern. It is not exactly the best of the two worlds, though, because now you have some code dependency. Each of the BCSequence factory methods have to have a counterpart in BCSequenceFactory. If you change the name of one BCSequenceFactory method, you have to change the code in BCSequence. Ah, ah! ;-) OK, not such a big deal, but the more code, the more bugs... 3. About BCSymbolListFactory At 1:31 PM -0500 12/24/04, Koen van der Drift wrote: >BTW, what is you guys opinion on adding a BCSymbolListFactory class as well? Do you mean replacing BCSequenceFactory with BCSymbolListFactory, or do you mean having two separate classes? Having two separate classes seems a bit too much, no? Having all the members of a class tree created in the same entity seems more appropriate to me. 4. An additional note about the factory methods of BCSequence, BCSequenceDNA, ... This idea is elevant whatever the chosen pattern is, BCSequenceFactory or not. At 11:10 AM -0500 12/24/04, Koen van der Drift wrote: >I thought a little bit more about how to implement this. Which class should actually contain this code? Right now the code > > + (BCSequenceDNA *) dnaSequenceWithString: (NSString *)entry skippingNonBases: (BOOL)skip { > BCSequenceDNA *theReturn = [[BCSequenceDNA alloc] initWithString: entry skippingNonBases: skip]; > return [theReturn autorelease]; > } > >is already in place in BCSequenceDNA (and similar ones in other subclasses of BCSequence). To avoid having factory methods spread out in the superclass and the subclasses, you could keep them all in the superclass (they could still return instances of the subclasses). Actually, maybe this would be a bit extreme, and that could confuse the user of the framework, but it is something to think about. That would actually be one step closer to a 'class cluster' pattern... (read below) 5. Now about class cluster and dynamic vs static typing. This discussion is relevant whatever the chosen pattern is, BCSequenceFactory or not. At 20:10 -0500 12/23/04, Koen van der Drift wrote: >The original reason to put in the factory class was to have a central object that figures out what type of sequence we're dealing with when reading files. At 1:49 +0100 12/24/04, Alexander Griekspoor wrote: >At 15:48 -0800 12/23/04, Charles PARNOT wrote: >>In a way, BCSymbolList would look a little bit like a class cluster, except there is no need for a placeholder class (actually, this could also be implemented to automagically take care of -(id)initWithString when called on the superclass), and except some of the subclasses would be public (if somebody using BioCocoa wants to use more static typing and catch more problems at compile time). >Could you comment a bit more on this Charles? It's not entirely clear to me what you mean. I started talking about class cluster, because you have something looking like it going on. And I have now thought a little more about it. So here are my (very deep!) thoughts... First, here is how I would define a class cluster. A class cluster looks like a single class and has a unique public interface, so you think there is only one class of object, but in fact, under the hood, there is one public abstract superclass and there are several private subclasses that handle the different cases, which helps with optimization. So you create a object, you think it is an instance of the superclass, but in fact, if you look at it, it is an instance of one of the private subclass. Of course, it is all transparent, and it works seemlessly as if it was just one single class. For example, NSNumber has several private subclasses, each holding a different 'value' instance variable of a different type (int, or double, or etc..). This way, when you ask -stringValue or -intValue, the subclass can do educated casts. If the superclass had to do it, it would have to loop through all the different cases, which would be not very optimal, and makes difficult the addition of a new type of number. Sorry if you already know all of that and the concept of class cluster, I just want to make the rest clear. Now, why does BCSequence look like a class cluster? This is mainly because of the method +(BCSequence *)sequenceWithString:(NSString *)sequence (it does not matter whether there is a BCSequenceFactory class hidden there). This method guesses the type of sequence. It returns an object statically typed to the superclass BCSequence. But IN FACT, it is really a BCSequenceDNA, or a BCSequenceProtein, etc... So the user could think it is a BCSequence, but it is actually a instance of a subclass. At this point, the BCSequence family is not exactly a class cluster like Apple's NSString, NSNumber,... First, the superclass BCSequence is not abstract. It can be instantiated (well, I am not sure about that, but at least, BCSymbolList can, right?). Second, the subclasses are not private. For example, you can explicitely instantiate an instance of BCSequenceDNA with the method -(BCSequenceDNA *)dnaSequenceWithString: However, there is an important issue with dynamic and static typing, that also has to do with compile vs runtime error. I have seen in the mailing list that you discussed it earlier. As soon as you allow the creation of a BCSequenceDNA object that is statically typed as a BCSequence, the compiler has no way to tell that you are now manipulating a BCSequenceDNA object. If you call '-complement' on it, and you have not defined it in BCSequence, you get a warning, even though at runtime, it will be fine. That object will really be a BCSequenceDNA and will indeed respond to '-complement'. To avoid the compiler warning, you have to declare it at the superclass level. Which also means that it could be called on BCSequenceProtein without a compiler warning. Then you have to handle at runtime a call to '-complement' on a BCSequenceProtein. I know you have discussed that issue already to some extent, but I am not sure if you have decided on something. One possibility is to have BCSequence superclass accept all the methods of the subclass. Another is to let the user deal with it and have him do some type checking and some casting. Basically, if she wants to use -complement, she has to use -dnaSequenceWithString, or cast the result of -sequenceWithString to a BCSequenceDNA. This is what i refer to in my previous email: the user will use strong typing when she needs it, and will thus be able to rely on compiler warnings. It looks like this is the pattern you have chosen: if the user ever want to use '-complement', she can not use it on an object created with +(BCSequence *)sequenceWithString:(NSString *)sequence (or else has to ignore compiler warning or cast the object to BCSequenceDNA). Why am I discussing the concept of class cluster? Well, I am not sure! I just want to bring the idea, because that could have been one way to go, and you are half-way to it... OK, I will stop here. I hope this is not too confusing. And sorry if some of these questions have been already answered before, or if I am missing other aspects... Merry Christmas! Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 -- Charles Panot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Sun Dec 26 10:33:25 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 26 Dec 2004 10:33:25 -0500 Subject: [Biococoa-dev] Sequence factory In-Reply-To: References: Message-ID: <7BABEDA6-5753-11D9-9FB9-003065A5FDCC@earthlink.net> On Dec 25, 2004, at 6:56 PM, Charles PARNOT wrote: > Or the cached stuff could stay inside the class implementation, using > the equivalent of 'class instance variables' that can be created with > static variables private to the implementation file. Actually, a > relevant case is that of enzymes. If the user tries to create an > enzyme that has already been created, the factory method (a class > method such as '+(BCEnzyme *)EcoRI') or even the 'init' method (an > instance method like '-(id)initWithName:(NSString *)name') would > return the cached BCEnzyme instance that has already been created. The > the BCEnzyme.m implementation file would have a static NSDictionary > with the current instances already created. I guess this is what we already do for BCNucleotide and BCAminoAcid. Only in those cases all possible instances are created at once. For enzymes though, I would keep all info in a plist, and not hard-code any name in BioCocoa. Just because there are much more enzymes than nucleotides, and amino acids. > Of course, this is a way to go and keep the existing pattern. It is > not exactly the best of the two worlds, though, because now you have > some code dependency. Each of the BCSequence factory methods have to > have a counterpart in BCSequenceFactory. If you change the name of one > BCSequenceFactory method, you have to change the code in BCSequence. > Ah, ah! ;-) Why? No need to change the name in BCSequence, just call the existing method in BCSequence. > > > 3. About BCSymbolListFactory > > At 1:31 PM -0500 12/24/04, Koen van der Drift wrote: >> BTW, what is you guys opinion on adding a BCSymbolListFactory class >> as well? > > Do you mean replacing BCSequenceFactory with BCSymbolListFactory, or > do you mean having two separate classes? Having two separate classes > seems a bit too much, no? I meant having two separate classes. We recently introduced the BCSymbolList as a class that only holds an NSArray of BCSymbols, no other info such as name and features. And it has an identifier for the sequence-type. The BCSequence class used to be like this, but it was changed to a subclass of BCSymbolList allowing it to have features, etc. We could have kept the original BCSequence, and create a new class BCAnnotatedSequence, but we found that would result in too long names, such as BCAnnotatedDNASequence. Therefore we introduced the intermediate class BCSymbolList. Actualy a symbol list class can be very handy when doing calculations and manipulations of the sequence itself, without all the other info. > Having all the members of a class tree created in the same entity > seems more appropriate to me. What do you mean by this? > To avoid having factory methods spread out in the superclass and the > subclasses, you could keep them all in the superclass (they could > still return instances of the subclasses). Actually, maybe this would > be a bit extreme, and that could confuse the user of the framework, > but it is something to think about. That would actually be one step > closer to a 'class cluster' pattern... (read below) The idea of using a class cluster pattern is intriguing, and definitely worth more thoughts. Thanks for bringing it up, Charles. To clarify it more, could you post some code snippets here using real BioCocoa examples? cheers, - Koen. From charles.parnot at stanford.edu Mon Dec 27 00:32:25 2004 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sun, 26 Dec 2004 21:32:25 -0800 Subject: [Biococoa-dev] Sequence factory In-Reply-To: <7BABEDA6-5753-11D9-9FB9-003065A5FDCC@earthlink.net> References: <7BABEDA6-5753-11D9-9FB9-003065A5FDCC@earthlink.net> Message-ID: >On Dec 25, 2004, at 6:56 PM, Charles PARNOT wrote: > >>Or the cached stuff could stay inside the class implementation, using the equivalent of 'class instance variables' that can be created with static variables private to the implementation file. Actually, a relevant case is that of enzymes. If the user tries to create an enzyme that has already been created, the factory method (a class method such as '+(BCEnzyme *)EcoRI') or even the 'init' method (an instance method like '-(id)initWithName:(NSString *)name') would return the cached BCEnzyme instance that has already been created. The the BCEnzyme.m implementation file would have a static NSDictionary with the current instances already created. > >I guess this is what we already do for BCNucleotide and BCAminoAcid. Only in those cases all possible instances are created at once. For enzymes though, I would keep all info in a plist, and not hard-code any name in BioCocoa. Just because there are much more enzymes than nucleotides, and amino acids. Yes, BCNucleotide is a much better example... I am not familiar enough with the framework, I should be ashamed of myself... and I am a little bit! At least, it is true that you can cache stuff outside of instances, at the level of a class, which was my point... Pfiou! >>Of course, this is a way to go and keep the existing pattern. It is not exactly the best of the two worlds, though, because now you have some code dependency. Each of the BCSequence factory methods have to have a counterpart in BCSequenceFactory. If you change the name of one BCSequenceFactory method, you have to change the code in BCSequence. Ah, ah! ;-) > >Why? No need to change the name in BCSequence, just call the existing method in BCSequence. What I meant is you have one method A in BCSequenceFactory that is being called inside a method B in BCSequence. So if you change the name of method A, you have to edit the code in method B. Or maybe more relevant: if you add a method in BCSequenceFactory, you need to add one in BCSequence too, e.g. if you add -(BCSequenceET *)extraterrestrialSequenceWithString in BCSequenceFactory and you want the factory method in BCSequence, you have to also write it (and it will call the method in BCSequenceFactory). The bottom line: more code! I just want to conclude about BCSequenceFactory. Sorry I have been pushing this so far. I am OK using it if you feel it is safer, I just want to make sure I am not missing something more subtle about it. Thanks Alex and Koen for taking the time to answer my questions :-) >>Do you mean replacing BCSequenceFactory with BCSymbolListFactory, or do you mean having two separate classes? Having two separate classes seems a bit too much, no? > >I meant having two separate classes. We recently introduced the BCSymbolList as a class that only holds an NSArray of BCSymbols, no other info such as name and features. And it has an identifier for the sequence-type. The BCSequence class used to be like this, but it was changed to a subclass of BCSymbolList allowing it to have features, etc. We could have kept the original BCSequence, and create a new class BCAnnotatedSequence, but we found that would result in too long names, such as BCAnnotatedDNASequence. Therefore we introduced the intermediate class BCSymbolList. Actualy a symbol list class can be very handy when doing calculations and manipulations of the sequence itself, without all the other info. > >>Having all the members of a class tree created in the same entity seems more appropriate to me. > >What do you mean by this? I know and understand about BCSymbolList/BCSequence. What I call the class tree is the whole family BCSymbolList-->BCSequence-->BCSequenceDNA/Protein,... And I was just saying that maybe one factory for all of them is enough. BTW, one thing is not clear to me: is BCSymbolList an abstract class (like BCSequence apparrently was) and just used to separate code, or is it going to be instantiable? In my previous emails, I was a little confused about it (and probably confusing). > >>To avoid having factory methods spread out in the superclass and the subclasses, you could keep them all in the superclass (they could still return instances of the subclasses). Actually, maybe this would be a bit extreme, and that could confuse the user of the framework, but it is something to think about. That would actually be one step closer to a 'class cluster' pattern... (read below) > >The idea of using a class cluster pattern is intriguing, and definitely worth more thoughts. Thanks for bringing it up, Charles. To clarify it more, could you post some code snippets here using real BioCocoa examples? > Wow, now I need to write some real code, and not just be super-theoretical without having to think about the real world? I am in trouble ;-) I will try to do something a little later. Kids waiting... Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From mek at mekentosj.com Mon Dec 27 02:53:03 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 27 Dec 2004 08:53:03 +0100 Subject: [Biococoa-dev] Sequence factory In-Reply-To: <59DDFC6E-55C6-11D9-9FB9-003065A5FDCC@earthlink.net> References: <59DDFC6E-55C6-11D9-9FB9-003065A5FDCC@earthlink.net> Message-ID: <55EDA902-57DC-11D9-831F-000D93AE89A4@mekentosj.com> Just a quick note on this, and than my reply to the latest email to not skip back in the discussion... Op 24-dec-04 om 17:10 heeft Koen van der Drift het volgende geschreven: > I thought a little bit more about how to implement this. Which class > should actually contain this code? Right now the code > > + (BCSequenceDNA *) dnaSequenceWithString: (NSString *)entry > skippingNonBases: (BOOL)skip { > BCSequenceDNA *theReturn = [[BCSequenceDNA alloc] initWithString: > entry skippingNonBases: skip]; > return [theReturn autorelease]; > } > > > is already in place in BCSequenceDNA (and similar ones in other > subclasses of BCSequence). Alex, are you proposing to replace this > with somthing like: > > + (BCSequenceDNA *) dnaSequenceWithString: (NSString *)entry > skippingNonBases: (BOOL)skip > { > BCSequenceFactory *factory = [BCSequenceFactory > sharedSequenceFactory]; > > BCSequenceDNA *theReturn = [factory > sequenceWithString:@"AGTAGATTTGAGGT"]; > > return [theReturn autorelease]; > } > > > Sound fine with me, just checking if that is what you had in mind :) Yep, exactly, though I'm just wondering if you the [theReturn autorelease] is approriate here because it is already given "autoreleased" by the factory object isn't it? Or can you just call autorelease as many times as you want on an object, never knew that... Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From mek at mekentosj.com Mon Dec 27 03:24:37 2004 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 27 Dec 2004 09:24:37 +0100 Subject: [Biococoa-dev] Sequence factory In-Reply-To: References: <7BABEDA6-5753-11D9-9FB9-003065A5FDCC@earthlink.net> Message-ID: ..continuing here: Op 27-dec-04 om 6:32 heeft Charles PARNOT het volgende geschreven: >> On Dec 25, 2004, at 6:56 PM, Charles PARNOT wrote: >> >>> Or the cached stuff could stay inside the class implementation, >>> using the equivalent of 'class instance variables' that can be >>> created with static variables private to the implementation file. >>> Actually, a relevant case is that of enzymes. If the user tries to >>> create an enzyme that has already been created, the factory method >>> (a class method such as '+(BCEnzyme *)EcoRI') or even the 'init' >>> method (an instance method like '-(id)initWithName:(NSString >>> *)name') would return the cached BCEnzyme instance that has already >>> been created. The the BCEnzyme.m implementation file would have a >>> static NSDictionary with the current instances already created. >> >> I guess this is what we already do for BCNucleotide and BCAminoAcid. >> Only in those cases all possible instances are created at once. For >> enzymes though, I would keep all info in a plist, and not hard-code >> any name in BioCocoa. Just because there are much more enzymes than >> nucleotides, and amino acids. > > Yes, BCNucleotide is a much better example... I am not familiar enough > with the framework, I should be ashamed of myself... and I am a little > bit! At least, it is true that you can cache stuff outside of > instances, at the level of a class, which was my point... Pfiou! No worries! Indeed the enzyme stuff was just an example, but this goes back to a discussion we once had about doing digests etc. If you say "cut my plasmid with all enzymes available and give me the fragments" you certainly want to instantiate them all at once without having to call "give me ecori, give me hindiii etc). But let's first focus on the sequences before destroying them with enzymes ;-) >>> Of course, this is a way to go and keep the existing pattern. It is >>> not exactly the best of the two worlds, though, because now you have >>> some code dependency. Each of the BCSequence factory methods have to >>> have a counterpart in BCSequenceFactory. If you change the name of >>> one BCSequenceFactory method, you have to change the code in >>> BCSequence. Ah, ah! ;-) >> >> Why? No need to change the name in BCSequence, just call the existing >> method in BCSequence. > > What I meant is you have one method A in BCSequenceFactory that is > being called inside a method B in BCSequence. So if you change the > name of method A, you have to edit the code in method B. Or maybe more > relevant: if you add a method in BCSequenceFactory, you need to add > one in BCSequence too, e.g. if you add -(BCSequenceET > *)extraterrestrialSequenceWithString in BCSequenceFactory and you want > the factory method in BCSequence, you have to also write it (and it > will call the method in BCSequenceFactory). The bottom line: more > code! That's certainly through, I'll rephrase that then to: "It's the best way to go at the moment" ;-) > I just want to conclude about BCSequenceFactory. Sorry I have been > pushing this so far. I am OK using it if you feel it is safer, I just > want to make sure I am not missing something more subtle about it. > Thanks Alex and Koen for taking the time to answer my questions :-) Absolutely no problem, in fact you have discovered by now that this still is a topic of discussion... >>> Do you mean replacing BCSequenceFactory with BCSymbolListFactory, or >>> do you mean having two separate classes? Having two separate classes >>> seems a bit too much, no? >> >> I meant having two separate classes. We recently introduced the >> BCSymbolList as a class that only holds an NSArray of BCSymbols, no >> other info such as name and features. And it has an identifier for >> the sequence-type. The BCSequence class used to be like this, but it >> was changed to a subclass of BCSymbolList allowing it to have >> features, etc. We could have kept the original BCSequence, and create >> a new class BCAnnotatedSequence, but we found that would result in >> too long names, such as BCAnnotatedDNASequence. Therefore we >> introduced the intermediate class BCSymbolList. Actualy a symbol list >> class can be very handy when doing calculations and manipulations of >> the sequence itself, without all the other info. >> >>> Having all the members of a class tree created in the same entity >>> seems more appropriate to me. >> >> What do you mean by this? > > I know and understand about BCSymbolList/BCSequence. What I call the > class tree is the whole family > BCSymbolList-->BCSequence-->BCSequenceDNA/Protein,... And I was just > saying that maybe one factory for all of them is enough. I very much like this idea, in fact I was about to comment to Koen's orginal question that I felt that we might overdue the factory thing. I know it almost has become a necessity because of the rather complicated architecture we had to come up with regarding sequences, but we should do it if not absolutely necessary. The idea to have one cluster for the complete class tree might actually be nice way to limit the rapid increase of factories. I think the BioJava framework is a nice (depends on how you see it) example where you need a factory for almost all things you do, something I don't really like and also quite non-cocoa. A nice story here perhaps, I was discussing how to implement alignments with Serge Cohen (yes, there's something in the works but quite in a premature stage though), when at some point we came to the number of basic classes we needed. And he said two, one for alignments and one for contigs. So I asked, where's the object that manages all this, and which does the actual alignment, "the alignment factory"? He answered, you don't need that, simply have a class method that does the alignment which returns the alignment as a class instance. It struck me how simple it indeed should be and how much I had been going to think in terms of factories and controllers. Which leads me to ask Koen to convince me why we do need a BCSymbolList factory and can't do with simple class methods? > BTW, one thing is not clear to me: is BCSymbolList an abstract class > (like BCSequence apparrently was) and just used to separate code, or > is it going to be instantiable? In my previous emails, I was a little > confused about it (and probably confusing). The latter is the plan I believe. >>> To avoid having factory methods spread out in the superclass and the >>> subclasses, you could keep them all in the superclass (they could >>> still return instances of the subclasses). Actually, maybe this >>> would be a bit extreme, and that could confuse the user of the >>> framework, but it is something to think about. That would actually >>> be one step closer to a 'class cluster' pattern... (read below) >> >> The idea of using a class cluster pattern is intriguing, and >> definitely worth more thoughts. Thanks for bringing it up, Charles. >> To clarify it more, could you post some code snippets here using real >> BioCocoa examples? Yep, copy that! > Wow, now I need to write some real code, and not just be > super-theoretical without having to think about the real world? I am > in trouble ;-) Sorry for introducing you to the project Charles, actually I'm not so sorry at all ;-) Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From kvddrift at earthlink.net Mon Dec 27 16:46:50 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 27 Dec 2004 16:46:50 -0500 Subject: [Biococoa-dev] Sequence factory In-Reply-To: References: <7BABEDA6-5753-11D9-9FB9-003065A5FDCC@earthlink.net> Message-ID: On Dec 27, 2004, at 3:24 AM, Alexander Griekspoor wrote: > Which leads me to ask Koen to convince me why we do need a > BCSymbolList factory and can't do with simple class methods? > Class methods should be fine for BCSymbolList. The BCSequenceFactory class is useful when opening files, which will create BCSequences instead of BCSymbolLists. - Koen. From kvddrift at earthlink.net Mon Dec 27 17:15:48 2004 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 27 Dec 2004 17:15:48 -0500 Subject: [Biococoa-dev] Sequence factory In-Reply-To: References: <7BABEDA6-5753-11D9-9FB9-003065A5FDCC@earthlink.net> Message-ID: On Dec 27, 2004, at 12:32 AM, Charles PARNOT wrote: > > BTW, one thing is not clear to me: is BCSymbolList an abstract class > (like BCSequence apparrently was) and just used to separate code, or > is it going to be instantiable? BCSymbolList is not supposed to be abstract. It's main use is for places where we don't need a complete sequence object, but just the symbol list. for instance when doing calculations, translations, etc. - Koen.