[Biococoa-dev] Demo App

Fri Aug 27 16:58:56 EDT 2004

>>> Thanks - I found the mistakes in the .plist, and things should work
>>> fine
>>> now. Incidentally, a 2.4 Mb BAC took about 46 seconds to reverse
>>> complement.
>> Great! That's pretty rapid! What system are you on John?
> A 1.33GHz G4 laptop.  I'm not sure if it stressed the disk at all, so 
> I'd
> imagine it was more a function of RAM access and processor, in which 
> case
> this is an above average machine.

At least for me, 2.4Mb is an above average sequence length as well ;-)

>> I already did the very nice and exiting work (ahum) of creating such a
>> plist for EnzymeX, so we have this one already ;-)
>> BCCodons express their sequence in the BCTokens right?
> Could you send me a copy of the .plist?
Sure, it's attached... In any case it might save some work... Oops, 
while attaching it, I notice I saved one plist for each species, but I 
guess copy pasting them in one file still saves some work... The 
structure of the plist should be self explanatory:
		<key>AAA</key>
		<array>
			<string>K</string>
			<string>Lys</string>
			<string>24.1</string>
		</array>
The number is the relative codon usage (to see if it is rare or not).

> I've been debating between a flatfile with all possible combinations 
> and a tree structure with keys that
> are BCSymbols themselves, which should allow us to use ambiguous bases 
> more
> easily. To explain the tree option in detail:  you simply enumerate 
> the keys and
> query each one as to whether it represents the first base.  If it 
> does, you
> grab the dictionary it keys for, and repeat the process with the second
> base.  On the third base of the codon, the dictionary simply contains 
> the
> answer - in the case of a translation, the amino acid.  If it fails at 
> any
> point, it returns undefined.
>
> This should cut down on the number of items we have to put in the 
> dictionary
> considerably, and provide a translation even if the sequence isn't high
> quality.  Plus, I already know how to populate an object from text
> references thanks to the nucleotide experience.

Sounds great! Guess we just have to see if it works in practice and if 
it's fast enough, but I can't see why not.

>> We could have two different methods for translation to either RNA or
>> protein. We should also take species specific translation into 
>> account,
>> that's the reason for geneticcode objects. We can have a number of
>> codes already predefined like [BCGeneticCode standardCode] as a
>> classmethod.
> I was thinking of making a single generic method that would handle all
> translations, but I guess there's only going to be a few, so 
> specialized
> methods make more sense.
And as these to a so different the first things in your method you 
would do is diverge between protein and rna, so why not make things 
much more transparant and keep it separate. Of course you can always 
add the convenience method which sorts out what to do and call the 
proper method.

>> I was thinking a bit about this as well yesterday, and came of with 
>> the
>> following problem; how do we return multiple frames?
>> I you do a translateDNASequence: usingCode: (BCGeneticCode *)code
>> inFrame: you just return a BCSequenceProtein
>> But what if you want all frames, or all forward frames, do we return a
>> dictionary of BCSequenceProteins with the frame as key?
>> Finally, let's define how we call each frame: -3, -2, -1, +1, +2, +3?
> If a method can return more than one result, clearly it should return 
> an
> array.
Yes, and no, if we allow multipleframes as a parameters (say -3, +1 and 
+2) we should either return an array in the same order (or fixed 
order), or a dictionary with those framenumbers as keys. In the latter 
case no confusion can occur, which can easily occur in the first case. 
Say I ask a convenience method translateReverseFrames do I get an array 
back in the order -3, -2, -1 or -1, -2, -3? Headerdoc will help you out 
here, but with the dictionary no question would be there in the first 
case. But in this case I agree with an array, we just have to make sure 
it is clearly documented what and how things are returned.

> As for frames, I think the non-zero integers are the way to go - we
> should try to make usage familiar to biologists (unless it's too 
> difficult
> or annoying to do so ;).
Yup! Certainly!
Cheers,
Alex

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Codon Tables.zip
Type: application/zip
Size: 17996 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20040827/e4b9332f/attachment.zip>
-------------- next part --------------

**************************************************************
                         ** Alexander Griekspoor **
**************************************************************
                  The Netherlands Cancer Institute
                  Department of Tumorbiology (H4)
             Plesmanlaan 121, 1066 CX, Amsterdam
                        Tel:  + 31 20 - 512 2023
                        Fax:  + 31 20 - 512 2029
                       AIM: mekentosj at mac.com
                       E-mail: a.griekspoor at nki.nl
                    Web: http://www.mekentosj.com

MacOS X: The power of UNIX with the simplicity of the Mac

***************************************************************