[Biococoa-dev] more ramblings

Fri Dec 3 16:27:23 EST 2004

Koen,

Op 3-dec-04 om 21:23 heeft Koen van der Drift het volgende geschreven:

>
> On Dec 3, 2004, at 11:47 AM, Alexander Griekspoor wrote:
>
>>> Well let me add some points here. Although I never liked the idea of 
>>> subclassing BCSequence, I think you guys are right that if we use 
>>> one-liners it is better to call them from the appropriate subclass. 
>>> But I still like the idea of have the wrapper test the sequence type 
>>> first before continuing. It might be nonsense to do that, but the 
>>> result won't be - because there are no results.  Just returning nil 
>>> will be sufficient, no need to start throwing exceptions around ;)
>>
>> True, the question is how we organize the wrapper (you mean the 
>> BCAnnotatedSequence right, or whatever we decide to name it).
>
> No, I meant the general wrappers that do something with a sequence 
> (translate, pI, search, etc).
Ok, I get it, in general you want those classes be able to handle a 
general BCSequence object as well, and not only a specific subclass per 
se.
>
>
>> There are basically two choices either let the developer separate the 
>> sequence from the annotations/features part and do all manipulations 
>> purely on the BCSequence, or make all methods accept besides 
>> BCSequences also the BCAnnotatedSequences. This latter his some clear 
>> advantages (such as the possibility to take features into account 
>> while calculating the MW for example), but it also has some clear 
>> problems. One thing it would mean is that we are almost forced to 
>> have also three types of BCAnnotatedSequence subclasses around (Koen 
>> might remark the benefit of a single BCSequence class here probably).
>
> LOL - actually, yes I would ;). But I would suggest the following. 
> BCSequence *only* takes care of managing the symbol list, more or less 
> like the SymbolList class they have in BioJava. The we have 
> BCAnnotatedSequence as a subclass of BCSequence. So now we have a 
> symbollist + all the additional info that makes it a real molecule. 
> Then, only for convenience, we subclass BCAnnotatedSequence to 
> BCSequenceDNA, BCSequenceProtein, etc.
hmm, not sure, it feels like the layer at which we then subclass is the 
wrong one. But it might also be the only problem.
>
>> Notes and annotations like creator, date etc are easy, they don't 
>> change (and are what I would call a BCAnnotation). Features 
>> (BCFeature objects) are much more of a problem, they are coupled to 
>> sequence ranges (i.e. a helix from aminoacid 10 to 15), and should be 
>> kept in sync while editing the sequence. The big problem here is, 
>> what architecture would be the smartest way of doing this. Any 
>> suggestions?
>
> The BioPerl docs I mentioned recently use a separate Location object. 
> I need to look more closely at it, to see how useful it is. One thing 
> we have to watch for is that features need to have a 1-based 
> numbering, not 0-based as we have so far. One possibility could be to 
> couple features with individual BCSymbols. So we tell a BCSymbol that 
> a feature XX starts there. However, what happens if in the example you 
> mentioned above (helix from aminoacid 10 to 15), the user edits the 
> sequence and removes AA 8-12? Then the startpoint of the feature is 
> gone. So, I guess that might not be a good solution, although this 
> problem (if any) will also manifest itself with ther solutions.
Exactly, what we have to emulate is an attributed string, that handles 
exactly the same problem(s). I think in general we don't need a 
location object, we need a range object and I don't see why NSRange 
wouldn't be good enough (even if our system is 1-bases).
>
>>>
>>> However, I don't like the idea that was suggested in another recent 
>>> mail, to also make subclasses for DNAStrict, proteinstrict, etc.
>> I copy that, definitely not, but the general BCSequence class could 
>> have a simple strict boolean that can be set.
>
> For what?
For preserving the knowledge that a sequence uses  a strict symbolset 
(the other option would be to have a symbolset property inside the 
BCSequence object.
>
>> Also, we can introduce the strict BCSequenceTypes for passing as 
>> arguments...
>
> Sounds good.
>
> BTW, what's the difference between 'strict', 'skippingnonbases' and 
> 'unambiguous' ?
Basically they're the same thing, and yes, we should rename them to be 
similar I think...
Cheers,
Alex

*********************************************************
                     ** Alexander Griekspoor **
*********************************************************
               The Netherlands Cancer Institute
               Department of Tumorbiology (H4)
          Plesmanlaan 121, 1066 CX, Amsterdam
                   Tel:  + 31 20 - 512 2023
                   Fax:  + 31 20 - 512 2029
                   AIM: mekentosj at mac.com
                   E-mail: a.griekspoor at nki.nl
               Web: http://www.mekentosj.com

                           Windows vs Mac
	65 million years ago, there were more
                      dinosaurs than humans.
	     Where are the dinosaurs now?

*********************************************************

*********************************************************
                     ** Alexander Griekspoor **
*********************************************************
               The Netherlands Cancer Institute
               Department of Tumorbiology (H4)
          Plesmanlaan 121, 1066 CX, Amsterdam
                     Tel:  + 31 20 - 512 2023
                     Fax:  + 31 20 - 512 2029
                     AIM: mekentosj at mac.com
                     E-mail: a.griekspoor at nki.nl
                 Web: http://www.mekentosj.com

Windows is a 32-bit patch to a 16-bit shell for an 8-bit
operating system, written for a 4-bit processor by a 2-
bit company without 1 bit of sense.

*********************************************************