[Biococoa-dev] Design question

Mon Aug 9 21:49:32 EDT 2004

> The guys at BioJava came up with a nice solution, the best of both 
> world so to speak: http://www.biojava.org/tutorials/chap1.html
> What we do is create singleton objects (think "sharedDefaultManager") 
> for each class of "symbol", then refer to these using pointers. A 
> sequence like "ATGC" would be an array in the form of: "pointer to 
> shared "A" object, pointer to shared "T" object,pointer to shared "G" 
> object, pointer to shared "C" object, etc" All used objects are 
> present in memory only once, and the sequence is an array of pointers 
> which is very cheap memory wise. To highlight some of the things in 
> this approach which I like very much:
> - Great performance memory wise
> - The "symbol" classes can store all additional data like name, pi, etc
> - Solution to the ambiguity problem (see the getMatches() method)

Absolutely the right approach, IMO.

>  I think we should discuss how exactly the functional groups should be 
> worked out. Either as separate symbols, or as possible "properties" of 
> the base class. Example: should phosporylated-Serine be a separate 
> "BCSymbol", or should phosphorylation be a "BCFunctionalGroup" that 
> can be added to a symbol? Properly the first option if we go for 
> shared symbols, as you can either add a property to all serines or 
> none in this approach. The alternative option is to keep a 
> modification dictionary (modification and position) associated at the 
> sequence level instead of the symbol one.

The second option is they way to go, I think. If I remember correctly, 
both the 'Singleton' and this approach are one of the 'Design 
Patterns', first described by the Gang of Four. I forgot the name of 
the second one, but you should check out that book in your local 
bookstore/library. One of the bibles of OOP.

>
>> Regarding the question whether the sequences should be 0-based or 
>> 1-based, I suggest we use both :) The BCSequence can have an NSRange 
>> member that is 1-based (or two ints indicating the start and end 
>> position), and the NSString and NSMutableArray are both 0-based.
> The tutorial above mentions an interesting choice: "Note that 
> numbering of Symbols within the SymbolList runs from 1 to length, not 
> from 0 to length-1 as is the case with Java strings. This is 
> consistent with the coordinate system found in files of annotated 
> biological sequences." Maybe we should do the same here.

Yes, I agree again. The stringRepresentation mentioned earlier will 
then do the conversion to 0-based.
>

> Ps. Again, I encourage everyone to read the tutorial I linked to, and 
> if you have time to further dive into the BioJava docs (further then I 
> did), I'm sure there are plenty of more design decisions they made 
> from which we can learn and take advantage of...

Great tutorials, also the other chapters. Alex, thanks for all the 
input. It's really good to talk about this before diving blindfolded 
into codeland.

- Koen.