[Biococoa-dev] Base test

Sun Aug 15 18:18:34 EDT 2004

Hi Guys,

Let me start by a very general comment, perhaps we should either put  
our work in a separate directory or provide some read me files for  
"innocent" downloaders of BioCocoa that find a lot of alpha code now of  
a sudden instead of the relatively stable version Peter left before we  
started. At least some comments in the read me file would be elegant.  
Perhaps Peter do you want to do that?

Also, would it be wise to make some further organisation using folders.  
We have a Utils folder already. Perhaps both the sequence stuff and IO  
part could be placed in separate folders.

Then one disclaimer here, I feel more and more guilty not having much  
time right now do really help programming. Our website makeover comes  
along nicely, but it will take some more time unfortunately. As a  
result I hope you don't get the feeling that this person is only  
complaining while you guys do all the work... Again, don't get offended  
it's all really meant with good intention. I'll try to jump in  
a.s.a.p....

OK, that having said, I'm impressed by the tempo guys, well done!

> Can we change the name BCSequenceBaseFoo to something else? Just as  
> you commented that Symbol is confusing, I think SequenceBase is also  
> confusing because (to me) it refers to DNA, not to a general building  
> block.
The naming scheme first, I understand the possible clash with the use  
of Symbol, although I think it still represents the subject best, and  
we are talking about BCSymbol instead of symbol (can't we call this  
latter property "character" (biojava calls them "token" which is a nice  
name as well). Anyway, my problem is not so much BCSymbol per se, I'm  
just not to fond of these really long names (long live autocomplete,  
but BCSequenceUnitDNABase is really long, let alone  
BCSequenceUnitAminoAcid!!
I go along with Koen then, why don't we just call the thing BCAminoAcid  
or BCNucleotideDNA and BCNucleotideRNA (or BCDNABase/BCRNABase). I know  
you rather have a shared prefix as they descend from a common ancestor,  
but maybe that doesn't weight enough here. Question remains of course  
what we call the ancestor ;-)

A few remarks that I had after my first quick look at the added code:
- John shouldn't there be singleton objects for the "W, S, V, B, R"  
bases as well? I now only found ACGTN

- I think it's impossible because of the needed statics, but would  
there be a way to COMPLETELY initialize all bases on the basis of the  
plist? Without having to hardcode them? In the ideal world I would  
imagine that you ask for a base, the factory object looks in the plist  
if that is such a base with that name is listed and then initializes it  
with the data from the plist. Alternatively, it would upon  
initialization enumerate over the plist to init all the bases listed. I  
was just wondering if someone could come up with an idea like that?

- Koen, in your sequence class I saw you can init them with a string,  
great! But next you keep the string around and many methods depend on /  
work with the string. This leads to exactly the problems we discussed.  
Init with a string is logical of course, but then we should just let  
that go and completely depend on the sequence list containing John's  
bases. We shouldn't have to worry about keeping the string in sync  
here, the only string you can get back out is through the  
stringRepresentation; method which is generated at that particular  
moment back "translated" from the sequencelist. Of course I realize it  
is work in progress and perhaps to early.

- I found the -position; method a bit confusing as to it's description  
vs what it does

- What does the countedset do, and is that supported from Jaguar?

- Then we encounter another problem. BCSequence should be a ancestor  
class that devides in aminoacid, dna or rna  sequence subclasses. Now  
you have something mixed, do we incorporate translations into the  
sequence? I guess not, these sequences should be mixed, either pure dna  
or pure protein. If we do, RNA translation must be there as well. So  
the aminoacid methods are strange here. The idea I would propose is  
that there is a shared translation util object that you could feed a  
dna sequence and get (in the requested frames) the translated sequences  
back as protein sequence objects. It's the app task to control/organize  
these. Likewise, one could argue this for complements as well -> a  
shared dna utils object returns you the complement sequence if you hand  
it a sequence. Alternatively these translations could be added as  
features, but in all cases there's again the "how to keep things in  
sync upon editing" problem. I think we should keep things as separate  
and clean entities as much as possible.

- Another discussion we had before was about the start/end position. I  
argued a bit before to handle things like movie editing. You have raw  
source clips and give a start and end position to mark the wanted  
region. The big advantage here is that you get socalled  
"non-destructive editing". Say you had selected bases 100 to 900 in a  
1000bp sequence. In iMovie 2 you were in big trouble if you in  
hindsight rather have had bases 50 to 950 as you have cropped the  
sequence and thrown away the ends. In iMovie 4 this is no problem, the  
raw source is still there and the only thing you have to move is the  
begin/end marker.
But during our discussions we more or less came to the conclusion that  
this would be something more appropriate to be coded in features as  
it's hard to predict when you want to crop or want to keep the complete  
sequence. In addition this current implementation is rather limited as  
only one region can be marked, instead of 100-200, 400-500 etc. A  
developer could easily add program specific features that allows him to  
simulate the desired behaviour when he wants to (like mark bases 50 to  
200 as a cut fragment).

- I love the snippet where you read the dictionary only once using the  
class method, certainly gonna use that one myself as well ;-)

Again, many of these remarks might come to early. Also, a lot of work  
comes from the interplay between John and Koen to get the two basic  
parts, symbols and sequences, working.

It's definitely going quite well from what I can see. I like the  
sequence header file items you send John, and indeed see many things  
already in the work of Koen. Indeed many items can go in the ancestor  
sequence class and it's key to identify as many as possible to keep the  
descendants look and work as similar as possible.

If I see things like (although this should indeed be in the general  
sequence class, thus loosing the "DNA" part):
/////////////////////////////////////////////////////////////////////// 
/////
//  INITIALIZATION METHODS
/////////////////////////////////////////////////////////////////////// 
/////
- (BCSequenceDNA *) initWithSequenceString: (NSString *)entry  
skippingNonBases: (BOOL)skip;
+ (BCSequenceDNA *) DNASequenceWithSequenceString: (NSString *)entry  
skippingNonBases: (BOOL)skip;
+ (BCSequenceDNA *) DNASequenceWithBaseArray: (NSArray *)entry;
+ (BCSequenceDNA *) DNASequenceWithSequence: (BCSequenceDNA *)entry;

, I can hardly wait to start using it in a real program!
I guess before that however, many discussion will follow ;-)
Keep up the good work guys!
Cheers,
Alex

**************************************************************
                         ** Alexander Griekspoor **
**************************************************************
                  The Netherlands Cancer Institute
                  Department of Tumorbiology (H4)
             Plesmanlaan 121, 1066 CX, Amsterdam
                        Tel:  + 31 20 - 512 2023
                        Fax:  + 31 20 - 512 2029
                       AIM: mekentosj at mac.com
                       E-mail: a.griekspoor at nki.nl
                    Web: http://www.mekentosj.com

MacOS X: The power of UNIX with the simplicity of the Mac

***************************************************************
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 8062 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20040816/99d42e74/attachment.bin>