[Biococoa-dev] Base test
Alexander Griekspoor
mek at mekentosj.com
Sun Aug 15 18:18:34 EDT 2004
Hi Guys,
Let me start by a very general comment, perhaps we should either put
our work in a separate directory or provide some read me files for
"innocent" downloaders of BioCocoa that find a lot of alpha code now of
a sudden instead of the relatively stable version Peter left before we
started. At least some comments in the read me file would be elegant.
Perhaps Peter do you want to do that?
Also, would it be wise to make some further organisation using folders.
We have a Utils folder already. Perhaps both the sequence stuff and IO
part could be placed in separate folders.
Then one disclaimer here, I feel more and more guilty not having much
time right now do really help programming. Our website makeover comes
along nicely, but it will take some more time unfortunately. As a
result I hope you don't get the feeling that this person is only
complaining while you guys do all the work... Again, don't get offended
it's all really meant with good intention. I'll try to jump in
a.s.a.p....
OK, that having said, I'm impressed by the tempo guys, well done!
> Can we change the name BCSequenceBaseFoo to something else? Just as
> you commented that Symbol is confusing, I think SequenceBase is also
> confusing because (to me) it refers to DNA, not to a general building
> block.
The naming scheme first, I understand the possible clash with the use
of Symbol, although I think it still represents the subject best, and
we are talking about BCSymbol instead of symbol (can't we call this
latter property "character" (biojava calls them "token" which is a nice
name as well). Anyway, my problem is not so much BCSymbol per se, I'm
just not to fond of these really long names (long live autocomplete,
but BCSequenceUnitDNABase is really long, let alone
BCSequenceUnitAminoAcid!!
I go along with Koen then, why don't we just call the thing BCAminoAcid
or BCNucleotideDNA and BCNucleotideRNA (or BCDNABase/BCRNABase). I know
you rather have a shared prefix as they descend from a common ancestor,
but maybe that doesn't weight enough here. Question remains of course
what we call the ancestor ;-)
A few remarks that I had after my first quick look at the added code:
- John shouldn't there be singleton objects for the "W, S, V, B, R"
bases as well? I now only found ACGTN
- I think it's impossible because of the needed statics, but would
there be a way to COMPLETELY initialize all bases on the basis of the
plist? Without having to hardcode them? In the ideal world I would
imagine that you ask for a base, the factory object looks in the plist
if that is such a base with that name is listed and then initializes it
with the data from the plist. Alternatively, it would upon
initialization enumerate over the plist to init all the bases listed. I
was just wondering if someone could come up with an idea like that?
- Koen, in your sequence class I saw you can init them with a string,
great! But next you keep the string around and many methods depend on /
work with the string. This leads to exactly the problems we discussed.
Init with a string is logical of course, but then we should just let
that go and completely depend on the sequence list containing John's
bases. We shouldn't have to worry about keeping the string in sync
here, the only string you can get back out is through the
stringRepresentation; method which is generated at that particular
moment back "translated" from the sequencelist. Of course I realize it
is work in progress and perhaps to early.
- I found the -position; method a bit confusing as to it's description
vs what it does
- What does the countedset do, and is that supported from Jaguar?
- Then we encounter another problem. BCSequence should be a ancestor
class that devides in aminoacid, dna or rna sequence subclasses. Now
you have something mixed, do we incorporate translations into the
sequence? I guess not, these sequences should be mixed, either pure dna
or pure protein. If we do, RNA translation must be there as well. So
the aminoacid methods are strange here. The idea I would propose is
that there is a shared translation util object that you could feed a
dna sequence and get (in the requested frames) the translated sequences
back as protein sequence objects. It's the app task to control/organize
these. Likewise, one could argue this for complements as well -> a
shared dna utils object returns you the complement sequence if you hand
it a sequence. Alternatively these translations could be added as
features, but in all cases there's again the "how to keep things in
sync upon editing" problem. I think we should keep things as separate
and clean entities as much as possible.
- Another discussion we had before was about the start/end position. I
argued a bit before to handle things like movie editing. You have raw
source clips and give a start and end position to mark the wanted
region. The big advantage here is that you get socalled
"non-destructive editing". Say you had selected bases 100 to 900 in a
1000bp sequence. In iMovie 2 you were in big trouble if you in
hindsight rather have had bases 50 to 950 as you have cropped the
sequence and thrown away the ends. In iMovie 4 this is no problem, the
raw source is still there and the only thing you have to move is the
begin/end marker.
But during our discussions we more or less came to the conclusion that
this would be something more appropriate to be coded in features as
it's hard to predict when you want to crop or want to keep the complete
sequence. In addition this current implementation is rather limited as
only one region can be marked, instead of 100-200, 400-500 etc. A
developer could easily add program specific features that allows him to
simulate the desired behaviour when he wants to (like mark bases 50 to
200 as a cut fragment).
- I love the snippet where you read the dictionary only once using the
class method, certainly gonna use that one myself as well ;-)
Again, many of these remarks might come to early. Also, a lot of work
comes from the interplay between John and Koen to get the two basic
parts, symbols and sequences, working.
It's definitely going quite well from what I can see. I like the
sequence header file items you send John, and indeed see many things
already in the work of Koen. Indeed many items can go in the ancestor
sequence class and it's key to identify as many as possible to keep the
descendants look and work as similar as possible.
If I see things like (although this should indeed be in the general
sequence class, thus loosing the "DNA" part):
///////////////////////////////////////////////////////////////////////
/////
// INITIALIZATION METHODS
///////////////////////////////////////////////////////////////////////
/////
- (BCSequenceDNA *) initWithSequenceString: (NSString *)entry
skippingNonBases: (BOOL)skip;
+ (BCSequenceDNA *) DNASequenceWithSequenceString: (NSString *)entry
skippingNonBases: (BOOL)skip;
+ (BCSequenceDNA *) DNASequenceWithBaseArray: (NSArray *)entry;
+ (BCSequenceDNA *) DNASequenceWithSequence: (BCSequenceDNA *)entry;
, I can hardly wait to start using it in a real program!
I guess before that however, many discussion will follow ;-)
Keep up the good work guys!
Cheers,
Alex
**************************************************************
** Alexander Griekspoor **
**************************************************************
The Netherlands Cancer Institute
Department of Tumorbiology (H4)
Plesmanlaan 121, 1066 CX, Amsterdam
Tel: + 31 20 - 512 2023
Fax: + 31 20 - 512 2029
AIM: mekentosj at mac.com
E-mail: a.griekspoor at nki.nl
Web: http://www.mekentosj.com
MacOS X: The power of UNIX with the simplicity of the Mac
***************************************************************
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 8062 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20040816/99d42e74/attachment.bin>
More information about the Biococoa-dev
mailing list