[Biococoa-dev] Design question

Tue Aug 10 11:06:36 EDT 2004

> Thanks Koen, I will check it out, I find myself in "programming land"
> so new that I seriously miss a lot of historic knowledge...
> The fact that I was trained as a biologist instead of IT guy doesn't
> help much either ;-)
>
I'm in the same situation, so I sympathize.  Anyway, I like the idea of the
singleton base arrangement.  I spent my time wandering around the javadoc
references instead of the tutorial, and clearly I missed a lot of things.
Given easy methods to convert between the array and strings, it would allow
us to code all the methods using whichever format is easier.  And I'm all
for easy to make methods....

> Yesterday I was still thinking a bit more about the two options I
> presented, and indeed the modification dictionary seems the best way to
> go. I think it's a very nice approach to keep this in a similar way as
> for instance the genbank records show features associated with the
> sequence. I believe John also mentioned something about this. The
> hierarchy would be something along the lines of a dictionary containing
> BCAnnotation objects (biojava does this as well), that would describe
> the positions in simple NSRanges and the type perhaps as
> BCFunctionalGroup objects. One of the problems will be to keep the
> system such that new (for us unknown) modifications/features are easily
> added...
I had thought an array of NSDictionary like objects, each a BCFeature (or
BCAnnotation) would be easier.  The key thing would be to have a unique ID
set when a feature is added, so the user is shielded from naming conflicts
(they could add as many things named "ORF" as they want).  This would also
allow a feature to point to a separate sequence within a bundle of sequences
- ie, the amino acid sequence of that ORF.

Either way works, but I'd thought that an array as the root feature object
had more parallels with other sequence file formats (ie - NCBI's) and having
a regular, repeating structure would make the native file format a bit more
readable. The flipside is that looking up a specific object in a dictionary
would be much simpler to code.  Maybe a vote on this is in order?

One thing I'd argue for is an enumeration of defined feature types.  The
user should be free to create their own, but there are huge advantages of a
set of non-custom ones.  Imagine being able to search an institute wide
plasmid collection for everything with a Vertebrate promoter, protein tag,
and unique BamHI site....

> Another thought I would like you to comment on is the addition of a
> "history/editing dictionary" which keeps track of who added/edited a
> sequence and when/what things were edited. In general, I think it would
> be nice if we would go for the "non-destructive editing approach"
> wherever possible. My would-be Biococoa based DNAStrider-like app would
> for instance allow the user to cut and paste fragments and vectors, and
> it would be very nice if many of the editing could always be undone,
> and the original sequence could always be viewed. Think along the lines
> of a modern video editing approach, the files are unchanged, only the
> displayed parts are changed. This could save a lot of memory/disk
> reusal/writing as well. Of course there must be methods to "crop" your
> file as it has no use to keep a complete genome around if your only
> interested in one gene right...
As you point out, the danger here would be that we'd have to guess in
advance the information content that would best suit the user.  Permanent
undo's are also out of keeping with most AppKit design practices, where the
UndoManager doesn't survive application quits.  I'm all for keeping an
internal Undo list in each sequence object and allowing that to transfer
with drag/drop actions and such, but I'm hesitant about writing it to disk.
Something like that might be better implemented on a per-program basis,
rather than at the root of BioCocoa.

Off to visit the mice now...

John

_______________________________________________
This mind intentionally left blank