mek at mekentosj.com
Mon Feb 21 10:26:31 EST 2005
Nice work on the annotations guys, looks nice and indeed a dictionary
is the obvious way to go.
Just a few things that came to mind and like to share with you. Two
issues with the annotations to think about.
First, it would be nice to have a standard set defined for common
annotations, like author, organism etc. The question of course is what
list should be adhere to? The EMBL format, the NCBI format?
Second, while searching the web a bit, I came along the BSML XML format
which seems to become a kind of standard for new sequence formats. It
would perhaps be nice (and wise) to have a look at the documents they
made because they (obviously) studied the annotation/feature issue very
You can find more info at: http://www.bsml.org/
Now, just to make sure, bsml is a file format and one we could
implement of course, internally the dictionary approach is for us the
way to go, but it might be an idea to adhere to there nomenclature
and/or tree/hierarchy. I came already across some nice ideas to keep in
> As research proceeds on a given biological molecule, certain segments
> of the sequence become interesting for a variety of reasons. Sequence
> annotation is used to capture this extra information about the
> sequence data. Positional annotation refers to annotations that are
> specific to a portion of a sequence. In BSML, positional annotation is
> captured through Feature tags. Feature tags are child tags of a
> sequence tag, and therefore a Feature is related to a single sequence.
> For example, the following tag indicates that the region between 1513
> and 1962 encodes a particular gene:
> <Feature id="FTR4" title="Leucine TNRA" class="GENE">
> <Qualifier value-type="gene"/>
> <Interval-loc startpos="1513 endpos="1962"
So a feature is defined as a "positional annotation" which is a nice
definition that I had in mind as well. Of course features give the
extra problem that they have to be kept in sync during editing.
Therefore it's perhaps better to internally have a dictionary of
annotations and a dictionary of features.
> A given DNA sequence could have many features associated with it.
> Rather than simply encoding all of these flatly, in BSML related
> feature tags can be aggregated into Feature-Tables. Feature-Tables are
> intended to provide a logical grouping to features, such as grouping
> all gene expression features together.
This is nice as well, it allows to have nested annotations and
features, which is perfectly possible with a dictionary of course. What
do you guys think of this, to complicated or a desired feature?
Basically, the dictionary approach we have right now allows us to put
everything (including resources, data etc) as an annotations, we're not
limited by strings and such. The question rises if we should go for an
annotation/feature object (which I kind of like because it allows much
more standardisation and easier addition of i.e. sorting/updating
logic), or not and let the user be free to add anything he/she likes
(which is still a possible with annotation objects as well of course).
> An annotation can also take the form of a comparison between two
> sequences. Perhaps two segments are equivalent to one another. In
> order to achieve this in BSML, a <segment-set> tag can be used to
> enclose a set of segments represented by <segment> tags. For example,
> the tag shown in Listing 2 expresses that a region from sequence
> AB1432 and sequence NZ5723 are equivalent.
This is also a very nice thing to keep in mind. For instance it allows
to backtrace how a construct was build...
> One of the core strengths of BSML, however, is the availability of
> public converters to translate from other formats into BSML. This
> allows consumers of bioinformatics data to pull together information
> from disparate sources into a single common language for their
> research. Surprisingly enough, many of these converters were not
> developed by LabBook, the company driving BSML as a standard, but
> rather from third-party adopters and supporters of BSML. For example,
> Bristol-Myers Squibb has release an open-source adapter into the
> BioPerl project that translates between the SeqIO format and BSML.
> Similarly, Cold Spring Harbor Laboratory has released a translator
> between the ASN.1 format used by GenBank and BSML. The European
> Bioinformatics Institute provides a translation between EMBL documents
> and BSML. Every day more and more translators become available, making
> it possible for researchers and application developers to build tools
> around BSML while accessing a variety of data sources.
This is nice as well of course, by mirroring to some extend the setup
of BSML internally we can use these adopters to more easily implement
the reader/writer classes instead of reinventing the wheel...
Finally, just a thinking out loud here, if we go for a number of often
used pre-defined tags for annotations and features, how do we then
"define" them? Perhaps it's nice to have a category added to the
BCAbstractSequence class, i.e. annotation-extensions that predefines
methods to add these predefined methods like:
-setAuthorname: (NSString *)author;
-setCreationdate: (NSCalendarDate *)date; (note the possibility to
return a calendardate instead of string, this way we ensure that all
dates will be created equally instead of someone entering: 20-2-2003
and the other 2/20/2003).
etc, including things like predefined position specific annotations
Although I'm not a fan of categories in frameworks, here it might be a
nice way to separate the code instead of adding all these things in the
abstract sequence class. Just an idea though...
On 21-feb-05, at 7:42, Charles PARNOT wrote:
> At 9:46 PM -0500 2/20/05, Koen van der Drift wrote:
>> On Feb 20, 2005, at 2:05 PM, Charles PARNOT wrote:
>>> It is because you have not #import-ed the BCSequenceProtein header,
>>> so the compiler does not know it is a subclass of
>> Thanks - I added some more code and fixes. If everyone agress this is
>> the right approach for the annotations, I will start adding more
>> - Koen.
> I do think the NSMutableDictionary is very appropriate for
> annotations. Regarding the current implementation, it looks good to
> me. My only comment is I don't think we need a 'setAnnotations:'
> method. This is a bit dangerous, particularly with a mutable
> dictionary as argument. Instead, methods 'removeAnnotationWithKey:',
> 'removeAllAnnotations' and 'addAnnotationsFromDictionary:' will do the
> BTW, I corrected some of the code because I needed the compiled
> framework for the testing unit thing and there was a compiler error.
> Let me know, guys, if and when you want me to incorporate the tests in
> the cvs?
> Also, Koen, I have one question about the symbolSet: it seems that all
> instances of one sequence type use the same symbol set. Is that right?
> Do you think it is going to stay like this, or are there special cases
> where we will want to change that? If this is true, we could leverage
> that knowledge to simplify the init methods of the sequence classes.
> Let me know, I can explain better what I mean.
> Help science go fast forward:
> Charles Parnot
> charles.parnot at stanford.edu
> Room B157 in Beckman Center
> 279, Campus Drive
> Stanford University
> Stanford, CA 94305 (USA)
> Tel +1 650 725 7754
> Fax +1 650 725 8021
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
** Alexander Griekspoor **
The Netherlands Cancer Institute
Department of Tumorbiology (H4)
Plesmanlaan 121, 1066 CX, Amsterdam
Tel: + 31 20 - 512 2023
Fax: + 31 20 - 512 2029
AIM: mekentosj at mac.com
E-mail: a.griekspoor at nki.nl
Windows is a 32-bit patch to a 16-bit shell for an 8-bit
operating system, written for a 4-bit processor by a 2-
bit company without 1 bit of sense.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 8956 bytes
Desc: not available
More information about the Biococoa-dev