[Biococoa-dev] Annotation

Charles PARNOT charles.parnot at stanford.edu
Mon Feb 21 13:23:21 EST 2005


At 4:26 PM +0100 2/21/05, Alexander Griekspoor wrote:
>Nice work on the annotations guys, looks nice and indeed a dictionary is the obvious way to go.
>Just a few things that came to mind and like to share with you. Two issues with the annotations to think about.
>First, it would be nice to have a standard set defined for common annotations, like author, organism etc. The question of course is what list should be adhere to? The EMBL format, the NCBI format?
>
>Second, while searching the web a bit, I came along the BSML XML format which seems to become a kind of standard for new sequence formats. It would perhaps be nice (and wise) to have a look at the documents they made because they (obviously) studied the annotation/feature issue very well.
>You can find more info at: http://www.bsml.org/
>Now, just to make sure, bsml is a file format and one we could implement of course, internally the dictionary approach is for us the way to go, but it might be an idea to adhere to there nomenclature and/or tree/hierarchy. I came already across some nice ideas to keep in mind:

Yes, we need a standard set internal to BioCocoa. Translations to and from other formats could be easily implemented using plists. Queries for a particular key could be smart:
* first look the annotation dictionary
* if nil, use a dictionary look-up table to see if the query key passed as argument could be translated into another key (e.g. query key is 'species' but we could also try 'organism')
* return the value
...e.g. if a query is made with the NCBI species key, but is not found in the annotation dictionary, try all the other equivalent keys of different formats, starting with the standard BioCocoa format.

In addition, the query key could be added to the annotation dictionary so future queries with the same query key will be faster.

Anyway, the bsml format looks like a good starting point. I am not really in the field so it is difficult to tell how widely accepted and used the format is. Whatever we choose, we will have to translate keys anyway. So we might as well choose something as general as possible and the bsml looks appropriate.

charles

-- 
Help science go fast forward:
http://cmgm.stanford.edu/~cparnot/xgrid-stanford/

Charles Parnot
charles.parnot at stanford.edu

Room  B157 in Beckman Center
279, Campus Drive
Stanford University
Stanford, CA 94305 (USA)

Tel +1 650 725 7754
Fax +1 650 725 8021



More information about the Biococoa-dev mailing list