[Pipet Devel] Re: SYNERGY

Humberto Ortiz Zuazaga hortiz at neurobio.upr.clu.edu
Fri Oct 15 15:55:31 EDT 1999


> > 0.  Loci is data independent, netgenetics in bioinformatics-centric
> 
> Right.  Speaking a data independence.  It looks as though SYNERGY uses an
> internal format for their biodata, which we have decided against.  An internal
> format requires 2 conversions between incompatible components:
> 
>   GENBANK   1   Internal  2   Analysis
>   document --->  format  ---> that doesn't
>                               read GENBANK
> 
> The 'converter locus' scheme that Loci uses, would do only 1 conversion via the
> converter.

Jeff, you've got this exactly backwards. We need an internal format, we 
decided it would be xml based, perhaps extended BSML. Converters should be 
written to any format to ours and from any format to ours, otherwise we get to 
write a converter for every pair of formats we support.

Example, image we want to support 4 file formats:

genbank - internal
pdb - internal
fasta - internal
bsml - internal

vs converters between the same 4 formats:

genbank - pdb
genbank - fasta
genbank - bsml
pdb - fasta
pdb - bsml
fasta - bsml

this comparison gets worse as you add more file formats. This is why the 
netpbm tools all convert to pnm files.

What we had decided is that we can defer defining our file formats until we 
actually have any loci that use them, and that we can have many small 
languages instead of a big language that tries to capture all possible data 
types.

So we'll have an internal format for nucleotide sequences, one for amino acid 
sequences, one for multi sequence objects, one for sequence annotations, one 
for bibliographic references, ...

-- 
Humberto Ortiz Zuazaga
Bioinformatics Specialist
Institute of Neurobiology
hortiz at neurobio.upr.clu.edu






More information about the Pipet-Devel mailing list