[Pipet Devel] [Fwd: [Pipet Devel] constructing the command-line]

Wed Dec 1 20:44:26 EST 1999

> > Alternatively, for a loci
> > interface, parsing the *.acd files might generate
> > a series of linked loci.
> 
> ...which can be combined into one composite locus.
> 

Yes, that would be the idea. The composite locus would encapsulate the I/O and 
parameters.   

> > One hassle with doing this is the
> > acd interface will change, incrementally ( see below).
> 
> Will it change because the entire interface is still under development, or
> because individual programs will require changes to their *.acd files?

Well, it seems like anything  that has a version 0.0.4 will change 8-).  But I would imagine
that before all is said and done it will be different.  I am not doing justice to the acd scheme, perhaps 
because I am trying to use it in a different way.  

> Again, we can promote something as our 'preferred format' and use it as an
> intermediate in format conversions.  Just because we don't hard-code a data
> format into Loci, it doesn't mean we can't push for some new standard.  I've
> heard some interesting ideas for a universal bioinformatics XML.  Peter
> Murray-Rust even started a mailing list to promote the development of an
> _open_ standard for such a beast.  But the list now seems dead.  If some Lab
> Rats want to start an effort here, I'm all for it.

I think there are two things here to consider. First, if you are going from genbank to 
fasta, why have an intermediate format?  Second, if you were going to write de novo 
some analysis program to work with loci, what format would you use? If you could settle on that,
that would be the internal format, which might not be a format at all but rather a sequence object.

> Yeah, I've been following the EMBOSS list.  It's funny that some programs
> 'assume' you are using a certain type of data.  And the same goes for data
> formats.  How hard is it to have one word to say what it is you're dealing
> with?

Some programs work with both Nucs and Prots,  FASTA, BLAST, CLUSTAL to name a few.  I think historically
someone thought it was a good idea to consider sequences with >80% AT(U)CG as nucleic acids, of course
that has problems right away, just like 99 0r 88.

>     <dna>
>       GCATAAGCATGCAGATC
>     </dna>
> 
>     <protein>
>       ACGATCATCAGCATCAG
>     </protein>
> 

Heh heh or ATCGRTSNRYTACG.

> I had a problem like this with GenBank once.  You might think GenBank has all
> the descriptors needed to annotate a nucleotide sequence.  But...hmmm...where
> did that DNA come from anyway?  The nucleus?  The mitochondria?  The
> chloroplasts?  There's no descriptor for that!!!

There is but someone has to annotate that section.  Check out Sequin on the NCBI site. There is a section for location
of the sequence ( genomic, mitochondrial, ...).  Or check out seq.asn  in the NCBI toolkit.      

> 
> Cheers.
> Jeff

-- 
 .david
 David Lapointe
"Hokey religions and ancient weapons are no 
match for a good blaster at your side, kid,"