> > Alternatively, for a loci > > interface, parsing the *.acd files might generate > > a series of linked loci. > > ...which can be combined into one composite locus. > Yes, that would be the idea. The composite locus would encapsulate the I/O and parameters. > > One hassle with doing this is the > > acd interface will change, incrementally ( see below). > > Will it change because the entire interface is still under development, or > because individual programs will require changes to their *.acd files? Well, it seems like anything that has a version 0.0.4 will change 8-). But I would imagine that before all is said and done it will be different. I am not doing justice to the acd scheme, perhaps because I am trying to use it in a different way. > Again, we can promote something as our 'preferred format' and use it as an > intermediate in format conversions. Just because we don't hard-code a data > format into Loci, it doesn't mean we can't push for some new standard. I've > heard some interesting ideas for a universal bioinformatics XML. Peter > Murray-Rust even started a mailing list to promote the development of an > _open_ standard for such a beast. But the list now seems dead. If some Lab > Rats want to start an effort here, I'm all for it. I think there are two things here to consider. First, if you are going from genbank to fasta, why have an intermediate format? Second, if you were going to write de novo some analysis program to work with loci, what format would you use? If you could settle on that, that would be the internal format, which might not be a format at all but rather a sequence object. > Yeah, I've been following the EMBOSS list. It's funny that some programs > 'assume' you are using a certain type of data. And the same goes for data > formats. How hard is it to have one word to say what it is you're dealing > with? Some programs work with both Nucs and Prots, FASTA, BLAST, CLUSTAL to name a few. I think historically someone thought it was a good idea to consider sequences with >80% AT(U)CG as nucleic acids, of course that has problems right away, just like 99 0r 88. > <dna> > GCATAAGCATGCAGATC > </dna> > > <protein> > ACGATCATCAGCATCAG > </protein> > Heh heh or ATCGRTSNRYTACG. > I had a problem like this with GenBank once. You might think GenBank has all > the descriptors needed to annotate a nucleotide sequence. But...hmmm...where > did that DNA come from anyway? The nucleus? The mitochondria? The > chloroplasts? There's no descriptor for that!!! There is but someone has to annotate that section. Check out Sequin on the NCBI site. There is a section for location of the sequence ( genomic, mitochondrial, ...). Or check out seq.asn in the NCBI toolkit. > > Cheers. > Jeff -- .david David Lapointe "Hokey religions and ancient weapons are no match for a good blaster at your side, kid,"