[Pipet Devel] [Fwd: [Pipet Devel] constructing the command-line]

Wed Dec 1 13:09:40 EST 1999

David Lapointe wrote:
> 
> I am glad Jeff reposted this.

And I'm glad that you're glad.

> I have been creating perl CGI interfaces to EMBOSS programs.
> I was writing to Jeff about this and how it would be great
> to parse the *.acd files for each program ( these define the
> input and output data types, which are required, the data
> ranges, etc) into a GUI interface. This might be similar to
> GDE but Glade seems very promising. Alternatively, for a loci
> interface, parsing the *.acd files might generate
> a series of linked loci.

...which can be combined into one composite locus.

> One hassle with doing this is the
> acd interface will change, incrementally ( see below).

Will it change because the entire interface is still under development, or
because individual programs will require changes to their *.acd files?

> As an aside on the internal data representation,  you could
> either have one or not, similar to what Brad just
> mentioned about  using databases. Personally I think  format
> conversions are too lossy wrt  annotations. Also, short
> of rewriting (almost) every application outside of loci, you
> would need to deal with format conversions at some point.

Again, we can promote something as our 'preferred format' and use it as an
intermediate in format conversions.  Just because we don't hard-code a data
format into Loci, it doesn't mean we can't push for some new standard.  I've
heard some interesting ideas for a universal bioinformatics XML.  Peter
Murray-Rust even started a mailing list to promote the development of an
_open_ standard for such a beast.  But the list now seems dead.  If some Lab
Rats want to start an effort here, I'm all for it.

> The EMBOSS list has interesting thread going about protein
> sequences with very high ATCG content, so they must
> be forced to protein type otherwise the program thinks they
> are nucleic acids. The issue is adding a new flag for this
> forcing, what will be the flags name.  The diversity of
> opinion on this issue is heartening.  BLAST for example
> does this up front. You have to tell the program what type you
> have. Other programs tag sequences at the top with their type,
> but that would involve changing the databases, to create a new
> data format, like FBF.

Yeah, I've been following the EMBOSS list.  It's funny that some programs
'assume' you are using a certain type of data.  And the same goes for data
formats.  How hard is it to have one word to say what it is you're dealing
with?

    <dna>
      GCATAAGCATGCAGATC
    </dna>

    <protein>
      ACGATCATCAGCATCAG
    </protein>

I had a problem like this with GenBank once.  You might think GenBank has all
the descriptors needed to annotate a nucleotide sequence.  But...hmmm...where
did that DNA come from anyway?  The nucleus?  The mitochondria?  The
chloroplasts?  There's no descriptor for that!!!

Cheers.
Jeff
-- 
                      +----------------------------------+
                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |
                      +----------------------------------+