Locians, The following is a reply I got from Guy Hulbert on the xml-mol mailing list. I would strongly suggest the Loci XML X-perts subscribe to this list, since this is a legitimate attempt to come up with some standards for XML and bioinformatics. It would be nice if we could participate and improve Loci's compatibility with future projects. http://ala.vsms.nottingham.ac.uk/biodom/xml-mol/ Jeff -----------------8<------------------- On Fri, 28 May 1999, J.W. Bizzaro wrote: <snip> JWB> Anyway, I am coordinating an ambitious GNU project for UNIX-type systems I'll have to check out your website then. <snip> JWB> some ideas: JWB> JWB> Sequence definition: BioML + BSML JWB> Structure def: mmCIF/XML + some CML JWB> Phylogeny def: ??? probably make our own JWB> Database query def: maybe from BLAST/XML JWB> Workflow def: make our own JWB> GUI def: from GLADE/XML JWB> Graphics def: maybe from some KDE programs <snip> Previously visual genomics had some restrictions on use of BSML. They still regard this as their intellectual property. See: http://www.visualgenomics.com/bsml/index.html for their current ideas. I don't think this is suitable for a "GNU project". I don't like either BioML or BSML. It seems to me that they are much too large --- trying to provide a complete 'bio-html'. With XML namespaces [ see: http://www.xml.com/xml/pub/1999/01/namespaces.html ] one ought to be able to to put together small DTDs for specialized data. Consider DNA sequences. All one needs is <dna> which is a string of the characters CTAG. One might allow ignorable whitespace and base-numbers: <dna> 1 tcgattcca gca... 51 gcctacaac acg... ... </dna> which is understood by many present applications (without the tags). There is also a standard alphabet which allows ambiguous bases to be included, e.g. 'N' stands for any of A,T,C,G etc. It may be desirable to represent these sequences as <dna-X> where X is the alphabet name. However, to manage DNA sequences, one doesn't need much more than this. Now, this is a bit too small but it would be really nice to have a standard Nucleic acid DTD --- or perhaps "Sequence" DTD. It would have <dna>, <rna> <protein>, and perhaps variations for generalized sequence alphabets. If everyone would use this then the problem of data-interchange between databases is much simplified. Suppose bio??? is some mythical organization which coordinates the standard DTDs and everyone agreed to use them then XML namespaces would allow us to represent (for example) Genbank data like this: [I stole a bit of this from Tim Bray's page on namespaces referenced above] <?xml ... ?> <h:html xmlns:s="http://www.bio???.org/DTD/sequence" xmlns:g="http://www.ncbi.nlm.nih.gov/DTD/genbank" xmlns:h="http://www.w3.org/HTML/1998/html4"> <h:head><h:title>My Sequence</h:title></h:head> <body> <g:LOCUS>blah blah blah ... </g:LOCUS> ... <s:dna> 1 tcgattcca gca... 51 gcctacaac acg... ... </s:dna> </body> </h:html> and with approiate style sheets, Mozilla and Internet Expoit^H^Hrer would be able to display them. I'm keen to work with anyone on getting these small things set up. As an experiment, I'm planning to play with the Genbank data to create the basic facility to create documents like the above. ---- Guy Hulbert, Systems Manager Bioinformatics Supercomputing Centre (416) 813-8876 555 University Avenue email: guy at bioinfo.sickkids.on.ca The Hospital for Sick Children http: www.bioinfo.sickkids.on.ca Toronto, ON, M5G 1X8, CANADA.