I'll reply to a few LocusML (or LociML?) points here and then infrastructure points in another e-mail. Justin Bradford wrote: > > I've been busy with school lately (in fact, I really should be studying > right now for an exam Monday), so I haven't gotten much of anything done. That's fine. We all appreciate the work you've done, especially this e-mail/book ;-) > First, reading BSML files makes a lot of things seem overly complex. > Second, BioML looks cleaner, but I hate the organism tag enclosing > everything. While that information could be useful for a structure or > sequence, it would be better to reference it, rather than enclosing it. I think the importance of an organism tag depends on the audience. Most biochemists couldn't care less about the organism. But to microbiologists, geneticists and the like, this information is very important. What matters to you, it seems, is that the organism information _has_to_be_ present. But I think as long as it _can_ be inserted at some level, we'll do fine. > Also, BSML doesn't seem to cover protein sequences, while BioML does. > However, BSML does seem to allow for more thorough definition of features > in the sequence. Of course we'll take the best of both worlds :-) > Also, BSML, and even BioML to a degree, try to define display information > as well. Do we want that in our ML? I can't see why we would need it, > since we have an intelligent client. No, we don't need display information. You're absolutely correct that each locus should be intelligent enough to know how to interpret the data that are targeted for it (and what locus to pass other data types to if they are encountered). > I would like to effectively merge BioML and BSML, incorporating protein > sequence information and feature specification, and use more descriptive > tag names (like BioML) for defining the sequences and features. I wouldn't > put any layout information in. Does anyone think we need it? By layout, you mean display information? I don't think we need it. > Also, for structure, there don't appear to be any MLs even attempting to > do this, with the exception of CML. So, my idea is to take the PDB file > format and XMLize it. If any of you know any glaring holes in PDB let > me know, and we can work around those. Now Konrad's ears should have perked up here. He'll have the final word on a format for structural information, but I recall he does not like any of the well-accepted formats for structure, especially not PDB. This is Konrad's chance to show the world what the perfect description of structure looks like ;-) What I do want, with respect to PDB's however, is an easy way to translate from PDB to LocusML, because PDB is the major format for 3D structure right now. So, Konrad, can you help us make LocusML the perfect structural (among other things) ML? Is there a way we can change CML to describe biomacromolecules the way you want it to? > Also, these sections will need some tags to allow for defining > relationships between multiple objects. It might describe homology, > alignment, etc. between two or more sequences, or for structures, it > might relate 3D similarities, regions of high interaction (binding > probabilities through free energy calculations), and other similar > concepts. Yes. That's something I haven't thought much about. > Generated data should also return information about the analysis process, > like the algorithm used, statistical probabilities, etc. Yes, great! Jeff -- J.W. Bizzaro Phone: 617-552-3905 Boston College mailto:bizzaro at bc.edu Department of Chemistry http://www.uml.edu/Dept/Chem/Bizzaro/ --