That's interesting. One of the things I'm working on (just about the only thing I'm working on, it seems) is a gene expression database that will support multiple species as well as multiple technologies (glasss microarrays, Affy chips, AFLP, SAGE, etc). As you might imagine, it;s one thing to store multiple species information; it's quite another ot try to make it interpretable and queriable and expect to get anything sensible back. We're currently using the NCBI taxonomy tree and looking forward to seeing more effort thru the Gene Ontology project, but it sounds like this might be a more flexible solution if you can do dynamic modifications to the tree by manipulating these WHAX trees. I'm off to check out her site, but it's unresponsive right now: http://cbil.humgen.upenn.edu/epodb/epodb.html How are you currently representing this problem at the Field Museum? Especially the dynamic nature of the problem? Cheers Harry "J. Steinbachs" wrote: > > > Hey all... > > I went to an interesting seminar yesterday at U of Chicago. Susan > Davidson (co-director, Center for Bioinformatics, UPenn) gave a talk on > "Refreshing the Tower of Babel." > > Caveat: I know very little about databases. > > The application: EpoDB, a database created at UPenn Center for > Bioinformatics, designed to study gene regulation during differentiation > and development of vertebrate red blood cells. > > The problems: extracting data from a sorts of databases with different > underlying structures; cleansing the data (error removal); integration; > annotation; updating (particularly, updating without losing the > information added/removed during data cleansing). > > I guess Susan is a strong proponent in the DB field for complex value > databases (blah blah blah ginger... don't ask me what those are). > However, for this problem, she and her colleagues have chosen to use > XML, modifying it a bit into something they call WHAX. > > The data can be represented as a "WHAX tree", with the tag representing > the branches and the tag value representing the node. Additions to the > a subset of the data can be integrated into the larger database by > simple manipulations of WHAX trees. > > I originally went because of the application to genetic data. But then > I got sidetracked... Here at the Museum, we have specimen data (21+ > million specimens in total) in which species names change, higher > taxonomic information changes, and so on, all of which should be tracked > within the database. In some cases, we are integrating the traditional > genetic data into our specimen databases; i.e., in newer portions of our > collection of specimens, we have a one-to-one correspondence between the > dead dried pressed plant (or the stuffed animal and corresponding > skeleton), the DNA extracted from said plant (or animal), and a record > in our developing databases (birds are separate from plants are separate > from fishes...). The computer scientists were intrigued by this type of > data :) This WHAX "thing" would be perfect for tracking all that > information. > > Perhaps "bioinformatics" is currently too narrowly defined (organisms > have more characteristics about them than just their DNA). If we, the > community of manipulators of biological data, do come up with an open > standard for representing said data, that standard should be flexible > enough to encompass all the characteristics about the organisms. And, > in light of all the stupid patenting going on, perhaps an open standard > is needed before big bad multinational corporation patents it first. > > Just a few thoughts... > -jennifer > > -------------------------- > J. Steinbachs, PhD > Computational Biologist > Dept of Botany > The Field Museum > Chicago, IL 60605-2496 > > office: 312-665-7810 > fax: 312-665-7158 > -------------------------- > > _______________________________________________ > pipet-devel maillist - pipet-devel at bioinformatics.org > http://bioinformatics.org/mailman/listinfo/pipet-devel -- Cheers, Harry Harry J Mangalam -- (949) 856 2847 -- mangalam at home.com