[Biodevelopers] RDBMS and Bioinformatics

Marc Dumontier mrdumont at blueprint.org
Tue Mar 16 14:16:27 EST 2004


Hi,

So this is in regards to the current development efforts for the BIND 
database.

The main reasoning for data being stored in an XML blob is that the BIND 
data model is very large and complex, and hierarchical (i believe the 
current spec. has approximately 2200 individual fields).

We use JAXB (Java API For XML Binding) to do the marshalling and 
unmarshalling of the XML document into auto-generated data structures. 
This allows us to work with the data in a very simple way without 
dealing directly with DOM or SAX. This is used extensively by the BIND 
Submission System.

We have also developed a module which takes the XML, and text indexes 
the document with the help of Lucene (http://jakarta.apache.org/lucene). 
This framework allows us to only worry about what to index, but also 
provides the query engine, file handling, etc. We use this code for our 
BIND Browsing System. We can do most searches under 1 millisecond, 
stream the data to the browser, and never touch the database.

We believe this was the best way to deal with a highly complex data 
model to provide not only field-specific searching which is very fast, 
but also an easy to generate API for working with the data.

I can't imagine dealing with a 3000 table relational database..scary.

we should be releasing a beta of this software in a short while...please 
visit http://www.bind.ca periodically for more information.

Marc Dumontier
BIND Software Developer
Blueprint Initiative
Mt. Sinai Hospital
Toronto,ON


Dan Bolser wrote:

>On Tue, 16 Mar 2004, Michel Dumontier wrote:
>
>  
>
>>>Same goes for BIND, they plan to use RDB, but not in a conventional way
>>>(so far as I understand).
>>>
>>>      
>>>
>>BIND (http://bind.ca) stores bind-objects based on ASN.1 specification
>>(ftp://ftp.blueprint.org/pub/BIND/spec/, also available as XML DTD and
>>Schema), as ASN.1/XML in BLOB fields in the database table.  BIND makes use
>>of field-specific indexing to be able to search for any particular object or
>>set of objects that match the search criteria.  The relational aspect is
>>really more for curatorial work and tracking, afaik...
>>    
>>
>
>
>So it wont be like an XML query system? Sorry if I misunderstand, but it
>sounds like you just do plain text index on an XML blob, but is is more
>than that?
>
>Generally, can anyone tell me  what is the point of XML schema when
>relational schema have existed for years with well understood maths, query
>language and theories of relational design? I understand XML as a
>transport medium, but why make it the basis for your object model over the
>RDB relational schema? Perhaps object orented datamodeling can do things
>relational modeling can't, but at what cost? I hate sounding old, but what
>was wrong with the RDB that we have to invent X-path and the like?
>
>Anyone on the list remember when relational databases were 'the new
>thing'?
>
>Dan.
>
>  
>
>>Michel Dumontier
>>PhD Candidate
>>Samuel Lunenfeld Research Institute, Mt. Sinai Hospital
>>Department of Biochemistry, University of Toronto
>>Toronto, ON M5G1X5
>>micheld at mshri.on.ca
>>http://blueprint.org
>>
>>
>>
>>_______________________________________________
>>Biodevelopers mailing list
>>Biodevelopers at bioinformatics.org
>>https://bioinformatics.org/mailman/listinfo/biodevelopers
>>
>>    
>>
>
>_______________________________________________
>Biodevelopers mailing list
>Biodevelopers at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/biodevelopers
>  
>




More information about the Biodevelopers mailing list