[Biodevelopers] RDBMS and Bioinformatics

Tue Apr 13 11:03:10 EDT 2004

Dan Bolser wrote:

>Hi, I would like to come back to you about some points in this email.
>
>On Tue, 16 Mar 2004, Marc Dumontier wrote:
>
>  
>
>>Hi,
>>
>>So this is in regards to the current development efforts for the BIND 
>>database.
>>
>>The main reasoning for data being stored in an XML blob is that the BIND 
>>data model is very large and complex, and hierarchical (i believe the 
>>current spec. has approximately 2200 individual fields).
>>
>>We use JAXB (Java API For XML Binding) to do the marshalling and 
>>unmarshalling of the XML document into auto-generated data structures. 
>>This allows us to work with the data in a very simple way without 
>>dealing directly with DOM or SAX. This is used extensively by the BIND 
>>Submission System.
>>
>>We have also developed a module which takes the XML, and text indexes 
>>the document with the help of Lucene (http://jakarta.apache.org/lucene). 
>>This framework allows us to only worry about what to index, but also 
>>provides the query engine, file handling, etc. We use this code for our 
>>BIND Browsing System. We can do most searches under 1 millisecond, 
>>stream the data to the browser, and never touch the database.
>>
>>We believe this was the best way to deal with a highly complex data 
>>model to provide not only field-specific searching which is very fast, 
>>but also an easy to generate API for working with the data.
>>
>>I can't imagine dealing with a 3000 table relational database..scary.
>>    
>>
>
>
>How does the use of XML make the data model less scary? 
>
>I see how the XML is convenient for your read / write API, and how the
>hierarchical data model is more naturally encoded in XML. I see that it is
>because you use XML that you have access to the fast indexing technology.
>
>But how do you deal with issues of data integrity? I get the feeling I
>should learn XML schema... Does the BIND datamodel have an XML schema with
>constraints on the data?
>
>I can't help feeling that a big / complex data model is probelmatic for
>any system, nomater what the format.
>
>Thanks very much for the feedback,
>
>Cheers,
>Dan.
>  
>

Using the XML document instead of a fully relational model is much less 
scary because you don't have to deal with creating complex SQL to select 
data and to update data properly. If you have to use 30 SQL to update 
many tables, you've got alot of points of failure there. It's just much 
easier to deal with a single document which contains all the date, and 
to have specific indexes on that data to query against.

Our software is easy to update, when the underlying data specification 
is updated as well. We just regenerate the XML Schema from the ASN.1 
spec, invoke jaxb, and start working with the new classes. We don't have 
to change any SQL, or anything.

BIND does have an XML schema which imposes restraints and defines the 
data types, and since JAXB works off this schema, our data structures 
are all properly typed. The XML document generated is always validated 
against the schema before being commited to the database.

Marc Dumontier

>
>  
>
>>we should be releasing a beta of this software in a short while...please 
>>visit http://www.bind.ca periodically for more information.
>>
>>Marc Dumontier
>>BIND Software Developer
>>Blueprint Initiative
>>Mt. Sinai Hospital
>>Toronto,ON
>>
>>Dan Bolser wrote:
>>
>>    
>>
>>>On Tue, 16 Mar 2004, Michel Dumontier wrote:
>>>
>>> 
>>>
>>>      
>>>
>>>>>Same goes for BIND, they plan to use RDB, but not in a conventional way
>>>>>(so far as I understand).
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>BIND (http://bind.ca) stores bind-objects based on ASN.1 specification
>>>>(ftp://ftp.blueprint.org/pub/BIND/spec/, also available as XML DTD and
>>>>Schema), as ASN.1/XML in BLOB fields in the database table.  BIND makes use
>>>>of field-specific indexing to be able to search for any particular object or
>>>>set of objects that match the search criteria.  The relational aspect is
>>>>really more for curatorial work and tracking, afaik...
>>>>   
>>>>
>>>>        
>>>>
>>>So it wont be like an XML query system? Sorry if I misunderstand, but it
>>>sounds like you just do plain text index on an XML blob, but is is more
>>>than that?
>>>
>>>Generally, can anyone tell me  what is the point of XML schema when
>>>relational schema have existed for years with well understood maths, query
>>>language and theories of relational design? I understand XML as a
>>>transport medium, but why make it the basis for your object model over the
>>>RDB relational schema? Perhaps object orented datamodeling can do things
>>>relational modeling can't, but at what cost? I hate sounding old, but what
>>>was wrong with the RDB that we have to invent X-path and the like?
>>>
>>>Anyone on the list remember when relational databases were 'the new
>>>thing'?
>>>
>>>Dan.
>>>
>>> 
>>>
>>>      
>>>
>>>>Michel Dumontier
>>>>PhD Candidate
>>>>Samuel Lunenfeld Research Institute, Mt. Sinai Hospital
>>>>Department of Biochemistry, University of Toronto
>>>>Toronto, ON M5G1X5
>>>>micheld at mshri.on.ca
>>>>http://blueprint.org
>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>Biodevelopers mailing list
>>>>Biodevelopers at bioinformatics.org
>>>>https://bioinformatics.org/mailman/listinfo/biodevelopers
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>_______________________________________________
>>>Biodevelopers mailing list
>>>Biodevelopers at bioinformatics.org
>>>https://bioinformatics.org/mailman/listinfo/biodevelopers
>>> 
>>>
>>>      
>>>
>>_______________________________________________
>>Biodevelopers mailing list
>>Biodevelopers at bioinformatics.org
>>https://bioinformatics.org/mailman/listinfo/biodevelopers
>>
>>    
>>
>
>_______________________________________________
>Biodevelopers mailing list
>Biodevelopers at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/biodevelopers
>  
>