[Biodevelopers] RDBMS and Bioinformatics

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Tue Apr 13 11:31:04 EDT 2004


Hi, I would like to come back to you about some points in this email.

On Tue, 16 Mar 2004, Marc Dumontier wrote:

> Hi,
> 
> So this is in regards to the current development efforts for the BIND 
> database.
> 
> The main reasoning for data being stored in an XML blob is that the BIND 
> data model is very large and complex, and hierarchical (i believe the 
> current spec. has approximately 2200 individual fields).
> 
> We use JAXB (Java API For XML Binding) to do the marshalling and 
> unmarshalling of the XML document into auto-generated data structures. 
> This allows us to work with the data in a very simple way without 
> dealing directly with DOM or SAX. This is used extensively by the BIND 
> Submission System.
> 
> We have also developed a module which takes the XML, and text indexes 
> the document with the help of Lucene (http://jakarta.apache.org/lucene). 
> This framework allows us to only worry about what to index, but also 
> provides the query engine, file handling, etc. We use this code for our 
> BIND Browsing System. We can do most searches under 1 millisecond, 
> stream the data to the browser, and never touch the database.
> 
> We believe this was the best way to deal with a highly complex data 
> model to provide not only field-specific searching which is very fast, 
> but also an easy to generate API for working with the data.
> 
> I can't imagine dealing with a 3000 table relational database..scary.


How does the use of XML make the data model less scary? 

I see how the XML is convenient for your read / write API, and how the
hierarchical data model is more naturally encoded in XML. I see that it is
because you use XML that you have access to the fast indexing technology.

But how do you deal with issues of data integrity? I get the feeling I
should learn XML schema... Does the BIND datamodel have an XML schema with
constraints on the data?

I can't help feeling that a big / complex data model is probelmatic for
any system, nomater what the format.

Thanks very much for the feedback,

Cheers,
Dan.


> we should be releasing a beta of this software in a short while...please 
> visit http://www.bind.ca periodically for more information.
> 
> Marc Dumontier
> BIND Software Developer
> Blueprint Initiative
> Mt. Sinai Hospital
> Toronto,ON
> 
> Dan Bolser wrote:
> 
> >On Tue, 16 Mar 2004, Michel Dumontier wrote:
> >
> >  
> >
> >>>Same goes for BIND, they plan to use RDB, but not in a conventional way
> >>>(so far as I understand).
> >>>
> >>>      
> >>>
> >>BIND (http://bind.ca) stores bind-objects based on ASN.1 specification
> >>(ftp://ftp.blueprint.org/pub/BIND/spec/, also available as XML DTD and
> >>Schema), as ASN.1/XML in BLOB fields in the database table.  BIND makes use
> >>of field-specific indexing to be able to search for any particular object or
> >>set of objects that match the search criteria.  The relational aspect is
> >>really more for curatorial work and tracking, afaik...
> >>    
> >>
> >
> >
> >So it wont be like an XML query system? Sorry if I misunderstand, but it
> >sounds like you just do plain text index on an XML blob, but is is more
> >than that?
> >
> >Generally, can anyone tell me  what is the point of XML schema when
> >relational schema have existed for years with well understood maths, query
> >language and theories of relational design? I understand XML as a
> >transport medium, but why make it the basis for your object model over the
> >RDB relational schema? Perhaps object orented datamodeling can do things
> >relational modeling can't, but at what cost? I hate sounding old, but what
> >was wrong with the RDB that we have to invent X-path and the like?
> >
> >Anyone on the list remember when relational databases were 'the new
> >thing'?
> >
> >Dan.
> >
> >  
> >
> >>Michel Dumontier
> >>PhD Candidate
> >>Samuel Lunenfeld Research Institute, Mt. Sinai Hospital
> >>Department of Biochemistry, University of Toronto
> >>Toronto, ON M5G1X5
> >>micheld at mshri.on.ca
> >>http://blueprint.org
> >>
> >>
> >>
> >>_______________________________________________
> >>Biodevelopers mailing list
> >>Biodevelopers at bioinformatics.org
> >>https://bioinformatics.org/mailman/listinfo/biodevelopers
> >>
> >>    
> >>
> >
> >_______________________________________________
> >Biodevelopers mailing list
> >Biodevelopers at bioinformatics.org
> >https://bioinformatics.org/mailman/listinfo/biodevelopers
> >  
> >
> 
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers
> 




More information about the Biodevelopers mailing list