[Biodevelopers] RDBMS and Bioinformatics

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Thu Mar 18 04:46:19 EST 2004


On Tue, 16 Mar 2004, Joe Landman wrote:

> On Tue, 2004-03-16 at 17:37, Dan Bolser wrote:
> 
> > > Basically the object<->relational mapping is on one-to-one and onto in
> > > most cases, so you have to resort to "hacks" like serialization to make
> > 
> > Sorry, do you mean 'is not 1 to 1' ?
> 
> s/on one-to-one/not one-to-one/


Just an aside, but from the perspective of databases (why they exist /
what they do / etc), this extra complexity provided by an object model can
create performance problems in an RDB setting - you loose your guarantee
that (properly formed) queries will finish in reasonable time
(proportional to the amount of data).


> I look at XML as more of a "portable" way to represent complex data. 
> RDBMS's are not portable in a binary sense (in most cases I am aware of)
> across ABI's.  Look at XML as akin to ASN.1.  They are not the same, but
> generally serve similar functions.  It is however, somewhat hard to read
> binary ASN.1 data, and infer the structure from the file.  What is
> really nice about XML is it is for the most part programming language
> and platform independent.  I am not sure if the tags can be Unicode, so
> it might not be human language independent.


My re-occurring worry about all this is when it comes to using XML as a
data repository (rather than an intermediate state) or ASN.1 for that
matter, the query languages are not mature, and as yet unstable. SQL is
(mostly) both - which is why I began my tirade.

I love the idea of opening up a programs internals in the form of XML. Now
if X-Path is as good as SQL, this would be a huge leap forward. I just
don't want to learn X-Path when something else my end up the standard.


> The nice thing about XML is that the structure of the document maps well
> into the structure of the data it represents.  


This is really neat. Is this what they call mixing semantics /
syntactics? You encode your structure right along with the data!


> > sounds like a good use of XML - giving / transporting data about a
> > programs internal state.
> > 
> > > They generally solve different problems, though there is overlap.  
> > 
> > I am still a bit confused. I can't help thinking of dia, which makes
> > exelent use of XML to represent diagrams, and so has easy interchange with
> > lots of tools - i.e. good use of XML, it woudl be crazy to run dia off an
> > RDB. 
> 
> To a degree this is correct.  If the XML document represented a
> connected set of tables, you could map that to an RDBMS.  However, it
> would be hard to generate the diagram itself from the RDBMS (e.g. it is
> easy to encode data in an RDBMS, but hard to encode structure, though


You mean the structure is in the data model, which is more implicit than
explicit?


> searching is easy).  The XML could represent a richer non-tabular
> system, in which case the XML can take on the necessary structure to
> represent the system (e.g. it is easy to encode structure in XML, as
> well as the data which resides in the structure, though searching is
> hard).


OK, finally I am beginning to understand ;)


> > But what is the point in creating biological data in this form, when the
> > 'data model' is basically our own concept about the data?
> 
> One of these days someone is going to extend Go"del's incompleteness
> proof for biological systems. 


I wish someone would tell the physicists about it :) every time I see a
show about GUT I think Godel must be hopping mad!


> > Wouldn't a SwissProt RDB be much more sensible than an XML document?
> 
> Only if the Swissprot never changes format.  The whole point of XML is
> the "X".  Extensible.  If you want to integrate portions of Swissprot
> into your own research DB, you can do this, but you would either have to
> deal with the Swissprot normalization model, or datamart the swissprot
> and create your own normalization .


Sure, but you have to understand the structure of the XML document just as
much as you would need to understand the data model of the RDB. Data
models do change, and you have to change code. Are you saying that
changing the structure of XML has less impact on the whole system? I guess
this is *the* reason people talk about XML.

 
> Some of this comes from the bias of the developers as well.  It is hard
> to transport RDBMS's portably.  There are whole companies devoted to EDI
> that do nothing but this (for other industries).  XML greatly simplifies
> the EDI.  It is not a silver bullet, but it is helpful for data
> exchange.  If you get your results back in tabular form (RDBMS) or
> structural form (XML) from a query, does it matter what the underlying
> data storage technology is?


I heard about XML layers between RDB and data access points, allowing the
RDB to change with those changes 'filtered' through XML layers - I guess
with this kind of system you get the best of both worlds, as clever
software can re write you SQL or even convert queries in XML syntax into
SQL.


Thanks very much for the interesting conversation, and sorry for my
ignorance.

Cheers,
Dan.


> 
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers
> 




More information about the Biodevelopers mailing list