[Biodevelopers] RDBMS and Bioinformatics

Patrick McConnell MCCon012 at mc.duke.edu
Thu Apr 15 08:54:25 EDT 2004




We use Tamino to index Gigabytes worth of data.  Document sizes range from
10k to several megabytes.  We are able to setup indices for particular
elements, attributes, and full-text.  We have not run into any performance
issues, and we are running our database on a pathetic Windows 2000
workstation.

-Patrick McConnell
Duke Bioinformatics Shared Resource
Duke Comprehensive Cancer Center
patrick.mcconnell at duke.edu


                                                                                                                                                 
                      Dan Bolser                                                                                                                 
                      <dmb at mrc-dunn.cam.ac.uk>           To:       biodevelopers at bioinformatics.org                                              
                      Sent by:                           cc:                                                                                     
                      biodevelopers-admin at bioinfo        Subject:  RE: [Biodevelopers] RDBMS and Bioinformatics                                  
                      rmatics.org                                                                                                                
                                                                                                                                                 
                                                                                                                                                 
                      04/15/2004 06:06 AM                                                                                                        
                      Please respond to                                                                                                          
                      biodevelopers                                                                                                              
                                                                                                                                                 
                                                                                                                                                 




On Tue, 13 Apr 2004, Singhal, Mudita wrote:

> Is there a limitation on the number of entries in one table. We have
> huge tabes with millions of records. I fail to see how xml can be used
> inplace of a relational database to query such huge datasets. Keeping
> even partial datasets in memory is a big problem.
>
> Any suggestions?

This is where XML indexing system come in (my knowlage of such systems is
very shaky). But (I think) it is a bit like running an XML DBS with XML as
the storage format. The document is loaded into the DBS for indexing a
querying. I asked a question at comp.text.xml some time ago and was
inundated with references to such systems.

Using these systems you don't need to put the xml in memory or search
through the entire document to find one piece of information.

>
> Mudita
>
>
>
> -----Original Message-----
> From: Dan Bolser [mailto:dmb at mrc-dunn.cam.ac.uk]
> Sent: Tuesday, April 13, 2004 9:37 AM
> To: biodevelopers at bioinformatics.org
> Subject: Re: [Biodevelopers] RDBMS and Bioinformatics
>
>
>
> > >How does the use of XML make the data model less scary?
> > >
> > >I see how the XML is convenient for your read / write API, and how
> > >the hierarchical data model is more naturally encoded in XML. I see
> > >that it is because you use XML that you have access to the fast
> > >indexing technology.
> > >
> > >But how do you deal with issues of data integrity? I get the feeling
> > >I should learn XML schema... Does the BIND datamodel have an XML
> > >schema with constraints on the data?
> > >
> > >I can't help feeling that a big / complex data model is probelmatic
> > >for any system, nomater what the format.
> > >
> > >Thanks very much for the feedback,
> > >
> > >Cheers,
> > >Dan.
> > >
> > >
> >
> > Using the XML document instead of a fully relational model is much
> > less
> > scary because you don't have to deal with creating complex SQL to
> select
> > data and to update data properly. If you have to use 30 SQL to update
> > many tables, you've got alot of points of failure there. It's just
> much
> > easier to deal with a single document which contains all the date, and
>
> > to have specific indexes on that data to query against.
>
> I see your point. Deleting one 'object' for example could require a set
> of deletes from many tables. What you describe sounds like you have the
> model encoded somewhere in the software (middle ware?) and so don't have
> to worry about it too much.
>
>
> > Our software is easy to update, when the underlying data specification
> > is updated as well. We just regenerate the XML Schema from the ASN.1
> > spec, invoke jaxb, and start working with the new classes. We don't
> have
> > to change any SQL, or anything.
>
> Great. That is a big problem for 'old' DB backend apps with multiple
> data access points.
>
>
> > BIND does have an XML schema which imposes restraints and defines the
> > data types, and since JAXB works off this schema, our data structures
> > are all properly typed. The XML document generated is always validated
>
> > against the schema before being commited to the database.
>
> Cool.
>
> OK, final question, how will you do complex queries across the data?
>
> Thanks again,
> Dan.
>
>
> >
> > Marc Dumontier
> >
> > >
> > >
> > >
> > >>we should be releasing a beta of this software in a short
> > >>while...please
> > >>visit http://www.bind.ca periodically for more information.
> > >>
> > >>Marc Dumontier
> > >>BIND Software Developer
> > >>Blueprint Initiative
> > >>Mt. Sinai Hospital
> > >>Toronto,ON
> > >>
> > >>Dan Bolser wrote:
> > >>
> > >>
> > >>
> > >>>On Tue, 16 Mar 2004, Michel Dumontier wrote:
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>>>Same goes for BIND, they plan to use RDB, but not in a
> > >>>>>conventional way (so far as I understand).
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>BIND (http://bind.ca) stores bind-objects based on ASN.1
> > >>>>specification (ftp://ftp.blueprint.org/pub/BIND/spec/, also
> > >>>>available as XML DTD and Schema), as ASN.1/XML in BLOB fields in
> > >>>>the database table.  BIND makes use of field-specific indexing to
> > >>>>be able to search for any particular object or set of objects that
>
> > >>>>match the search criteria.  The relational aspect is really more
> > >>>>for curatorial work and tracking, afaik...
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>So it wont be like an XML query system? Sorry if I misunderstand,
> > >>>but it sounds like you just do plain text index on an XML blob, but
>
> > >>>is is more than that?
> > >>>
> > >>>Generally, can anyone tell me  what is the point of XML schema when
>
> > >>>relational schema have existed for years with well understood
> > >>>maths, query language and theories of relational design? I
> > >>>understand XML as a transport medium, but why make it the basis for
>
> > >>>your object model over the RDB relational schema? Perhaps object
> > >>>orented datamodeling can do things relational modeling can't, but
> > >>>at what cost? I hate sounding old, but what was wrong with the RDB
> > >>>that we have to invent X-path and the like?
> > >>>
> > >>>Anyone on the list remember when relational databases were 'the new
>
> > >>>thing'?
> > >>>
> > >>>Dan.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>>Michel Dumontier
> > >>>>PhD Candidate
> > >>>>Samuel Lunenfeld Research Institute, Mt. Sinai Hospital Department
>
> > >>>>of Biochemistry, University of Toronto Toronto, ON M5G1X5
> > >>>>micheld at mshri.on.ca
> > >>>>http://blueprint.org
> > >>>>
> > >>>>
> > >>>>
> > >>>>_______________________________________________
> > >>>>Biodevelopers mailing list Biodevelopers at bioinformatics.org
> > >>>>https://bioinformatics.org/mailman/listinfo/biodevelopers
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>_______________________________________________
> > >>>Biodevelopers mailing list Biodevelopers at bioinformatics.org
> > >>>https://bioinformatics.org/mailman/listinfo/biodevelopers
> > >>>
> > >>>
> > >>>
> > >>>
> > >>_______________________________________________
> > >>Biodevelopers mailing list
> > >>Biodevelopers at bioinformatics.org
> > >>https://bioinformatics.org/mailman/listinfo/biodevelopers
> > >>
> > >>
> > >>
> > >
> > >_______________________________________________
> > >Biodevelopers mailing list
> > >Biodevelopers at bioinformatics.org
> > >https://bioinformatics.org/mailman/listinfo/biodevelopers
> > >
> > >
> >
> > _______________________________________________
> > Biodevelopers mailing list
> > Biodevelopers at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/biodevelopers
> >
>
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers
>


_______________________________________________
Biodevelopers mailing list
Biodevelopers at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/biodevelopers






More information about the Biodevelopers mailing list