[Biodevelopers] XML for huge DB?

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Thu Jul 31 18:43:23 EDT 2003


On 31 Jul 2003, Michael Gruenberger wrote:

> I agree with the other posters, but if you want to continue using your
> XML::Simple package, a quick 'hack' might be to check if you are already
> parsing a large file in one of your other processes.
> And only parse files larger than a certain size when there is enough
> memory and no other process parsing a large file....
> 
> As you have a .cam.ac.uk address ... is there anything you could use on
> mole.bio.cam.ac.uk ? Maybe they would be willing to share some code?!

? 

What is this?

Ta, 
Dan.

> 
> Michael.
> 
> On Thu, 2003-07-31 at 16:39, Alex Milowski wrote:
> > On Thursday, July 31, 2003, at 09:02 AM, Dan Bolser wrote:
> > 
> > > Hello,
> > >
> > > How can I use XML efficiently to parse multiple blast results
> > > files?
> > >
> > > I want to parse them on a multi processor environment, without
> > > hitting the system memory limit.
> > >
> > > This is likely to happen, as big files take the most time, so the
> > > processes tend to work on big files at the same time, leading
> > > to a system memory outage....
> > 
> > You need to parse your XML in a "streaming" fashion.  If you are using
> > Java, for most people, that means using SAX.  You should write a 
> > ContentHandler
> > (org.xml.sax package) that gathers your results.  The SAX 
> > ContentHandler is
> > a call-back style API and so programming can get complicated--but that 
> > isn't necessarily
> > true.
> > 
> > Many C/C++ APIs have a similar call-back style APIs.  Basically, you 
> > want to interface
> > the parser directly and get the essential information as efficiently as 
> > possible.
> > 
> > If you plan to use Java 2, check out version 1.4.x and the 
> > javax.xml.parsers and
> > org.xml.sax packages.
> > 
> > Alex Milowski                FAX: (707) 598-7649                        
> >   alex at milowski.com
> > 
> > "The excellence of grammar as a guide is proportional to the paucity of 
> > the
> > inflexions, i.e. to the degree of analysis effected by the language
> > considered."
> > 
> > Bertrand Russell in a footnote of Principles of Mathematics
> > 
> > 
> > _______________________________________________
> > Biodevelopers mailing list
> > Biodevelopers at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/biodevelopers
> 




More information about the Biodevelopers mailing list