[Biodevelopers] XML for huge DB?

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Thu Jul 31 18:28:42 EDT 2003


On Thu, 31 Jul 2003, Patrick McConnell wrote:

> 
> 
> 
> 
> 
> You are better off using SAX instead of DOM.  What we do is filter Hsps and
> Hits using a streaming technology (such as SAX), and then we parse the rest
> with DOM.  But, if you need all the Hsps and Hits, then you must use SAX or
> load balancing.

Yup, cheers, SAX is the way forward.

> 
> Load balance based on file size.  When your threads (or processes) ask for
> another document to parse, you must give them one based on the size of the
> documents the other threads are parsing.  But I feel like the large
> documents are still going to dominate the CPU time, and thus you will only
> be left with a bunch of large documents in the end.

I thought about this too, but I hate anything complex;)

I found a really neat way to do massive dumps to mysql without
incuring any of the normal overheads - Either increasingly slow
index updates or (very) large prepared files for LOAD DATA INFILE
...

Simply LOAD DATA INFILE from a named pipe... All is perfect,
and multiprocessors (with a common file system) can cooperate
like a charm. 

I found this solution in a mysql bug report.

Thanks again, 
Dan.
	


> 
> -Patrick
> 
> 
> 
> 
> 
> Dan Bolser <dmb at mrc-dunn.cam.ac.uk>@bioinformatics.org on 07/31/2003
> 12:02:17 PM
> 
> Please respond to biodevelopers at bioinformatics.org
> 
> Sent by:    biodevelopers-admin at bioinformatics.org
> 
> 
> To:    biodevelopers at bioinformatics.org
> cc:
> 
> Subject:    [Biodevelopers] XML for huge DB?
> 
> Hello,
> 
> How can I use XML efficiently to parse multiple blast results
> files?
> 
> I want to parse them on a multi processor environment, without
> hitting the system memory limit.
> 
> This is likely to happen, as big files take the most time, so the
> processes tend to work on big files at the same time, leading
> to a system memory outage....
> 
> Cheers,
> Dan.
> 
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers
> 
> 
> 
> 
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers
> 




More information about the Biodevelopers mailing list