[Biodevelopers] XML for huge DB?

Thu Jul 31 12:33:18 EDT 2003

You are better off using SAX instead of DOM.  What we do is filter Hsps and
Hits using a streaming technology (such as SAX), and then we parse the rest
with DOM.  But, if you need all the Hsps and Hits, then you must use SAX or
load balancing.

Load balance based on file size.  When your threads (or processes) ask for
another document to parse, you must give them one based on the size of the
documents the other threads are parsing.  But I feel like the large
documents are still going to dominate the CPU time, and thus you will only
be left with a bunch of large documents in the end.

-Patrick

Dan Bolser <dmb at mrc-dunn.cam.ac.uk>@bioinformatics.org on 07/31/2003
12:02:17 PM

Please respond to biodevelopers at bioinformatics.org

Sent by:    biodevelopers-admin at bioinformatics.org

To:    biodevelopers at bioinformatics.org
cc:

Subject:    [Biodevelopers] XML for huge DB?

Hello,

How can I use XML efficiently to parse multiple blast results
files?

I want to parse them on a multi processor environment, without
hitting the system memory limit.

This is likely to happen, as big files take the most time, so the
processes tend to work on big files at the same time, leading
to a system memory outage....

Cheers,
Dan.

_______________________________________________
Biodevelopers mailing list
Biodevelopers at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/biodevelopers