[Biodevelopers] XML for huge DB?

Thu Jul 31 12:44:52 EDT 2003

On Thu, 2003-07-31 at 12:26, Dan Bolser wrote:
> No, the problem is that a big results file can grab 50% of the 4GB
> memory on the system. When I run 4 processes (and a file of this
> size takes about 1 hour to process with XML::Simple) then as soon
> as more that one process encounters a big file I am skuppered.

Have a look at XML::Twig

"XML::Twig - A perl module for processing huge XML documents in tree
mode."

http://search.cpan.org/author/MIROD/XML-Twig-3.10/Twig.pm

> I am looking for a memory lite way of parsing the blast results
> files from XML, I.E. one HST at a time with a print event
> for each, rather than whole file at a time processing from
> XML::Simple....

You might also look at Bioperl to handle this.  They have a neat
interface to exactly this.

XML::Simple slurps the entire file into memory for parsing.  This is not
a good idea for big documents.  XML::SAX is possible, but you have to
work harder to write your callbacks and parsers.  The callbacks under
Twig are easy to write as closures.

The XML::Twig->next_sibling() may be useful for this.

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615