Wow, thanks for the ideas On 31 Jul 2003, Joseph Landman wrote: > On Thu, 2003-07-31 at 12:26, Dan Bolser wrote: > > No, the problem is that a big results file can grab 50% of the 4GB > > memory on the system. When I run 4 processes (and a file of this > > size takes about 1 hour to process with XML::Simple) then as soon > > as more that one process encounters a big file I am skuppered. > > Have a look at XML::Twig > > "XML::Twig - A perl module for processing huge XML documents in tree > mode." > > http://search.cpan.org/author/MIROD/XML-Twig-3.10/Twig.pm Cheers, I will have a look. > > I am looking for a memory lite way of parsing the blast results > > files from XML, I.E. one HST at a time with a print event > > for each, rather than whole file at a time processing from > > XML::Simple.... > > You might also look at Bioperl to handle this. They have a neat > interface to exactly this. Yup, I saw a neat interface with optional 'html plugins' which is exactly the kind of thing that I love. I would like to see an integrated bioinformatics database based around this principal of data / display independance. Once you derive complex enough queries, analysis becomes essential, we use custom software and (maby) eventually implement our findings back into a web page. I would love to see a seemless approach to this whole buisness, with a 'modular' but integrated datbase with web-api access and plugable 'display/analysis' modules. How much of your day to day 'research' is actually data integration? The problem with pure CS approaches is that the datamodeling must be based on biological concepts, and thus is best left to distributed experts. > > XML::Simple slurps the entire file into memory for parsing. This is not > a good idea for big documents. XML::SAX is possible, but you have to > work harder to write your callbacks and parsers. The callbacks under > Twig are easy to write as closures. Yup, I was planning to implement event handler sub routines with perl XML::Parser, > > The XML::Twig->next_sibling() may be useful for this. But I will give this a go. I am so nearly finished I am reluctant to look at Bioperl right now, but I know I will need to display results sooner or later. Thanks very much, Dan. > >