[Bioclusters] BioPerl 1.2.3 and memory handling

Malay mbasu at mail.nih.gov
Mon Nov 29 14:24:35 EST 2004

Michael Cariaso wrote:
> Michael Maibaum wrote:
>> On 10 Nov 2004, at 18:25, Al Tucker wrote:
>>> Hi everybody.
>>> We're new to the Inquiry Xserve scientific cluster and trying to iron 
>>> out a few things.
>>> One thing is we seem to be coming up against is an out of memory 
>>> error when getting large sequence analysis results (5,000 seq - at 
>>> least- and above) back from BTblastall. The problem seems to be with 
>>> BioPerl.
>>> Might anyone here know if BioPerl is knows enough not to try and 
>>> access more than 4gb of RAM in a single process (an OS X limit)? I'm 
>>> told Blastall and BTblastall are and will chunk problems accordingly, 
>>> but we're not certain if BioPerl is when called to merge large Blast 
>>> results back together. It's the default version 1.2.3 that's supplied 
>>> btw, and OS X 10.3.5 with all current updates just short of the 
>>> latest 10.3.6 update.

>> BioPerl tries to slurp up the entire results set from a BLAST query, 
>> and build objects for each little bit of the result set and uses lots 
>> of memory. It doesn't have anything smart at all about breaking up the 
>> job within the result set, afaik.

This is not really true. SearchIO module as far as I know works on stream.

>>  I ended up stripping out results that hit a certain threshold size to 
>> run on a different, large memory opteron/linux box and I'm 
>> experimenting with replacing BioPerl with BioPython etc.
>> Michael
> You may find hthat the BPLite parser works better when dealing with 
> large blast result files. Its not as clean or maintained, but it does 
> the job nicely for my current needs, which overloaded the usual parser.

There is basically no difference between BPLite and other BLAST parser 
interfaces in Bioperl.

The problem lies in the core of Perl iteself. Perl does not release 
memory to the system even after the reference count of an object created 
in the memory goes to 0, unless the program in actually over. Perl 
object system in highly inefficient to handle large number of objects 
created in the memory.


More information about the Bioclusters mailing list