[Bioclusters] BioPerl and memory handling

Tim Cutts tjrc at sanger.ac.uk
Tue Nov 30 04:57:17 EST 2004


On 29 Nov 2004, at 11:32 pm, Ian Korf wrote:
> Here's something odd. The following labeled block looks like it should 
> use no memory.
>
> 	BLOCK: {
> 		my  $FOO = 'N' x 100000000;
> 	}
>
> The weird thing is that after executing the block, the memory 
> footprint is still 192 Mb as if it hadn't been garbage collected.

Perl's garbage collection does not give the memory back to the OS; it 
just marks the allocated memory for internal reuse by subsequent 
allocations within perl.

This is actually true of most UNIX programs; this is not unique to 
perl.  free() does not necessarily give the memory back to the 
operating system, it just marks it for re-use by the current process 
the next time it calls malloc().  The memory doesn't become available 
to the OS until the program exits.

This is one reason why garbage collecting languages like perl and java 
should not be relied on to keep memory under control; GC does *not* 
absolve the programmer from the need to keep their memory usage tight.

Consider the following C program (which you need to run on an OS which 
actually populates all the contents of the rusage struct - Linux does 
not, and neither does MacOS X, but Tru64 does):

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <sys/resource.h>

#define PRINT_RESOURCES(x) getrusage(RUSAGE_SELF, &r);\
     printf(#x "\n\nShared: %lu\nUnshared: %lu\nStack: %lu\n\n",\
            r.ru_ixrss, r.ru_idrss, r.ru_isrss)

int main(void) {

    char *p;
    struct rusage r;
    int i;

    PRINT_RESOURCES("Program start");

    p = malloc(100000000);

    /* Use the memory */
    for (i = 0; i<100000000; i++)
         p[i] = 'N';

    PRINT_RESOURCES("After malloc");

    free(p);

    PRINT_RESOURCES("After free");

    return 0;

}

The output on this Tru64 machine is:

09:46:26 tjrc at ecs2d:~$ ./memtest
"Program start"

Shared: 0
Unshared: 0
Stack: 0

"After malloc"

Shared: 19
Unshared: 116577
Stack: 19

"After free"

Shared: 19
Unshared: 116577
Stack: 19

As you can see, free() does not actually release the memory from the 
process back to the operating system.
>

> sub foo {my $FOO = 'N' x 100000000}
> for (my $i = 0; $i < 50; $i++) {foo()} # 29.420u 1.040s
>
> sub bar {my $BAR = 'N' x 100000000; undef $BAR}
> for (my $i = 0; $i < 50; $i++) {bar()} # 26.880u 21.220s
>
> The increase from 1 sec to 21 sec system CPU time is all the extra 
> memory allocation and freeing associated with the undef statement. Why 
> the user time is less in the undef example is a mystery to me.

I can explain this.  It's because you're forgetting that the final 
statement in a perl subroutine is always its return value, even if you 
don't specify 'return', so if you allocate 100MB of Ns, as in the first 
case, and then return it (which you do because the allocation is the 
last statement in the subroutine) you actually force perl to *copy* 
that lexically scoped variable each time the routine is called.  That's 
why the program uses 200MB of memory, not 100MB.

In the second version, by explicitly freeing the memory, perl never has 
to copy the return value, so its memory footprint is half.

Using undef has not actually freed any memory at all, it's just changed 
the return value from the function and stopped perl doubling its memory 
use.

The lesson here is therefore to be very careful in perl subroutines 
where you don't care about the return value to make sure the return 
value is something tiny.   Perl has no equivalent to a C void function.

Tim

-- 
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5  860B 3CDD 3F56 E313 4233



More information about the Bioclusters mailing list