[Bioclusters] ack. I'm getting bitten by the 2gb filesize problem on a linux cluster...

Joseph Landman bioclusters@bioinformatics.org
Thu, 30 Jan 2003 17:21:47 -0500


chris dagdigian wrote:
> Hi folks,
> 
> I thought these problems were long past me with modern kernels and 
> filesystems --

But of course not!!!

> We as a community have learned to deal with uncompressed sequence 
> databases that are greater than 2gb -- its pretty simple to gzcat the 
> file and pipe it through formatdb via STDIN to avoid having to 
> uncompress the database file at all.

Sorta, just you are writing to a pipe handle instead of a file handle, 
but you don't need the disk space for the uncompressed file... in theory.

> Now however I've got a problem that the compressed archive file that 
> someone is trying to download is greater than 2gb in size :)

... and ....

> The database in question is:
> 
> ftp://ftp.ncbi.nlm.nih.gov/blast/db/FormattedDatabases/htgs.tar.gz
> 
> The file is mirrored via 'wget' and a cron script and has recently 
> started core dumping. A ftp session for this file also seemed to bomb 
> out but I have not verified this fully.
> 
> I did the usual things that one does; verified that the wget binary core 
> dumps regardless of what shell one is using (Joe Landman found this 
> issue a while ago...). I also verified that the error occurs when 

Most of the shells are not compiled with the LFS options set by default 
(I dont know if this has changed....)  I have taken to (defensively) 
recompiling the shell by hand.

> downloading to a NFS mounted NetApp filesystem as well as a local ext3 
> formatted filesystem.  The node is running Redhat 7.2 with a 2.4.18-18.7 
> kernel.

RH7.2 had a few problems with large files.  The shells needed 
re-compilation.  As did a few tools.

> Next step was to recompile 'wget' from the source tarball with the usual 
>  "-D_ENABLE_64_BIT_OFFSET" and "-D_LARGE_FILES"  compiler directives.
> Still no love. The wget binary still fails once the downloaded file gets 
> a little larger than 2gb in size.
> 
> Anyone seen this before? What FTP or HTTP download clients are people 
> using to download large files?

Ok, the usual suspects

	. shell
	. some library wget is using (do an ldd /usr/bin/wget)
	. wget itself (using an int or a long for byte counter or for a seek or 
...)

I'll try something here.  I assume they are doing

	wget url

and not

	wget -O - url  | some_other_command

or

	wget --output-file=- url | some_other_command

Joe

> 
> -Chris
> 
> 
> 
> 

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615