[Bioclusters] what are people using to mirror large ftp repositories with 2gb+ files?

Nathan O. Siemers bioclusters@bioinformatics.org
Wed, 10 Sep 2003 16:53:08 -0400


Hello Chris,

	we use unholy combos of wget, GET (from lwp), and the ancient perl4 
'mirror' code at bms.  Can't address the 32 bit limits 'cause we run the 
code on origins, but you might try those other tools (GET is not 
recursive though).

	If you can download a single one of those big files by straight ftp, I 
would guess that 'mirror' will also work.

	Nathan
	

Chris Dagdigian wrote:
> 
> Hi folks,
> 
> I've turned a bunch of Seagate 160gb IDE disks into a large software 
> RAID5 volume and am trying to mirror the raw fasta data from 
> ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/ for use on a personal 
> development project.
> 
> The 'wget' utility (Redhat 9 on a dual PIII system w/ ext3 filesystem) 
> is bombing out on a few of the remote files which even when compressed 
> are greater than 2gb in size.
> 
> My kernel and ext3 filesystem support large filesizes but 'wget' or my 
> shell seem to have issues.
> 
> I've recompiled the wget .src.rpm with the usual compiler flags to add 
> large file support and wget _seems_ to be working but I don't really 
> trust it as it is reporting negative filesizes like this now:
> 
>  > RETR nt.Z...done
>  > Length: -1,668,277,957 [-1,705,001,669 to go]
> 
> What are others doing? Would 'curl' be better? Any recommendations would 
> be appreciated.
> 
> -Chris
> 
> 

-- 
Nathan Siemers|Associate Director|Applied Genomics|Bristol-Myers Squibb 
Pharmaceutical Research
Institute|HW3-0.07|P.O. Box 5400|Princeton, NJ 
08543-5400|(609)818-6568|nathan.siemers@bms.com