[Bioclusters] Re: Rsync and NCBI and bio-mirror.net

Michael Cariaso cariaso at yahoo.com
Wed Feb 1 15:31:39 EST 2006


Assuming an http get of the files is also ok, 'curl -z' would be of use
http://curl.haxx.se/
http://curl.haxx.se/docs/manual.html

curl -z local.html http://remote.server.com/remote.html



Jeremy Mann wrote:
> Don, we exclude FASTA from our BLAST database. We use rsync because of its
> no-whole-file function. Why download the entire 50+ gig database every
> night when all we need are the changes? Is there an FTP client that
> supports just changes in the files?
> 
> 
> Don Gilbert said:
>> How high is demand for mirroring the FASTA/ subfolder of
>> ftp://ftp.ncbi.nlm.nih.gov/blast/db/ ?   I'll be happy
>> to consider adding it to bio-mirror.net.   On the other hand,
>> it is a large data chunk, and will add to network copy load
>> which now is stretched for the blast-format tar files.
>> We have almost continuous ftp copying from ncbi:/blast/db/ now
>> due to the almost daily data turnover, ftp timeouts, and such.
>>
>> Those who want source data could instead use the
>> Genbank dataset -> fasta, at a lower cost. E.g. only
>> a few Genbank/WGS subsets are updated daily, whereas
>> the whole 18 GB blast wgs.fasta is updated daily.
>>
>> Rsync is a nice tool, but has a much higher server side
>> CPU cost than FTP. Those of you running into rsync errors at NCBI
>> would probably have better luck using an FTP mirroring
>> tool.
>>
>> - Don Gilbert
>> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
>> -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/
>> _______________________________________________
>> Bioclusters maillist  -  Bioclusters at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bioclusters
>>
> 
> 



More information about the Bioclusters mailing list