"Chris Dwan (CCGB)" wrote: > > The December 2001 version of "formatdb" will split up your targets > into chunks of arbitrary size for you, via the "-v > <max_size_of_a_chunk>" flag. I think that it was intended to get > around file size limitations on some larger datasets / older OS's, but > it also works nicely for my group to keep things under the RAM / CPU > performace transition point. Thanks Chris, you made my day! I had read the release notes and had interpreted it to mean that the -v flag created a _fixed_ max of 2 billion letters for really large custom databases. Here is the pertinent release section: 3.) A volume option ('-v') has been added to formatdb. This option breaks up large FASTA files into 'volumes' (each with a maximum size of 2 billion letters). As part of the creation of a volume formatdb writes a new type of BLAST database file, called an alias file, with the extension 'nal' or 'pal', is written. This option should be used if one wishes to formatdb large databases (e.g., over 2 billion base pairs). The README.formatdb Section C is much clearer: One may also specify a smaller size for the volume databases by using the -v option: formatdb -i hugefasta -p F -v 2000000000 This command line will format the "hugefasta" FASTA file as a number of database "volumes," each containing a maximum of two billion base pairs, as specified by the "-v" option. Two billion is the current limitation on the NCBI toolkit command-line parser. The volumes will have names consisting of the root database name, "hugefasta" followed by a two-digit volume extension, followed by the usual BLAST database extensions. These smaller databases can be searched as if they were a single entity using: blastall -i infile -d hugefasta -p blastn -o out