[ssml] BLAST fastacmd failed to fetch sequence from database stored on PVFS2 filesystem

Dan Bolser
Wed Dec 5 10:33:03 EST 2007

On 05/12/2007, Yun He wrote:
> Hi,
> Yestoday I found  I always got msg "[blastpgp] WARNING:  [000.000]  Failed to
> initialize search. ISAM Error code is -5" when I run blastpgp against a
> database which is stored on the parallel filesystem PVFS2, but the warning
> did not occur when blast against database shared by NFS. Here is an example:
> 1) /data/blastdb is a directory holding some database like nr, and this
> directory is shared from the master node of a cluster to several compute
> nodes by NFS filesystem;
> 2) /pool/blastdb is mount point of PVFS2 (version 2.6.3) filesystem on all
> nodes, the content of this directory is identical to that of /data/blastdb (I
> use rsync to make them identical);
> 3) I employed a small testset of about 100 sequences to test blastpgp against
> nr database in both of the to directories. All runnings on /pool/blastdb
> complained "[blastpgp] WARNING:  [000.000]  Failed to initialize search. ISAM
> Error code is -5", but those on /data/blastdb did not;
> 4) It seems that BLAST failed to fetch some sequences from the database on
> PVFS2 filesystem and make the complain; I use fastacmd to fetch some
> sequence:
> a) fetch from database on NFS, this is OK,
> $ fastacmd -s "gi|34495614" -d /data/blastdb/nr
> >gi|34495614|ref|NP_899829.1| sulfite dehydrogenase - subunitB
> [Chromobacterium violaceum ATCC 12472] >gi|34101469|gb|AAQ57838.1| sulfite
> dehydrogenase - subunitB [Chromobacterium violaceum ATCC 12472]
> b) fetch from database on PVFS2, ohhh,
> $ fastacmd -s "gi|34495614" -d /pool/blastdb/nr
> [fastacmd] ERROR: Accesion search failed for "gi|34495614" with error code -5
> Why this happen?

I think perhaps (maybe) you are crossing architectures by using
different FS... It doesn't seem like it because both are mounted on
the same arch, where you run blast / fastacmd / etc. However, I think
the files prepared on the different underlying file systems need to
match the arch of the machine that they are hosted on... I think...

Seems like you have a complex config, however, try running formatdb
and fastacmd separately on each different disk and try again.

Sorry for the confused reply, but first try running formatdb on each
disk and check if the results are identical. If they are identical
after re-running formatdb the question becomes whether you need to
recompile formatdb on each different file server.

Also you could try asking this kind of question on
bioclusters at bioinformatics.org



> There is a paper on 2002 (J.D Grant, et al, Bioinformatics 2002, 18(5):
> 765-766) said they had developed a distributed BLAST and PSI-BLAST on a
> cluster and the database was really stored on PVFS.
> Is PVFS2 suitable for storage?
