[Bioclusters] BLAST speed mystery
jpowell at takedacam.com
Sun Feb 14 15:17:04 EST 2010
I have a couple of new fast servers with 24GB Ram and raid 0 15K SAS hard drives (2 in each server). I've run some tests using BLASTN on the est_mouse database which is 1.7GB. As one would expect the results when repeating identical BLASTS are fairly impressive as the database becomes effectively cached in RAM. However I have also been doing some timings of the first BLAST when no cached data is available. I've found that if I have the database on the local 15K drives the first blast takes about 45 seconds with my particular query sequence. However doing the identical BLAST but instead NFS mounting the databases off an old Apple G4 Xserver attached to an old Apple XRaid (which has 8 parallel ATA disks) the first blast runs in 30 seconds. (I ensure that there is no cached data in the NFS server too).
Measuring straight throughput off disk using dd shows that the 15K disks can deliver 300MB/sec, whereas the NFS mounted Xserve/Xraid combination only delivers 70MB/second. So its not a problem with streaming throughput. Possibly its something to do with IOPS - I'm not sure what a decent benchmarking tool would be for that so I don't have figures currently - is BLAST particularly sensitive to this?
Interestingly I have tried mounting the database via NFS from one of the new servers across to the other. When using an 8K block size (same as for the XServe NFS mount) I again get 45 seconds for the first BLAST iteration. Interestingly when I increase the block size to 32K the time for the first BLAST iteration drops down to 30 seconds, comparable to the Xserve case.
I'm not sure what this means. Possibly the block size result implies some sort of read-ahead would improve things, but turning on read-ahead on the RAID controller did not improve the performance of the SAS disk based BLAST. Is the problem possibly IOPs limitation and solvable by putting more disks in the raid 0 array? The NFS block size results imply some sort of tuning should be possible even with the existing disks, but I'm not sure what to try.
Anyone have any ideas?
More information about the Bioclusters