[Bioclusters] BLAST speed mystery
georgios at biotek.uio.no
Sat Feb 20 15:12:47 EST 2010
In general, I tend to use iozone (http://www.iozone.org/) to measure
IOPS before I put cluster nodes into production. I assume that your
BLAST versions between the G4 and the new server (Linux?) environment
are the same.
Doing a vm_stat (on MACOSX) and vmstat (Linux) during the BLAST op (both
precached and with est_mouse cached) can give you rough figures of disk
throughput and buffer cache (yes, having more stripes is useful, but
something else might be happening)
However, it would be useful to give us software (OS/kernel version) and
hardware (RAID controller) versions on your new servers.
On 02/14/2010 09:17 PM, Justin Powell wrote:
> I have a couple of new fast servers with 24GB Ram and raid 0 15K SAS hard drives (2 in each server). I've run some tests using BLASTN on the est_mouse database which is 1.7GB. As one would expect the results when repeating identical BLASTS are fairly impressive as the database becomes effectively cached in RAM. However I have also been doing some timings of the first BLAST when no cached data is available. I've found that if I have the database on the local 15K drives the first blast takes about 45 seconds with my particular query sequence. However doing the identical BLAST but instead NFS mounting the databases off an old Apple G4 Xserver attached to an old Apple XRaid (which has 8 parallel ATA disks) the first blast runs in 30 seconds. (I ensure that there is no cached data in the NFS server too).
> Measuring straight throughput off disk using dd shows that the 15K disks can deliver 300MB/sec, whereas the NFS mounted Xserve/Xraid combination only delivers 70MB/second. So its not a problem with streaming throughput. Possibly its something to do with IOPS - I'm not sure what a decent benchmarking tool would be for that so I don't have figures currently - is BLAST particularly sensitive to this?
> Interestingly I have tried mounting the database via NFS from one of the new servers across to the other. When using an 8K block size (same as for the XServe NFS mount) I again get 45 seconds for the first BLAST iteration. Interestingly when I increase the block size to 32K the time for the first BLAST iteration drops down to 30 seconds, comparable to the Xserve case.
> I'm not sure what this means. Possibly the block size result implies some sort of read-ahead would improve things, but turning on read-ahead on the RAID controller did not improve the performance of the SAS disk based BLAST. Is the problem possibly IOPs limitation and solvable by putting more disks in the raid 0 array? The NFS block size results imply some sort of tuning should be possible even with the existing disks, but I'm not sure what to try.
> Anyone have any ideas?
> Bioclusters maillist - Bioclusters at bioinformatics.org
More information about the Bioclusters