[Bioclusters] mpi-blast performance on a 256-core cluster

George Magklaras georgios at biotek.uio.no
Fri Jan 4 14:59:24 EST 2008



Li Liu wrote:
> Dear biocluster members,
> 
> We've been struggling with the performance of mpi-blast 1.4 on our  
> 256-core cluster for almost a month. What we tried to do is to run  
> blastx search against NCBI NR database for 600K 454 reads. It ran for  
> a few days and stopped without giving any error message. I'm  
> wondering if any of you have any suggestions on
Any hints as to what point did it stop? What was the state of your 
result files? If possible, during the execution, did your sysadmin 
observe anything from I/O monitoring tools (preferrably vmstat/iostat) 
that was weird?
> 
> 1. Any alternative parallel blast program?
> 2. Did anybody observe frequent I/O operations with mpi-blast? Isn't  
> it supposed to load the database into memory and access the memory  
> from then on, rather than keep asking disk for database fragment?
> 
If you mean the buffer fs cache yes, that's normally what it should do, 
provided that what you are attempting to access both on the fragment AND 
the input sequence size does not exceed your RAM per node (I assume you 
are not running something else on the cluster nodes when the job 
executes). We do not run such large jobs here, but I have seen in our 
small setup (10 nodes- 40 cores) buffer cache invalidation that forced 
some heavy I/O only due to the fact that we were running other jobs per 
node that were writing large amounts of files (more than 700 Megs) and 
hence blowing any buffer cache zone we had on 4 Gig RAM nodes. To answer 
your question I need to know what exactly you class as frequent I/O, the 
amount of RAM you have on each node/core, whether you are running 
something else per node and how exactly do your nodes access the blast 
db fragment data (on local disk, FC, other network FS? we just copy on 
local node SATA-II disks). Preferrably, if you can show some rows of 
vmstat on some of the nodes, so we can see what sort of disk activity 
and cache state they are in while they run the job that would help.


-- 
--
George Magklaras

Senior Computer Systems Engineer/UNIX Systems Administrator
EMBnet Technical Management Board
The Biotechnology Centre of Oslo,
University of Oslo
http://www.biotek.uio.no/

EMBnet Norway:	http://www.no.embnet.org/




More information about the Bioclusters mailing list