Duzlevski, Ognen wrote: > Hi all, > > we have a 40 node cluster (2 cpus each) and a cluster master that has > attached storage over fibre, pretty much a standard thingie. > > All of the nodes get their shared space from the cluster master over nfs. I have a user who has set-up an experiment that fragmented a database into 200,000 files which are then being blasted against the standard NCBI databases which reside on the same shared space on the cluster master and are visible on the nodes (he basically rsh-s into all the nodes in a loop and starts jobs). He could probably go about his business in a better way but for the sake of optimizing the setup, I am actually glad that testing is being done the way it is. > > I noticed that the cluster master itself is under heavy load (it is a > 2 CPU machine), and most of the load comes from the nfsd threads (kernel space nfs used). > > Are there any usual tricks or setup models utilized in setting up clusters? For example, all of my nodes mount the shared space with rw/async/rsize=8192,wsize=8192 options. How many nfsd threads usually run on a master node? Any advice as to the locations of NCBI databases vs. shared space? How would one go about measuring/observing for the bottlenecks? Hi Ognen, There are many people on this list who have similar setups and have worked around NFS related bottlenecks in various ways depending on the complexity of their needs. One easy way to avoid NFS bottlenecks is to realize that BLAST is _aways_ going to be performance bound by IO speeds and that generally your IO access to local disk is going to be far faster than your NFS connection. Done right, local IDE drives in a software RAID configuration can get you better speeds than a direct GigE connection to a NetApp filer or fibrechannel SAN. Another way to put this: You will NEVER (well, without exotic storage hardware) be able to build a NFS fileserver that cannot be swamped by lots of cheap compute nodes going long sequential reads against network-mounted BLAST databases. You need to engineer around the NFS bottleneck that is slowing you down. All you need to do is have enough local disk in each of your compute nodes to hold all (or some) of your BLAST datasets. The idea is that you use the NFS mounted blast databases only as a 'staging area' for rsync'ing or copying your files to scratch or temp space on your compute nodes. Given the cheap cost of 40-80gb IDE disk drives this is a quick and easy way to get around NFS related bottlenecks. Each search can then be done against local disk on each compute node rather than all nodes hitting the NFS fileserver and beating it to death... This is generally what most BLAST farm operators will do as a "first pass" approach. It works very well and is pretty much standard practice these days. The "second pass" approach is more complicated and involves splitting up your blast datasets into RAM-sized chunks, distributing them across the nodes in your cluster and then multiplexing your query across all the nodes to get faster throughput times. This is harder to implement and is useful only for long queries against big databases as there is a certain amount of overhead required to merge your multiplexed query results back into one human or machine parsable file. People only implement the 'second pass' approach when they really need to. Usually in places where pipelines are constantly repeating the same big searches over and over again. My $.02 of course -Chris www.bioteam.net