[Bioclusters] blast and nfs

Mon, 21 Apr 2003 15:05:58 -0500

Thank you, Joe and everyone else for the insight.
I guess one never stops learning :)
Ognen

>-----Original Message-----
>From: Joseph Landman [mailto:landman@scalableinformatics.com]
>Sent: Monday, April 21, 2003 2:56 PM
>To: biocluster
>Subject: Re: [Bioclusters] blast and nfs
>
>
>Hi Ognen:
>
>On Mon, 2003-04-21 at 14:58, Duzlevski, Ognen wrote:
>> Hi all,
>>=20
>> we have a 40 node cluster (2 cpus each) and a cluster master that has
>> attached storage over fibre, pretty much a standard thingie.
>
>Bottleneck #1...
>
>> All of the nodes get their shared space from the cluster master over
>> nfs. I have a user who has set-up an experiment that fragmented a
>
>Bottleneck #2.
>
>[...]
>
>> Are there any usual tricks or setup models utilized in setting up
>> clusters? For example, all of my nodes mount the shared space with
>> rw/async/rsize=3D8192,wsize=3D8192 options. How many nfsd threads =
usually
>> run on a master node? Any advice as to the locations of NCBI=20
>databases
>> vs. shared space? How would one go about measuring/observing for the
>> bottlenecks?
>
>Local disk space via local IDE channels on a 40 node cluster has a
>higher aggregate bandwidth than the NAS.  Assuming old slow disks
>speaking at 15 MB/s, 40 of these running in parallel gives you a 600
>MB/s IO capacity (non-blocking at that).  Your NAS device gives you in
>theory 200 MB/s, though it is likely to be less.  If you use=20
>more modern
>IDE drives that can talk at 30 MB/s, you will have about a 6:1=20
>bandwidth
>advantage for local IO (more like 12:1) ...
>
>... and that would be true if the NAS were the weakest link in the
>chain.  It is not.  It is the network.  If you are lucky, and using a
>gigabit network, then you have a 100 MB/s connection to the head node.=20
>No matter how fast that disk array is, you still have 100 MB/s to the
>head node.  As you increase the size of the cluster, the bandwidth of
>this pipe out of the head node drops as 1/N(nodes).  This is the 1/N
>effect I occasionally talk about.  Your scalability drops in this model
>as you increase the number of nodes or the remote disk utilization.
>
>Ok, so now we know where the problem is, what can be done? =20
>
>1) pre-distribute the databases. =20
>2) local IO only during run
>3) combine results at end of run or end of bunches of runs.
>
>Increasing the number of NFSD's probably will not help.  Increasing the
>read and write size may help a few cases, but not likely this one. =20
>
>You might look at dividing up the network access to the head node
>between multiple network adapters.  You will need different IPs for
>them, and a specific network fabric design to enable this, not to
>mention somewhat more complex mounting/routing efforts.  But it can be
>done.  What you are doing here is forestalling solving the problem by
>moving it back onto the PCI bus of the head node.  You will still be
>network bound, and the machines will still run sluggishly, but likely
>less so than before.
>
>Joe
>
>--=20
>Joseph Landman, Ph.D
>Scalable Informatics LLC
>email: landman@scalableinformatics.com
>  web: http://scalableinformatics.com
>phone: +1 734 612 4615
>
>_______________________________________________
>Bioclusters maillist  -  Bioclusters@bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bioclusters
>