Thank you, Joe and everyone else for the insight. I guess one never stops learning :) Ognen >-----Original Message----- >From: Joseph Landman [mailto:landman@scalableinformatics.com] >Sent: Monday, April 21, 2003 2:56 PM >To: biocluster >Subject: Re: [Bioclusters] blast and nfs > > >Hi Ognen: > >On Mon, 2003-04-21 at 14:58, Duzlevski, Ognen wrote: >> Hi all, >>=20 >> we have a 40 node cluster (2 cpus each) and a cluster master that has >> attached storage over fibre, pretty much a standard thingie. > >Bottleneck #1... > >> All of the nodes get their shared space from the cluster master over >> nfs. I have a user who has set-up an experiment that fragmented a > >Bottleneck #2. > >[...] > >> Are there any usual tricks or setup models utilized in setting up >> clusters? For example, all of my nodes mount the shared space with >> rw/async/rsize=3D8192,wsize=3D8192 options. How many nfsd threads = usually >> run on a master node? Any advice as to the locations of NCBI=20 >databases >> vs. shared space? How would one go about measuring/observing for the >> bottlenecks? > >Local disk space via local IDE channels on a 40 node cluster has a >higher aggregate bandwidth than the NAS. Assuming old slow disks >speaking at 15 MB/s, 40 of these running in parallel gives you a 600 >MB/s IO capacity (non-blocking at that). Your NAS device gives you in >theory 200 MB/s, though it is likely to be less. If you use=20 >more modern >IDE drives that can talk at 30 MB/s, you will have about a 6:1=20 >bandwidth >advantage for local IO (more like 12:1) ... > >... and that would be true if the NAS were the weakest link in the >chain. It is not. It is the network. If you are lucky, and using a >gigabit network, then you have a 100 MB/s connection to the head node.=20 >No matter how fast that disk array is, you still have 100 MB/s to the >head node. As you increase the size of the cluster, the bandwidth of >this pipe out of the head node drops as 1/N(nodes). This is the 1/N >effect I occasionally talk about. Your scalability drops in this model >as you increase the number of nodes or the remote disk utilization. > >Ok, so now we know where the problem is, what can be done? =20 > >1) pre-distribute the databases. =20 >2) local IO only during run >3) combine results at end of run or end of bunches of runs. > >Increasing the number of NFSD's probably will not help. Increasing the >read and write size may help a few cases, but not likely this one. =20 > >You might look at dividing up the network access to the head node >between multiple network adapters. You will need different IPs for >them, and a specific network fabric design to enable this, not to >mention somewhat more complex mounting/routing efforts. But it can be >done. What you are doing here is forestalling solving the problem by >moving it back onto the PCI bus of the head node. You will still be >network bound, and the machines will still run sluggishly, but likely >less so than before. > >Joe > >--=20 >Joseph Landman, Ph.D >Scalable Informatics LLC >email: landman@scalableinformatics.com > web: http://scalableinformatics.com >phone: +1 734 612 4615 > >_______________________________________________ >Bioclusters maillist - Bioclusters@bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bioclusters >