[Bioclusters] NFS performance with multiple clients.
landman at scalableinformatics.com
Wed Nov 24 09:50:27 EST 2004
Humberto Ortiz Zuazaga wrote:
>We've only got 12 nodes or so up (AC problems), and our users are
>already complaining about lousy disk IO.
I have a rule of thumb when building clusters: Never more than about 10
machines (20 CPU's) to a single pipe on the file server. If you need
more performance you should look at alternatives on the file server and
connection to cluster aspect. A SAN will not help you here.
>I've written up a summary of some tests I've run, I'd like people to
>read it and tell me if this performance is adequate for our hardware, of
>if we should be looking for problems.
>Here is a summary of bonnie++ results for 1, 2, 4, 6 and 8 simultaneous
>bonne runs on the cluster, these are the average of however many bonnie
>processes were run simultaneously, results are in KB/sec.
>#Procs ch-out blk-out rw ch-in blk-in
>1 10285 10574 12116 28753 71982
>2 4296 4386 954 16965 22997
>4 2336 2266 412 7870 7913
>6 1098 602 286 2789 3545
>8 1322 970 181 2518 2750
I see a number of issues, but first, this beautifully (and sadly)
illustrates what I have been saying for years as the "1/N" problem. As
you take a single shared resource of fixed size (the single network pipe
into your server), start sharing it with N requestors (N processors or
nodes requesting traffic on that network pipe), you introduce
contention, and you get less resource on average per node as you
increase the number of nodes. That is, you are sharing a slow pipe
among N heavy users of that pipe, and you will on average get about 1/N
of that pipe per heavy user.
Second, it appears that your performance through your switch is
abysmal. 1 processor reading and writing should be able to hit about
25-35 MB/s on a gigabit network mounted RAID5 from a 3ware card. This
is what I see on mine, connected to an older/slower Athlon. You are
getting 10 MB/s. In fact it looks suspiciously like your network has
switched to 100 mode somewhere along the lines. Check each machine with
mii-tool or ethtool to see what state the networks are in. I had a
switch that continuously renegotiated speed until I turned this off on
the head node and compute nodes. It looks like your head node may have
Third, there are a number of kernel tunables that can improve the disk
IO performance. If you bug me, I can find a link for you.
Fourth, RAID 5 is not a high performance architecture. It is safe, just
not fast (even with 3ware units) on writes. RAID 5 can tolerate a
single disk failure. RAID 1 is a mirror and can tolerate a single
failure on one drive. RAID 10 is a RAID 0 (stripe) of RAID 1 (mirror).
Consumes lots of disk, but it is quite fast. RAID 50 is a RAID 0
(stripe) of a RAID 5 (crc). Consumes less disk, and is fast.
RAID 5 is slow on writes, as each write is a read-modify-write
operation. Best I have seen out of 3ware 75xx/85xx series units for
RAID5 writes is about 30-40 MB/s. Considering that your network pipe
should be capable of about 100 MB/s (gigabit), you should be
bottlenecked at the disk. You might want to consider moving to a RAID
10 if possible. You lose storage space, but gain speed.
You can also look at alternative architecture disks, or server mods. If
you contact me offline, I can give you some ideas.
>The rw (rewrite) results are especially lousy. Even with two bonnie
>clients, performance drops precipitously.
>Any comments, tips or suggestions?
>Bioclusters maillist - Bioclusters at bioinformatics.org
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 612 4615
More information about the Bioclusters