[Bioclusters] Login & home directory strategies for PVM?

Wed Feb 9 12:52:13 EST 2005

Ultimately, I'm not too worried about the ability of the network or NFS 
server to handle the traffic.  However, as the number of nodes grow, 
I'm becoming very concerned about the sheer number of mounts that are 
required, with a corresponding increase in the RSS of the nfs server 
processes.  I've got some really big hardware, so I may be making a 
bigger deal of this than necessary, but with the number of nodes 
doubling every year.... I'll have to burn this bridge at some point.

What I'm considering doing for now is using PVFS2 for home directories 
($HOME).  PVFS2 seems to have improved redundancy- home directories on 
the cluster would only be temporary storage anyway, so not too much of 
a concern.  Databases are another question- I'm debating between PVFS2 
and rsync'ng/distributing the database directories to the nodes.  I 
guess I could go AFS/OpenAFS for that purpose too.

Long term I'm going to evaluate Lustre and GPFS, I think.  Lustre has 
some impressive clients, but I'm using the 2.6 kernel, so I'd want to 
wait until that gets a little more mature.  GPFS would work well with 
some of the AIX gear I've got...

Thanks for the advice and help

Michael

On Feb 7, 2005, at 1:14 AM, Tim Cutts wrote:

>
> On 6 Feb 2005, at 11:04 am, Tony Travis wrote:
>
>> Hello, Tim.
>>
>> We only have a 'small' 64-node cluster here :-)
>>
>> However, I've opted to use BOBCAT architecture:
>>
>> 	http://www.epcc.ed.ac.uk/bobcat/
>>
>> Although the original EPCC BOBCAT no longer exists, it's spirit lives 
>> on in our RRI/BioSS cluster:
>>
>> 	http://bobcat.rri.sari.ac.uk
>>
>> The important thing is to have TWO completely separate private 
>> network fabrics: One for DHCP/NFS, the other for IPC. The main 
>> problem we have is that IPC (i.e. Inter Process Communication) can 
>> swamp the bandwidth of a single network fabric and you rapidly lose 
>> control of the cluster.
>
> We don't have any IPC.  We don't run any parallel code.  Each job runs 
> on a single CPU.  And NFS *still* causes problems, occasionally.  It 
> really isn't a myth at this scale.  It's unusable.  For example, we 
> have to make separate copies of the LSF binaries on all of the 
> machines, because to do it the Platform-endorsed way, with everything 
> NFS mounted, is a bit flakey.  The NFS contention from LSF's house 
> keeping alone can be enough to break the cluster.
>
> I suspect if you're running large parallel jobs, then the number of 
> NFS operations involved is relatively low.  The issue for us is 
> sometimes hundreds of jobs completing every minute, all trying to read 
> some data files and then create three or four output files on an NFS 
> mounted disk.  That's a lot of separate NFS operations, a large 
> proportion of which are the particularly painful directory operations. 
>  I plead with the users not to write code like this, but you know what 
> users are like.
>
>> I think there are some MYTHS about NFS and clusters around because of 
>> the bandwidth contention on a single network fabric. The NFS network 
>> traffic on our cluster is completely segregated from the IPC traffic 
>> which is throttled by the bandwidth of its own network fabric. The 
>> switches on the two network fabrics are NOT connected in any way...
>
> Our approach is actually similar to yours; we're moving towards 
> cluster filesystems like GPFS and Lustre, and in those cases, we run 
> the cluster filesystem traffic over a second network.  It's actually a 
> VLAN on the same switches, but that's not the performance problem you 
> might think because the Extreme switches we use are fully 
> non-blocking.  You can throw an absolutely obscene number of packets 
> at them and they cope fine.  Even when a Ganglia bug caused a machine 
> to emit thousands of multicast packets to all 1000 machines every 
> second.  The ganglia daemons went into 100% CPU coping with the 
> incoming packets, which made the cluster almost unusable, but the 
> network itself was still going strong.
>
> Tim
>
> -- 
> Dr Tim Cutts
> Informatics Systems Group, Wellcome Trust Sanger Institute
> GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5  860B 3CDD 3F56 E313 4233
>
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>