[Bioclusters] Linux cluster storage question (SAN/NAS/GPFS)

Kumaran Rajaram bioclusters@bioinformatics.org
Wed, 18 Aug 2004 13:58:24 -0500 (CDT)


Anand,

   NAS is good for manageability of data, but cannot really scale that
well for the type of application you are talking about unless you invest
of expensive NAS filers from NetApp or BlueArc. For high I/O request +
small file access, you pay high penality on NFS/TCP/IP overhead with NAS
architecture + 100MB/s per GigE connection (unless you do not trunk
multiple GigE connections in the NAS filer).

    If performance is primary constraint + availability of data: I would
go for the following configuration:

Option 1: True SAN: Performance Driven

Storage Hardware: FC Disk Array Module, SAN, >= 4 I/O servers (front end),
                  Dual port FC Adaptor per host, FC switch depending on
                  number of I/O hosts.
                  For cheaper: go for SATA Disks instead of FC Disks
	          Eg: LSILogic/Engenio, EMC etc

Software: Redhat (formerly Sistina) GFS or Lustre or Polyserve File System
         to aggregate multiple volumes and export a single Global NameSpace.

Option 2: SAN using iSCSI technology: Cheaper model compared to Option 1

Storage Hardware: SCSI Disk Array Modules, SAN, >= 4 I/O servers (front end),
                  single iSCSI + TCP Offload GigE Adaptor, GigE switch
                  depending on number of I/O hosts.
                  For cheaper: go for SATA Disks instead of SCSI Disks
		  Eg: LeftHandNetworks, FalconStor etc

Software: Redhat (formerly Sistina) GFS or Lustre or Polyserve File System
         to aggregate multiple volumes and export a single Global NameSpace.
         Although this still involves TCP/IP overhead but Distributed
         nature of the file system + storage helps to process concurrent
         requests compared to NAS.

Option 3: Use Distributed NAS model like Panasas.

Option 4: Direct Attached Storage with Cluster File System (Lustre, GFS,
GPFS) to aggregate storage capacity of individual nodes. Although it is
cheaper, it is kinda difficult to Manage + availability is a concern.

-Kums

__
Kumaran Rajaram
Verari Systems, Inc.
Phone: 205-314-3471 x208

> Hello,
>
> I wanted to know which is a better alternative for a cluster of 48 nodes
> (dual processor) that is working 24x7 for life science problems dealing with
> extensive I/O's (small files) for performance. The kind of I/O's i am
> talking about is small file read and writes say (10-20kb) each and 10000's
> of these operations simultaneously on the file system. How well does a
> distributed file system like GPFS on SAN works or a NAS storage works.
>
> We are in the process of designing a cluster for life science related
> problem that will work on 10'000's of file's simultaneously from across the
> linux cluster and we are hung up on the storage options the pro's and con's
> of (GPFS on SAN) or (NAS device). If some body could point me to a right
> direction it would be great because as i read from few sites they say NAS
> devices are more preferred option but i could'nt find the reasons to support
> either one of them.
>
> Thanks
>
> ASB
>