Anand, NAS is good for manageability of data, but cannot really scale that well for the type of application you are talking about unless you invest of expensive NAS filers from NetApp or BlueArc. For high I/O request + small file access, you pay high penality on NFS/TCP/IP overhead with NAS architecture + 100MB/s per GigE connection (unless you do not trunk multiple GigE connections in the NAS filer). If performance is primary constraint + availability of data: I would go for the following configuration: Option 1: True SAN: Performance Driven Storage Hardware: FC Disk Array Module, SAN, >= 4 I/O servers (front end), Dual port FC Adaptor per host, FC switch depending on number of I/O hosts. For cheaper: go for SATA Disks instead of FC Disks Eg: LSILogic/Engenio, EMC etc Software: Redhat (formerly Sistina) GFS or Lustre or Polyserve File System to aggregate multiple volumes and export a single Global NameSpace. Option 2: SAN using iSCSI technology: Cheaper model compared to Option 1 Storage Hardware: SCSI Disk Array Modules, SAN, >= 4 I/O servers (front end), single iSCSI + TCP Offload GigE Adaptor, GigE switch depending on number of I/O hosts. For cheaper: go for SATA Disks instead of SCSI Disks Eg: LeftHandNetworks, FalconStor etc Software: Redhat (formerly Sistina) GFS or Lustre or Polyserve File System to aggregate multiple volumes and export a single Global NameSpace. Although this still involves TCP/IP overhead but Distributed nature of the file system + storage helps to process concurrent requests compared to NAS. Option 3: Use Distributed NAS model like Panasas. Option 4: Direct Attached Storage with Cluster File System (Lustre, GFS, GPFS) to aggregate storage capacity of individual nodes. Although it is cheaper, it is kinda difficult to Manage + availability is a concern. -Kums __ Kumaran Rajaram Verari Systems, Inc. Phone: 205-314-3471 x208 > Hello, > > I wanted to know which is a better alternative for a cluster of 48 nodes > (dual processor) that is working 24x7 for life science problems dealing with > extensive I/O's (small files) for performance. The kind of I/O's i am > talking about is small file read and writes say (10-20kb) each and 10000's > of these operations simultaneously on the file system. How well does a > distributed file system like GPFS on SAN works or a NAS storage works. > > We are in the process of designing a cluster for life science related > problem that will work on 10'000's of file's simultaneously from across the > linux cluster and we are hung up on the storage options the pro's and con's > of (GPFS on SAN) or (NAS device). If some body could point me to a right > direction it would be great because as i read from few sites they say NAS > devices are more preferred option but i could'nt find the reasons to support > either one of them. > > Thanks > > ASB >