[Bioclusters] Re: Using semi-public PCs for heavy computation jobs

Mon, 16 Feb 2004 10:30:56 -0800

Hi Arnon,

For Windows, you might also look into some of the work being done by David
Lifka and crew at the Cornell Theory Center HPC Solutons Center,
http://www.ctc-hpc.com.

I built a database-centric job schedule/execution environment using all
Windows-based tools (rcmd, ...) while at Perlegen, where it was used to
process a 100 TB+ microarray image repository.  You still need to get the
SysAdms involved - but access/participation can be controlled via an Active
Directory security model and standard Windows services / apps (which may go
over better with IT).

We had a 60+ node cluster, and at the time didn't need to cycle-steal from
our desktops, but the system was designed to do that.

Your biggest challenge will be data distribution, as described in a previous
post.  There are two basic strategies, depending on how much local disk you
have access to (or how fast your remote access is to shared storage).  

I'd be happy to share my experiences further - feel free to contact me
directly.

Bruce Moxon
Chief Solutions Architect, Panasas Inc.
Delivering the premier storage system for scalable Linux clusters

www.panasas.com
bmoxon@panasas.com
510-608-7778