[Bioclusters] RE: non linear scale-up issues?

Chris Dwan bioclusters@bioinformatics.org
Tue, 11 May 2004 18:56:46 -0500


> Slinging a terabyte or two of traffic over the same worm-rotten, 
> ocasionally-managed corporate network that handles things like 
> payroll, HR, business apps etc. just to get some CPU cycles from a 
> bunch of cheap $900 desktop CPUs can be, um... problematic.

I agree with this completely.

I try to treat data motion as an "out of band" problem which is 
completely decoupled from the CPU scheduling and access problem.  I 
have found that we can get good use out of those $900 desktops provided 
that I'm allowed to reserve 20GB (or so) for my target set and that I 
can populate that 20GB with my target data via cron / rsync / whatever 
on an automatic basis.  All the scheduler really needs to know is 
whether or not the data is already on a particular node.

This comes back to a very old saw indeed:  Not all problems are suited 
to parallel computing.

-Chris Dwan