> Slinging a terabyte or two of traffic over the same worm-rotten, > ocasionally-managed corporate network that handles things like > payroll, HR, business apps etc. just to get some CPU cycles from a > bunch of cheap $900 desktop CPUs can be, um... problematic. I agree with this completely. I try to treat data motion as an "out of band" problem which is completely decoupled from the CPU scheduling and access problem. I have found that we can get good use out of those $900 desktops provided that I'm allowed to reserve 20GB (or so) for my target set and that I can populate that 20GB with my target data via cron / rsync / whatever on an automatic basis. All the scheduler really needs to know is whether or not the data is already on a particular node. This comes back to a very old saw indeed: Not all problems are suited to parallel computing. -Chris Dwan