[Bioclusters] Use of FPGA based BLAST accelerators?

Chris Dwan (CCGB) bioclusters@bioinformatics.org
Thu, 12 Dec 2002 20:50:35 -0600 (CST)


> if I have a new algorithm coded yesterday, how will it perform "as
> is" on these systems, and how much work (and how cooperative is TimeLogic)
> in making it screaming fast?

There is no "compiler" for the TimeLogic (TL) system.  When they
release a new version of the board (think of a flash memory upgrade),
it sometimes comes with new supported algorithms.  Usually not. 

You can't put your own algorithm on their hardware.  If you want
something put in, you work with TimeLogic.

> Would you agree that in order to run things practically you wouldn't
> really want to "just" have the TimeLogic machine and not the normal 
> cluster?

Bingo.  The TL device frees up the cluster for custom apps.  So far as
I'm able to enforce it, the cluster never wastes any time running
BLAST, HMMer, and the other applications that are available on the
TL. 

Of course, that's a dream scenario.  In reality, I'm able to push for
large, routine BLAST and HMMer jobs to be on the TL. 

> how much extra nuisance/time do you think this two-way approach asks for?
> i.e. in a standard system there is no lag between prototyping and running,
> because if it works on one of the cluster CPUs, it will work on the rest (of
> course with a good sysadmin)

Hard to say.  On the one hand, all the tools on the TL are pretty much
commodity, so they're not the nifty and difficult part of your
system.  (How much time do you spend debugging BLAST?)

On the other hand, the value of having the same binaries on all your
cluster machines can't be overstated in terms of getting parsers to
work reliably.

A good question might be:  "What is the optimal ratio of specialized
hardware to general purpose farm nodes, assuming that price
per throughput is pretty much equal between the two."  

> Finally, at our end, we like to have a fully automated pipeline that works
> with LSF/SGE/etc and takes care of each job, input, output, etc. How
> easy/difficult do you see this sort of system on the TimeLogic machine? i.e.
> is it a black box with its proprietory buttons that I need to push, or can I
> go in and build and use my own buttons?

TL has their own queuing system which loads the jobs onto and off the
boards.  

I have some middleware (a metascheduler, if you want to dignify it)
to manage job interactions between the dedicated cluster in the
machine room (SGE on Linux on P-III's), a student lab of Apple G4's
(cycle stealing using SGE), and the TimeLogic.

You could imagine a really beautiful "grid" like system if it makes
you happy to think about such things.  That's not what we have at 
all.  We really have a gross set of legacy hacks that get the job
done and use the (free) resources that I can negotiate my way onto.

It's pretty trivial to build your own hooks for the TL command line
tools and use those from a DRM like LSF, PBS, SGE, or Condor.  I have
zero experience with the TL API, because I haven't needed to mess with
it. 

Someday, we may make the jump to Globus, but not today.

I hope this is helpful to you.

  Chris Dwan
  Center for Computational Genomics and Bioinformatics
  University of Minnesota