[Bioclusters] quick look see at fractal computing.

Wed Feb 22 19:00:27 EST 2006

We have limited the query size initially in order to manage a surge in
usage, which we experienced today.  If you want to really blast us, please
contact Nick Robertson at nick at massivelyparallel.com.  He'll get ya hooked
up so you can test our system with a massive query.  You can also talk with
one of our mega users who blasts us at least once a quarter.  

K

-----Original Message-----
From: James Cuff [mailto:jcuff at broad.mit.edu] 
Sent: Wednesday, February 22, 2006 3:58 PM
To: bioclusters at bioinformatics.org
Subject: [Bioclusters] quick look see at fractal computing.

Hi all,

I was reading GenomeWeb News this morning, and an article about the Howard
Fractal-based computing(tm) and fractal-based communication(tm) models
rather caught my eye.

So I decided to take the new MPT Blast Query server over at
http://www.mptbiotech.com/ for an outing, just for a quick look see.

Standard disclaimers apply, this was just a quick test, it is probably full
of holes, for which I apologise in advance.

I sort of consider myself a 'DNA man' these days, so I decided to look at
the old faithful DNA/DNA blastn code, that always runs fairly bad on
clusters because of I/O, etc. etc. yada yada.

Anyway, my first big problem started when I found that there was a limit to
the amount of DNA one can put in the 'power user portal':

Errors Encountered
# Query (1) is 207954 aa long; this exceeds maximum allowable length of 7000
aa

No worries, I'll carry on.  So as a test we compared the bottom 6,700 odd
bases of chr5 of zebrafish:

node209 /tmp/ wc -c test2.mpt
   6737 test2.mpt

As a comparison we took a single machine with 4GB memory, and the current NT
database split into: 5 chunks: nt.00 nt.01 nt.02 nt.03 nt.04 which were also
read in over a pretty loaded production NFS server, there is not enough
memory to cache it all.  I would like to point out that this is a *really*
bad configuration, but for the test it will do.
I just wanted a worse case baseline scenario.

This was the result of our basic run:

time blastall -a2 -nT -p blastn -i test2.mpt -d nt > ourtest.out
46.250u 7.900s 0:30.33 178.5%   0+0k 0+0io 391341pf+0w

The two copies of NT available here and at MPT were slightly different
sizes, so I report a letters/second number below:

*  MPT total RAIS time 10.45s for 14,192,730,777 letters
   (1358156055 letters / second)

*  A dual CPU Intel box took 30.33s for 15,994,705,008 letters
   (527355918 letters / second) 

So I make that a speed up of only 2.57 times faster over a single dual
processor server.  

We also produced 250 (blast default) alignments the MPT server only
managed to find 156, with the limits set to ask for more.  So something
might also be slightly wrong there.

I guess the proof of the pudding would to use much larger data sizes
and do a real bake off to see the real performance difference.

I'd love to see one of the vendor agnostic groups that hang out on this
list to work with MPT to really nail this down in an independent report.

I'm sure my simple minded test here does not reflect the true power of
the method.

Best regards,

J.

_______________________________________________
Bioclusters maillist  -  Bioclusters at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters