[Bioclusters] Question about grid

Dan Bolser bioclusters@bioinformatics.org
Thu, 13 May 2004 14:57:52 +0100 (BST)

On Thu, 13 May 2004, Arnon Klein wrote:

>That's a really exciting concept, but why stop at dedicated cpu-service 
>servers? Why not harness P2P into this, where each client has the 
>ability both to donate cycles and use other people's cycles.
>I can see this benefitting organisations such as universities, where 
>there is a lot of unuesed desktop power, and many people who need that 
>power, but for short periods only, so didn't get dedicated facilities.

Right! I was strugling with the problem of how to include non web server
machines in the system (behind firewall, not running a webserver, on the
same lan as the webserver). If the webserver could privatly talk to
'internal' nodes (via intranet), it could then expose this via its
internet connection.

Cool idea!

>So imagine running these servers/client hybrids, that accept code (in 
>binary, bytecode, or source code format), on anywhere between thousands 
>to millions of computers...
>If the code is self-contained, then the bandwidth and latency issues are 
>not as big as they would be for small chunks of instructions. I think 
>that even with very fast networks, latencies will kill the benefits when 
>scaling into something larger than a single LAN segment, so to avoid it, 
>you have to batch the instructions together (i.e. send complete functions).

I see, so the 'calculator' as I describe it could be an actuall
programming language? Sending code is all well and good, but it adds the
complexity which I want to remove. 

However, if you could 'install' your code 'on-the-web' (i.e. standard
packages exposed on a server), then we could all use the same code /
distribute code (packages) in P2P environment.


>Doing this in Java is actualy pretty easy, since RMI lets you transport 
>an object containing both code and data over the network.
>You put up a server, exposing a method such as:
>interface Computable {
>	public Object compute() throws Exception;
>public Object compute(Computable job) throws RemoteException;
>and using the Java RMI facilities, call this method on the server with 
>an object that implements a method called "compute" that does the 
>Ofcourse , like you said, security and accounting issues will pose 
>problems for a wide-spread installation.

This is the problem with sending code again, and why existing grid
projects are quite complex (as I understand them).

If you send low level code, security and accounting are not a problem. You
just have to deal with load balancing, client selection etc..

Thanks for your comments,


>Dan Bolser wrote:
>>I had an idea to do with grid computing, but it may be total garbage.
>>I heard about some clever people who started to 'steal' computation from
>>unsuspecting web sites by hijacking the normal function of the site and
>>co-opting its computations into a different program. 
>>If these stories are true, surly we could do this with a bit more
>>civility, and set up a bunch of generic 'calculators' through the web
>>which could then be used for grid computing.
>>The way I imagine the system is this... 
>>Program starts by searching the web for calculators, the code is compiled
>>for the 'web-engine' so every single instruction is encoded as an HTTP /
>>CGI / XML request, and all instructions are performed over the web on a
>>shifting number of calculators.
>>Actually, I found something similar hear...
>>I wanted to ask about the feasibility of such an idea. 
>>For example if one machine sent all its instructions to another over a
>>gigabit intra net, how much slower would this be than local computation?
>>Is a gigabit LAN 1/2/3/10/100/1000 orders of magnitude slower than
>>internal CPU communication channels?
>>The power of an open source system like this would be if someone like
>>Apache would take the idea on board and release it as part of its standard
>>distribution. However, even if every web server on the web were running
>>such a calculator (why not be ambitious), could the system be fast enough?
>>Naturally there are a lot of issues regarding distribution / allocation /
>>scheduling etc. but before we get into nasty details, is the idea remotely
>>worth consideration?
>>How difficult would it be to make a Java compiler accommodate such a 
>>Thanks very much for any feedback,
>>Bioclusters maillist  -  Bioclusters@bioinformatics.org
>Bioclusters maillist  -  Bioclusters@bioinformatics.org