[Pipet Devel] Distributed Filesystems WAS Choice of ORB implementations

Sat Apr 8 10:36:04 EDT 2000

I'm going to try to summarize some of the stuff Jarl and Jeff have 
been talking about and then offer my opinions. Please smack me up if I 
have represented anyone wrong.

My quick summary:

Okay, so I originally proposed a way to use the naming and trading 
services of corba to manage remote vsh instances and "search" for 
specific nodes within registered nodes.
    Jeff and Jarl have proposed an alternative way of doing things 
using a distributed filesystem approach. The choices that were 
mentioned for implementing this were:

Jungle Monkey ->  http://www.junglemonkey.net/
Gnutmeg -> http://sourceforge.net/project/?group_id=3965

My thoughts:
    I guess I sort of see what you guys are thinking about, but I 
don't really understand exactly how you are proposing to work this 
type of system into vsh. Let me try to think through it:

1. There is a central server (at TOL) that all instances of vsh will 
register with.

2. This central server will allow services for browsing and searching 
the registered vsh instances (like the screenshots on the Jungle 
Monkey page).

3. Each vsh instances will make available files that remote users can 
browse. Here, I don't really understand what kind of files you want to 
make available--xml files describing available nodes and subnets? 

4. A user looking through the files finds a subnet they want to run on 
a remote vsh implementation. Then they need to connect to the remote 
vsh system (via CORBA) and request the available subnet information. 
How does the user get the object reference for the remote system?

5. Then things would work the same in both of the proposals. The local 
user would incorporate the remote node into their work flow diagram 
and submit the work flow diagram for processing, and then during 
processing all of the dl -> bl and bl -> dl and dl -> dl stuff would 
have to happen to do the proper processing (I won't go into that again 
here since that's not what we are discussing).

So, do I have it semi-right? What are your guys ideas for how this 
will work?

I also have a couple of other issues:

1. As Jarl mentioned, the distributed filesystem plan depends heavily 
on a central server to handle everything, and then we get into 
problems with load on the server and what happens if it goes down. 
Although I would propose that we start _initially_ with a central 
naming and trading service, it is possible (ie. has been done--not 
that I can do it yet :) to distribute these services over multiple 
computers. How does the distributed filesystem plan scale up?

2. How does authentication work for viewing everyones files? Does all 
this occur in the central server? If so, it seems like this might make 
the central server a big target for cracking, since if you could do so 
you would have access to everyone's files. For the corba services 
plan, the authentication would occur at each vsh instance, which might 
at least make things less of a target.

3. Jeff, you wrote this:

> Napster has an interesting security feature, even if it was 
unintentional: You
> don't connect directly to a remote system.  You connect to the 
Napster server,
> and the Napster server connects to the remote system.  How would 
something
> like that jive with our plans?

Does this imply that every connection and filesharing and everything 
has to funnel through the main server? For what I want to use vsh for, 
this would be a serious pain. For a biological example (sorry, that's 
all I can think of :-), lets say I had three Suns running local BLAST 
servers and had 30,000 sequences to query. I'd like to be able to use 
vsh to "distribute" the request over the three Suns (ie. 10,000 on 
each) as a kind of cheap cluster or something. This is entirely a 
local use (and could even be occuring on one local network), but would 
I have to funnel all the connections through some remote central 
server? I don't like this, and would rather not even have to register 
my Suns with the remote server.
     The thing I like about the corba services are that they are 
voluntary and more of a convenience--if you could get a hold of on 
object instance in another way, you wouldn't even have to use them. 
How will the distributed filesystem plan work?
    Sorry to be so long--thanks if you read through all of this :-)

Brad