justin at ukans.edu said: > That's a good point. We haven't considered slow networks or very > large files. > I have, and that's my problem with 1) how Paos passes objects -- it > sends the whole thing. I would prefer > just sending updates. Breaking up the data into linked objects > could be > an adequate compromise. Paos makes me nervous too. It looks complex, and I can't see what it buys us over CORBA. Orbit is already a standard part of gnome, and we may as well leverage as much as we can from other efforts. > 2) the independently roaming object concept > where it's passed directly > from tool to tool. Without a "home" everything has to be passed, > and by > the end of a complex series, that could be a large object. > I'm beginning to think the optimal solution is a virtual interface (or > set of optional interface) across all junctions. It's the most > efficient (only what the receiving end wants is sent [and only the > receiving end really knows what it wants]). So, data objects have an URI, and a loci can request the data it needs by URI. The local locid can fetch remote data objects, and cache them. Each part of a pipleline of loci can request only the data objects it needs. Your local locus requests it be sent the results that it wants, and only those, and displays them for you. This way only the necessary data objects need be transferred. Imagine a service that annotates a blast search: your locus sends the sequence data to the blast server, the blast server sends the matching genbank UID's to the annotation server, the annotation server may have a local copy of genbank, and gets the sequences from there, then sends the UID's and the feature annotations back to your local locus, which may have to fetch some of the UID's from genbank, then applies the annotations and displays the result. > It's completely language > independent, as well as "junction" indepedent (each end has a standard > interface, regardless of whether a C, Python, or Perl script is on the > other end, or whether the two are communication via CORBA, TCP/IP, UDP/ > IP, shared memory, a pipe, a dynamically-loaded plug-in interface). This sounds good, and can help make sure we don't overcommit to PAOS. We just need a simple way of communicating between loci, "here's this data, please run foo v2 on it", "have your results, formatted for bar v1" > This interface method requires a home location where the object > resides throughout its processing life-time. This is what I had > envisioned the work flow system to be (ie. coordinating it's various > objects, where and when they connected, etc). This could be located on > the client machine, and it allows the various other loci to be really > dumb (which means small). Data objects can be identified by URI's with special URI's for data on a local disk (the locid will have to have some way to service requests for your local data, possibly from multiple loci). But now say we want to run a five step pipeline on 2GB worth of genomic sequences, each of the five loci may want a copy of the sequence, which means our machine will send the file five times. Try that over a modem! Caching at loci hubs can help solve this problem. -- Humberto Ortiz Zuazaga Bioinformatics Specialist Institute of Neurobiology hortiz at neurobio.upr.clu.edu