[Pipet Devel] inter-locus communication

Wed Mar 3 03:13:57 EST 1999

What exactly is the role of Paos in Loci? As I understand it, Paos is
simply a way of moving objects back and forth. I think that Justin's
original suggestion was good, with some details that need to be worked
out.

With an XML parser, it should be trivial to convert a LociML file to an
object in any language (Python, Perl, C++, etc.).

Each of the four sections should be accessible separately, and the part of
that section corresponding to a specific step should be equally easy to
access. The state is the section that will need to be passed around
frequently, the others should be static, except for the data. That can
have one part for the original data, and then subsequent parts for the
results of the analyses. The main difference from Justin's original model
that I'm suggesting regarding the structure is that _all_ sequence and
related data be stored in the <data> section and then referred to in the
queries. Parts would be appended to that with the output of a specific
step, containing chunks of data surrounded in an identifying XML tag, such
as <protein>, <dna>, <rna>, etc. with whatever identifiers would seem
appropriate for that step. Whatever acts as the master controller for this
analysis sequence will be in charge of putting all of these pieces
together and sending them back to the Workspace. The status section could
be sent over an open socket every time the client requests it (so that the
updating occurs as fast as the client and server can handle it, but not
faster). The data can be streamed over a separate socket for each step or
even for each part of each step.

So for a generalized example:
** indicates a "URL" that is accessed
" indicates data that is transmitted over the actual socket

** lociwfs://some.wfs.server/my-analysis?create

would be used to create a query with the name my-analysis + a unique
suffix. This name would be sent back to the client and would be the
'session ID'. The client would send all the relevant information at this
time, the <query> and the <data>. The server would return " <control
sessionID="ID_name"> "  <step id="q1" server="some.locus" /> "  [more
steps] "  <step id="q5" serverpending=true /> "  [more steps] " </control>
after figuring out what servers will do which step or indicate that it
doesn't yet know. The connection can now be closed or left open. This
could be specified in the request or not specified at all (if the client
closes the connection, then it's closed).

** lociwfs://some.wfs.server/sessionID?status

This would be the status socket, which could be closed and then
reconnected at any time. Upon connection, the server would send a control
section and then a status section

" <status>
"   <step id="q1" state="finished">
"     <output type="protein" id="protein" size="121331" />
"     <message>Analysis finished.</message>
"   </step>
"   <step id="q2" state="failed">
"     <message>Analysis failed. Error: ... </message>
"   </step>
"   <step id="q3" state="aborted">
"     <message>Aborted by user</message>
"   </step>
"   <step id="q4" state="processing" completion="2%">
"     <output type="dna" id="dna1" size="121331" />
"        <!-- reports size of data available so far -->
"     <output type="dna" id="dna2" size="0" />
"     <message>Processing... Reading Sequence...</message>
"   </step>
"   <step id="q5" state="waiting">
"     <message>Waiting for output from step q4</message>
"   </step>
"   <step id="q6" state="pending">
"     <message>Searching for available server...</message>
"   </step>
" </status>

The client would then send
" <querystatus />
to get another status section.
Every time the control section is updated (a new server is found), a new
control section would be sent. This could also be requested explicitly by
" <querycontrol />

if the client sends
" <cancel />
at any time, the analysis would be aborted,
" <cancel stepid="q1" />
should also be possible.

The socket would be closed by the server when all of the steps are either
aborted, failed, or finished. It could be closed by the client at any
time.

** lociwfs://some.wfs.server/sessionID?cancel[.stepid]

would do the same as the <cancel ... /> 'command'. The server would then
send a complete <control> and <status> section.

** lociwfs://some.wfs.server/sessionID?data[.stepid[.blockid#offset]]

would send all available data (or from a specific set or block of a set).
Offsets only make sense for the specific blocks of the step, as they
change as the output grows. The data would be streamed to the client as
more is amde available if the request was for a specific block. Actually,
now that I think about it, the data for a step or all the data shouldn't
be made available like this until all of the parts/steps are finished
processing. Partial data should only be available for a specific part.

** lociwfs://some.wfs.server/sessionID?reject

I think we should include this to prevent the server from being loaded up
with unnecessary sessions. However, that brings up an important point:

Should we implement a login/password system and the ability to control
readability permissions. I guess permissions should be rw for the creator
of the session and ro for others. If a wfs wants to keep certain data
unreadable to certain people, I don't think we need to implement that,
they should either allow general access or access only to those with
accounts. This information should also be available, possibly in a
separate <meta> section that would be sent in the final type of request:

** lociwfs://some.wfs.server/sessionID?info
** lociwfs://some.wfs.server/sessionID?report
** lociwfs://some.wfs.server/sessionID?fullreport

would send an entire report of the session, complete with all control
information, the final status information (to keep error messages
available), and the query. If ...?report is specified, then the server
would send output data as well. If ...?fullreport is specified, input data
would also be sent. This would only be accessible after the analysis is
complete and the session is closed.

The structure of the data section would be as follows:
<data>
  <input>
    [data block]
  </input>
  <output>
    <step id="q1">
      [data block]
    </step>
    [more steps]
  </output>
</data>

A data block would be structured as follows (I'm open to better ideas
here):
<protein id="ID_name" [other parameters specific to proteins]>
  [data]
</protein>

likewise, you can have <dna> and <rna> data blocks.

Of course we'll have to devise a protocol-level error reporting system to
report, for example, that a sessionID is non-existent or that data is not
available, or that the offset is larger that the current data.

I included a lot of detail here, but I'm open to discussion, I just put
the details in so we had a concrete example on the table to work
with/argue about. Nothing here is set in stone, especially since I haven't
a clue what I'm talking about :).

This is only for the communication between the wfs server and the
workspace. The communication between the wfs server and the loci can be
done in a different way. That can and maybe should involve Paos
specifically. We can worry about that later.

Also, I think that the definition for transfer in the glossary should also
include objects. Whatever, it's 2AM and I feel like nitpicking. There, I
feel much better. :)P

Wheew, my brain is _tired_. Now to get some sleep...

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain at usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GED/S/CS/MD/M/P/O/U/! d->-- s:-(--) a--->? C++(+++)$ UL++++$ P+++$>++++
L+++$>++++ !E--(----)? W++$>+++ N+(--) o>++++$ K? !w---()>? !O? M+ !V--?
!PS+? PE+() Y+(++) PGP>+ t !5-->? !X-- R>+ !tv-(+) b+>++ DI+(+++)>++++
D++@>$ G e(*)>++++>+++++>$ h-()>++ r? y?
------END GEEK CODE BLOCK------
See also: http://www.hewgill.com/ogr/  http://www.douglasadams.com
   Version 11.423.999.210000101.23.50110101.042
   (c)1996-1999, All rights reserved. Disclaimer available upon request.