What exactly is the role of Paos in Loci? As I understand it, Paos is simply a way of moving objects back and forth. I think that Justin's original suggestion was good, with some details that need to be worked out. With an XML parser, it should be trivial to convert a LociML file to an object in any language (Python, Perl, C++, etc.). Each of the four sections should be accessible separately, and the part of that section corresponding to a specific step should be equally easy to access. The state is the section that will need to be passed around frequently, the others should be static, except for the data. That can have one part for the original data, and then subsequent parts for the results of the analyses. The main difference from Justin's original model that I'm suggesting regarding the structure is that _all_ sequence and related data be stored in the <data> section and then referred to in the queries. Parts would be appended to that with the output of a specific step, containing chunks of data surrounded in an identifying XML tag, such as <protein>, <dna>, <rna>, etc. with whatever identifiers would seem appropriate for that step. Whatever acts as the master controller for this analysis sequence will be in charge of putting all of these pieces together and sending them back to the Workspace. The status section could be sent over an open socket every time the client requests it (so that the updating occurs as fast as the client and server can handle it, but not faster). The data can be streamed over a separate socket for each step or even for each part of each step. So for a generalized example: ** indicates a "URL" that is accessed " indicates data that is transmitted over the actual socket ** lociwfs://some.wfs.server/my-analysis?create would be used to create a query with the name my-analysis + a unique suffix. This name would be sent back to the client and would be the 'session ID'. The client would send all the relevant information at this time, the <query> and the <data>. The server would return " <control sessionID="ID_name"> " <step id="q1" server="some.locus" /> " [more steps] " <step id="q5" serverpending=true /> " [more steps] " </control> after figuring out what servers will do which step or indicate that it doesn't yet know. The connection can now be closed or left open. This could be specified in the request or not specified at all (if the client closes the connection, then it's closed). ** lociwfs://some.wfs.server/sessionID?status This would be the status socket, which could be closed and then reconnected at any time. Upon connection, the server would send a control section and then a status section " <status> " <step id="q1" state="finished"> " <output type="protein" id="protein" size="121331" /> " <message>Analysis finished.</message> " </step> " <step id="q2" state="failed"> " <message>Analysis failed. Error: ... </message> " </step> " <step id="q3" state="aborted"> " <message>Aborted by user</message> " </step> " <step id="q4" state="processing" completion="2%"> " <output type="dna" id="dna1" size="121331" /> " <!-- reports size of data available so far --> " <output type="dna" id="dna2" size="0" /> " <message>Processing... Reading Sequence...</message> " </step> " <step id="q5" state="waiting"> " <message>Waiting for output from step q4</message> " </step> " <step id="q6" state="pending"> " <message>Searching for available server...</message> " </step> " </status> The client would then send " <querystatus /> to get another status section. Every time the control section is updated (a new server is found), a new control section would be sent. This could also be requested explicitly by " <querycontrol /> if the client sends " <cancel /> at any time, the analysis would be aborted, " <cancel stepid="q1" /> should also be possible. The socket would be closed by the server when all of the steps are either aborted, failed, or finished. It could be closed by the client at any time. ** lociwfs://some.wfs.server/sessionID?cancel[.stepid] would do the same as the <cancel ... /> 'command'. The server would then send a complete <control> and <status> section. ** lociwfs://some.wfs.server/sessionID?data[.stepid[.blockid#offset]] would send all available data (or from a specific set or block of a set). Offsets only make sense for the specific blocks of the step, as they change as the output grows. The data would be streamed to the client as more is amde available if the request was for a specific block. Actually, now that I think about it, the data for a step or all the data shouldn't be made available like this until all of the parts/steps are finished processing. Partial data should only be available for a specific part. ** lociwfs://some.wfs.server/sessionID?reject I think we should include this to prevent the server from being loaded up with unnecessary sessions. However, that brings up an important point: Should we implement a login/password system and the ability to control readability permissions. I guess permissions should be rw for the creator of the session and ro for others. If a wfs wants to keep certain data unreadable to certain people, I don't think we need to implement that, they should either allow general access or access only to those with accounts. This information should also be available, possibly in a separate <meta> section that would be sent in the final type of request: ** lociwfs://some.wfs.server/sessionID?info ** lociwfs://some.wfs.server/sessionID?report ** lociwfs://some.wfs.server/sessionID?fullreport would send an entire report of the session, complete with all control information, the final status information (to keep error messages available), and the query. If ...?report is specified, then the server would send output data as well. If ...?fullreport is specified, input data would also be sent. This would only be accessible after the analysis is complete and the session is closed. The structure of the data section would be as follows: <data> <input> [data block] </input> <output> <step id="q1"> [data block] </step> [more steps] </output> </data> A data block would be structured as follows (I'm open to better ideas here): <protein id="ID_name" [other parameters specific to proteins]> [data] </protein> likewise, you can have <dna> and <rna> data blocks. Of course we'll have to devise a protocol-level error reporting system to report, for example, that a sessionID is non-existent or that data is not available, or that the offset is larger that the current data. I included a lot of detail here, but I'm open to discussion, I just put the details in so we had a concrete example on the table to work with/argue about. Nothing here is set in stone, especially since I haven't a clue what I'm talking about :). This is only for the communication between the wfs server and the workspace. The communication between the wfs server and the loci can be done in a different way. That can and maybe should involve Paos specifically. We can worry about that later. Also, I think that the definition for transfer in the glossary should also include objects. Whatever, it's 2AM and I feel like nitpicking. There, I feel much better. :)P Wheew, my brain is _tired_. Now to get some sleep... -- -> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <- -> -/-=-=-=-=-=-=-=-=-=/ { Rahul -<>- Jain } \=-=-=-=-=-=-=-=-=-\- <- -> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <- -> -/- http://photino.sid.rice.edu/ -=- mailto:rahul-jain at usa.net -\- <- |--|--------|--------------|----|-------------|------|---------|-----|-| -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GED/S/CS/MD/M/P/O/U/! d->-- s:-(--) a--->? C++(+++)$ UL++++$ P+++$>++++ L+++$>++++ !E--(----)? W++$>+++ N+(--) o>++++$ K? !w---()>? !O? M+ !V--? !PS+? PE+() Y+(++) PGP>+ t !5-->? !X-- R>+ !tv-(+) b+>++ DI+(+++)>++++ D++@>$ G e(*)>++++>+++++>$ h-()>++ r? y? ------END GEEK CODE BLOCK------ See also: http://www.hewgill.com/ogr/ http://www.douglasadams.com Version 11.423.999.210000101.23.50110101.042 (c)1996-1999, All rights reserved. Disclaimer available upon request.