[Pipet Devel] WHAX and Loci storage ideas

J.W. Bizzaro bizzaro at geoserve.net
Tue Dec 7 09:51:02 EST 1999

Gary Van Domselaar wrote:
> It good to know that someone is thinking about data storage issues for
> Loci.

...cuz we can't rely on Jeff ;-)

> 'middleware'.  The database used to store the individual loci contained
> within a 'container locus' would be another example.

Interesting point.  A database for a locus's workflow data is middleware, but
a database for a locus's processable data is back-endware.

> mention of data-type:  Loci can work for physicists as well as it can
> for bioinformaticists, but we are all bioinformaticists here, so we

And Brad, this is why you can give an example of ant-eater physiology.  If any
one of us designed Loci for ourselves, the audience would be very small.  Even
within the scope of bioinformatics, it would be limited.

> Although I'm not the absolute authority on Loci's architecture,

No one is ;-)

> On the other hand, what if we planned to do our entire thesis project
> based upon the information kept in that 2 Terabyte file? Would we want
> to retrieve it from the NCBI database everytime we wanted to do an
> analysis on it, especially if we wanted only to search a small segment
> of it? No way! we would wan to have that file stored in a fashion
> wherein we could easily extract only the parts that we are interested in
> performing an analysis on. This is where Loci's ability to store
> sequence data in a database becomes important.

Everytime Loci 'points' to a locus (see my last message), the user should have
the option to download the whole thing.  If remote_locus_1 is a processor and
remote_locus_2 is the data, and they both reside on the same remote computer,
NOTHING should be passed back to the user but the results of the process. 
This is why we use pointers (URI's - not C pointers): low bandwith usage,
convenience.  But if the user really wants remote_locus_2 on his/her computer,
he/she should be able to 'get it'.  I haven't thought about how the user
interface for this would work.

> The OMG LSR ( http://www.omg.org/homepages/lsr/) Biomolecular Sequence
> Analysis working group has a nearly complete RFP
> (http://www.omg.org/techprocess/meetings/schedule/Biomolecular_Sequ._Analysis_RFP.html)
> for sequences and their alignment and annotation.  Loci plans to adopt
> their CORBA IDL for passing biomolecular sequence objects to
> CORBA-compliant backend apps.  This RFP has 'XML extensions' for future
> compatability, btw.

Right, and AppLab and some others have adopted the RFP.

> My understanding is that Loci will come with 'data translators'
> (middleware) that will be placed between a document / database to
> accomodate the formatting requirements of the analysis program that will
> operate on the document.

Again, it depends on whether Brad was talking about workflow data or
processable data.

> I think this is appropriate only for Loci's own internal data
> requirements, but violates Loci's 'laissez-faire' paradigm for operating
> on 'exogenous' data. Jeff explained to me best when he said that Loci
> should be like the Bash shell: the bash shell has redirection operators
> and pipes, which you can combine to do some fairly sophisticated data
> processing, for example:
> bash$ cat /var/adm/messages | grep "root" > /tmp/root.txt
> Here bash will pipe the contents of /var/adm/messages to grep, which
> will extract all the lines containing the word 'root' and place them in
> the /tmp/root.txt file.  Bash itself cares not about the contents of
> /var/adm/messages, doesnt reformat it, doesnt store it in an
> intermediate database, then re-extract it from the database, reformat it
> once again, and finally pump out the /tmp/root.txt file according to
> some xml dtd.  Neither should Loci, in its most abstracted form.
> Instead,the data conversions and XML operations should be the modular
> extensions to Loci that we provide as valuable options for the end-user,
> so that Loci becomes not just a graphical 'bash', but a sophisticated
> distributed data processing system.

You said it so well!

> Not that a graphical bash wouldn't
> be nice:  the gnome dudes have talked about using Loci's graphical shell
> to do just that!  Bottom line:  maximum abstraction + maximum
> modularization = maximum flexibility = maximum power!

So, Loci is more like a graphical bash + some nifty programs to go with it.

Regarding processable data format conversions, a bash command might work like

  echo data.fasta | fasta2xml | bioxmlview.py

Does bash need to know ANYTHING about biological data???

                      |           J.W. Bizzaro           |
                      |                                  |
                      | http://bioinformatics.org/~jeff/ |
                      |                                  |
                      |           THE OPEN LAB           |
                      |    Open Source Bioinformatics    |
                      |                                  |
                      |    http://bioinformatics.org/    |

More information about the Pipet-Devel mailing list