[Biodevelopers] NCBI XML

Alex Milowski alex at milowski.com
Thu Jan 16 01:05:16 EST 2003


On Wednesday, January 15, 2003, at 09:51 PM, Joe Landman wrote:

>   The other problem for structured documents of this nature is that the
> size of them almost precludes real parsing efforts.  A parser is going
> to build up data structures which represent the content of the 
> document,
> and these structures should be of comparable size to the document in
> various cases.
>
>   We probably need to start looking at things differently in the file
> systems, and handling the output somewhat differently (and more
> succinctly).
>

Part of my interest is that I've been working on event-parsing schemes
for XML that should be of good use in this area.  There are lots of
useful things you can do in an event-oriented environment where you
only look at small subtrees at any point in time.  This would then
allow you to traverse a large document (i.e. genome data), doing 
whatever
you do, without have to try to "load" it into some data structure first.

I've just found BSML [1] so I'm going to take a look at that to see if
it is any better.

[1] http://www.bsml.org     Bioinfomatic Sequence Markup Language

Alex Milowski                FAX: (707) 598-7649                        
  alex at milowski.com

"The excellence of grammar as a guide is proportional to the paucity of 
the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics





More information about the Biodevelopers mailing list