[Biodevelopers] NCBI XML

Joe Landman landman at scalableinformatics.com
Thu Jan 16 00:51:32 EST 2003


Hi Alex:

  I agree.  It seems that the XML output is the result of walking a
tree, and automatically emitting an XML tag regardless of whether or not
there are leaves at the node or more branches.  

  When done wrong, XML can be very verbose.  When done right, it can be
merely verbose...  I wonder if this is why Ewan Birney has been somewhat
... ah ... muffled in his XML exuberance. 

  The other problem for structured documents of this nature is that the
size of them almost precludes real parsing efforts.  A parser is going
to build up data structures which represent the content of the document,
and these structures should be of comparable size to the document in
various cases.  

  We probably need to start looking at things differently in the file
systems, and handling the output somewhat differently (and more
succinctly).

On Thu, 2003-01-16 at 00:11, Alex Milowski wrote:

> Thanks.
> 
> Sheesh... this stuff is really verbose.  For example, the follow 
> encodes a
> single date instance *every* time you use a date:
> 
> 
>      <Date>
>        <Date_std>
>          <Date-std>
>            <Date-std_year>2002</Date-std_year>
>            <Date-std_month>8</Date-std_month>
>            <Date-std_day>14</Date-std_day>
>          </Date-std>
>        </Date_std>
>      </Date>
> 
> 
> Why not:
> 
>     date="2002/8/14"
> 
>     or
> 
>     <date>2002/8/14</date>
> 
>     or
> 
>    <date year="2002" month="8" day="14"/>
> 
> ?
> 
> Then this XML wouldn't be over 3GB in size!  Actually, 8GB... but I 
> stripped the
> ignorable whitespace (5GB of pretty printing...)
> 
> People can't possibly use this data in XML format...
> 
> Alex Milowski                FAX: (707) 598-7649                        
>   alex at milowski.com
> 
> "The excellence of grammar as a guide is proportional to the paucity of 
> the
> inflexions, i.e. to the degree of analysis effected by the language
> considered."
> 
> Bertrand Russell in a footnote of Principles of Mathematics
> 
> 
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers




More information about the Biodevelopers mailing list