[Biodevelopers] Another point of discussion: XML

Joe Landman landman at scientificappliance.com
Fri May 3 10:08:37 EDT 2002


(warning: long, lots of points for potential discussion) 

On Fri, 2002-05-03 at 05:09, martin goodson wrote:
> 
> 
> >parses the document.  Of course, as I am using SOAP, I am using XML
> >implicitly for the structure of my object transport layer.
> 
> 
> Hi Joe
> what is your feeling for the performance of SOAP and how much data are you 
> transferring in one go?
> Martin

Hi Martin:

 I am using SOAP via SOAP::Lite (http://www.soaplite.com) perl module. 
I am using the standalone web server that you can do with SOAP::Lite on
the server side, and my own client.  Note:  You can find all the modules
I mention for Perl at CPAN (http://search.cpan.org).

 What I am doing is writing a module with a simple interface, which
allows me to completely abstract data motion, so my apps developers dont
have to think about data motion if they do not want to.  I will give
them the knobs to turn if they want them, but otherwise, for those who
simply want it to work without thinking really hard, I have that working
quite well. 

 For small LWP based web servers (SOAP::Lite sits atop and abstracts
LWP::Daemon), I have found limits of about 4 MB for data transfers
before the web server starts choking.  It does much of its work in RAM,
as it was designed to be a small web server.

  What I did is to build effectively an XMODEM-like (e.g. trivial file
transfer protocol atop an unreliable medium) layer to pass a header
packet with the filename, complete MD5 sum (for CRC checking, quite
useful), the block size, the number of blocks, then large data packets
of about 1 MB each, followed by a stop packet.  The server side gets the
header, builds the file object, and then receives packets one at a time,
populating the file, and comparing MD5's of the packet received with the
MD5 from the client transmitting the packet. The stop codon has the full
file MD5 (again), and that has been computed on the fly using
Digest::MD5 from all the inbound packets.

  I have not done extensive performance studies yet, but on slow
machines I can hit about 0.5 MB/s transfer rates, and on fast machines I
can hit 2-3 MB/s.  For a client and server on the same machine, I hit
about 5 MB/s.  The wire is not the limiting factor.  It appears that it
is a) the processing speed of the packet decoding (the SOAP layer uses
the older expat XML parser from J. Clark, and is not known for speed) b)
the processing speed of the MD5 computation, and c) the physical IO
time.

  I would like to make this multithreaded, but our group has seen some
strange bugs with threaded perl (5.6.x) on SMP machines.  Bugs marked as
"wont-fix", which cause spurious errors.

  I am thinking of trying a python server here, as python should have
somewhat better threading, and I have been looking for a good excuse to
play with it.  Does anyone have opinions as to the quality of SOAP or
XML implementation in Python?  It is very good in Perl, and I would like
to know if there is a similar broad level of support in Python.

  I could use the Apache mod_soap (has anyone played with that?), and I
think the performance would change.  I am not sure how large a POST or
GET I can do under Apache and not have it choke.

> >How are you/arent you using this or related technologies?

  I have become smitten with SOAP :)  It makes interop and
communications tremendously easy.  It is portable, and language
invariant (you can write SOAP calls in practically any language).  It is
quite easy to use it correctly (which if you think about technology, is
odd, as most technology has a steep learning curve).

  SOAP sits atop XML and related specs.  I am using SOAP as a data and
object transport layer.  As my clients and servers can be in a variety
of languages, I need to make sure I am not shipping objects with methods
back and forth... just the data.  The beauty of XML is that it is easy
to represent very complex data structures within it.  That is similar to
what I like about Perl.  

  One of the things I am working on is a SOAP based interface to the
perl DBI.  I dont really like the way the DBI is set up... it is
actually quite painful to use for those whose job it is not to be
database programmers, but occasional database users.  So I wrote a nice
(very perl-ish) abstraction layer over it, that makes using DBI actually
simple (it is not hard to figure out what I am doing here...)

    my $jobdb = $db->open_database 
      		(
    		 'database' => "jobdb",
    		 'autocommit' => 0
    		);
    .
    .
    .
     							     
    %query = $db->query_records_from_table 
    	    (
           	     'handle' => $jobdb,
    	     'table'  => $table,
    	     'search' => { 
    			   'status' => "complete",				
    			 }
           	    );      
    .
    .
    .
       
    $db->update_record_in_table
        (
          'handle' => $jobdb,
          'table'  => $table,  
          'search' => {
    		    'job_number' => $query{$i}->{job_number}
    		  },
          'record' => {
    		   'returned_to_submitter' => "yes"
    		  }
        );

These are code snippets which illustrate the layer.  I am currently
working on making these SOAP calls to a remote DB server implementing
the same interface.  I do not expect this SOAP-ized version to require
coding changes, just a module change.  

The database interactions are represented at this moment by an anonymous
hash passed to the database routine.   This hash is a complex data
structure.  The beauty of this is that I can take that hash, do
something like 

    my $encoded = SOAP::Data->type(base64 => $buffer);
    
to convert buffer a a base64 encoded representation (so the parser wont
try to parse the complex object), and then use a SOAP method to pass it
over to the remote server.  Or, I could use the XML::Simple module and
do a


    use XML::Simple;
    my $xml = XMLout(\%complex_hash);

and send the complex hash back and forth.


Joe




More information about the Biodevelopers mailing list