PyMsXML |
 |
 |
Introduction
PyMsXML is a python script for converting vendor specific mass
spectrometry data files for Applied Biosystems' Q-Star, 4700, Mariner,
and Voyager mass spectrometers from their raw binary form, to either
of the emerging XML file formats for mass spectra: mzXML, from the
Sashimi Glossolalia project of the Institute for Systems Biology (ISB);
and mzData, from the Proteome Standardization Initiative (PSI) project
of the Human Proteome Organization (HUPO).
PyMsXML uses installed vendor software under Windows to access the
proprietary raw mass spectra file format, interfacing to the vendor
supplied libraries via the supplied COM interface. Unlike other
software solutions that use this approach, PyMsXML is written in a
free, open-source language called Python. As such, no installation of
Microsoft Visual C++ or Visual Basic is necessary to use, alter, or
improve the PyMsXML script.
PyMsXML is easily extended for new instruments and vendor software,
and for new, or changed, XML file formats. The code that interfaces to
the vendor software is decomposed from the code that formats the data
as XML, as such, the addition of new instrument capability need not
re-write the XML data format code. Similarly, as new XML file formats
emerge, the code that interfaces with the instrument software need not
change.
Installation
- Download and install the latest version of ActiveState ActivePython for Windows.
- Start the Pythonwin IDE (All Programs -> ActiveState ActivePython
2.4 -> Pythonwin IDE). From the Tools menu, select the "COM Makepy
utility" entry. In the popup window, select "ExploreDataObjects 1.0
Type Library (1.0)" to build a python interface to Analyst's COM
libraries for reading .wiff files, select "IDAExplorer 1.0 Type
Library (1.0)" to build a python interface to Data Explorer's COM
libraries for reading ".dat" and ".t2d" files. If you have both pieces
of software, repeat this step for each software package. Click OK.
- Check the installation of COM library interfaces. If any of these
tests are unsuccessful, then PyMsXML will be unable to read the
corresponding raw datafiles.
For Analyst, these commands at the Pythonwin IDE command-line
(copy-and-paste!) should elicit similar responses:
>>> from win32com.client import Dispatch
>>> Dispatch('Analyst.FMANSpecData')
<win32com.gen_py.ExploreDataObjects 1.0 Type Library.IFMANSpecData instance at 0x14421558>
>>> Dispatch('Analyst.FMANChromData')
<win32com.gen_py.ExploreDataObjects 1.0 Type Library.IFMANChromData instance at 0x14418408>
For Data Explorer, these commands at the Pythonwin IDE command-line (copy-and-paste!) should elicit similar responses:
>>> from win32com.client import Dispatch, gencache
>>> Dispatch('DataExplorer.Application',resultCLSID='{3FED40F1-D409-11D1-8B56-0060971CB54B}')
<COMObject DataExplorer.Application>
>>> gencache.EnsureModule('{06972F50-13F6-11D3-A5CB-0060971CB54B}',0,4,2)
<module 'win32com.gen_py.06972F50-13F6-11D3-A5CB-0060971CB54Bx0x4x2' from 'C:\Python24\lib\site-packages\win32com\gen_py \06972F50-13F6-11D3-A5CB-0060971CB54Bx0x4x2.py'>
- Download and unpack the PyMsXML scripts and examples. Download
PyMsXML. After unzipping PyMsXML, edit the file pymsxml.cmd to
point to your Python installation (usually C:\Python24\python.exe)
and your PyMsXML installation.
Usage
PyMsXML consists of a single python script. A windows cmd file wrapper is provided, to take care of calling the python interpretor appropriately.
- pymsxml [ options ] raw-spectra-data-file
-
Options:
-
- -R raw-format, --rawdata raw-format
- Valid raw-format values: wiff, qstar, t2d, ab4700, voyager, mariner, mzXML. Optional if raw-spectra-data-file ends in .wiff, .t2m, or .mzXML.
- -X xml-format, --xmlformat xml-format
- Valid xml-format values: mzXML (ISB), mzData (HUPO). Optional if output-file ends in .mzXML or .mzData.
- -o output-file, --output output-file
- Name of output file. If omitted, and xml-format is supplied, then the output file is inferred by changing the file extention of raw-spectra-data-file to xml-format.
- -p ms-levels, --peaks ms-levels
- Apply (vendor library) peak detetion to spectra with level in ms-levels (comma separated). QStar (MS/MS spectra only) raw format, 4700 raw format only. Default: 2.
- -f filter-spec, --filter filter-spec
-
Filter output scans by their meta-data. Filters are specified as a
comma-separated list of filter tokens. Each filter token is specified
as field.comparison.value. field must be
an attribute of the scan object. comparison must be one of
eq, ne, lt, le, gt, or
ge, specifying =, ≠, <, ≤, >, and ≥
respectively.
- -V xml-version, --version xml-version
-
XML version. mzXML only. Valid values: 2.1, 2.2, and 3.0. Default: 3.0
- -z, --compress_peaks
- Compress mzXML peaks data using zlib. mzXML version ≥ 3.0 only. Default: False.
- -Z compress-format, --compress compress-format
- Compress output file. Valid values: gz. Default: None, unless output file ends with .gz, then gz.
- -d, --debug
- Debug. Output XML for first 10 spectra only. Truncate spectral data, too. Useful to verify that the output is formatted correctly.
- -h, --help
- Help.
Applied Biosystems Q-Star Spectra
The raw spectra data files for the ESI spectra from Applied
Biosystems' Q-Star instruments are usually extracted as ".wiff"
files. These can be opened using Applied Biosystem's Analyst or
BioAnalyst programs. PyMsXML uses Analyst's support libraries
to extract mass spectra from these files.
Applied Biosystems Mariner, Voyager, 4700 Spectra
The raw spectra data files for the MALDI spectra from Applied
Biosystems' Mariner, Voyager and 4700 instruments are usually
extracted as ".t2d" or ".dat" files. These can be opened using Applied
Biosystem's Data Explorer program. PyMsXML uses Data Explorer's
support libraries to extract mass spectra from these
files. These file formats store very little meta-data in addition to
the mass spectrum. As such, additional information must be supplied in
a meta-data text file, which is supplied on the command-line as raw-spectra-data-file.
The meta-data file is most easily constructed in Excel and saved as
tab-separated-values, but it can be formed by hand too, if
desired. Each line of the meta-data file specifies a record,
describing the MALDI plate, the plates' spots, and the scans acquired
from these spots. A short-cut record, that defines the plate and spot
naming convention is also provided.
The plate definition record consists of the word PLATE (case
insensitive) in the first column, followed by alternating key-value
pairs in subsequent columns. Particular key-value pairs do not need to
be specified in any particular order. The following keys must be
provided:
plateID, spotXCount, spotYCount,
plateManufacturer, and plateModel. The
plateID value is referenced by the spot and scan definition
records. The spotXCount is the number of MALDI spots in
the horizontal dimension (integer). The spotYCount is
the number of MALDI spots in the vertical dimension (integer). The
plateManufacturer and plateModel values are inserted verbatim in the output XML.
The spot definition record consists of the word SPOT (case
insensitive) in the first column, followed by alternating key-value
pairs in subsequent columns. Particular key-value pairs do not need to
be specified in any particular order. The following keys must be
provided:
plateID, spotID, spotXPosition,
spotYPosition, and maldiMatrix. The
plateID value must be defined by some plate definition
record. The spotID is referenced by the scan definition
records. The spotXPosition is the horizontal position of the
spot on the plate (integer). The spotYPosition is the
vertical position of the spot on the plate (integer). Spot positions
can be numbered beginning at 0 or 1. The
maldiMatrix value is inserted verbatim in the output XML.
The scan definition record consists of the word SCAN (case
insensitive) in the first column, followed by alternating key-value
pairs in subsequent columns. Particular key-value pairs do not need to
be specified in any particular order. The following keys must be
provided:
plateID, spotID, filename, and index.
The
plateID must be defined by some plate definition record. The
spotID must be defined by some spot definition record. The
filename is the name of the ".dat" or ".t2d" file containing
the corresponding scan's spectrum. The index is the ordinal
of the corresponding spectrum in the provided file. Spectra within
files should be referenced beginning at 1.
To alleviate some of the tedium with specifying the spot definition records, a shortcut plate definition record is provided. The platedef definition record consists of the word PLATEDEF (case insensitive) in the first column, followed by alternating key-value
pairs in subsequent columns. Particular key-value pairs do not need to
be specified in any particular order. The following keys must be
provided:
plateID, plateManufacturer, plateModel, spotNaming, and maldiMatrix.
The
plateID value is referenced by the spot and scan definition
records. The plateManufacturer and plateModel are
used to identify the properties of the MALDI plate. Currently, only
the values ABI / SCIEX and 01-192+06-BB are
recognized, but others are easily added on request. The "ABI / SCIEX
01-192+06-BB" plate consists of 8 rows of 24 spots (plus 6 calibration
spots). The spotNaming must be one of alpha,
parallel, or antiparallel. At this time, only
alpha is implemented. The alpha spot attribute
constructs spotID values for all spots on the plate with the
row specified by a letter from A to H, and the column specified by a
number from 1 to 24. The
maldiMatrix value is assumed the same for each spot and is
inserted verbatim in the output XML.
NOTE: The order of scan definition lines is important! MS/MS
spectra must appear immediately after the MS spectrum containing their
precursors.
Example meta-data files are provided in the distribution. example1.t2m explicitly defines the MALDI plate, spots and scans. example2.t2m is equivalent, but uses the platedef keyword shortcut.
Examples
Convert the Q-Star example.wiff file to mzXML format, placing the output in the file qstar-example.xml.
C:\PyMsXML\example\wiff> pymsxml.cmd -R wiff -X mzXML -o qstar-example.xml example.wiff
Convert the Q-Star example.wiff file to mzData format, placing the output in the file example.mzData.
C:\PyMsXML\example\wiff> pymsxml.cmd -X mzData example.wiff
Convert the AB4700 spectra listed in example1.t2m and example2.t2m to mzXML format, placing the output in the files example1.mzXML and example2.mzXML.
C:\PyMsXML\example\t2d> pymsxml.cmd -X mzXML example*.t2m
Release Notes
The Analyst COM libraries seem to have trouble with long pathnames. If you consistently have trouble getting PyMsXML to read ".wiff" files, try moving the files to a shorter directory path.
Credits
Development of PyMsXML was significantly helped by the
open-source Visual Basic source code from the MzStar program of the
ISB glossolalia project.
|