Tutorial: Metannogen Data Management
This advanced tutorial demonstrates how metabolic networks are generated from the
Metannogen datasets and exported as in text mode.
In text mode the graphical user interface is not involved. It allows integration into script pipelines.
If you want to annotate an existing SBML file Metannogen, then you need another tutorial:
Annotate SBML files.
If you just want to export a plain SBML file without additional information, you do not need this tutorial.
Just use Menu-bar>File>Export ... .
This tutorial explains how two types of information are merged to generate
the export file:
- Expert opinion stored in datasets
- Programmatically generated information
It does not explain, how an existing SBML file is annotated with Metannogen annotations.
This tutorial explains SBML output.
But since the program line option to -toSBML can be exchanged by another export option, this tutorial
stands for any output format.
SBML is a standard format for metabolic networks and the exported
SBML file is probably sufficient for your needs.
However, for self defined data formats
you will need to change the export plugin.
This is described in the tutorial Customize the export.
The tutorial requires basic PC knowledge of the command Echo.
Background
The Metannogen datasets hold the expert knowledge which is entered
by the data curators through Metannogen. Usually, datasets encode
one or several metabolic reactions and contain the evidence
in form of citations and database references that a certain reaction
exists in the described biological system.
Though the datasets are stored as flat text lines,
Metannogen is internally object oriented.
The object model is created from the dataset and from optional files
given as command line argument. It contains reaction and metabolite objects.
These two object types are exposed by the and can be referred to
by user defined scripts.
This API is also used to generate the output files in different
formats and allows the user to customize the output to her/his
specific needs.
There exist two SBML exporters. One creates SBML directly from the
reaction and species objects. The other first creates libSBML objects to use the export function of libSBML.
Metannogen provides only a few specific fields:
Datasets identifier, equation of the metabolic reaction, name of the curator, compartment, EC-class.
To enter information in a structured way for which no specific field in Metannogen exists,
variable declarations inside the multi-line text-field are used.
The variable declarations follow the
syntax.
The information is assigned to Reaction-objects where it can be retrieved with the
getAttribute("Name of variable") method.
This allows to use this information for the export of the network.
An increasing amount of information used for network simulations
may be automatically computed or extracted from existing data
sources. It is not recommended to store automatically generated
information together with manually curated information.
Instead it is recommended, to keep automatically generated
information in separate files. Metannogen is able to merge the
information from text files with those from datasets to generate
the final model.
Preparation
All green on black command lines can be directly executed in the
terminal.
It requires a compatible command line interpreter as included in modern computer systems such as
Macintosh and Linux.
MS-Windows users need to install .
The first command "mkdir" creates the directory "~/metannogenTutorial/" in the home directory.
The wget-line downloads Metannogen and the example dataset file.
On Macintosh the command for downloading files is not but
- please substitute "wget -N" by "curl -O".
mkdir ~/metannogenTutorial
cd ~/metannogenTutorial
wget -N http://www.bioinformatics.org/strap/metannogen/metannogen.jar
wget -N http://www.bioinformatics.org/strap/metannogen/tutorialData/myDatasets.datasets
Starting the graphical user interface
The file "myDatasets.datasets" contains a few datasets which can
be viewed and edited with Metannogen.
Start Metannogen by typing the following command and be prepared
that large amounts of data will be loaded when Metannogen is
run for the first time.
java -Xmx200M -jar metannogen.jar -networks KEGG
A form for selecting the dataset source appears.
When the program is started for the first time, the KEGG metabolic
network is downloaded (60 Megabyte) as specified by the command
line option "-networks KEGG ".
There are two radio buttons specifying whether datasets should be loaded from a file or from an URL.
Select "File" and
enter the file path:
~/metannogenTutorial/myDatasets.datasets
For your convenience, this text field offers tab-file-path completion.
When Metannogen is running, the KEGG database is displayed in the left panel.
Expand the Glycolysis tree node for a list of all reactions in the pathway Glycolysis.
Those with a traffic light have a corresponding dataset. These
datasets can be opened by clicking the tree nodes. Datasets can
also be accessed from the tab "Datasets".
Now leave the program. At this point modifications can be saved to the file "myDatasets.datasets".
This tutorial is about using the program in batch mode.
From now on we will not use the graphical features any more.
SBML Output
To reduce the length of the command lines it is recommended to define an alias with the constant part of the
command lines:
alias RunMetannogen="java -Xmx100M -jar metannogen.jar -stdout -networks KEGG -datasets myDatasets.datasets "
Now use the -toSBML option to create the SBML output.
RunMetannogen -toSBML output.sbml
This line produces the SBML file "output.sbml" from the dataset file "myDatasets.datasets". Please look at it with a text viewer to get an understanding of the XML structure.
less output.sbml
or
more output.sbml
Additional XML attributes
In the comment field of dataset R00235 the attribute "myAttribute" is defined.
Please view the myDatasets.datasets file and locate these variable declarations.
In the SBML-specification, "myAttribute" does not denote a specific attribute.
To include this non-standard attribute in the output the option "-useReactionAttributes" is used.
It takes a space separated list of attribute names.
RunMetannogen -toSBML output.sbml -useReactionAttributes myAttribute
fgrep myAttribute output.sbml
As a result, the attribute myAttribute defined in the dataset R00235 is included in the two localized
reactions of this dataset: Nucleus and mitochondrial matrix.
But attributes can also be defined in separate files.
This allows separation of computed data from manually curated data.
echo -e 'R00235\t $anotherAttribute="one more" $yetAnother="another"' > attributes.txt
RunMetannogen -toSBML output.sbml -useReactionAttributes myAttribute anotherAttribute yetAnother -attributesOverride attributes.txt
fgrep anotherAttribute output.sbml
It is also possible to confine attributes to one compartment. Example with mitochondrial matrix:
echo -e 'R00235\t $anotherAttribute@mitoMx="one more" $yetAnother="another"' > attributes.txt
RunMetannogen -toSBML output.sbml -useReactionAttributes myAttribute anotherAttribute yetAnother -attributesOverride attributes.txt
fgrep anotherAttribute output.sbml
To reduce typing, it is possible to specify more than one compartment suffices.
The following three declarations are equivalent. Obviously, the last variant is the shortest:
-
echo -e 'R00235\t $myAttribute@nuc="hello world" '> attributes.txt
echo -e 'R00235\t $myAttribute@mitoMx="hello world"'>> attributes.txt
-
echo -e 'R00235\t $myAttribute@nuc="hello world" $myAttribute@mitoMx="hello world"'> attributes.txt
-
echo -e 'R00235\t $myAttribute@nuc,mitoMx="hello world"'> attributes.txt
The attribute name "$HL" is reserved: It contains space separated
strings to be highlighted in publication abstracts.
Substitutions of Metabolites
Under certain circumstances some metabolites with
distinct identifiers might be considered as being identical. This is when
metabolite dictionaries come into play. Please consider D-glucose,
beta-D-glucose and alpha-D-glucose which have the KEGG identifiers C00031,
C00221 and C00267, respectively. These three forms are inter-converted
spontaneously by a process called
mutarotation.
In a stoichiometric network they might be considered as one pool.
This is achieved by representing all forms by only one identifier, here C00031.
The following creates the appropriate Hash map or (Syn. hash table or dictionary).
echo C00221 C00031 > dictGlucose.txt
echo C00267 C00031 >> dictGlucose.txt
Or if you like it compact. the following is equivalent:
echo C00221 C00267 C00031 > dictGlucose.txt
This hash-table is loaded with the option -dictionaryOfSpecies:
RunMetannogen -toSBML output.sbml -dictionaryOfSpecies dictGlucose.txt
fgrep "speciesType id" output.sbml
Watch the list of species in the output.sbml file. There should not be any C00221 and C00267 but only C00031.
Substitutions of Compartments
Imagine the nucleus and the cytosol should be treated as one compartment in the simulation.
We give the union of both compartments the name "cytoOrNuc"
echo cyto cytoOrNuc > dictCompart.txt
echo nuc cytoOrNuc >> dictCompart.txt
Again this be contracted to:
echo cyto nuc cytoOrNuc > dictCompart.txt
RunMetannogen -toSBML output.sbml -dictionaryOfCompartments dictCompart.txt
fgrep cytoOrNuc output.sbml
Customization of the SBML-output
The SBML-format is under permanent development to include novel data types
that are required by the Bioinformatics community.
Metannogen offers the possibility to adapt its export
functions to special needs.
Adapting the output format at source code level is relatively easy
and will be discussed in the tutorial Customize Export.