Preparing document for printing:
When printing, backgrounds are automatically turned to white.

Metannogen


Overview

Metannogen is a customizable graphical work-bench to facilitate curation of biological networks for a single user or for a team of data curators. It consists of a graphical browser for metabolic networks and an editor for user defined collections of biochemical equations. Metannogen can be used in two ways:
  1. Reconstruction of metabolic networks
  2. Annotation of existing metabolic networks given as SBML

1. Annotation of SBML-files

Metannogen allows to load biological models as SBML. These files are not changed by Metannogen, but Metannogen lets the user add information to reactions. The annotation may be free text which does not affect the exported SBML or it can provide information which is included into the exported SBML file.
  1. Free text annotations not influencing the model: The user can add notes, comments including hyper-references, links to pubmed IDs etc.
  2. Annotations that are included into the output SBML: The output SBML is generated from the read-only network SBML by inserting XML code. Attributes are written like Perl-variables into the annotated text. XML Tag-structures are enclosed into "XML_BEGIN" and "XML_END" marks.
The Annotations may be accessible by only one user or by many users over the Internet.

2. Reconstruction of metabolic networks

Reconstruction of metabolic networks is the first step of rk_modelling and comprises collection and curation of biochemical reactions. The program design allows efficient use of existing knowledge in form of already published metabolic networks for the reconstruction of new metabolic networks. One advantage is that biochemical reaction can be directly copied to the user data. The resulting datasets can be modified: For example the biochemical equation and compartmentalization can be changed and Web-links, notes, remarks and variable declarations can be added.

Context-menu

The items in the browsable trees have a context-menu (right-click) which give access to specific program functions.

Cross-linking by ID

Datasets may have the same ID as a reaction in a loaded immutable network. In this case the dataset and the reaction are linked together. The dataset acts as an annotation of the reaction.

Cross-linking by equivalence of the reaction equation

Reactions and datasets are also linked if they have the same biochemical equation. All reactions and datasets with equivalent biochemical equations are cross-linked. Reactions with a linked dataset are marked with a green (reaction exists) or red (reaction is not part of the model) traffic light sign. Unfortunately, different authorities such as KEGG, the Palsson group, ChEBI, HepatoNet and Reactome prefer different identifier systems for metabolites. Nevertheless, tables of equivalent metabolite identifiers can be provided such that reactions are recognized to be identical even if they are written with different identifiers. The equals relation for biochemical equations is important for the graphical user interface. Metannogen uses the graphical pathway maps from KEGG. In the pathway maps, reactions and metabolites can be highlighted by and quantitative data such as flux strength can be shown. Metannogen can be run without graphical user interface in a non-interactive manner to generate output files as part of a shell script.

Program Features

The user interface follows conventions of typical graphical applications and implements advanced GUI concepts like browsable trees, , , rch and highlighting, -check and automatic Web links. Metannogen does not need to be installed because it can be directly started by clicking the Java-Web-start-button on the Metannogen web site. Using a central data store, several people can work simultaneously on the same project. To evaluate Metannogen, the demo database can be selected. The button to select the demo database is found in the start frame. The demo database is configured to accept simultaneous changes without password, whereas real projects should be password protected. Metannogen supports interprocess communication via socket connection to work in concert with other applications. is an XML based language for representing models of biological processes. It can be imported by many modelling programs.

Central storage of datasets

A Metannogen dataset contains a biochemical equation and expert opinion edited by the curator. The datasets can either be stored in a local file on the PC or on a central HTTP server such that several investigators can work on the same data. When Metannogen is started with GUI the start dialog appears. The access data for a central data repository is entered into the top panel of the start dialog. In case the user wants to store the data locally, then the absolute file path is entered in the bottom part of the dialog.

For using a central data repository, the user may create a repository on the www.bioinformatics.org-server. He will need to write down the access date which needs to be entered into the multi-line text-field. Alternatively, it is possible to store data on any other web server. This requires installation of a PERL-script. The following may be used as a template:
    http://www.bioinformatics.org/strap/metannogen/metannogen.perl.txt.
    
Input form for server URLs. The first line contains the server URL followed by white space and the URL for single datasets. The URL of a single dataset has an asterisk as the placeholder for the dataset ID. It is important in a multiuser environment to avoid concurrent modification by different users. The second line is ignored because it starts with a "#".
MetannogenMain#newPnlServer()


Server address: The multi-line text-field holds the URL of the PERL-script. The first valid URL is the master server. Optional, secondary URLs in text lines below may serve as Backup servers, as they receive the same data from Metannogen. The server can be tested with the Web-browser: Visiting this address in the Web-browser should show a file with all datasets. For example visit the demo server address:
http://www.bioinformatics.org/strap/cgi-bin/metannogenDemo.pl
Lines that are commented out by a leading "#" are ignored.
Address of one single dataset The server address may optionally be followed in the same line by an expression denoting the URL for single datasets. The asterisk of this mask stands for the dataset ID. The URL of a single dataset is formed by replacing the asterisk by the ID. For example consider the URL mask of the demo server:
http://www.bioinformatics.org/strap/metannogen/demo_datasets/*.dataset
Replace the asterisk by a dataset ID such as "R08689" and visit the resulting address in the Web-browser: This address allows Metannogen to observe the "Last-Modified" and "Date" attributes of the current dataset. If a dataset is uploaded by another investigator while it is modified, a warning message appears. The goal is to prevent data loss when datasets are modified simultaneously by different users.

Password: For password protected repositories, the server URL may contain a password in the form &passwd=xxxx. If a the password is not contained in the URL, the user can specify a password before uploading. If the password is not validated by the PERL script on the server, the user will not be allowed to change datasets.

Metannogen dataset

The datasets contain the data entered manually by the user. They have two different functions depending on whether Metannogen is used for Network reconstruction or whether it is used for SBML-file annotation. For Network reconstruction it contains all reactions of the reconstructed model. When annotating an SBML file, it is an annotation of that reaction in the SBML model, that has the same ID. A dataset contains a biochemical reaction written with metabolite identifiers.

ID: Each dataset should have a unique identifier. If the ID is matching the ID of a reaction, then the dataset acts as an annotation of the reaction.

Creating datasets: An empty dataset can be created from the dataset-menu (Menu-bar>Datasets>MetannogenStatics#CMD_NEW_EMPTY_DATASET). A panel with the dataset form is opened (see figure dataset-view). It is also possible to use a reaction of one of the loaded networks as a template such that the identifier field and the equation field will be filled. The respective menu item is found in the (Right click) for reactions.
Figure: Snapshot of the graphical dataset view. The top panel contains tool-buttons and text fields. The biochemical equation contains identifiers such as C01089 instead of metabolite names. For convenience, the user can type names which are converted to identifiers. The large text area can contain comments and notes in free text form.
MetannogenStatics#docuMetannogenDatasetView()


User interface: A dataset view is opened when the respective tree node is activated in the tree view or with the respective context menu. A dataset has several fields containing text. The different fields of the dataset are manipulated by appropriate graphical elements such as single and multi-line text fields, choice menus and toggle buttons. Usually, a dataset describes a biochemical reaction in one or several compartments. For unspecific reactions and unspecific transporters: See Brace Expansion of the biochemical reaction.

PUBMED links: Like all x-refs also Pubmed identifiers in the text area such as "PMID16020471" act as hyper-links. A prompt display of Pubmed abstracts (and Uniprot entries) is implemented using a cache: When the mouse pointer hovers over a Pubmed identifier, the abstract is downloaded. Once it is downloaded, it is always shown without delay when the mouse is over the Pubmed identifier and then displayed. The same applies to Uniprot identifiers. PDFs can be associated to Pubmed Identifiers using the context-menu (right-clicking the Pubmed-ID).

As text highlighting capabilities of document viewers like Evince or Acrobate are still limited, advanced highlighting features have been implemented for abstracts as well as for publication full texts within the Metannogen environment: The user can define a set of text patterns to be highlighted in the publication text (Menu-bar>Options>MetannogenStatics#BUT_LAB_highlightInPubmed). These patterns apply to all datasets. Dataset specific catch words to be highlighted can be defined by declaring a variable $HL=" pattern1 #00FF00 pattern2 ... " in the comment text field. Each token my be preceded by a color written in hexadecimal HTML-syntax. For example #00FF00 denotes the color green.

Internal links: In the comment text of a dataset, references to other datasets can be included such that the referenced dataset is opened when the link is clicked. The links are typed in the form DATASET#datasetIdentifier. In addition, paragraphs in the comment text field can be referenced. This is useful, when two datasets share parts of the same text note. The paragraph must have at least a BEGIN#myLabelText anchor of the form BEGIN#myLabelText and may optionally be terminated with an end anchor like END#myLabelText. A tagged paragraph can be referenced. The corresponding references to this paragraph have the form #myLabelText or INCLUDE#myLabelText. In a similar way links of the form NETWORK#R00149 act as an hyper-link to all reactions (KEGG, Recon1, EHMN) with the given ID. Any metabolite identifier or any metabolite name has a context menu.

Additional GUI elements: It is possible to add additional GUI elements which correspond to specific columns in the Tab-separated Metannogen file format. These customizations must be performed by the project manager with sufficient programming skills. See: Menu-bar>Options>AbstractDatasetView#BUT_LAB_changeSource.

Additional additional dataset fields: The above approach requires programming skills. A much simplier way to structure the data is to use variable declarations in the comment text field. Example:
 $myVariable="my text"
It is possible to define an input form with variable declarations. A form may contains contains free text with embedded empty variable declarations. The curator can fill in the form by adding text to the empty declarations. The form is prepared with a text editor. No programming skills are required. The file path or URL of the form file is provided as a command line option.

Starting Metannogen - command line options

Metannogen requires Java version 1.5 or higher. It consists of a single -file metannogen.jar. Further data-files or program parts are automatically downloaded at runtime from the Web-server. Therefore the computer needs to be connected to the internet. The file metannogen.jar can be started in two different ways:
  1. Using the java-command:
      java -Xmx200M -jar metannogen.jar  [options]
    The option -Xmx200M increases the maximum amount of memory allocated by the Java Virtual Machine to 200 MBytes.
  2. Using (): The command javaws acts on the file metannogen.jnlp. The JNLP file may be inspected with any text editor. A modified copy can be saved. It contains the command line parameters and the the value for the maximum memory.
      javaws http://www.charite.de/bioinf/strap/metannogen/metannogen.jnlp

Web proxy
In some institutions all Internet connections are going through a Web proxy (see ). In this case the proxy must be set correctly, otherwise downloading of files from the Web will fail. Detailed explanation is found in "Menu-bar>Options>Internet settings" or by clicking the button "Test Web Proxy" in the start frame of Metannogen.

Command line parameters of Metannogen
Metannogen takes a number of optional command line parameters. If Metannogen is started using the Java-Web-Start mechanism, the command line options are defined in the JNLP file. Some command line options are followed by one or more text files. Files can be given as a relative or absolute file path or in form of an URL.

General format of dictionaries:
For the syntax of most dictionaries two options exist:
  1. Lines with Tab-separated entries: If the dictionary file contains tabulator characters then Metannogen assumes that each line contains a key-value pair such that both are separated by tabulator. The first column in the tab-separated file contains the keys and the second column the values.
    This can be altered with a suffix of the URL or file path. For example appending "(3,4)" to the url, the key will be read from the 3rd and the value from the 4th column (numbering starting with "1").
  2. Lines with Space-separated entries: However, if the file does not contain any tab, then Metannogen assumes that white space is the separator of keys and values. With space as separator, lines with different keys but identical values might be combined to one line. The advantage is, that the resulting file is more compact. For example the two lines:
          C00124  C00962
          C01582  C00962
        
    can be written as one line (If the file does not contain any tab-character.)
          C01582 C00124  C00962
        

Metannogen for network reconstruction - non-graphical mode

This chapter explains, how a network reconstructed with Metannogen can be exported using different file formats like SBML. This does not apply to the case that metannogen is used to annotate an SBML file. Though the file export function is accessible in the GUI (Menu-bar>File), the prefered way is to run Metannogen as a command line tool without GUI. This allows Metannogen to be included in shell scripts. Complex pipelines can be constructed to directly couple Metannogen with data analysis. The option "-datasets" and an option like "-toSBML1" are required. The other options are optional. The following command line options are only important under very special circumstances:

Specifying the command line options jnlp file: The command line options are specified in the jnlp file that is behind the Web-start button. This file need to be modified when other program options are required. It can be downloaded to the Desktop and edited with a text editor. When the file is clicked on the Desktop Metannogen is started.

Brace Expansion of the biochemical reaction

This paragraph applies to the case that the Metannogen is used for reconstruction of metabolic networks but not for the annotation of SBML files. Some enzymes and transport carriers are unspecific. To avoid creation of many similar datasets, can be used. Brace expansion allows to define several similar biochemical conversions within one single text string. If the reaction string entered by the curator contains curly braces then Metannogen applies Brace expansion to determine the set of reactions resulting from the dataset. The result are several biochemical reactions each having a different metabolite from the group enclosed in curly braces. The number of resulting equations is the number of metabolites within braces. If there are several groups of metabolites enclosed in braces then every i-th metabolite within each group belongs together. If the number of elements is not the same in all groups then an error message is written out. Example for transport reaction
{ Alanine Proline Glutamine }@cyto  <=> { Alanine Proline Glutamine }@mito
is expanded to
Alanine@cyto  <=> Alanine@mito
Proline@cyto  <=> Proline@mito
Glutamine@cyto  <=> Glutamine@mito
Example for transaminase reaction
{Alanine Aspartate} + Alpha-Ketoglutarate   <=> { Pyruvate Oxalacetate} + Glutamate
is expanded to
Alanine   + Alpha-Ketoglutarate   <=>  Pyruvate    + Glutamate
Aspartate + Alpha-Ketoglutarate   <=>  Oxalacetate + Glutamate
If the second parenthetical group is identical to the first one, then it can be abbreviated by "{}". This is usually the case for transport reactions. There must not be space between both braces:
{ Alanine Proline Glutamine }@cyto  <=> {}@mito


Advanced: Double brace expansion: A second independent group of metabolites can be included in double curly {{ ... }} brackets. The number of resulting reactions corresponds to the ct of the two sets.

Text-Pane short-cuts

The multi-line text-pane is derived from the standard Java text-pane and provides additional functionality. Shortcuts with the CTRL key: The control key is locateted at the left on the keyboard is labeled "Control", "Ctrl" or "Strg", IKIde:Steuerung_(Taste). The text-pane receives the key-strokes only if it has the input focus. Click to set focus. Function keys: For editable text fields the following features may or may not be active:

Hyperlinks

In most text views URLs like http://www.google.de or database references like 12166070 or PUBMED{fructose glucose} or EC classes like "1.2.2.1" can be clicked to open the selected item in the browser or document viewer. To specify another Web-browser hold the CTRL-key while clicking. The list of database links can be customized: Customize#webLinks, Customize#databases, Customize#proteinDatabases.

Context Menus

The context menu is opened by right-clicking a word. Pubmed links like PMID2840859 have a specific context menu.

Auto-word-completion

available for certain text views. Typing the beginning of word and hitting the Tab-key or alt-/ completes the word. In case of ambiguity, the desired word may come up by pressing the key several times. null

Context menus and Balloon text

As the mouse is moved over an identifier in an editable or read only text-field, the name of the respective object is shown. For example moving the mouse over the metabolite identifier "C00984" brings up a with "D-Galactose". Metannogen associates "C00984" with the "D-Galactose" if KEGG is loaded. The same happens if the mouse is moved over "gal" if a Palsson network has been loaded. Names can not only be found in loaded networks, but may also be provided by a file given after the command line option "-names".

Identifiers or names of metabolites, reactions and datasets have a . This context menu allows to locate the object in the object hierarchy of the network and to highlight it in the graphical pathway map. Related reactions and datasets can be opened via the context menu.

Browsable Tree

Several browsable trees are available in different tabs of the tabbed pane: The items in the trees have a context menu (right-click).
Multiple selection of tree nodes follows the conventions of Windows and X-Windows and requires the Shift key and the Ctrl key.

Importing biological Networks

Metannogen allows to load and browse finished networks like Kegg or Recon1. The immutable network data is shown as a hierarchical tree with expandable nodes: Pathway, reaction, EC-class, metabolites.

Datasets

The datasets are listed in two trees located in adjacent tabs:
  1. Datasets: Datasets are in flat lists: Several branches are available:
    • MetannogenStatics#NODE_DATASETS: This branch contains all datasets edited and modified by the user.
    • MetannogenStatics#NODE_DATASETS_KILL: This branch contains all datasets that have been deleted during the current session. Deleted datasets can be rescued with the menu-item "Duplicate focused dataset" in the dataset-menu or by clicking the IC_KILL button. All deleted datasets are finally lost after the session.
    • MetannogenStatics#NODE_ORPHAN_DATASETS: This branch contains all datasets that are not assigned to a reaction because the ID does not match to any reaction.
    • MetannogenStatics#NODE_TRANSPORTER_DATASETS: This branch contains all datasets with more that one compartment.
  2. By PW: Datasets are ordered by pathway. Only those datasets are included where a pathway is defined in the Pathway-text-field.

Figure Components of the Kegg network in a graphical tree. Reaction nodes could be expanded to see substrates and products which itself can be expanded to see all reactions for a metabolite.
MetannogenTree#docuTree()


Bugs: Sometimes the tree is not displaying any more. Workaround: "Redraw"-button.

Graphical KEGG pathway maps

The KEGG database provides graphical representations for several classical pathways. These pathway maps often resemble those in text books. They can be viewed in Metannogen by opening the context menu. The pathway maps are interactive i.e. the reactions and metabolites have a balloon message and a context menu. It is possible to view quantitative data of reactions on the maps.

Export Data

The export-Dialog is opened from the file-menu. It allows The Dialog is organized as a tabbed pane with a tab for each category. Each tab has a button to perform the specific action.
Figure: Snapshot of the script dialog. Customization of network scripts and exports at source code level. The source code is opened in a text editor. The changed code is active as soon as the modified text is saved to hard disk.
charite.christo.metannogen.MetannogenUserScripts#newPanel():width: 30em;
The user can modify the source code. For understanding the source code only three Java interfaces and one Java class are important. They are similar but much more simple that libSBML.

Executing shell command

This Dialog allows to apply shell commands to selected text regions and object IDs. It provides one way of linking other programs. The list of shell scripts can be customizable. The user can chose the shell script to be applied to a certain text-string from a browsable tree. This hierarchical tree contains the menu and sub-menus and shell-scripts and Web addresses.
Figure Dialog to apply shell scripts or Web addresses to selected text. Each node in the tree represents a shell script which can be started by left click. The text in the text box "Argument for shell scripts" replaces the placehoder in the script (asterisk). Users can define own scripts.
ExecByRegex#screenshot()


The list of shell-scripts is customizable at run-time. Alternatively, the program option "-customizeAddScriptByRegex" can be used to specify a file with shell-scripts. Example:
   demo\Example google%09.*%09http://www.google.com/search?q=*
   demo\Example: Matches starting with M%09^M.*$%09 echo starts with letter M *
   
Each line associates a shell script (3rd column) to regular expression (2nd column). The asterisk is replaced by the text. The shell script is presented to the user as a menu item. The menu item path is given in the 1st column with back-slash being the separator. If the value in the 3rd column starts with http://, then it is not taken as a shell script but as a Web address.