Bioinformatics
Jannotatix
 

How to write a configuration file for Jannotatix

subtitle

Title abbrev

This text describes step-by-step how to include a new algorithm into Jannotatix. We will write an .alg-file for Jannotatix. The example uses a local binary executable, pratt. We think that it is easier to follow an example than to read a formal syntax for the file format.

General layout of Algorithm Description Files (alg)

Note

We will assume in the following that you already have some notion of XML-terminology. If you don't, then it might be helpful to have a quick look on any of the XML-tutorials on the web, like this one.

Algorithm Description Files (alg) are simple XML-Files that consist of five parts:

  • A header, which states name and category of the algorithm. It is used to build menus and when tagging features that originate from this algorithm.

  • General information about the algorithm, like the author's name, the article that describes it, its website, etc. This information is just displayed to the user to give him some background about the algorithm. It is optional, this part of the XML-file can be completely left out.

  • How to invoke the algorithm. Currently, this can be either a local binary executable file or a website that accepts http-post-requests and returns some result.

  • A list of all parameters that the algorithm accepts, their type (number, text,...), description and help (In the future, it will include combinations of parameters that are not allowed). From this part, dialogues are built that are shown to the user who then can choose the parameters or can request additional information.

  • A list of parsers that are used to convert the algorithm's results back to Jannotatix in GFF-format. Every entry of the list is an (improved) regular expression that contains named groups, a concept borrowed from Python. So you write a regular expression and mark certain parts of it to be extracted into the final GFF-file (the format of the results).

In the following, we will write a file for the PRATT-algorithm. So we go to ~/JannotatixPlugins or to C:\Documents and Setting\(yourname)\Jannotatix and create a file named pratt.alg. Now we fire up our favorite editor and start typing:

The Header

This is by far the simplest and quickest part. We just write:

  <?xml version="1.0" encoding="UTF-8"?>
  <AlgorithmDescription AlgorithmName="Pratt" Category="Motif Discovery">
  

The first line is the usual XML blurb and has to be specified. The second line is the top element of our file which has two required attributes: Name and category of the algorithm.

General info

We give some general info about our algorithm that will be shown to the user when he chooses the algorithm on the interface

  
  <Info>
    <FullName>PRATT - Protein </FullName>
    <Description>Pratt was designed for proteins but </Description>
    <Authors></Authors>
    <HomepageUrl></HomepageUrl>
    <PubmedId></PubmedId>
    <ArticleFulltextUrl></ArticleFulltextUrl>
    <Availability>Source for Unix, Website</Availability>
    <LicenseFilename>license.txt</LicenseFilename>
    <PackageUrl
    OS="Linux">http://ftp.bioinformatics.org/jannotatix/pratt.zip</PackageUrl>
    <PackageUrl
    OS="Windows">http://ftp.bioinformatics.org/jannotatix/pratt.zip</PackageUrl>
  </Info>
  

Most of this information is optional and the meaning of the tags are pretty obvious, however, you're well advised to specify the PackageUrl-tag. This will be the address where you are going to store this file on the Internet later, so all users can download them from within Jannotatix's plugin manager. Create PackageUrl-directives for all operating systems that you are going to support, even if they all point to the same file in the end.

Invocation

Let's assume that we downloaded the PRATT program and compiled it statically on Windows with the MinGW-Compiler (which, as opposed to Cygwin allows us to compile non-open-source software as well) and compiled it on a Linux machine ( good old "make"+Return is sufficient for either case). Now we just have to tell Jannotatix how to run the programs. So we write the following:

  
  <LocalFileInvocation LocalBaseDir="PRATT">
    <FileName OS="Linux">pratt</FileName>
    <FileName OS="Windows">pratt.exe</FileName>
  </LocalFileInvocation>
  
  

This will tell Jannotatix to look on which operating system it is currently running and will then run the right program.

Note

If your algorithm refers to sequences only by number instead of their names from the FASTA-file you would have to add the attribute ResolveSeqNameNumber="true" to the LocalFileInvocation-tag. This will try to resolve any numbers in sequence fields to sequence names. If you don't know what this means, simply continue the tutorial, it shouldn't be very important for most algorithms.

The Arguments

We will now indicate all parameters that our PRATT-executable accepts. Take this paragraph from PRATT's documentation, for instance:


Command line:

          Pratt <format> <filename> [options]
where <format> is one of
fasta
swissprot
and <filename> is
the name of a file containing the sequences in the given format

So we know that the first parameter will always be "fasta", since Jannotatix can only export to fasta. So we write:


  <Arguments>
  <ConstantArgument Parameter="fasta">
  <ShortDescription>Choose fasta as format</ShortDescription>
  <Value></Value>
  

The second parameter has to be the filename of the sequences that were exported from Jannotatix. So we add to our configuration file:


  <InfileArgument Parameter="">
      <Value></Value>
  </InfileArgument>

  

There is no parameter before the InfileArgument, therefore we just leave this attribute blank. The <Value>-tag is needed, unfortunately, at the moment. But it is also useful, in case that we ever want to fill in real values here (see later on).

TODO: Parsers for PRATT

by Maximilian Haeussler