Contents Next

Chapter 1   Introduction

1.1   What is Biopython?

The Biopython Project is an international association of developers of freely available Python (http://www.python.org) tools for computational molecular biology. The web site http://www.biopython.org provides an online resource for modules, scripts, and web links for developers of Python-based software for life science research.

Basically, we just like to program in python and want to make it as easy as possible to use python for bioinformatics by creating high-quality, reusable modules and scripts.

1.1.1   What can I find in the biopython package

The main biopython releases have lots of functionality, including:

We hope this gives you plenty of reasons to download and start using Biopython!

1.2   Obtaining Biopython

Biopython's internet home is at, naturally enough, http://www.biopython.org. This is the home of all things biopython, so it is the best place to start looking around if you are interested. When you feel ready to dive in and start working with the code, you have three choices:

  1. Release code -- We made available both stable and developer's releases on the download page (http://www.biopython.org/Download/). The stable releases are likely to be more well tested, while the development releases are closer to what is in CVS, and so will probably have more features. The releases are also available both as source and as installers (rpms and windows installers, right now), so you have some choices to pick from on releases if you prefer not to deal with source code directly.

  2. CVS -- The current working copy of the Biopython sources is always available via CVS (Concurrent Versions Systems -- http://www.cvshome.org/). Concise instructions for accessing this copy are available at http://cvs.biopython.org.
Based on which way you choose, you'll need to follow one of the following installation options. Read on for the information you are interested in.

1.3   Installation

1.3.1   Installing from source on UNIX

Biopython uses Distutils, which is the new standard python installation package. Copies are available at http://www.python.org/sigs/distutils-sig/download.html and also comes standard with Python 1.6 and beyond. Distutils will make installation a snap, as you will see in a second.

Now that we've got what we need, let's get into the installation:

  1. First you need to unpack the distribution. If you got the CVS version, you are all set to go and can skip on ahead. Otherwise, you'll need to unpack it. On UN*X machines, a tar.gz package is provided, which you can unpack with tar -xzvpf biopython-X.X.tar.gz. A zip file is also provided for other platforms.

  2. Now that everything is unpacked, move into the biopython* directory (this will just be biopython for CVS users, and will be biopython-X.X for those using a packaged download).

  3. Now you are ready for your one step install -- python setup.py install. This performs the default install, and will put Biopython into the site-packages directory of your python library tree (on my machine this is /usr/local/python2.1/site-packages). You will have to have permissions to write to this directory.

    1. This install requires that you have the python source available. You can check this by looking for Python.h and config.h in some place like /usr/local/include/python1.5.

    2. The distutils setup process allows you to do some customization of your install so you don't have to stick everything in the default location (in case you don't have write permissions there, or just want to test Biopython out). You have quite a few choices, which are covered in detail in the distutils installation manual (http://www.python.org/sigs/distutils-sig/doc/inst/inst.html), specifically in the Alternative installation section.

  4. That's it! Biopython is installed. Wasn't that easy? Now let's check and make sure it worked properly. Skip on ahead to section 1.4.

1.3.1.1   Installation on FreeBSD

Johann Visagie has been kind enough to create (and keep updated) a FreeBSD port of Biopython. Thanks to the wonders of the ports system, this means that all you need to do to install Biopython on FreeBSD is do the following as root:

# cd /usr/ports/biology/py-biopython
# make install
And voila! It's installed.

If you want more information on FreeBSD and things, Johann has written a nice primer for his FreeBSD EMBOSS port. This has lots of generally useful information, such as how to keep your ports tree up to date. If you are new to FreeBSD, you should definately check it out at ftp://ftp.no.embnet.org/pub/EMBOSS-extras/EMBOSS-FreeBSD-HOWTO.txt.

1.3.2   Installing from source on Windows

This section deals with installing the source (i. e. from CVS or from a source zip file) on a Windows machine. Much of the information from the UNIX install applies here, so it would be good to read section 1.3.1 before starting. Also, a little warning -- I (Brad) am writing these instructions based on very limited experience with Windows; I am basically a UNIX geek. So if you know more about Windows and want to add/correct things in this section, please feel let us know!

I have successfully managed to use distutils to compile Biopython with Borland's free C++ compiler (available from http://www.inprise.com/bcppbuilder/freecompiler/). It should also be possible with other Distutils supported compilers (please provide info if you've done this!).

  1. Borland C++ compiler

Now that you've got everything installed, skip on ahead to section 1.4 to make sure everything worked.

1.3.3   Installing using RPMs

Warning. Right now we're not making RPMs for biopython (because I stopped using an RPM system, basically). If anyone wants to pick this up, or feels especially strongly that they'd like RPMs, please let us know.

To simplify things for people running RPM-based systems, biopython can also be installed via the RPM system. Additionally, this saves the necessity of having a C compiler to install biopython.

Installing Biopython from a RPM package should be much the same process as used for other RPMs. If you need general information about how RPMs work, the best place to go is http://www.rpm.org.

To install it, you should just need to do:

rpm -i your_biopython.rpm
To see what you installed try doing rpm -qpl your_biopython.rpm which will list all of the installed files.

RPMs do not install the documentation, tests, or example code, so you might want to also grab a source distribution, so you can use these resources (and also look at the source code if you want to).

1.3.4   Installing with a Windows Installer

Installing things on Windows with the installer should be really easy (hey, that's why they've got graphical installers, right?). You should just need to download the Biopython-version.exe installer from biopython web site. Then you just need to double click and voila, a nice little installer will come up and you can stick the libraries where you need to. No need for a C compiler or anything fancy.

This does not install the documentation, tests, example code or source code, so it is probably also a good idea to download the zip file containing this so you can test your installation and learn how to use it.

1.3.5   Installing on Macintosh

Biopython code should work like a charm on the Macintosh, using the MacPython distribution. I (Brad) am not a big Mac user, but have had good luck using several on the modules on the Macintosh.

Basically, installation should be very easy. You need to download either the biopython-version.tar.gz or biopython-version.zip file from the download page, and unpack these. This can be done with tools such as Aladdin's Stuff-It expander. It will unpack into a directory called biopython-version. If you open up this directory, you will find the main directory of modules, called Bio. You should then open up your python installation (which should be in some place like Macintosh HD::Python2.0) to the directory Lib::site-packages, and copy the Bio directory there by dragging it. Bam! You're done! By default, site-packages is included in your PYTHONPATH, so you should be ready to use it.

Some notes: Obviously this will not compile any of the C extensions in biopython. There are pure python implementations of all of these extensions, though, so you shouldn't need to worry about lack of functionality, only lack of speed. Jack Jansen (the MacPython god) has made patches to distutils which allow it to work on the Mac with the Metrowerks CodeWarrior compiler. I don't have this compiler (it costs money, oh no!), so I can't speak of how well it works. If anyone who codes more on the Mac has more information, I would be very happy to include it here.

1.4   Making sure it worked

First, we'll just do a quick test to make sure python. The most important thing is that python can find the biopython installation. Biopython all installs into a top level Bio directory, and you want to make sure this directory is specified in on your $PYTHONPATH environmental variable. If you used the default install, this shouldn't be a problem, but if not, you'll need to set the PYTHONPATH with something like export PYTHONPATH = $PYTHONPATH':/directory/where/you/put/Biopython' (on UNIX). Now that we think we are ready, fire up your python interpreter and follow along with the following code:


[chapmanb@taxus chapmanb]$ python
Python 1.6a2 (#1, Jul 31 2000, 09:04:26)  [GCC 2.95.2 19991024 (release/franzo)] on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> from Bio.Seq import Seq
>>> new_seq = Seq('GATC') 
>>> new_seq[0:2]
Seq('GA', Alphabet())

If this worked properly, then it looks like Biopython is in a happy place where python can find it, so now you might want to do some more rigorous tests. The Tests directory inside the distribution contains a number of tests you can run to make sure all of the different parts of biopython are working. These should all work just by running python test_WhateverTheTestIs.py.

You can also run all of the tests using a nice graphical interface supplied by using PyUnit. To do this, you just need to be in the installation directory and type:

python setup.py test
This should start up a Tk based graphical user interface (or default to the command line if you don't have Tkinter installed), which you can run the tests from. You can also run them by typing python run_tests.py in the Tests directory.

Well, now you've gotten Biopython installed and running, you are probably ready to get working with it, so continue reading...

1.5   FAQ

  1. I looked in a directory for code, but I couldn't seem to find the code that does something. Where's it hidden?
    One thing to know is that we put code in __init__.py files. If you are not used to looking for code in this file this can be confusing. The reason we do this is to make the imports easier for users. For instance, instead of having to do a ``repetitive'' import like from Bio.GenBank import GenBank, you can just import like from Bio import GenBank.

  2. What happened to the br_regrtest.py regression tests? We updated the regression testing framework to use PyUnit, and also to fix newline problems. br_regrtest.py is still there, but almost all of its functionality has been moved (well, copy and pasted) to run_tests.py.

  3. Why do some of the tests fail when running the regression tests with output like:
    Writing: '\012', expected: '\015'
    This shouldn't happen any more! We updated the regression testing suite so that it uses PyUnit and we hopefully have fixed newline problems. Please let us know if any tests fail.

Contents Next