Table of Contents

Module: __init__ Bio/GenBank/__init__.py
Code to work with GenBank
http://www.ncbi.nlm.nih.gov/

Classes: Iterator Iterate through a file of GenBank entries Dictionary Access a GenBank file using a dictionary interface. ErrorFeatureParser Catch errors caused during parsing. FeatureParser Parse GenBank data in Seq and SeqFeature objects. RecordParser Parse GenBank data into a Record object. NCBIDictionary Access GenBank using a dictionary interface.

_BaseGenBankConsumer A base class for GenBank consumer that implements some helpful functions that are in common between consumers. _FeatureConsumer Create SeqFeature objects from info generated by the Scanner _RecordConsumer Create a GenBank record object from Scanner info. _PrintingConsumer A debugging consumer.

_Scanner Set up a Martel based GenBank parser to parse a record.

ParserFailureError Exception indicating a failure in the parser (ie. scanner or consumer) LocationParserError Exception indiciating a problem with the spark based location parser.

Functions: index_file Get a GenBank file ready to be used as a Dictionary. search_for Do a query against GenBank. download_many Download many GenBank records.

Imported modules   
from Bio import Alphabet, File, Index, SeqFeature
from Bio.Alphabet import IUPAC
from Bio.GenBank import LocationParser
from Bio.ParserSupport import AbstractConsumer, EventGenerator
from Bio.Seq import Seq
from Bio.SeqFeature import Reference
from Bio.SeqRecord import SeqRecord
from Bio.WWW import NCBI, RequestLimiter
import Martel
from Martel import RecordReader
import Record
import genbank_format
import os
import re
import sgmllib
import string
import urlparse
import utils
from xml.sax import handler
Functions   
_strip_and_combine
download_many
index_file
index_file_db
search_for
  _strip_and_combine 
_strip_and_combine ( line_list )

Combine multiple lines of content separated by spaces.

This function is used by the EventGenerator callback function to combine multiple lines of information. The lines are first stripped to remove whitepsace, and then combined so they are separated by a space. This is a simple minded way to combine lines, but should work for most cases.

  download_many 
download_many (
        gis,
        callback_fn,
        broken_fn=None,
        db='Nucleotide',
        delay=127.0,
        batchsize=500,
        parser=None,
        )

download_many(gis, callback_fn[, delay][, batchsize])

Download many records from GenBank. gis is a list of Genbank Gi's. Each time a record is downloaded, callback_fn is called with the text of the record. delay is the number of seconds to wait between requests. Waits 127 seconds by default. abatchsize is the number of records to request each time. Default is 500 records, which is the maximum NCBI can handle.

This does not check to make sure all gi's are returned. The client must make sure that the gi's are valid. This may be implemented in the future.

  index_file 
index_file (
        genbank_file,
        index_file,
        rec_to_key=None,
        )

Index a GenBank file to prepare it for use as a dictionary.

Arguments: o genbank_file - The name of the GenBank file to be index. o index_name - The name of the index file which will be created. o rec_to_key - A function object which, when called with a GenBank record object, will return a key to be used for the record. If no function is specified, then the accession numbers will be used as the keys.

Exceptions   
KeyError( "Duplicate key %s found" % key )
KeyError( "Empty sequence key produced" )
ValueError( "%s does not exist" % genbank_file )
  index_file_db 
index_file_db (
        genbank_file,
        db_name,
        db_directory,
        identifier="locus",
        aliases=[ "accession" ],
        keywords=[],
        always_index=0,
        )

Index a GenBank file into a database for quick loading.

WARNING: This is very experimental and subject to change. It requires the use of Andrew Dalke's mindy.

This is very similar to index_file, but uses a database instead of a flat file to store the information about the genbank_file.

Arguments:

  • genbank_file - The GenBank formatted file that we want to index.

  • db_name - The name of the database to create. This name will allow you to retrieve the file later.

  • db_directory - The directory where the database information should be stored.

  • identifier - The primary identifier used to store records in the file under. This will be used for retrieving them later.

  • aliases - Secondary identifiers that point to the record. These can be used for searching if a primary identifier is not found. This is useful for GenBank since we'll index by a single identifier (the LOCUS identifier by default) but might want to search by some other identifier.

  • keywords - More advanced Mindy features that I'm not positive how to make full use of right now.

  • always_index - A flag indicating whether or not to index a file even if the file appears not to have changed. By default, the function will try to skip indexing if it thinks the file hasn't changed.

Exceptions   
SystemExit( "You must have mindy installed:\n" + "http://www.biopython.org/~dalke/mindy-0.1.tar.gz" )
  search_for 
search_for (
        search,
        database='Nucleotide',
        max_ids=500,
        )

search_for(search[, database][, max_ids])

Search GenBank and return a list of GenBank identifiers (gi's). search is the search string used to search the database. database should be either Nucleotide or Protein. max_ids is the maximum number of ids to retrieve (default 500).

Exceptions   
ValueError, "database must be 'Nucleotide' or 'Protein'"
Classes   
Dictionary

Allow a GenBank file to be accessed using a dictionary interface.

ErrorParser

Parse GenBank files and attempt to catch errors.

FeatureParser

Parse GenBank files into Seq + Feature objects.

Iterator

Iterator interface to move over a file of GenBank entries one at a time.

LocationParserError

Could not Properly parse out a location from a GenBank file.

MindyDictionary

Access a GenBank file using a dictionary interface, though a Mindy DB.

NCBIDictionary

Access GenBank using a read-only dictionary interface.

ParserFailureError

Failure caused by some kind of problem in the parser.

RecordParser

Parse GenBank files into Record objects

_BaseGenBankConsumer

Abstract GenBank consumer providing useful general functions.

_FeatureConsumer

Create a SeqRecord object with Features to return.

_RecordConsumer

Create a GenBank Record object from scanner generated information.

_Scanner

Start up Martel to do the scanning of the file.


Table of Contents

This document was automatically generated on Mon Jul 1 12:02:49 2002 by HappyDoc version 2.0.1