Table of Contents

Module: NLMMedlineXML Bio/Medline/NLMMedlineXML.py

NLMMedlineXML.py

This module provides code to work the NCBI's XML format for Medline.

Functions: choose_format Pick the right data format to use to index an XML file. index Index a Medline XML file. index_many Index multiple Medline XML files.

Imported modules   
from Bio.ParserSupport import *
from Bio.Tools import MultiProc
import Martel
import os
import types
from xml.sax import handler
Functions   
choose_format
index
index_many
  choose_format 
choose_format ( data )

choose_format(data) -> module

Look at some data and choose the right format to parse it. data should be the first 1000 characters or so of the file. The module will contain 2 attributes: citation_format and format. citation_format is a Martel format to parse one citation. format will parse the whole file.

Exceptions   
AssertionError, "I could not identify that format."
  index 
index ( handle,  index_fn=None )

index(handle[, index_fn]) -> list of (PMID, MedlineID, start, end)

Index a Medline XML file. Returns where the records are, as offsets from the beginning of the handle. index_fn is a callback function with parameters (PMID, MedlineID, start, end) and is called as soon as each record is indexes.

  index_many 
index_many (
        files_or_paths,
        index_fn,
        nprocs=1,
        )

index_many(files_or_paths, index_fn[, nprocs])

Index multiple Medline XML files. files_or_paths can be a single file, a path, a list of files, or a list of paths.

index_fn is a callback function that should take the following parameters: index_fn(file, event, data)

where file is the file being indexed, event is one of "START", "RECORD", "END", and data is extra data dependent upon the event. "START" and "END" events are passed to indicate when a file is being indexed. "RECORD" is passed whenever a new record has been indexed. When a "RECORD" event is passed, then data is set to a tuple of (pmid, medline_id, start, end). Otherwise it is None. start and end indicate the location of the record as offsets from the beginning of the file.

Exceptions   
ValueError, "I can't find %s" % f
Classes   
Citation

Holds information about a Medline citation.

CitationParser

Parses a citation into a Record object.

_IndexerHandler

Handles the results from the nlmmedline_format. Saves the begin

_SavedDataHandle

Table of Contents

This document was automatically generated on Mon Jul 1 12:02:51 2002 by HappyDoc version 2.0.1