Table of Contents

Module: SGMLExtractor Bio/SGMLExtractor.py

Code for more fancy file handles.

Classes: SGMLExtractorHandle File object that strips tags and returns content from specified tags blocks.

SGMLExtractor Object that scans for specified SGML tag pairs, removes any inner tags and returns the raw content. For example the object SGMLExtractor( [ h1 ] on the following html file would return "House that Jack built' SGMLExtractor( [ dt ] ) would return ratcatdogcowmaiden SGMLExtractor( [ dt, dd ] ) would return rat that ate the malttcat ate the rat etc

<h1>House that Jack Built</h1> <dl> <dt><big>rat</big></dt> <dd><big>ate the malt</big></dd> <dt><big>cat</big></dt> <dd><big>that ate the rat</big></dd> <dt><big>dog</big></dt> <dd><big>that worried the dats</big></dd> <dt><big>cow</big></dt> <dd><big>with crumpled horn</big></dd> <dt><big>maiden</big></dt> <dd><big>all forlorns</big></dd> </dl>

Imported modules   
import StringIO
import os
import sgmllib
import string
Functions   
is_empty
  is_empty 
is_empty ( items )

Classes   
SGMLExtractor
SGMLExtractorHandle

A Python handle that automatically strips SGML tags and returns data from


Table of Contents

This document was automatically generated on Mon Jul 1 12:02:43 2002 by HappyDoc version 2.0.1