The Record class is meant to make data easy to get to when you are
just interested in looking at GenBank data.
Attributes:
o locus - The name specified after the LOCUS keyword in the GenBank
record. This may be the accession number, or a clone id or something else.
o size - The size of the record.
o residue_type - The type of residues making up the sequence in this
record. Normally something like RNA, DNA or PROTEIN, but may be as
esoteric as ss-RNA circular
.
o data_file_division - The division this record is stored under in
GenBank (ie. PLN -> plants; PRI -> humans, primates; BCT -> bacteria...)
o date - The date of submission of the record, in a form like 28-JUL-1998
o accession - list of all accession numbers for the sequence.
o nid - Nucleotide identifier number.
o pid - Proteint identifier number
o version - The accession number + version (ie. AB01234.2)
o db_source - Information about the database the record came from
o gi - The NCBI gi identifier for the record.
o keywords - A list of keywords related to the record.
o segment - If the record is one of a series, this is info about which
segment this record is (something like 1 of 6
).
o source - The source of material where the sequence came from.
o organism - The genus and species of the organism (ie. Homo sapiens
)
o taxonomy - A listing of the taxonomic classification of the organism,
starting general and getting more specific.
o references - A list of Reference objects.
o comment - Text with any kind of comment about the record.
o features - A listing of Features making up the feature table.
o base_counts - A string with the counts of bases for the sequence.
o origin - A string specifying info about the origin of the sequence.
o sequence - A string with the sequence itself.
o contig - A string of location information for a CONTIG in a RefSeq
file.
Methods
|
|
|
|
__init__
|
__init__ ( self )
|
|
__str__
|
__str__ ( self )
Provide a GenBank formatted output option for a Record.
The objective of this is to provide an easy way to read in a GenBank
record, modify it somehow, and then output it in GenBank format.
We are striving to make this work so that a parsed Record that is
output using this function will look exactly like the original
record.
Much of the output is based on format description info at:
ftp://ncbi.nlm.nih.gov/genbank/gbrel.txt
|
|
_accession_line
|
_accession_line ( self )
Output for the ACCESSION line.
|
|
_base_count_line
|
_base_count_line ( self )
Output for the BASE COUNT line with base information.
|
|
_comment_line
|
_comment_line ( self )
Output for the COMMENT lines.
|
|
_contig_line
|
_contig_line ( self )
Output for CONTIG location information from RefSeq.
|
|
_db_source_line
|
_db_source_line ( self )
Output for DBSOURCE line.
|
|
_definition_line
|
_definition_line ( self )
Provide output for the DEFINITION line.
|
|
_features_line
|
_features_line ( self )
Output for the FEATURES line.
|
|
_keywords_line
|
_keywords_line ( self )
Output for the KEYWORDS line.
|
|
_locus_line
|
_locus_line ( self )
Provide the output string for the LOCUS line.
|
|
_nid_line
|
_nid_line ( self )
Output for the NID line. Use of NID is obsolete in GenBank files.
|
|
_organism_line
|
_organism_line ( self )
Output for ORGANISM line with taxonomy info.
|
|
_origin_line
|
_origin_line ( self )
Output for the ORIGIN line
|
|
_pid_line
|
_pid_line ( self )
Output for PID line. Presumedly, PID usage is also obsolete.
|
|
_segment_line
|
_segment_line ( self )
Output for the SEGMENT line.
|
|
_sequence_line
|
_sequence_line ( self )
Output for all of the sequence.
|
|
_source_line
|
_source_line ( self )
Output for SOURCE line on where the sample came from.
|
|
_version_line
|
_version_line ( self )
Output for the VERSION line.
|