Translate Genbank nucleotide files
[Menu-bar>Protein>Nucleotide sequence]
1 AAACATGGCG CTGGCTAGCG TGTTGCAGCG ...
51 ACGGGTTTTT TGGGCTCGGA GGTGGTGCAG ...
101 GGGAGTCCTG GTGATGGGCT GAGCCTAGCC ...
...
The Genbank file format and Embl file format are widely used file
format for annotated nucleotide sequences (format specifications:
http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html and
ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt and
http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html).
When a Genbank file is loaded into STRAP the entire nucleotide
sequence will be extracted and appear in the multiple sequence
alignment pane.
These nucleotides can be translated into
amino acids using the CDS-sequence-features within the protein file.
The Genbank file contains fields with the keyword "CDS" (coding sequence).
The text following "CDS" tells the translation direction and the translated nucleotides.
The sequence may be continuous or interrupted by introns.
The following example of a CDS field is from the file NCBI_NT:5757659.
CDS 5. .799
/gene="Psmb5"
/codon_start=1
/product="proteasome subunit X"
/protein_id="AAD50536.1"
/db_xref="GI:5757659"
/translation="MALA ...VSVP"
If the Genbank file contains several genes the user can chose one of them.
Sometimes Genbank files already contain the translation in a field
/translation. The sequence computed by STRAP should be identical to
this sequence.
When the project is saved to disk an additional file will be created with
the ending ".dna" coding the translation direction (forward/reverse
complement) and the translated and un-translated positions.