fmtseq: A BioSeq File Format Converter

This applet implements the base functionality of the fmtseq format conversion program (which is an extension of Don Gilbert's readseq program). Just paste in one of the input formats (see the popup choice), select the transformations and output format, and hit the convert button.

The applet should be straightforward, except possibly for the "* (Any)" choice of input format, the Normal, Raw and Strip modes, and the "FASTA-old" output format. The "* (Any)" choice tells the applet to automatically determine the file format, which it can do for all of the input formats.

The Normal, Raw and Strip mode give better control about what characters appear in the output sequences (above and beyond the input to output gap character setting). In Raw Mode, all sequences, for input and output, are assumed to consist of all characters except digits and spaces. In Strip Mode, all sequences are assumed to consist only of the alphabetic characters. Thus, you can leave in or strip out all gap and annotation characters using one of these two modes.

In Normal Mode, the input and output sequences for the GenBank, EMBL, Swiss-Prot and PIR formats are assumed to consist only of alphabetic characters (since the format is primarily used for databank storage of sequences only), and the input and output sequences for the other formats are assumed to consists of all characters except digits and spaces (since they commonly contain alignments and other annotation characters). Depending on the particular conversion, the non-alphabetic characters may or may not be stripped from the sequences. Play around with the three modes and the transformation of the gap input and gap output characters until you get the sequence you want.

The "FASTA-old" output format is included in the applet in order to generate FASTA formatted sequences for programs that can't handle the full FASTA format (particularly, any comment lines beginning with ';' appearing in the sequence). The FASTA-old format is guaranteed to have one and only one header line, followed by the sequence lines where the width of each sequence line is the same (except possibly for the last line). Thus, this format is compatible with the format required by the BLAST preprocessing programs setdb and pressdb.

The source.
Author: James Knight
Address: jknight@curagen.com at CuraGen Corporation