SWeBLAST (Sliding Window WeB-based BLAST) is a command line program written in Perl for use on Windows, Linux and Mac. Its main component, SWeBLAST.pl, divides a query sequence, or sequences, supplied in a FASTA format, into a series of subsequences of chosen length and overlap, and sends these successively to GenBank’s BLAST facility. The search results are collated into a single file, BLAST.TXT, in a sub-folder. This is analyzed by the second component, SWeBLAST_parser.pl, and the results put into a sub-folder as a series of CSV files to be examined using a standard spreadsheet program, such as EXCEL.
The programs require Perl to be installed on the computer. If not already installed, the libwww-perl-xxxx library has to be downloaded from the CPAN website.
Download the SWeBLAST.ZIP file and extract in a single folder, add the FASTA file of the sequence(s) to be analysed. Open a command line window and navigate to the folder containing the SWeBLAST package.
To look at the available arguments or options, one can run the program using the option -help
> perl SWeBLAST.pl -help
|-help||or -h :||Show this message.|
|-input||or -i :||Input (data) file.|
|-output||or -o :||Output folder (OPTIONAL).|
|-window||or -w :||Window length default 100 (OPTIONAL).|
|-step||or -s :||Step length - default 50 (OPTIONAL).|
|-database||or -d :||Genbank Database to be searched - default nr (OPTIONAL).|
|-program||or -p :||Program to be used : blastn, blastp, blastx - default blastn (OPTIONAL).|
|-expect||or -e :||EXPECT (OPTIONAL).|
|-entrez||or -q :||ENTREZ QUERY (OPTIONAL) Boolean command to limit BLAST search|
(e.g. “virus NOT potato virus Y [ALL]”).
|-parse||or -r :||to call SWeBLAST_parser.pl (OPTIONAL).|
|-cutoff||or -c :||discards hits with a greater e-value (OPTIONAL).|
or use the interactive menu by running the program without arguments and follow the prompt:
> perl SWeBLAST.pl
> perl SWeBLAST.pl -p blastn -d nr -i dna.fasta -o myRESULT -w 100 -s 50 -r -c 1.0
This will automatically start the parser and produce a Blast.txt and Seq.fas files in the sub-folder DNA.
The parser may be re-run using a smaller cutoff with the following command:
>perl SWeBLAST_parser.pl -i DNA/blast.txt -c 0.01
The files from the first parsing will be overwritten if the sub-folder is not renamed
Note: The Entrez query must be written between double quotes as well as any arguments containing spaces such as the input file path.
The output files are:
- seq.fas: A Fasta format file of the sub-sequences that were sent to BLAST as query sequences.
- BLAST.txt: A file containing a concatenation of all the BLAST output files
- evalue.csv: A table of the e-values (exponential term only) for all the reported matches found by the BLAST program. Each row of this table corresponds to a sequence in the Genbank database found by BLAST to have matched with any of the submitted subsequences. Column A lists the matched sequences, and the other columns in the table correspond to the successive slices of the query sequence. In the body of the table are the reported e-values.
- rank.csv: This table is like ‘evalue.csv’, but records the rank position of each match in the BLAST results for each sub-sequence submitted to BLAST.
- stat.csv: The rows in this table also correspond to each database sequence found to match the query sequence, and these are in the same order as in the ‘evalue.csv’ and ‘rank.csv’ files. The first column of the file again records the names of the matched sequences, the second column records the number of matches obtained with all slices of the query sequence, and subsequent columns contain a series of statistics of the evalues and rankings obtained by the database sequence with every slice of the query sequence; their minima, maxima, ranges, medians, means and standard deviations.
- summary.csv: This table differs from the others in that each row contains the results for one subsequence of the query sequence. Column A gives the name and position of the slice, and successive columns then give the names of the sequences with which it matched and the evalue of that match.
- frame.csv: Generated only when blastx is used. A table of the frames for all the reported matches found by the BLAST program. Each row of this table corresponds to a sequence in the Genbank database found by BLAST to have matched with any of the submitted subsequences. Column A lists the matched sequences, and the other columns in the table correspond to the successive slices of the query sequence. In the body of the table are the reported frames.