gp_qs -i [-n [value]] [-m|-p] [-q] [-v] [-d] [-h] inputfile
gp_qs -n 2 ATCGATCG
will also find CGAT
.
The value parameter tells how many bases at respective ends can
differ from the sequence string.
gp_qs interface differs from most of the GP packages. To
make the every day usage simpler, sequence to be retrieved can be
directly put into the command line, and is only read from standard
input if the option -i is used. You cannot directly source the
search sequences from an input file; however this restriction can
be easily circumvented by typing something like cat some_file |
gp_qs -i search_file
.
Without the -i option, sequence string is mandatory. With the -i option, the inputfile containing the sequence(s) in which we are looking for our search sequence is mandatory.
Note that the sequences can have wildcards, so you can use, for example, NNNNNNN as the search string.
gp_qs TAATAT orfs.fasta
or
cat orfs.fasta | gp_qs TAATAT
cat promoter.seq | gp_qs -i large.database.fasta
All Genpak programs complain in situations you would also complain, like when they cannot find a sequence you gave them or the sequence is not valid.
The Genpak programs do not write over existing files. I have found this feature very useful :-)
I'm sure there are plenty left, so please mail me if you find them. I tried to clean up every bug I could find.
January Weiner III <january@bioinformatics.org>