[BiO BB] Matching and Filtering -- try grep
Dmitri I GOULIAEV
dig at bioinformatics.org
Mon Nov 17 07:12:13 EST 2003
Hi, Pooja Jain !
On Mon, Nov 17, 2003 at 11:15:10AM -0000, Pooja Jain wrote:
> I am having a txt file with a list of accession numbers for few of the
> seqeuence from entire Arabidopsis thaliana genome. I have another tab
> delimited txt file with all the accession numbers and other details about
> every sequence peresent in the genome of it (row wise). From this later
> file I want to filter the details about only those sequences which have
> the same accesion numbers as in the former file.
> Could some one please suggest some simple way to do this matching and
> filtering? I tried using the simple shell scripts commands like cmp and
> diff but none of them worked. Is ther any other command I can use with the
> shell. Any other way to do so with perl is also welcome.
From man pages:
grep, egrep, fgrep - print lines matching a pattern
You should use grep.
file-with-a-list is a txt file with a list of accession numbers
file-with-all-the-details is the other file,
then this shell one-liner
user at host$ cat file-with-a-list \
| while read AN ; do \
grep "^$AN" file-with-all-the-details ; \
done >> file-with-the-details-for-the-listed-accnum
should work for you (if the accession numbers are at the beginning of the lines in the "other" file). If they are not, but there are some white-space characters at the beginning of each lines, then change "^$AN" to "[ \t]$AN" (with quotation marks).
Hope this helps,
DIG (Dmitri I GOULIAEV) http://www.bioinformatics.org/~dig/
1024D/63A6C649: 26A0 E4D5 AB3F C2D4 0112 66CD 4343 C0AF 63A6 C649
More information about the BBB