[BiO BB] Matching and Filtering -- try grep- thanks
pooja at igc.gulbenkian.pt
Mon Nov 17 13:30:25 EST 2003
Hi Dmitri I Gouliaev ,
Thank you for your suggestion. I followed the grep man pages and used
grep -f and it worked.
grep -f 'file1.txt' file2.txt > file3.txt
Where file1.txt has the list of accession numbers corresponding to which I
would like to filter the details from file2.txt. But the above command
writes the contents of the file2.txt to file3.txt.
> Hi, Pooja Jain !
> On Mon, Nov 17, 2003 at 11:15:10AM -0000, Pooja Jain wrote:
>> I am having a txt file with a list of accession numbers for few of the
>> seqeuence from entire Arabidopsis thaliana genome. I have another tab
>> delimited txt file with all the accession numbers and other details
>> every sequence peresent in the genome of it (row wise). From this later
>> file I want to filter the details about only those sequences which have
>> the same accesion numbers as in the former file.
>> Could some one please suggest some simple way to do this matching and
>> filtering? I tried using the simple shell scripts commands like cmp and
>> diff but none of them worked. Is ther any other command I can use with
>> shell. Any other way to do so with perl is also welcome.
> From man pages:
> grep, egrep, fgrep - print lines matching a pattern
> You should use grep.
> file-with-a-list is a txt file with a list of accession numbers
> file-with-all-the-details is the other file,
> then this shell one-liner
> user at host$ cat file-with-a-list \
> | while read AN ; do \
> grep "^$AN" file-with-all-the-details ; \
> done >> file-with-the-details-for-the-listed-accnum
> should work for you (if the accession numbers are at the beginning of the
> lines in the "other" file). If they are not, but there are some
> white-space characters at the beginning of each lines, then change "^$AN"
> to "[ \t]$AN" (with quotation marks).
> Hope this helps,
> DIG (Dmitri I GOULIAEV) http://www.bioinformatics.org/~dig/
> 1024D/63A6C649: 26A0 E4D5 AB3F C2D4 0112 66CD 4343 C0AF 63A6 C649
> BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org
More information about the BBB