[BiO BB] FW/Re: Fasta convertion in large EST assemblies
Bossers, A.
A.Bossers at id.dlo.nl
Wed Apr 2 01:31:52 EST 2003
Dear all,
thanks for the quick replies and help with the fasta conversion problem. I
already started fiddling around in perl to convert the fasta files into
files acceptable to tgicl for EST assembly. But Eitan provided the most
simpel solution in his one line perl 'script' that exactly did what I
needed. BIG THANKS. The script just gets rid of all stuff after the filename
(as long as no spaces are in the filename) and preserves all sequence or
quality info behind it. His solution is below.
I still don't get why tgicl does't accept files in allowed fastA format. But
I don't bother anymore. My EST assembly is one step further.
Thanks again to all people sending me perl solutions!
Alex
-----Oorspronkelijk bericht-----
Van: Eitan Rubin [mailto:Eitan.Rubin at weizmann.ac.il]
Verzonden: dinsdag 1 april 2003 20:28
Aan: A.Bossers at ID.DLO.NL
Onderwerp: Fasta convertion
Hi,
If I am not mistaken, you question is "how do I convert format A below to
format B". If this indeed what you need, the following should do the trick:
perl -pe 's/^>(\S+).*/>$1/;' old_format_file > new_format_file
Format A:
==========
>seqname1 some text with spaces
ACGTAGACTGACT..
>seqname2 some other text etc
ACGATCGATAGCT
Format B:
========
>seqname1
ACGTAGACTGACT..
>seqname2
ACGATCGATAGCT
Eitan
------------------------------------------------------------------------
Eitan Rubin, PhD
Head of Bioinformatics and Biological Computing
Dept. Biological Services
Weizmann Institute of Science
Tel: +972-8-9343456
Fax: +972-8-9346006
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030402/3516bd34/attachment.html>
More information about the BBB
mailing list