[BiO BB] gff to sequence

Mike Marchywka marchywka at hotmail.com
Sat Oct 3 08:59:10 EDT 2009


 <2c8757af0910030454t454facf1r1d083120aec1e41 at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0










----------------------------------------
> Date: Sat=2C 3 Oct 2009 12:54:51 +0100
> From: dan.bolser at gmail.com
> To: bbb at bioinformatics.org
> Subject: Re: [BiO BB] gff to sequence
>
> You can do this easily in Perl... Here is some 'pseudo code' to
> (roughly) do it...
>
>
> ## Get a hash of sequences=2C keys =3D IDs=2C values =3D sequence strings=
=3B
> my %sequences=3B
> ...
>
> # open the GFF file ...
>
> while(my $gff =3D ){
> my @gffcols =3D split(/\t/=2C $gff)=3B
>
> print substr($sequence{$gffcols[0]}=2C $gffcols[3]=2C $gffcols[4] -
> $gffcols[3])=2C "\n"=3B
> ...
> }
>
>
> Or something roughly similar to the above =3B-)
>
> Dan.
>
>
> 2009/10/3 Kie Kyon Huang :
>> Hi=2C
>>
>> Is there a way to quickly extract out the coordinates from a gff file
>> and the corresponding sequence from a fasta file?
>>

I guess it depends what you mean by quick- quick to write you could use awk
but then it depends what additional things you want to do with results.=20
I ended up writing a C++ fasta utility program since PERL can slow down som=
etimes but I ended up grabbing a couple of regex libraries to let me=20
grep names etc.=20




 		 	   		  =0A=
_________________________________________________________________=0A=
Hotmail: Free=2C trusted and rich email service.=0A=
http://clk.atdmt.com/GBL/go/171222984/direct/01/=




More information about the BBB mailing list