[BiO BB] Bioperl FASTA to NEXUS with concatenation of genes
basm101
basm101 at york.ac.uk
Tue May 20 05:24:55 EDT 2003
Hi there,
I'm new to this board so not sure whether people use it to ask for help,
if not I apologise for posting in
the wrong place.
I am wanting to concatenate gene datasets and wondered if anyone knew
how to use bioperl to do this.
My input files are aligned FASTA format and I would like the output to
be NEXUS in non-interleaved format.
ie.
#NEXUS
begin taxa;
dimensions ntax=number of taxa;
taxlabels
labels here
;
end;
begin characters;
dimensions nchar=number of chars;
format symbols = "" missing=?;
matrix
[gene1]
species 1 AAA
species 2 BBB
species 3 AAA
[gene2]
species1 CCC
species2 DDD
species3 CCC
;
end;
paup block here.
I have a bioperl script that prints out in interleaved format, but I
don't want this as I want to clearly see where one gene ends and the
next begins.
#!/usr/bin/perl -w
#Bioperl for format conversions
print "Which input file ?\n";
chomp($infile=<STDIN>);
print "output filename:\n";
chomp($output=<STDIN>);
open (MYFILE, "$infile" ) || die;
open (DATA, ">$output" ) ||die;
use Bio::AlignIO;
$in = Bio::AlignIO->new(-file => "$infile" , '-format' => 'fasta');
$out = Bio::AlignIO->new(-file => ">$output" , '-format' =>
'nexus');
# note: we quote -format to keep older perls from complaining.
while ( my $aln = $in->next_aln() ) {
$out->write_aln($aln);
}
@sequences = <$infile>;
Also I need to make the AlignIO object take multiple input files.
Any ideas ?
Thanks,
basm101
University of York
More information about the BBB
mailing list