[BiO BB] Bioperl FASTA to NEXUS with concatenation of genes

basm101 basm101 at york.ac.uk
Tue May 20 05:24:55 EDT 2003


Hi there,

I'm new to this board so not sure whether people use it to ask for help,
if not I apologise for posting in
the wrong place.

I am wanting to concatenate gene datasets and wondered if anyone knew
how to use bioperl to do this.
My input files are aligned FASTA format and I would like the output to
be NEXUS in non-interleaved format.

ie.

#NEXUS
begin taxa;
dimensions ntax=number of taxa;
taxlabels
labels here
;
end;

begin characters;
dimensions nchar=number of chars;
format symbols = "" missing=?;

matrix
[gene1]
species 1 AAA
species 2 BBB
species 3 AAA

[gene2]
species1 CCC
species2 DDD
species3 CCC
;
end;

paup block here.

I have a bioperl script that prints out in interleaved format, but I
don't want this as I want to clearly see where one gene ends and the
next begins.

#!/usr/bin/perl -w

#Bioperl for format conversions
print "Which input file ?\n";
chomp($infile=<STDIN>);

print "output filename:\n";
chomp($output=<STDIN>);


open (MYFILE, "$infile" ) || die;
open (DATA, ">$output" )  ||die;

use Bio::AlignIO;

    $in  = Bio::AlignIO->new(-file => "$infile" , '-format' => 'fasta');

    $out = Bio::AlignIO->new(-file => ">$output" , '-format' =>
'nexus');
    # note: we quote -format to keep older perls from complaining.

    while ( my $aln = $in->next_aln() ) {
        $out->write_aln($aln);
    }


  @sequences = <$infile>;

Also I need to make the AlignIO object take multiple input files.

Any ideas ?

Thanks,
basm101
University of York





More information about the BBB mailing list