[BiO BB] time efficient global alignment algorithm

Tue Aug 4 04:41:23 EDT 2009

2009/8/3 Ryan Golhar <golharam at umdnj.edu>:
> I'm trying to perform a large amount of sequence alignments of long DNA
> sequences, some up to 163,000+ bp in length.  I was trying to use the
> standard Needleman-Wunsch algorithm, but the matrix used requires a large
> amount of memory...about 100 GB of memory.  This obviously won't work.

For two sequences in the region of > 85% similarity, MUMMER [1] works
very well.

For example, aligning two strains of e. coli on my desktop, both in
the region of 460 kb:

* U00096 (Escherichia coli str. K-12 substr. MG1655)
* CP000948 (Escherichia coli str. K12 substr. DH10B)

time nucmer U00096.fasta CP000948.fasta

real    0m14.035s
user    0m11.370s
sys     0m0.400s

It uses k-mer based alignment heuristics to do things very quickly and
efficiently.

HTH,
Dan.

[1] http://mummer.sourceforge.net/

> I tried using stretcher from the EMBOSS package, but it takes way too long
> to align each pair of sequences.  I'm looking for something that can perform
> alignments fast using a reasonable amount of memory.
>
> I found one tool, called AVID, but have been unsuccessful in getting it to
> run to the sequence set I have.
>
> Before I go an try to develop a new solution to this, does anyone have or
> recommend a program to perform a large number of global pairwise alignments
> for long sequences?
>
> Ideally, something with the speed similar to BLAST.
>
> Ryan
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>