[BiO BB] time efficient global alignment algorithm
Dan Bolser
dan.bolser at gmail.com
Tue Aug 4 04:41:23 EDT 2009
2009/8/3 Ryan Golhar <golharam at umdnj.edu>:
> I'm trying to perform a large amount of sequence alignments of long DNA
> sequences, some up to 163,000+ bp in length. I was trying to use the
> standard Needleman-Wunsch algorithm, but the matrix used requires a large
> amount of memory...about 100 GB of memory. This obviously won't work.
For two sequences in the region of > 85% similarity, MUMMER [1] works
very well.
For example, aligning two strains of e. coli on my desktop, both in
the region of 460 kb:
* U00096 (Escherichia coli str. K-12 substr. MG1655)
* CP000948 (Escherichia coli str. K12 substr. DH10B)
time nucmer U00096.fasta CP000948.fasta
real 0m14.035s
user 0m11.370s
sys 0m0.400s
It uses k-mer based alignment heuristics to do things very quickly and
efficiently.
HTH,
Dan.
[1] http://mummer.sourceforge.net/
> I tried using stretcher from the EMBOSS package, but it takes way too long
> to align each pair of sequences. I'm looking for something that can perform
> alignments fast using a reasonable amount of memory.
>
> I found one tool, called AVID, but have been unsuccessful in getting it to
> run to the sequence set I have.
>
> Before I go an try to develop a new solution to this, does anyone have or
> recommend a program to perform a large number of global pairwise alignments
> for long sequences?
>
> Ideally, something with the speed similar to BLAST.
>
> Ryan
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>
More information about the BBB
mailing list