[Bioclusters] Parallel Sequence Alignment tool
nuin at genedrift.org
Mon Aug 3 11:50:07 EDT 2009
Just my two cents. Aligning rRNA is not a straightforward process and
it shouldn't be attempted to be accomplished automatically. Muscle,
MAFFT and other fast algorithms will generate very low quality
alignments if it's done blindly. Based on the number of sequences you
have, and their nature, you would be OK by wrapping some script around
ClustalW or ClustalW-MPI.
A good protocol to align rRNA is as follows:
- align two sequences
- add a third sequence to it by using the first two as a profile
- add a fourth sequence using the first three as a profile
- add a fifth sequence ...
- at some point you will have a good enough profile that would allow
you to use the aligned sequences as a model to the ones added to the
The reason is rRNA has a secondary (and tertiary) structure that
contains stems and loops. Stems are short segments that are somewhat
"duplicated" along the flat sequence and attache to each other when
forming the secondary structure. This connection sometimes don't
follow the usual A-T(U) C-G connection. Due to the stems there is a
pattern on the primary structure that has to be followed to generate a
good (but not excellent) alignment.
I guess a rRNA alignment software would be too slow for your
requirements, but I guess by using ClustalW-MPI and some sequences as
profile would you get a slightly good alignment in maybe a couple of
Hope that helps
On 30-Jul-09, at 12:19 PM, Nick Holway wrote:
> Steve actually posted this on behalf of me, so to cut out the middle
> man I'll answer.
> I'm trying to assist a scientist with a bioinformatics project. He's
> trying to align 16s rDNA sequences to identify the bacterial species.
> I launched a Muscle job on his behalf which took ~5.5 days to run (on
> 3GHz "Harpertown" Xeons). The file the scientist gave me had ~5000
> sequences in which were mostly 1000-1500 bases long.
> I'm trying to persuade the scientist to see if he can reduce the
> number of sequences that he needs to align and also to see if his data
> needs to let Muscle run to completion rather than just the first two
> My reason for wanting to know if there are any good parallel sequence
> alignment tools is that we've seen some excellent speed increases with
> our MD code. Knowing this scientist I imagine he'll need the entire
> data set to be aligned :)
> If you need me to find out any more information from the scientist
> please let me know.
> 2009/7/22 Juan Carlos Perin <bic at genome.chop.edu>:
>> Are you looking to align short reads from ngs, or other data?
>> ~ juan
>> On Jul 17, 2009, at 10:41, <slitster at rcn.com> wrote:
>>> Does anyone have recommnedations for a parallel sequence alignment
>>> User investigation so far has turned up clustalW-MPI, but it seams
>>> to be
>>> using an older version of clustalW.
>>> Any imput much appreciated.
>>> Bioclusters maillist - Bioclusters at bioinformatics.org
>> Bioclusters maillist - Bioclusters at bioinformatics.org
> Bioclusters maillist - Bioclusters at bioinformatics.org
More information about the Bioclusters