[Bioclusters] Parallel Sequence Alignment tool

Liu,Li lli at ufl.edu
Thu Aug 13 21:12:59 EDT 2009


You may want to try this (http://www.biotech.ufl.edu/people/sun/esprit.html 
)

Sent from Li's iPhone

On Aug 13, 2009, at 8:30 PM, "Paulo Nuin" <nuin at genedrift.org> wrote:

> Hi
>
> Just my two cents. Aligning rRNA is not a straightforward process and
> it shouldn't be attempted to be accomplished automatically. Muscle,
> MAFFT and other fast algorithms will generate very low quality
> alignments if it's done blindly. Based on the number of sequences you
> have, and their nature, you would be OK by wrapping some script around
> ClustalW or ClustalW-MPI.
>
> A good protocol to align rRNA is as follows:
>
> - align two sequences
> - add a third sequence to it by using the first two as a profile
> - add a fourth sequence using the first three as a profile
> - add a fifth sequence ...
> - at some point you will have a good enough profile that would allow
> you to use the aligned sequences as a model to the ones added to the
> alignment
>
> The reason is rRNA has a secondary (and tertiary) structure that
> contains stems and loops. Stems are short segments that are somewhat
> "duplicated" along the flat sequence and attache to each other when
> forming the secondary structure. This connection sometimes don't
> follow the usual A-T(U) C-G connection. Due to the stems there is a
> pattern on the primary structure that has to be followed to generate a
> good (but not excellent) alignment.
>
> I guess a rRNA alignment software would be too slow for your
> requirements, but I guess by using ClustalW-MPI and some sequences as
> profile would you get a slightly good alignment in maybe a couple of
> days.
>
>
> Hope that helps
> Paulo
>
>
>
> On 30-Jul-09, at 12:19 PM, Nick Holway wrote:
>
>> Hello,
>>
>> Steve actually posted this on behalf of me, so to cut out the middle
>> man I'll answer.
>>
>> I'm trying to assist a scientist with a bioinformatics project. He's
>> trying to align 16s rDNA sequences to identify the bacterial species.
>> I launched a Muscle job on his behalf which took ~5.5 days to run (on
>> 3GHz "Harpertown" Xeons). The file the scientist gave me had ~5000
>> sequences in which were mostly 1000-1500 bases long.
>>
>> I'm trying to persuade the scientist to see if he can reduce the
>> number of sequences that he needs to align and also to see if his  
>> data
>> needs to let Muscle run to completion rather than just the first two
>> iterations.
>>
>> My reason for wanting to know if there are any good parallel sequence
>> alignment tools is that we've seen some excellent speed increases  
>> with
>> our MD code. Knowing this scientist I imagine he'll need the entire
>> data set to be aligned :)
>>
>> If you need me to find out any more information from the scientist
>> please let me know.
>>
>> Thanks
>>
>> Nick
>>
>> 2009/7/22 Juan Carlos Perin <bic at genome.chop.edu>:
>>> Are you looking to align short reads from ngs, or other data?
>>>
>>> ~ juan
>>>
>>> On Jul 17, 2009, at 10:41, <slitster at rcn.com> wrote:
>>>
>>>> Does anyone have recommnedations for a parallel sequence alignment
>>>> tool
>>>>
>>>> User investigation so far has turned up clustalW-MPI, but it seams
>>>> to be
>>>> using an older version of clustalW.
>>>>
>>>> Any imput much appreciated.
>>>>
>>>> Cheers
>>>>
>>>> Steve
>>>>
>>>> _______________________________________________
>>>> Bioclusters maillist  -  Bioclusters at bioinformatics.org
>>>> http://www.bioinformatics.org/mailman/listinfo/bioclusters
>>>>
>>>
>>> _______________________________________________
>>> Bioclusters maillist  -  Bioclusters at bioinformatics.org
>>> http://www.bioinformatics.org/mailman/listinfo/bioclusters
>>>
>>
>> _______________________________________________
>> Bioclusters maillist  -  Bioclusters at bioinformatics.org
>> http://www.bioinformatics.org/mailman/listinfo/bioclusters
>
>
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bioclusters



More information about the Bioclusters mailing list