[BiO BB] Clustering small DNA sequences into groups
dmb at mrc-dunn.cam.ac.uk
Tue Aug 9 14:53:45 EDT 2005
On Tue, 9 Aug 2005, Samantha Fox wrote:
>I have a set of small DNA sequences (about 40) 6-10 bp, and wish to
>group them into clusters based on sequence.
>Any suggestions for doing that ?
I never tried using CD-HIT to cluster DNA, but it should work (you will
have to alter the 'throwaway' length to something like 4 to stop all your
sequences being filterd as too short.
I found blastclust (which can be explicitly set to cluster
DNA) automatically ignores any protein sequence of less than 30
residues. While it could cluster those together (100% identical for
example) it always seems to put any protein fragment less than 30 residues
into a new cluster.
Not sure if the behaviour is the same in DNA mode.
>Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org
More information about the BBB