[BiO BB] Clustering small DNA sequences into groups

Iddo Friedberg idoerg at burnham.org
Tue Aug 9 18:07:34 EDT 2005


CD-HIT does not work on DNA, or short sequences for that matter..


Dan Bolser wrote:

>On Tue, 9 Aug 2005, Samantha Fox wrote:
>
>  
>
>>Hi,
>>
>>I have a set of small DNA sequences (about 40) 6-10 bp, and wish to
>>group them into clusters based on sequence.
>>
>>Any suggestions for doing that ?
>>    
>>
>
>I never tried using CD-HIT to cluster DNA, but it should work (you will
>have to alter the 'throwaway' length to something like 4 to stop all your
>sequences being filterd as too short. 
>
>I found blastclust (which can be explicitly set to cluster
>DNA) automatically ignores any protein sequence of less than 30
>residues. While it could cluster those together (100% identical for
>example) it always seems to put any protein fragment less than 30 residues
>into a new cluster.
>
>Not sure if the behaviour is the same in DNA mode.
>
>
>  
>
>>Thanks,
>>
>>Samantha
>>_______________________________________________
>>Bioinformatics.Org general forum  -  BiO_Bulletin_Board at bioinformatics.org
>>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>>
>>    
>>
>
>_______________________________________________
>Bioinformatics.Org general forum  -  BiO_Bulletin_Board at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
>
>  
>


-- 

Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
Tel: (858) 646 3100 x3516
Fax: (858) 713 9930
http://ffas.ljcrf.edu/~iddo




More information about the BBB mailing list