CD-HIT: Sequence clustering software - Summary
All categories :: bioinformatics software development :: CD-HIT: Sequence clustering software CD-HI/CD-HIT clusters protein sequence database at high sequence identity threshold. This program can remove the high sequence redundance efficiently. Program written by: Weizhong Li
License: GNU General Public License
Latest announcements
Weizhong Li has moved his popular cd-hit software to and created a new open source project!
cd-hit is used in a wide variety of applications, helping many people quickly and efficiently create non-redundant sequence databases at high sequence identity. Now the software is open for community development - which means you too can help improve this already excellent package!
We are looking for developers and researchers of all experience levels to...
- Make cd-hit compatible with existing sequence IO libraries, to expand the range of allowed input formats.
- Develop a range of useful output formats, including XML.
- Package cd-hit with gnu configure utilities to expand the range of platforms for which cd-hit can be reliably used.
- Research the all important sequence clustering benchmark 'sub project' of cd-hit, working to develop rigorous measures of sensitivity, selectivity and optimisation for a range of clustering tools and parameters.
- Begin to KO the few existing bugs in the cd-hit bug list.
If you use cd-hit, we would like to know!