Weizhong Li has moved his popular cd-hit software to bioinformatics.org and created a new open source project!
cd-hit is used in a wide variety of applications, helping many people quickly and efficiently create non-redundant sequence databases at high sequence identity. Now the software is open for community development - which means you too can help improve this already excellent package!
We are looking for developers and researchers of all experience levels to...
Make cd-hit compatible with existing sequence IO libraries, to expand the range of allowed input formats.
Develop a range of useful output formats, including XML.
Package cd-hit with gnu configure utilities to expand the range of platforms for which cd-hit can be reliably used.
Research the all important sequence clustering benchmark 'sub project' of cd-hit, working to develop rigorous measures of sensitivity, selectivity and optimisation for a range of clustering tools and parameters.
Begin to KO the few existing bugs in the cd-hit bug list.