> I'm helping to write > gene expression database analysis tools that will help people see what > their data means in a larger context. Even in tiny genomes like bacteria, > you have 4.6MB of DNA, ~4300 ORFs, very complex interrelationships on a > gene regulation level, more on a biochemical pathway level. How do you > hope to have people conceptualize those relations? If you're doing gene expression stuff, have you checked out the code available from biojava (http://www.biojava.org with API docs at http://www.sanger.ac.uk/Users/td2/biojava-docs/)? I know that they have some Suppor Vector Machine (SVM) code, which seems to be the most current "popular" method to cluster expression data. The paper in PNAS that I read on this also has a SVM implementation (http://www.cse.ucsc.edu/research/compbio/genex/genex.html), so they might be interesting to compare. > but we (NCGR) need something that works right > now and even tho DX was born years ago (like unix), it's been > well-thought-out and debugged (like unix) and may, with only a little > meddling at an external interface, support a tremendous amount of the > functionality that we need. And I personally would rather design nifty > analytical tools than infrastructure. Makes sense, especially when you're on a limited time frame. I'll be interested to hear how it works out. Brad