One simple approach would be using Neighbor Joining to build a tree where each leaf represents one of your CCP-modules and path length between two leaves represents the relative similarity between a pair of modules. There are a number of phylogenetic analysis software packages with implementations of Neighbor Joining, let google be your guide, but they typically take a distance matrix as input rather than a similarity matrix. Of course, your similarity matrix could be transformed into a 'relative distance matrix'. If the largest value in the matrix is k, then for each entry x, replace x with x_new = 1 - (x/k). Although there are certainly more rigorous approaches, this ought to be simple and would suffice as a first approximation at clustering. It really depends on what you plan to do with the clustering once you've created it. Hope that helps -Aaron --------------- Original Message Follows --------------- Date: Fri, 6 Aug 2004 00:16:47 +0100 From: FJPB Asselbergs <s0340567@sms.ed.ac.uk> Reply-To: bio_bulletin_board@bioinformatics.org To: bio_bulletin_board@bioinformatics.org Subject: [BiO BB] question on clustering Hi all, I have a question that concerns my MSc project. I am trying to cluster 30 CCP-modules (Complement Receptor 1) after having used a novel approach that looks at the electrostatic surfaces. I have reached the stage where I have obtained a similarity matrix of 30 by 30 filled with positive scores. The higher a score the more similar two modules are. For example, if matrix entry (3,6) = 13 and entry (3,8) = 24 then module 3 is more similar to module 8 than to module 6 due to a higher score. My problem now is to cluster these 30 modules based on this one similarity matrix. I am not used to have to cluster small datasets or in this case a similarity matrix and not having training data. I have searched around a lot on Google for programs that could cluster my modules using the similarity matrix but so far I have not found anything very helpful. Does anyone know of a program (preferably free software) that could help me out here, or another way which I could easily implement myself in a script, that would be valid? I would really appreciate all replies to this message and thank you all already for looking at this and thinking about this. Thanks and regards, Floris On Thu, 5 Aug 2004, J.W. Bizzaro wrote: