[BiO BB] Clustering
Dan Bolser
dmb at mrc-dunn.cam.ac.uk
Wed Sep 3 12:46:56 EDT 2003
> > What packages support clustering of points
> > with a with a similarity matrix?
>
> I don't think I quite understand the question, can you elaborate on that?
Yup... I am always finding that I have some similarities between things,
and I would like to be able to do a simple clustering of the points,
but I am not familiar with the algoithms, so I would just like to play
around a bit.
I know you can do phylogenetic analysis on any similarity matrix, but
I don't need the high resolution (many similar points closly linked to
one short branch). I would like to generally see what 'blobs' of data
I have without investing too much time into the analysis (or the
computation!).
For example I might have the AA composition of 1000 sequences, and we
may suspect that the composition is biased across these sequences (not
uniform). So we think - maby I should break up into secondary structure,
maby into families, maby I should perform chi-squaird between every
possible combination of groups of the 1000 to find sub populations within
which the composition isn't biased...
If I take each protein and compare it's composition to every other, I have
an N**2/2 similarity matrix, which I would like to cluster, just to see
if any protein families, structural classes or taxonomic groups have a
particular bias in terms of AA composition, but this is a long complicated
analysis (I think to myself), so I don't bother.
Now I ask I am sure there are 1000's of clustering toolkits out there,
I should just google. Does anyone have any recomendations?
> > How can I derive the similarity of two matrices?
> >
>
> If you mean that you would like to check how "close" two similarity
> matrices (e.g. BLOSUM, PAM) are to each other, then one method is to
> compare the amino-acid pair frequency distributions used to construct
> these matrices.
You mean the similarity of two distributions? sounds interesting...
> Look to the following paper (fig 4, and the last
> paragraph in the "methods" section) for one example on how to do this,
> although other methods of comparing distributions may be used just as
> effectively:
>
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=retrieve&db=pubmed&list_uids=11790845&dopt=Abstract
Thanks very much,
Dan.
> ./I
>
>
>
>
>
More information about the BBB
mailing list