[BiO BB] About clustering genes to gene family
dmb at mrc-dunn.cam.ac.uk
Thu Aug 7 14:57:19 EDT 2003
What you describe can occur for 2 good reasons...
You are forming a 'complex cluster', created by *multiple domain*
A has domains in common with B,
B has domains in common with C.
A and C have no domains in common, and hence no homology.
A and C are too distantly related for sequence searches to uncover their
true homology. However, sequence B is *intermediate* to A and C,
having homology to both...
NB: Sequence similarity is not a metric, as it does not obey triangular
(I think it is metric at high levels of similarity though?)
In this case you have used the transitive nature of sequence similarity
distant homology via an intermediate sequence.
Jong Park and Sarah Techimann worked on both these ideas, and has
family clustering package called GENEFAMMER, Specifically DIVCLUS breaks up
complex clusters into domain families. Transitivity is implemented
(kinda) in psiblast /
hmm models, all three of which are used in PFAM, so you might want to
for your families.
Or you could insist your allignments cover 90% of the shortest sequence,
cluster using single linkage.
Zheng Fu wrote:
>Does anyone know how to clustering genes to a gene family based on the
>For two genes, we can define a threshold to seperate the homolog and
>non-homolog. But for three or more genes,how to define the homologs?(Such
>as Gene A and Gene B has high alignment score, A and C also has high sore,
>but B and C doesn't have high socre, can we say ABC are homologs?
More information about the BBB