[BiO BB] About clustering genes to gene family

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Fri Aug 8 13:23:53 EDT 2003



Zheng Fu wrote:

>How to differentiate the fist case(complex cluster) and the
>second(distantly related with homolog).
>  
>

You have to look at the alignments ... second case would look like...


--------------------------A---------------------------------
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---------------------------B---------------------------------
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
----------------------------C--------------------------------

And first case would be...

A: |------W------/-----X-----|
                 |||||||||||||
B:              |------x-----/-----Y-------|
                               ||||||||||||
C:                            |------y-------/--------hello mum!------|




>And where can I find the information about GENEFAMMER?
>  
>

Bioinformatics 1998 14: 144-150


>Thank you.
>  
>
:)

Dan.


>
>On Thu, 7 Aug 2003, Dan Bolser wrote:
>
>  
>
>>What you describe can occur for 2 good reasons...
>>
>>You are forming a 'complex cluster', created by *multiple domain*
>>proteins...
>>
>>A has domains in common with B,
>>B has domains in common with C.
>>
>>A and C have no domains in common, and hence no homology.
>>
>>I.e.
>>
>>A: |------W------/-----X-----|
>>B:                       |------x-----/-----Y-------|
>>C:                                         |------y-------/--------hello
>>mum!------|
>>
>>OR
>>
>>A and C are too distantly related for sequence searches to uncover their
>>true homology. However, sequence B is *intermediate* to A and C,
>>having homology to both...
>>
>>         B
>>        /   \
>>      /       \
>>    /           \
>> A              C
>>
>>NB: Sequence similarity is not a metric, as it does not obey triangular
>>equality.
>>(I think it is metric at high levels of similarity though?)
>>
>>In this case you have used the transitive nature of sequence similarity
>>to uncover
>>distant homology via an intermediate sequence.
>>
>>Jong Park and Sarah Techimann worked on both these ideas, and has
>>created a
>>family clustering package called GENEFAMMER, Specifically DIVCLUS breaks up
>>complex clusters into domain families. Transitivity is implemented
>>(kinda) in psiblast /
>>hmm models, all three of which are used in PFAM, so you might want to
>>look there
>>for your families.
>>
>>Or you could insist your allignments cover 90% of the shortest sequence,
>>and then
>>cluster using single linkage.
>>
>>Dan.
>>
>>
>>Zheng Fu wrote:
>>
>>    
>>
>>>Hi everyone,
>>>
>>>Does anyone know how to clustering genes to a gene family based on the
>>>sequence alignments.
>>>For two genes, we can define a threshold to seperate the homolog and
>>>non-homolog. But for three or more genes,how to define the homologs?(Such
>>>as Gene A and Gene B has high alignment score, A and C also has high sore,
>>>but B and C doesn't have high socre, can we say ABC are homologs?
>>>
>>>Thank you.
>>>
>>>Carol
>>>
>>>
>>>
>>>      
>>>
>>_______________________________________________
>>BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
>>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>>
>>    
>>
>
>  
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030808/f5a5bc07/attachment.html>


More information about the BBB mailing list