[BiO BB] protein clustering threshold

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Tue Feb 10 21:21:46 EST 2004

On Tue, 10 Feb 2004, Hongyu Zhang wrote:

> I also remember hearing from somewhere that a single mutation at active sites
> can change the activity of a protein from catalyzing one reaction to another.
> I need to find the the exact example. Can anyone help me, please? I don't think
> Prion is a good example, because my question is not about whether one protein
> can have multiple functions.

I think the above 'infamous' example was actualy engineered, rather than
existing naturally.

> Dan, thanks for the Uniprot example, but I found that some of the pair-wise
> percentages of identities within the cluster are far less than 90% (e.g.,
> between representative P51857 and member P52895 is 57.3% based on CLUSTALW). I

Sorry! My fault - I was using UniRef50!

> think it was caused by the clustering algorithm used in the data set. Another
> problem in the UniRef90 XML file is that it doesn't come with Evidence code,
> which makes it hard to tell whether the annotations are from experimental
> results or electronic annotations. The latest TrEMBL provides this evidence

Yup, this is why you need to link to main record data somehow.

> code, which I am going to give a try. I will keep you guys posted. BTW, I also
> have a Perl XML parser, which is based on the XML::Twig module, but I am not

I am not sure if this works on massive XML files, but maby it is OK if you
are only looking at a few identifiers.

> familiar with the automatic conversion between XML schema and SQL tables.

I posted a question at comp.text.xml, so I will let you know what results
I get back (if any!).


> Thanks again.
> --Hongyu
