[BiO BB] Finding (Predicting) a Proein family in a new organism using only Bioinformatics approach

Boris Steipe boris.steipe at utoronto.ca
Sat Apr 29 09:35:33 EDT 2006

Let me rephrase your question so I can be sure I understand it:

   You are looking for evidence for the presence of a certain system
   (pathway?) in an organism. You know the components of that
   system in another organism.

I am not sure why you would do a multiple alignment for this task.  
What you are looking for is the presence or absence of particular  
genes - multiple alignments are useful to compare genes, not to  
search them. Surely you have not strung the whole proteome into a  
string and then tried to align that? That approach must fail because  
you would be trying to align similar fragments with shuffled order.

The correct approach is to generate a phylogenetic profile. (1) To  
begin with, you need a set of reference sequences that are components  
of your system S. I assume you can take that from literature  
references. Then, for several proteomes for which know that they  
contain your system S, you compare all sequences with each of your  
reference sequences (by BLAST, FASTA, or, if you have the  
computational resources, by full dynamic alignment (EMBOSS program  
"Needle")). For each  reference sequence, you note whether it has an  
orthologue in the proteome you are analysing (using the common  
definition of "reciprocal best matching pairs" to find orthologues).  
The result is a profile that represents a system component in every  
column, an organism in every row, and a yes/no information in every  
cell, that describes whether an orthologue is present or absent. In  
this first step you establish that your idea of the reference set is  
indeed correct in the sense that the components from your reference  
list really have orthologues in (nearly) all organisms that you  
believe have that system. (2) next you need to understand how the  
absence of your system affects a proteome. You do the same analysis,  
but this time with organisms for which you know that the system S is  
absent. As a result you will understand which components are affected  
by the presence or absence of the system you are looking for. (3)  
Finally, you do the analysis with your query organism. It is likely  
going to be obvious which class this proteome belongs to.

Searching for "phylogenetic profiles" in PubMed will give you some  
background reading, in particular

seems to me to discuss something similar to what you want to do...

... unless I have misunderstood your question, then you'll just have  
to spend more time explaining it clearly :-)

Hope this helps,


On 29 Apr 2006, at 01:24, hamid wrote:

> Hi buddy,
> I have a question regarding to find a new protein family in a new  
> organism.
> Suppose we have the whole proteome of organism "O", although we  
> have 200 proteins regarding to protein family "F" those belongs to  
> several subunits of complete system "S".  We doubt whether there is  
> system "S" in organism "O" or not. I have done a multiple alignment  
> among whole proteome of "O" (almost 1200 proteins) and all the  
> proteins belonging to "F" acheieved from several  organisms those  
> are proven to have system "S" in them. I achieved a huge alignment  
> file. Now I do not know how can I find that are there any proteins  
> in "O" proteome relating to "F" family.
> Please guide me in this regard.
> Yours
> Behnam
> /*
> Hamid( Behnam )Nikbakht,
> M.Sc of Cell and Molecular Sciences
> Bioinformatics Center
> Laboratory of Biophysics and Molecular Biology
> Institute of Biochemistry and Biophysics
> University of Tehran
> P.O.Box 13145-1384
> Tehran, Iran.
> Tel: (+98 21) 664-98672
> Fax: (+98 21) 669-56985
> Alt. E-Mail : hamid at ibb.ut.ac.ir
> */
> _______________________________________________
> Bioinformatics.Org general forum  -   
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

More information about the BBB mailing list