[BiO BB] Finding (Predicting) a Proein family in a new organism using only Bioinformatics approach
boris.steipe at utoronto.ca
Sat Apr 29 09:35:33 EDT 2006
Let me rephrase your question so I can be sure I understand it:
You are looking for evidence for the presence of a certain system
(pathway?) in an organism. You know the components of that
system in another organism.
I am not sure why you would do a multiple alignment for this task.
What you are looking for is the presence or absence of particular
genes - multiple alignments are useful to compare genes, not to
search them. Surely you have not strung the whole proteome into a
string and then tried to align that? That approach must fail because
you would be trying to align similar fragments with shuffled order.
The correct approach is to generate a phylogenetic profile. (1) To
begin with, you need a set of reference sequences that are components
of your system S. I assume you can take that from literature
references. Then, for several proteomes for which know that they
contain your system S, you compare all sequences with each of your
reference sequences (by BLAST, FASTA, or, if you have the
computational resources, by full dynamic alignment (EMBOSS program
"Needle")). For each reference sequence, you note whether it has an
orthologue in the proteome you are analysing (using the common
definition of "reciprocal best matching pairs" to find orthologues).
The result is a profile that represents a system component in every
column, an organism in every row, and a yes/no information in every
cell, that describes whether an orthologue is present or absent. In
this first step you establish that your idea of the reference set is
indeed correct in the sense that the components from your reference
list really have orthologues in (nearly) all organisms that you
believe have that system. (2) next you need to understand how the
absence of your system affects a proteome. You do the same analysis,
but this time with organisms for which you know that the system S is
absent. As a result you will understand which components are affected
by the presence or absence of the system you are looking for. (3)
Finally, you do the analysis with your query organism. It is likely
going to be obvious which class this proteome belongs to.
Searching for "phylogenetic profiles" in PubMed will give you some
background reading, in particular
seems to me to discuss something similar to what you want to do...
... unless I have misunderstood your question, then you'll just have
to spend more time explaining it clearly :-)
Hope this helps,
On 29 Apr 2006, at 01:24, hamid wrote:
> Hi buddy,
> I have a question regarding to find a new protein family in a new
> Suppose we have the whole proteome of organism "O", although we
> have 200 proteins regarding to protein family "F" those belongs to
> several subunits of complete system "S". We doubt whether there is
> system "S" in organism "O" or not. I have done a multiple alignment
> among whole proteome of "O" (almost 1200 proteins) and all the
> proteins belonging to "F" acheieved from several organisms those
> are proven to have system "S" in them. I achieved a huge alignment
> file. Now I do not know how can I find that are there any proteins
> in "O" proteome relating to "F" family.
> Please guide me in this regard.
> Hamid( Behnam )Nikbakht,
> M.Sc of Cell and Molecular Sciences
> Bioinformatics Center
> Laboratory of Biophysics and Molecular Biology
> Institute of Biochemistry and Biophysics
> University of Tehran
> P.O.Box 13145-1384
> Tehran, Iran.
> Tel: (+98 21) 664-98672
> Fax: (+98 21) 669-56985
> Alt. E-Mail : hamid at ibb.ut.ac.ir
> Bioinformatics.Org general forum -
> BiO_Bulletin_Board at bioinformatics.org
More information about the BBB