[BiO BB] Extracting GeneNames from a GO class and their syblings

Sat Dec 1 15:19:08 EST 2007

On Dec 1, 2007, at 2:56 AM, Dan Bolser wrote:

> On 28/11/2007, Gaj Stan (BIGCAT) <Stan.Gaj at bigcat.unimaas.nl> wrote:
>> Dear all,
>>
>>
>>
>> My question today concerns GeneOntology annotation extraction. Is it
>> possible to extract a list of genes which belong to a specific
>> GO-process and it's children?
>>
>>
>>
>> Or, to put it in a more clearer context:
>>
>>
>>
>> - I'm interested in all genes belonging to the Lipid Metabolic  
>> process
>> category (GO:0006629 : lipid metabolic process
>> <http://amigo.geneontology.org/cgi-bin/amigo/go.cgi? 
>> view=details&search_
>> constraint=terms&depth=0&query=GO: 
>> 0006629&session_id=4045b1196258143&sho
>> w_associations=list>  ).
>>
>> - I aim to have a list of ALL genes that belong to this GO  
>> category or
>> below, up untill the smallest, most specific node (i.e. cellular  
>> lipid
>> metabolic process
>> <http://amigo.geneontology.org/cgi-bin/amigo/go.cgi? 
>> view=details&search_
>> constraint=terms&depth=0&query=GO: 
>> 0044255&session_id=4045b1196258143&sho
>> w_associations=list>   -> cellular lipid catabolic process
>> <http://amigo.geneontology.org/cgi-bin/amigo/go.cgi? 
>> view=details&search_
>> constraint=terms&depth=0&query=GO: 
>> 0044242&session_id=4045b1196258143&sho
>> w_associations=list>   --> etc)
>>
>> - This list can afterwards be filtered for duplicate names / IDs  
>> using
>> Perl or something similar (can do that part myself (-; )
>>
>> - In the end, I would like to have a list that consists of gene  
>> names or
>> any other usable ID(i.e. like EnsEMBL, UniProt, EntrezGene, ...)  
>> which
>> are classfied as having a Lipid Metabolic activity!
>>
>>
>>
>> Is there a specific GO-Tool available (or am I unaware that this  
>> is even
>> possible in GO itself) that can do this? If the solution presents  
>> itself
>> by using R and specific GO-libraries, then I'm eager to hear about  
>> it as
>> well (-: (since I know it is possible to extract both parent and
>> children nodes, but am unaware on how to do this for gene names/IDs).
>
> Sounds like an interesting (if not uncommon) request.... You should be
> able to achieve this using the 'go-database', which you can find under
> here;
>
> http://www.geneontology.org/

The database page is here:
http://www.geneontology.org/GO.database.shtml

> Seems that you need to traverse the hierarchy, querying a specific
> mapping as you go (performing iterative SQL queries via Perl should
> allow you to do this).

Actually, no iterative queries or programming is required as the  
closure of the relations are pre-computed in the database - see the  
explanation on the page above.

There are some example queries here:
http://wiki.geneontology.org/index.php/Example_Queries

and a link to a web form for executing SQL queries & getting back  
results as tab-delimited files or HTML

Of course, if you want to do the closure of the relations in the  
graph programmatically, there are APIs for doing so.

Note also that export of tab delimited files from query results will  
be available in the next release of AmiGO - this will be the most  
likely route from the non-SQL or perl-savvy user.

> I don't know why the database dump isn't
> provided in a convenient tab delimited format.

Well, technically the MySQL database dump is tab-delimited. But we  
recommend against using this for any purpose other than building a  
MySQL instance. The relatively normalized schema and prevalence of  
surrogate IDs makes these dumps a poor choice for writing parsers  
against.

And of course the association files are tab-delimited but these are  
not sufficient in and of themselves to answer the query above.

> You can try asking for a more direct solution on either the "GO
> Friends" mailing list or the "GO Database" mailing list, here;
>
> http://www.geneontology.org/GO.mailing.lists.shtml

There is also the GO helpdesk, linked from the front page.

> Which reminds me... (more in the next email).
>
>
> Dan.
>
> --
>
> Join the # bioinformatics fun!
>
> irc://freenode.net/#bioinformatics or
>
> http://www.acm.jhu.edu/cgi-irc/irc.cgi?chan=%23bioinformatics
>
>