How Should We Do It

This is a place we could brainstorm the approach we could make the gene wiki useful.

The first thing came to my mind is to build a skeleton of gene information through an automated information aggregation from the public databases. We could set up a set of modular sessions for each data source and populate the data independently. A few things we need to do for this:

1. decide on the major modules we want to include in the gene wiki and the data source.

2. We need a ID mapping scheme to ensure the gene IDs are mapped properly from different source. We could adopt Entrez Gene ID as our starting point.

3. We release the first skeleton to the public for human curation.


One idea would be to go through the NCBI GEO database, and mark down which gene is found in which cell line/tissue on which array. That would, at least, provide data in a "low hanging fruit" approach, with novel and valuable data. It should also be reasonably automatable.

A second idea would be to build up a robust Synonym section on each gene. Not only would this increase traffic, it would also provide a good service to people learning their way around the genome.

One more thing I'd add: it might be worth changing the gene naming system used. If you go with the names, as is currently the system, you will have a much more difficult time ensuring that no duplicates occur. Instead, it might be wise to have each page done for an Ensembl Id or otherwise for each gene, and then beef up the Synonym sections. That way, you will only get one gene per page, and one page per gene.