[BiO BB] "The" mRNA for a gene

Ryan Golhar golharam at umdnj.edu
Fri Apr 20 19:41:43 EDT 2007


You can't always rely on 1 mRNA to represent a gene due to splice variants,
etc.  Your best bet is to use the RefSeq NM_ sequence and only use the
coding portion ie from start to stop codon.  The RefSeq NM_ sequence will
give you a single reference sequence per gene.  In the GenBank entry, pull
out the CDS portion.  That will give you what you are looking for.


-----Original Message-----
From: bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org
[mailto:bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org] On
Behalf Of Amir Karger
Sent: Friday, April 20, 2007 10:01 AM
To: bio_bulletin_board at bioinformatics.org
Subject: [BiO BB] "The" mRNA for a gene

I'd like to get exactly one "representative" transcript (Ensembl ENST or
GenBank mRNA, say) for every gene. Due to alternative splicing and other
messy biology, Ensembl and GenBank will often have lots of mRNA's associated
with single genes. But I think it's true that for most genes, we know of one
"main" protein that that gene makes, and one isoform of that protein will be
the most studied. Swiss-PROT gives me just one mRNA for BRCA1_HUMAN, which
is nice, but for 1433E_HUMAN it gives me 7 mRNAs, which have different

Does anyone know of a place I could get something like this? I'm happy to do
some coding if necessary. The results don't have to be perfect. I'm just
trying to do something high-throughput that gives one believable result per

- Amir Karger
Research Computing
Life Sciences Division
Harvard University
General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org

More information about the BBB mailing list