[BiO BB] "The" mRNA for a gene

Ryan Golhar golharam at umdnj.edu
Fri Apr 20 19:41:43 EDT 2007


Amir,

You can't always rely on 1 mRNA to represent a gene due to splice variants,
etc.  Your best bet is to use the RefSeq NM_ sequence and only use the
coding portion ie from start to stop codon.  The RefSeq NM_ sequence will
give you a single reference sequence per gene.  In the GenBank entry, pull
out the CDS portion.  That will give you what you are looking for.

Ryan


-----Original Message-----
From: bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org
[mailto:bio_bulletin_board-bounces+golharam=umdnj.edu at bioinformatics.org] On
Behalf Of Amir Karger
Sent: Friday, April 20, 2007 10:01 AM
To: bio_bulletin_board at bioinformatics.org
Subject: [BiO BB] "The" mRNA for a gene


I'd like to get exactly one "representative" transcript (Ensembl ENST or
GenBank mRNA, say) for every gene. Due to alternative splicing and other
messy biology, Ensembl and GenBank will often have lots of mRNA's associated
with single genes. But I think it's true that for most genes, we know of one
"main" protein that that gene makes, and one isoform of that protein will be
the most studied. Swiss-PROT gives me just one mRNA for BRCA1_HUMAN, which
is nice, but for 1433E_HUMAN it gives me 7 mRNAs, which have different
lengths!

Does anyone know of a place I could get something like this? I'm happy to do
some coding if necessary. The results don't have to be perfect. I'm just
trying to do something high-throughput that gives one believable result per
gene.

Thanks,
- Amir Karger
Research Computing
Life Sciences Division
Harvard University
617-496-0626
_______________________________________________
General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board





More information about the BBB mailing list