[BiO BB] Comparing sequences from GenBank and RefSeq...
Ryan Raaum
ryan.raaum at gmail.com
Thu Apr 23 12:15:34 EDT 2009
The refseq entry tells you which non-refseq entry/entries it was
derived from. In this case it says DQ386163, which suggests there are
at least 2 pototo chloroplast sequences available - one by an Italian
group and one by a Korean group.
On Thu, Apr 23, 2009 at 11:42 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
> Hi,
>
> I found that the potato chloroplast sequence from GenBank (DQ231562.1)
> has several differences (260 SNPs and 30 indels) relative to the same
> sequence in RefSeq (NC_008096.1). As far as I am aware this sequence
> has only been obtained once, why would the two differ? In general
> should I trust the refseq sequence?
>
>
> For your reference here is the output of dnadiff over the two files:
>
> Reference/DQ231562.fasta Query/NC_008096.fasta
> NUCMER
>
> [REF] [QRY]
> [Sequences]
> TotalSeqs 1 1
> AlignedSeqs 1(100.00%) 1(100.00%)
> UnalignedSeqs 0(0.00%) 0(0.00%)
>
> [Bases]
> TotalBases 155312 155298
> AlignedBases 155312(100.00%) 155298(100.00%)
> UnalignedBases 0(0.00%) 0(0.00%)
>
> [Alignments]
> 1-to-1 1 1
> TotalLength 155312 155298
> AvgLength 155312.00 155298.00
> AvgIdentity 99.81 99.81
>
> M-to-M 1 1
> TotalLength 155312 155298
> AvgLength 155312.00 155298.00
> AvgIdentity 99.81 99.81
>
> [Feature Estimates]
> Breakpoints 0 0
> Relocations 0 0
> Translocations 0 0
> Inversions 0 0
>
> Insertions 0 0
> InsertionSum 0 0
> InsertionAvg 0.00 0.00
>
> TandemIns 0 0
> TandemInsSum 0 0
> TandemInsAvg 0.00 0.00
>
> [SNPs]
> TotalSNPs 260 260
> AC 23(8.85%) 14(5.38%)
> AG 24(9.23%) 30(11.54%)
> AT 15(5.77%) 14(5.38%)
> CA 14(5.38%) 23(8.85%)
> CG 24(9.23%) 18(6.92%)
> CT 32(12.31%) 19(7.31%)
> GA 30(11.54%) 24(9.23%)
> GC 18(6.92%) 24(9.23%)
> GT 13(5.00%) 34(13.08%)
> TA 14(5.38%) 15(5.77%)
> TC 19(7.31%) 32(12.31%)
> TG 34(13.08%) 13(5.00%)
>
> TotalGSNPs 113 113
> AC 9(7.96%) 8(7.08%)
> AG 17(15.04%) 17(15.04%)
> AT 5(4.42%) 3(2.65%)
> CA 8(7.08%) 9(7.96%)
> CG 6(5.31%) 7(6.19%)
> CT 15(13.27%) 8(7.08%)
> GA 17(15.04%) 17(15.04%)
> GC 7(6.19%) 6(5.31%)
> GT 6(5.31%) 12(10.62%)
> TA 3(2.65%) 5(4.42%)
> TC 8(7.08%) 15(13.27%)
> TG 12(10.62%) 6(5.31%)
>
> TotalIndels 30 30
> A. 14(46.67%) 4(13.33%)
> C. 1(3.33%) 0(0.00%)
> G. 0(0.00%) 0(0.00%)
> T. 7(23.33%) 4(13.33%)
>
> TotalGIndels 24 24
> A. 10(41.67%) 4(16.67%)
> C. 1(4.17%) 0(0.00%)
> G. 0(0.00%) 0(0.00%)
> T. 5(20.83%) 4(16.67%)
>
>
> Thanks for any pointers,
> Dan.
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>
--
Ryan Raaum
Assistant Professor
Department of Anthropology
Lehman College, The City University of New York
250 Bedford Park Blvd. West
Bronx, NY 10468
e: ryan.raaum at lehman.cuny.edu
w: http://www.raaum.org
o: (718) 960-8845
f: (718) 960-8406
More information about the BBB
mailing list