[BiO BB] Comparing sequences from GenBank and RefSeq...
Dan Bolser
dan.bolser at gmail.com
Thu Apr 23 11:42:42 EDT 2009
Hi,
I found that the potato chloroplast sequence from GenBank (DQ231562.1)
has several differences (260 SNPs and 30 indels) relative to the same
sequence in RefSeq (NC_008096.1). As far as I am aware this sequence
has only been obtained once, why would the two differ? In general
should I trust the refseq sequence?
For your reference here is the output of dnadiff over the two files:
Reference/DQ231562.fasta Query/NC_008096.fasta
NUCMER
[REF] [QRY]
[Sequences]
TotalSeqs 1 1
AlignedSeqs 1(100.00%) 1(100.00%)
UnalignedSeqs 0(0.00%) 0(0.00%)
[Bases]
TotalBases 155312 155298
AlignedBases 155312(100.00%) 155298(100.00%)
UnalignedBases 0(0.00%) 0(0.00%)
[Alignments]
1-to-1 1 1
TotalLength 155312 155298
AvgLength 155312.00 155298.00
AvgIdentity 99.81 99.81
M-to-M 1 1
TotalLength 155312 155298
AvgLength 155312.00 155298.00
AvgIdentity 99.81 99.81
[Feature Estimates]
Breakpoints 0 0
Relocations 0 0
Translocations 0 0
Inversions 0 0
Insertions 0 0
InsertionSum 0 0
InsertionAvg 0.00 0.00
TandemIns 0 0
TandemInsSum 0 0
TandemInsAvg 0.00 0.00
[SNPs]
TotalSNPs 260 260
AC 23(8.85%) 14(5.38%)
AG 24(9.23%) 30(11.54%)
AT 15(5.77%) 14(5.38%)
CA 14(5.38%) 23(8.85%)
CG 24(9.23%) 18(6.92%)
CT 32(12.31%) 19(7.31%)
GA 30(11.54%) 24(9.23%)
GC 18(6.92%) 24(9.23%)
GT 13(5.00%) 34(13.08%)
TA 14(5.38%) 15(5.77%)
TC 19(7.31%) 32(12.31%)
TG 34(13.08%) 13(5.00%)
TotalGSNPs 113 113
AC 9(7.96%) 8(7.08%)
AG 17(15.04%) 17(15.04%)
AT 5(4.42%) 3(2.65%)
CA 8(7.08%) 9(7.96%)
CG 6(5.31%) 7(6.19%)
CT 15(13.27%) 8(7.08%)
GA 17(15.04%) 17(15.04%)
GC 7(6.19%) 6(5.31%)
GT 6(5.31%) 12(10.62%)
TA 3(2.65%) 5(4.42%)
TC 8(7.08%) 15(13.27%)
TG 12(10.62%) 6(5.31%)
TotalIndels 30 30
A. 14(46.67%) 4(13.33%)
C. 1(3.33%) 0(0.00%)
G. 0(0.00%) 0(0.00%)
T. 7(23.33%) 4(13.33%)
TotalGIndels 24 24
A. 10(41.67%) 4(16.67%)
C. 1(4.17%) 0(0.00%)
G. 0(0.00%) 0(0.00%)
T. 5(20.83%) 4(16.67%)
Thanks for any pointers,
Dan.
More information about the BBB
mailing list