[Bio-Linux] tophat with -p > 1 results in missing reads
Josh Thackray
thackray at rci.rutgers.edu
Fri Aug 21 13:48:32 EDT 2015
Hi All,
I am running tophat (version 2.0.13, from the biolinux distribution). I
am facing a problem where running tophat with increasing values for -p
(number of threads) results in more and more reads lost in the final
output. I'm starting with an uncompressed fastq file containing
18,115,321 reads, and running tophat with default parameters except for
-p and -o.
Running -p 8 results with the following information in align_summary.txt:
Input : 318640
Mapped: 191949 (60.2% of input)
of these: 29316 (15.3%) have multiple alignments (0 have >20)
60.2% overall read mapping rate.
Running -p 4 results with the following information in align_summary.txt:
Input : 1302700
Mapped: 759998 (58.3% of input)
of these: 115861 (15.2%) have multiple alignments (1 have >20)
58.3% overall read mapping rate.
Running -p 1 results with the following information in align_summary.txt:
Input : 18115321
Mapped: 12014534 (66.3% of input)
of these: 1867188 (15.5%) have multiple alignments (13 have >20)
66.3% overall read mapping rate.
I also tried running tophat with the --no-sort-bam option to check if
samtools was somehow screwing up during the mergesort operation, but I
get the same result. I also confirmed the numbers reported in the
align_summary.txt file using the samtools flagstat command. Further
using bowtie1 instead of bowtie2 for the alignment engine did not
resolve the problem of these reads going missing.
Any ideas???
Thanks,
Josh
--
Josh Thackray
Laboratory Researcher III
Human Genetics Institute of NJ
Department of Genetics
Rutgers University
More information about the Bio-linux-list
mailing list