[Bio-Linux] tophat with -p > 1 results in missing reads

Josh Thackray thackray at rci.rutgers.edu
Fri Aug 21 13:48:32 EDT 2015


Hi All,

I am running tophat (version 2.0.13, from the biolinux distribution). I 
am facing a problem where running tophat with increasing values for -p 
(number of threads) results in more and more reads lost in the final 
output. I'm starting with an uncompressed fastq file containing  
18,115,321 reads, and running tophat with default parameters except for 
-p and -o.

Running -p 8 results with the following information in align_summary.txt:
     Input     :    318640
     Mapped:    191949 (60.2% of input)
     of these:     29316 (15.3%) have multiple alignments (0 have >20)
     60.2% overall read mapping rate.

Running -p 4 results with the following information in align_summary.txt:
     Input     :   1302700
     Mapped:     759998 (58.3% of input)
     of these:     115861 (15.2%) have multiple alignments (1 have >20)
     58.3% overall read mapping rate.

Running -p 1 results with the following information in align_summary.txt:
     Input     :  18115321
     Mapped:  12014534 (66.3% of input)
     of these:   1867188 (15.5%) have multiple alignments (13 have >20)
     66.3% overall read mapping rate.

I also tried running tophat with the --no-sort-bam option to check if 
samtools was somehow screwing up during the mergesort operation, but I 
get the same result. I also confirmed the numbers reported in the 
align_summary.txt file using the samtools flagstat command. Further 
using bowtie1 instead of bowtie2 for the alignment engine did not 
resolve the problem of these reads going missing.

Any ideas???

Thanks,

Josh

-- 
Josh Thackray
Laboratory Researcher III
Human Genetics Institute of NJ
Department of Genetics
Rutgers University




More information about the Bio-linux-list mailing list