Using Tracer to Diagnose I/O Bottlenecks in Genomics Pipelines: Correlating Pipeline Metadata with Kernel Traces [Promoted]

Editor • January 8, 2026
Using Tracer to Diagnose I/O Bottlenecks in Genomics Pipelines: Correlating Pipeline Metadata with Kernel Traces [Promoted]
Genomics teams often rely on static cloud tags (e.g., project, pipeline_run) and service-level metrics to monitor compute costs. These signals provide limited visibility into how workloads behave at the execution-layer. There's no default mapping from a pipeline task to an EC2 process, it is difficult to determine whether slowdowns are caused by application logic or infrastructure constraints such as disk or network I/O.

This guide analyzes a real-world RNA-seq pipeline that was initially assumed to be memory- or compute-bound. By correlating pipeline run identifiers with kernel-level execution data, the team identified a different root cause: sustained disk and network I/O saturation. The pipeline consistently ran for more than three hours per sample. Cloud metrics showed high memory reservations, leading the team to vertically scale the infrastructure:

Baseline: r6i.8xlarge (256 GB RAM, EBS-backed)

Scaled: r6i.16xlarge (512 GB RAM, EBS-backed)

Costs doubled, but runtime did not improve. By correlating the specific pipeline_run ID to kernel-level traces, using Tracer (https://www.tracer.cloud), the team isolated the following metrics:

CPU utilization remained below ~25%

Peak memory usage stayed well below requested limits

Disk and network throughput were saturated for large portions of the run

STAR frequently stalled while waiting on data rather than compute

The team evaluated newer, memory-optimized instance families with improved CPU generation, memory bandwidth, and network characteristics:

r7a.12xlarge: ~33% faster runtime at ~37% lower cost

r8i.8xlarge: near-baseline runtime at ~61% lower cost

After the change:

Runtime decreased from 3+ hours to ~2 hours (~30% faster)

Cost per pipeline dropped by 36-60%, depending on configuration

Effective optimization needs more signals. Tracing execution from pipeline-level identifiers down to kernel behavior, helps teams avoid paying for unused resources and select infrastructure that matches how workloads actually run.

Discussion forums: Using Tracer to Diagnose I/O Bottlenecks in Genomics Pipelines: Correlating Pipeline Metadata with Kernel Traces [Promoted]

Expanded view | Monitor forum | Save place

Start a new thread:

You have to be to post a reply.