-
Genomics teams often rely on static cloud tags (e.g., project, pipeline_run) and service-level metrics to monitor compute costs. These signals provide limited visibility into how workloads behave at the execution-layer. There's no default mapping from a pipeline task to an EC2 process, it is difficult to determine whether slowdowns are caused by application logic or infrastructure constraints such as disk or network I/O.
This guide analyzes a real-world RNA-seq pipeline that was initially assumed to be memory- or compute-bound. By correlating pipeline run identifiers with kernel-level execution data, the team identified a different root cause: sustained disk and network I/O saturation. The pipeline consistently ran for more than three hours per sample. Cloud metrics showed high memory reservations, leading the team to vertically scale the infrastructure:- Baseline: r6i.8xlarge (256 GB RAM, EBS-backed)
- Scaled: r6i.16xlarge (512 GB RAM, EBS-backed)
- CPU utilization remained below ~25%
- Peak memory usage stayed well below requested limits
- Disk and network throughput were saturated for large portions of the run
- STAR frequently stalled while waiting on data rather than compute
- r7a.12xlarge: ~33% faster runtime at ~37% lower cost
- r8i.8xlarge: near-baseline runtime at ~61% lower cost
- Runtime decreased from 3+ hours to ~2 hours (~30% faster)
- Cost per pipeline dropped by 36-60%, depending on configuration
Discussion forums: Using Tracer to Diagnose I/O Bottlenecks in Genomics Pipelines: Correlating Pipeline Metadata with Kernel Traces [Promoted]
Expanded view | Monitor forum | Save place
Start a new thread:
You have to be logged in to post a reply.