• [Photo] Editor January 8, 2026
    Genomics teams often rely on static cloud tags (e.g., project, pipeline_run) and service-level metrics to monitor compute costs. These signals provide limited visibility into how workloads behave at the execution-layer. There's no default mapping from a pipeline task to an EC2 process, it is difficult to determine whether slowdowns are caused by application logic or infrastructure constraints such as disk or network I/O.

    This guide analyzes a real-world RNA-seq pipeline that was initially assumed to be memory- or compute-bound. By correlating pipeline run identifiers with kernel-level execution data, the team identified a different root cause: sustained disk and network I/O saturation. The pipeline consistently ran for more than three hours per sample. Cloud metrics showed high memory reservations, leading the team to vertically scale the infrastructure:
    • Baseline: r6i.8xlarge (256 GB RAM, EBS-backed)
    • Scaled: r6i.16xlarge (512 GB RAM, EBS-backed)
    Costs doubled, but runtime did not improve. By correlating the specific pipeline_run ID to kernel-level traces, using Tracer (https://www.tracer.cloud), the team isolated the following metrics:
    • CPU utilization remained below ~25%
    • Peak memory usage stayed well below requested limits
    • Disk and network throughput were saturated for large portions of the run
    • STAR frequently stalled while waiting on data rather than compute
    The team evaluated newer, memory-optimized instance families with improved CPU generation, memory bandwidth, and network characteristics:
    • r7a.12xlarge: ~33% faster runtime at ~37% lower cost
    • r8i.8xlarge: near-baseline runtime at ~61% lower cost
    After the change:
    • Runtime decreased from 3+ hours to ~2 hours (~30% faster)
    • Cost per pipeline dropped by 36-60%, depending on configuration
    Effective optimization needs more signals. Tracing execution from pipeline-level identifiers down to kernel behavior, helps teams avoid paying for unused resources and select infrastructure that matches how workloads actually run.

Discussion forums: Using Tracer to Diagnose I/O Bottlenecks in Genomics Pipelines: Correlating Pipeline Metadata with Kernel Traces [Promoted]

Expanded view | Monitor forum | Save place

Start a new thread:

You have to be logged in to post a reply.

© 1998-2025 Scilico, LLC. All rights reserved.