Skip to content

Parabricks

Overview

This tutorial will guide you through using Clara Parabricks v4.5.1 on the Betty cluster to run GPU-accelerated genomics workflows, including alignment and variant calling. We’ll walk through how to enter the container environment, prepare sample data, and run two basic tools: fq2bam and haplotypecaller.

Pre-requisites

You should be comfortable with the NVIDIA Enroot environment with an NGC API Key setup please see the tutorial here

Step 1: Setup Data

We’ll use Clara Parabricks sample data to test the workflow. Please select a folder to run this project from with about 25GB available to use.

export PROJECT_DIR=$HOME/parabricks_test

Now we can create the folder, download the data, and create a directory to store the results.

mkdir -p $PROJECT_DIR
pushd $PROJECT_DIR

# Download sample data
wget -O parabricks_sample.tar.gz \
  "https://s3.amazonaws.com/parabricks.sample/parabricks_sample.tar.gz"

# Extract the contents
tar xvf parabricks_sample.tar.gz

# Create an output directory for results
mkdir outputdir

Step 2: Launch Container

From your $PROJECT_DIR, ensure your input data is in place before entering the container.

To run Clara Parabricks with full access to your project and home directories, use the following srun command:

pushd $PROJECT_DIR

srun --container-image='nvcr.io/nvidia/clara/clara-parabricks:4.5.1-1' \
     --cpus-per-gpu=16 \
     --mem-per-gpu=128G \
     --gpus=1 \
     --container-mounts=/tmp/$(id -u):/opt/nim/.cache,$PROJECT_DIR:$PROJECT_DIR \
     --container-mount-home \
     --pty bash

Wait for the container to download and launch, then you will be placed in a bash shell inside of the container.

Notes

  • This command allocates a B200 GPU on Betty: 1 B200 GPU, 128 GB RAM, and 16 CPUs.
  • The --container-mounts flag ensures your project and temporary cache directories are available inside the container.
  • --container-mount-home gives you access to your home directory as well.

Step 3: Align Reads with fq2bam

The fq2bam tool performs alignment, sorting, and duplicate marking from paired-end FASTQ files to a BAM file.

Inside the Parabricks container:

pbrun fq2bam \
  --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
  --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz \
  --out-bam outputdir/fq2bam_output.bam

Expected time: ~2 minutes
Output: outputdir/fq2bam_output.bam

Step 4: Call Variants with haplotypecaller

Use the haplotypecaller tool to generate variant calls (VCF) from the BAM file.

Still inside the container:

pbrun haplotypecaller \
  --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
  --in-bam outputdir/fq2bam_output.bam \
  --out-variants outputdir/variants.vcf

Output: outputdir/variants.vcf
This step is typically even faster than fq2bam.

Summary

You’ve now run a full end-to-end GPU-accelerated variant calling pipeline using Clara Parabricks on Betty:

  • Launched the Parabricks container using SLURM
  • Aligned FASTQ files to a reference genome with fq2bam
  • Called variants with haplotypecaller

For additional tools and pipelines, refer to the official Clara Parabricks documentation.