Parabricks
On This Page
Overview
This tutorial will guide you through using Clara Parabricks v4.5.1 on the Betty cluster to run GPU-accelerated genomics workflows, including alignment and variant calling. We’ll walk through how to enter the container environment, prepare sample data, and run two basic tools: fq2bam
and haplotypecaller
.
Pre-requisites
You should be comfortable with the NVIDIA Enroot environment with an NGC API Key setup please see the tutorial here
Step 1: Setup Data
We’ll use Clara Parabricks sample data to test the workflow. Please select a folder to run this project from with about 25GB available to use.
export PROJECT_DIR=$HOME/parabricks_test
Now we can create the folder, download the data, and create a directory to store the results.
mkdir -p $PROJECT_DIR
pushd $PROJECT_DIR
# Download sample data
wget -O parabricks_sample.tar.gz \
"https://s3.amazonaws.com/parabricks.sample/parabricks_sample.tar.gz"
# Extract the contents
tar xvf parabricks_sample.tar.gz
# Create an output directory for results
mkdir outputdir
Step 2: Launch Container
From your $PROJECT_DIR, ensure your input data is in place before entering the container.
To run Clara Parabricks with full access to your project and home directories, use the following srun
command:
pushd $PROJECT_DIR
srun --container-image='nvcr.io/nvidia/clara/clara-parabricks:4.5.1-1' \
--cpus-per-gpu=16 \
--mem-per-gpu=128G \
--gpus=1 \
--container-mounts=/tmp/$(id -u):/opt/nim/.cache,$PROJECT_DIR:$PROJECT_DIR \
--container-mount-home \
--pty bash
Wait for the container to download and launch, then you will be placed in a bash shell inside of the container.
Notes
- This command allocates a B200 GPU on Betty: 1 B200 GPU, 128 GB RAM, and 16 CPUs.
- The
--container-mounts
flag ensures your project and temporary cache directories are available inside the container. --container-mount-home
gives you access to your home directory as well.
Step 3: Align Reads with fq2bam
The fq2bam
tool performs alignment, sorting, and duplicate marking from paired-end FASTQ files to a BAM file.
Inside the Parabricks container:
pbrun fq2bam \
--ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz \
--out-bam outputdir/fq2bam_output.bam
Expected time: ~2 minutes
Output: outputdir/fq2bam_output.bam
Step 4: Call Variants with haplotypecaller
Use the haplotypecaller
tool to generate variant calls (VCF) from the BAM file.
Still inside the container:
pbrun haplotypecaller \
--ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--in-bam outputdir/fq2bam_output.bam \
--out-variants outputdir/variants.vcf
Output: outputdir/variants.vcf
This step is typically even faster than fq2bam
.
Summary
You’ve now run a full end-to-end GPU-accelerated variant calling pipeline using Clara Parabricks on Betty:
- Launched the Parabricks container using SLURM
- Aligned FASTQ files to a reference genome with
fq2bam
- Called variants with
haplotypecaller
For additional tools and pipelines, refer to the official Clara Parabricks documentation.