Alignment

This section describes the alignment steps in the Poppy pipeline. It takes the trimmed and merged FASTQ files produced by the pre‑alignment module and produces a single, sorted, duplicate‑marked BAM file per sample that is used by downstream modules (SNV/indel calling, CNV/SV detection, and QC).

All alignment rules are provided by the Hydra‑Genetics alignment module (v0.5.1).

Alignment Workflow

Input Files

The alignment module takes the trimmed and merged FASTQ files produced by the pre‑alignment module as input.

Input File	Source
`prealignment/merged/{sample}_{type}_fastq1.fastq.gz`	Pre‑alignment module
`prealignment/merged/{sample}_{type}_fastq2.fastq.gz`	Pre‑alignment module

Workflow Steps

1. BWA‑MEM — Read Alignment

Each FASTQ pair (one per flowcell / lane / barcode) is aligned to the reference genome using BWA‑MEM.

Item	Value
Container	`hydragenetics/bwa_mem:0.7.17`
Input	`prealignment/merged/{sample}_{type}_{read}.fastq.gz`
Output	`alignment/bwa_mem/{sample}_{type}_{flowcell}_{lane}_{barcode}.bam`

2. Samtools Merge (per‑lane) — Merge Lane BAMs

When a sample has been sequenced across multiple flowcells or lanes, the per‑lane BAM files are merged into a single unsorted BAM with samtools merge.

Item	Value
Output	`alignment/bwa_mem/{sample}_{type}.bam` (after sorting)

3. Samtools Extract Reads — Split by Chromosome

The merged BAM is split into per‑chromosome BAM files. This allows duplicate marking to run in parallel for each chromosome, significantly reducing wall‑clock time.

Item	Value
Output	`alignment/samtools_extract_reads/{sample}_{type}_{chr}.bam`

4. Picard MarkDuplicates — Duplicate Marking

Duplicate reads are flagged independently per chromosome using Picard MarkDuplicates.

Item	Value
Container	`hydragenetics/picard:2.25.0`
Output	`alignment/picard_mark_duplicates/{sample}_{type}_{chr}.bam`

5. Samtools Merge — Combine Chromosomes

The per‑chromosome, duplicate‑marked BAM files are merged back into a single BAM.

Item	Value
Output	`alignment/samtools_merge_bam/{sample}_{type}.bam` (after sorting)

6. Samtools Index — BAM Indexing

The final merged BAM is indexed so that it can be efficiently queried by downstream tools.

Item	Value
Output	`alignment/samtools_merge_bam/{sample}_{type}.bam.bai`

Key Output Files

Output File	Description
`alignment/samtools_merge_bam/{sample}_{type}.bam`	Merged, sorted, duplicate‑marked BAM
`alignment/samtools_merge_bam/{sample}_{type}.bam.bai`	BAM index

Downstream Consumers

The final BAM and its index are copied into the results/bam/ output directory as final pipeline outputs:

bam/{sample}_{type}.bam — Final merged BAM
bam/{sample}_{type}.bam.bai — BAM index

They are also used by multiple downstream modules:

SNV / Indels — GATK Mutect2, VarDict
CNV / SV — CNVkit, GATK CNV, Pindel
QC — Mosdepth, Picard CollectHsMetrics, samtools stats, and others

Configuration

The relevant sections in config.yaml:

bwa_mem:
  amb: "/path/to/reference.amb"
  ann: "/path/to/reference.ann"
  bwt: "/path/to/reference.bwt"
  pac: "/path/to/reference.pac"
  sa: "/path/to/reference.sa"
  container: "docker://hydragenetics/bwa_mem:0.7.17"

picard_mark_duplicates:
  container: "docker://hydragenetics/picard:2.25.0"

See the full config.yaml for all available settings.