poppy-light

Poppy Pipeline Overview

Poppy uses Hydra-Genetics modules to analyse hybrid capture short-read sequencing data from the Genomic Medicine Sweden myeloid gene panels.

Poppy Pipeline Overview

Hydra-Genetics Module Versions

Module Version
alignment v0.5.1
annotation v1.0.0
cnv_sv v3.1.0
filtering v0.3.0
prealignment v1.2.0
reports v1.1.1
snv_indels_gms v0.6.0
qc v0.4.1

Alignment

Description: Align reads to the reference genome using BWA‑MEM (Docker image hydragenetics/bwa_mem:0.7.17) and marks duplicates using Picard's MarkDuplicates (Docker image hydragenetics/picard:2.25.0). Key Outputs

Output File Description
alignment/samtools_merge_bam/{sample}_{type}.bam Merged, sorted, duplicate marked BAM
alignment/bam_index/{sample}_{type}.bam.bai BAM index

Annotation

Description: Annotate VCF files with VEP (Docker image hydragenetics/vep:111.0) and/or custom annotations such as artifact and background annotation from the reference pipeline. Key Outputs

Output File Description
snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vep_annotated.vcf.gz VEP‑annotated VCF
snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vep_annotated.artifact_annotated.background_annotated.vcf.gz VCF annotated with both artifact and background annotations from the reference pipeline

CNV / SV (cnv_sv)

Description: Detect copy‑number and structural variants using CNVkit and GATK. SVDB is used to merge and annotate the SV calls. Pindel is also used on a smaller set of genes to detect SVs. Key Outputs

Output File Description
cnv_sv/cnvkit_batch/{sample}/{sample}_{type}.cns CNVkit segmentation
cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.cnv_genes.filter.cnv_hard_filter.vcf.gz Filter‑hard‑filtered SV VCF
cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vcf.gz Normalized Pindel VCF

Filtering

Description: Apply hard/soft filters to germline and somatic VCFs (config files under config/). Key Outputs

Output File Description
filter_vcf/{sample}_{type}.filter.germline.vcf.gz Germline filtered VCF
filter_vcf/{sample}_{type}.filter.somatic.vcf.gz Somatic filtered VCF

Pre‑alignment

Description: Perform initial QC and trimming (FastQC, Fastp). Key Outputs

Output File Description
prealignment/fastp_pe/{sample}_{type}_fastp.json Fastp QC JSON
qc/fastqc/{sample}_{type}_fastqc.zip FastQC report archive

Reports

Description: Generate multi‑QC and HTML reports (MultiQC, custom HTML). Key Outputs

Output File Description
report/multiqc/multiqc_report.html Consolidated MultiQC report
report/html/report.html Final HTML pipeline report

SNV / Indels (snv_indels_gms)

Description: Call and annotate SNVs/indels using GATK Mutect2, VarDict, and VEP. Key Outputs

Output File Description
snv_indels/gatk_mutect2/{sample}_{type}.normalized.sorted.vep_annotated.filter.germline.bcftools_annotated.vcf.gz Final germline VCF
snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vep_annotated.filter.germline.vcf.gz Ensemble VCF (pre‑filter)

QC (qc)

Description: Comprehensive quality control using FastQC, Mosdepth, Picard, and MultiQC. Key Outputs

Output File Description
qc/mosdepth_bed/{sample}_{type}.mosdepth.summary.txt Mosdepth summary
qc/picard_collect_hs_metrics/{sample}_{type}.HsMetrics.txt Hybrid selection metrics
qc/multiqc_report.html MultiQC HTML report