Quality Control (QC)
This section describes the quality control (QC) steps in the Poppy pipeline. Utilizing the Hydra-Genetics qc module (v0.4.1), it executes comprehensive metrics generation across sequencing reads and alignment files. Tools such as FastQC, Mosdepth, Picard, and Samtools generate individual metrics that are subsequently aggregated into a single interactive HTML report by MultiQC for easy assessment.
Input Files
The QC module relies on outputs generated by the pre-alignment and alignment modules:
| Input | Source |
|---|---|
alignment/samtools_merge_bam/{sample}_{type}.bam |
Alignment module |
| Raw FASTQ files (for FastQC) | Defined in units.tsv |
prealignment/fastp_pe/{sample}_{type}_{flowcell}_{lane}_fastp.json |
Pre-alignment module |
Workflow Steps
1. FastQC (Raw Read QC)
FastQC runs on the raw FASTQ sequences to provide basic read quality, adapter content, and sequence composition metrics before trimming.
| Item | Value |
|---|---|
| Container | hydragenetics/fastqc:0.11.9 |
| Output | qc/fastqc/{sample}_{type}_{flowcell}_{lane}_{barcode}_{read}_fastqc.zip |
2. Mosdepth (Coverage & Target QC)
Mosdepth calculates coverage statistics, explicitly looking at mapping qualities within the regions defined by the design BED file.
| Item | Value |
|---|---|
| Container | hydragenetics/mosdepth:0.3.2 |
| Outputs | qc/mosdepth_bed/{sample}_{type}.mosdepth.summary.txt |
qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz |
|
qc/mosdepth_bed/{sample}_{type}.regions.bed.gz |
|
qc/mosdepth_bed/{sample}_{type}.thresholds.bed.gz |
Note: In Poppy, the thresholds are configured specifically in config.yaml to evaluate depths at 100x, 200x, and 1000x.
3. Picard Metrics (Alignment QC)
Several Picard (v2.25.0) tools are executed simultaneously to assess different alignment statistics:
CollectAlignmentSummaryMetrics: Details mapping rates and error rates.CollectDuplicationMetrics: Measures sequence duplications.CollectGcBiasMetrics: Highlights coverage bias over GC-rich or poor regions.CollectHsMetrics: Specific metrics for hybrid selection (targeted sequencing capabilities), determining on-target rates.CollectInsertSizeMetrics: Calculates the distribution of insert sizes across read pairs.
| Item | Value |
|---|---|
| Container | hydragenetics/picard:2.25.0 |
| Outputs | qc/picard_collect_{metric_tool}/{sample}_{type}.{metric_extension} |
4. Samtools Stats
A general overarching statistics summary of the alignment BAM is done via samtools.
| Item | Value |
|---|---|
| Output | qc/samtools_stats/{sample}_{type}.samtools-stats.txt |
5. MultiQC (Report Aggregation)
All generated logs and metrics arrays (including the Fastp quality metrics generated previously during Pre-alignment) are systematically compiled using MultiQC, configured according to the rules and modules defined in config_multiqc.yaml (see below).
| Item | Value |
|---|---|
| Container | hydragenetics/multiqc:1.21 |
| Output | qc/multiqc/multiqc_DNA.html (exported to qc/multiqc_DNA.html by end user) |
Expand to view current MultiQC config file.
```yaml
sp:
fastp:
fn: "*.json"
extra_fn_clean_exts:
- ".duplication_metrics"
mosdepth_config:
include_contigs:
- "chr*"
exclude_contigs:
- "*_alt"
- "*_decoy"
- "*_random"
- "chrUn*"
- "HLA*"
- "chrM"
- "chrEBV"
- "MT"
- "NC_007605"
- "GL000*"
general_stats_coverage:
- 100
- 200
- 1000
table_columns_visible:
FastQC:
percent_duplicates: False
percent_gc: False
avg_sequence_length: False
percent_fails: False
total_sequences: False
fastp:
pct_adapter: False
after_filtering_q30_rate: False
after_filtering_q30_bases: False
filtering_result_passed_filter_reads: False
after_filtering_gc_content: False
pct_surviving: False
pct_duplication: False
mosdepth:
median_coverage: True
mean_coverage: False
1_x_pc: False
5_x_pc: False
10_x_pc: False
20_x_pc: False
30_x_pc: False
50_x_pc: False
100_x_pc: True
200_x_pc: True
1000_x_pc: False
"Picard: HsMetrics":
FOLD_ENRICHMENT: False
MEDIAN_TARGET_COVERAGE: False
PCT_TARGET_BASES_30X: False
"Picard: InsertSizeMetrics":
summed_median: False
summed_mean: True
"Picard: Mark Duplicates":
PERCENT_DUPLICATION: True
"Samtools: stats":
error_rate: False
non-primary_alignments: False
reads_mapped: False
reads_mapped_percent: True
reads_properly_paired_percent: True
reads_MQ0_percent: False
raw_total_sequences: True #only on bedfile not total of fastq, bases on target only
# Custom columns to general stats
multiqc_cgs:
"Picard: HsMetrics":
FOLD_80_BASE_PENALTY:
title: "Fold80"
description: "Fold80 penalty from picard hs metrics"
min: 1
max: 3
scale: "RdYlGn-rev"
format: "{:.1f}"
PCT_SELECTED_BASES:
title: "Bases on Target"
description: "On+Near Bait Bases / PF Bases Aligned from Picard HsMetrics"
format: "{:.2%}"
ZERO_CVG_TARGETS_PCT:
title: "Target bases with zero coverage [%]"
description: "Target bases with zero coverage [%] from Picard HsMetrics"
min: 0
max: 100
scale: "RdYlGn-rev"
format: "{:.2%}"
"Samtools: stats":
average_quality:
title: "Average Quality"
description: "Ratio between the sum of base qualities and total length from Samtools stats"
min: 0
max: 60
scale: "RdYlGn"
table_columns_placement:
mosdepth:
median_coverage: 601
1_x_pc: 666
5_x_pc: 666
10_x_pc: 602
20_x_pc: 603
30_x_pc: 604
50_x_pc: 605
100_x_pc: 606
200_x_pc: 607
1000_x_pc: 608
"Samtools: stats":
raw_total_sequences: 500
reads_mapped: 501
reads_mapped_percent: 502
reads_properly_paired_percent: 503
average_quality: 504
error_rate: 555
reads_MQ0_percent: 555
non-primary_alignments: 555
"Picard: HsMetrics":
FOLD_ENRICHMENT: 888
MEDIAN_TARGET_COVERAGE: 888
PCT_TARGET_BASES_30X: 888
FOLD_80_BASE_PENALTY: 801
PCT_SELECTED_BASES: 800
ZERO_CVG_TARGETS_PCT: 803
"Picard: InsertSizeMetrics":
summed_median: 803
summed_mean: 803
"Picard: Mark Duplicates":
PERCENT_DUPLICATION: 802
Picard:
TOTAL_READS: 500
PCT_SELECTED_BASES: 801
FOLD_80_BASE_PENALTY: 802
PCT_PF_READS_ALIGNED: 888
summed_median: 888
PERCENT_DUPLICATION: 803
summed_mean: 804
STANDARD_DEVIATION: 805
ZERO_CVG_TARGETS_PCT: 888
MEDIAN_COVERAGE: 888
MEAN_COVERAGE: 888
SD_COVERAGE: 888
PCT_30X: 888
PCT_TARGET_BASES_30X: 888
FOLD_ENRICHMENT: 888
```
Key Output Files
| Output File | Description |
|---|---|
qc/multiqc_DNA.html |
The final aggregated pipeline QC HTML report |
qc/mosdepth_bed/{sample}_{type}.mosdepth.summary.txt |
Summary coverage statistics for Mosdepth |
qc/picard_collect_hs_metrics/{sample}_{type}.HsMetrics.txt |
Target enrichment metrics |
Configuration
The exact tools executed and parameters passed inside the Poppy pipeline are defined in config.yaml. The key parameters specific to the QC metrics block:
fastqc:
container: "docker://hydragenetics/fastqc:0.11.9"
mosdepth_bed:
container: "docker://hydragenetics/mosdepth:0.3.2"
thresholds: "100,200,1000"
extra: " --mapq 20 "
multiqc:
container: "docker://hydragenetics/multiqc:1.21"
reports:
DNA:
config: "{{POPPY_HOME}}/config/config_multiqc.yaml"
included_unit_types:
- T
- N
qc_files:
- "qc/fastqc/{sample}_{type}_{flowcell}_{lane}_{barcode}_{read}_fastqc.zip"
- "prealignment/fastp_pe/{sample}_{type}_{flowcell}_{lane}_{barcode}_fastp.json"
- "qc/mosdepth_bed/{sample}_{type}.mosdepth.summary.txt"
- "qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz"
- "qc/mosdepth_bed/{sample}_{type}.mosdepth.region.dist.txt"
- "qc/mosdepth_bed/{sample}_{type}.regions.bed.gz"
- "qc/mosdepth_bed/{sample}_{type}.thresholds.bed.gz"
- "qc/picard_collect_hs_metrics/{sample}_{type}.HsMetrics.txt"
- "qc/picard_collect_alignment_summary_metrics/{sample}_{type}.alignment_summary_metrics.txt"
- "qc/picard_collect_duplication_metrics/{sample}_{type}.duplication_metrics.txt"
- "qc/picard_collect_insert_size_metrics/{sample}_{type}.insert_size_metrics.txt"
- "qc/picard_collect_gc_bias_metrics/{sample}_{type}.gc_bias.summary_metrics"
- "qc/samtools_stats/{sample}_{type}.samtools-stats.txt"
picard_collect_alignment_summary_metrics:
container: "docker://hydragenetics/picard:2.25.0"
picard_collect_duplication_metrics:
container: "docker://hydragenetics/picard:2.25.0"
picard_collect_gc_bias_metrics:
container: "docker://hydragenetics/picard:2.25.0"
picard_collect_hs_metrics:
container: "docker://hydragenetics/picard:2.25.0"
picard_collect_insert_size_metrics:
container: "docker://hydragenetics/picard:2.25.0"
For the comprehensive configuration of Hydra-Genetics QC tools, see the full config.yaml.