logo Quality Control (QC)

This section describes the quality control (QC) steps in the Poppy pipeline. Utilizing the Hydra-Genetics qc module (v0.4.1), it executes comprehensive metrics generation across sequencing reads and alignment files. Tools such as FastQC, Mosdepth, Picard, and Samtools generate individual metrics that are subsequently aggregated into a single interactive HTML report by MultiQC for easy assessment.

Quality Control Workflow


Input Files

The QC module relies on outputs generated by the pre-alignment and alignment modules:

Input Source
alignment/samtools_merge_bam/{sample}_{type}.bam Alignment module
Raw FASTQ files (for FastQC) Defined in units.tsv
prealignment/fastp_pe/{sample}_{type}_{flowcell}_{lane}_fastp.json Pre-alignment module

Workflow Steps

1. FastQC (Raw Read QC)

FastQC runs on the raw FASTQ sequences to provide basic read quality, adapter content, and sequence composition metrics before trimming.

Item Value
Container hydragenetics/fastqc:0.11.9
Output qc/fastqc/{sample}_{type}_{flowcell}_{lane}_{barcode}_{read}_fastqc.zip

2. Mosdepth (Coverage & Target QC)

Mosdepth calculates coverage statistics, explicitly looking at mapping qualities within the regions defined by the design BED file.

Item Value
Container hydragenetics/mosdepth:0.3.2
Outputs qc/mosdepth_bed/{sample}_{type}.mosdepth.summary.txt
qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz
qc/mosdepth_bed/{sample}_{type}.regions.bed.gz
qc/mosdepth_bed/{sample}_{type}.thresholds.bed.gz

Note: In Poppy, the thresholds are configured specifically in config.yaml to evaluate depths at 100x, 200x, and 1000x.

3. Picard Metrics (Alignment QC)

Several Picard (v2.25.0) tools are executed simultaneously to assess different alignment statistics:

  • CollectAlignmentSummaryMetrics: Details mapping rates and error rates.
  • CollectDuplicationMetrics: Measures sequence duplications.
  • CollectGcBiasMetrics: Highlights coverage bias over GC-rich or poor regions.
  • CollectHsMetrics: Specific metrics for hybrid selection (targeted sequencing capabilities), determining on-target rates.
  • CollectInsertSizeMetrics: Calculates the distribution of insert sizes across read pairs.
Item Value
Container hydragenetics/picard:2.25.0
Outputs qc/picard_collect_{metric_tool}/{sample}_{type}.{metric_extension}

4. Samtools Stats

A general overarching statistics summary of the alignment BAM is done via samtools.

Item Value
Output qc/samtools_stats/{sample}_{type}.samtools-stats.txt

5. MultiQC (Report Aggregation)

All generated logs and metrics arrays (including the Fastp quality metrics generated previously during Pre-alignment) are systematically compiled using MultiQC, configured according to the rules and modules defined in config_multiqc.yaml (see below).

Item Value
Container hydragenetics/multiqc:1.21
Output qc/multiqc/multiqc_DNA.html (exported to qc/multiqc_DNA.html by end user)
Expand to view current MultiQC config file.
```yaml
sp:
  fastp:
    fn: "*.json"

extra_fn_clean_exts:
  - ".duplication_metrics"

mosdepth_config:
  include_contigs:
    - "chr*"
  exclude_contigs:
    - "*_alt"
    - "*_decoy"
    - "*_random"
    - "chrUn*"
    - "HLA*"
    - "chrM"
    - "chrEBV"
    - "MT"
    - "NC_007605"
    - "GL000*"

  general_stats_coverage:
    - 100
    - 200
    - 1000

table_columns_visible:
  FastQC:
    percent_duplicates: False
    percent_gc: False
    avg_sequence_length: False
    percent_fails: False
    total_sequences: False
  fastp:
    pct_adapter: False
    after_filtering_q30_rate: False
    after_filtering_q30_bases: False
    filtering_result_passed_filter_reads: False
    after_filtering_gc_content: False
    pct_surviving: False
    pct_duplication: False
  mosdepth:
    median_coverage: True
    mean_coverage: False
    1_x_pc: False
    5_x_pc: False
    10_x_pc: False
    20_x_pc: False
    30_x_pc: False
    50_x_pc: False
    100_x_pc: True
    200_x_pc: True
    1000_x_pc: False
  "Picard: HsMetrics":
    FOLD_ENRICHMENT: False
    MEDIAN_TARGET_COVERAGE: False
    PCT_TARGET_BASES_30X: False
  "Picard: InsertSizeMetrics":
    summed_median: False
    summed_mean: True
  "Picard: Mark Duplicates":
    PERCENT_DUPLICATION: True
  "Samtools: stats":
    error_rate: False
    non-primary_alignments: False
    reads_mapped: False
    reads_mapped_percent: True
    reads_properly_paired_percent: True
    reads_MQ0_percent: False
    raw_total_sequences: True #only on bedfile not total of fastq, bases on target only

# Custom columns to general stats
multiqc_cgs:
  "Picard: HsMetrics":
    FOLD_80_BASE_PENALTY:
      title: "Fold80"
      description: "Fold80 penalty from picard hs metrics"
      min: 1
      max: 3
      scale: "RdYlGn-rev"
      format: "{:.1f}"
    PCT_SELECTED_BASES:
      title: "Bases on Target"
      description: "On+Near Bait Bases / PF Bases Aligned from Picard HsMetrics"
      format: "{:.2%}"
    ZERO_CVG_TARGETS_PCT:
      title: "Target bases with zero coverage [%]"
      description: "Target bases with zero coverage [%] from Picard HsMetrics"
      min: 0
      max: 100
      scale: "RdYlGn-rev"
      format: "{:.2%}"
  "Samtools: stats":
    average_quality:
      title: "Average Quality"
      description: "Ratio between the sum of base qualities and total length from Samtools stats"
      min: 0
      max: 60
      scale: "RdYlGn"

table_columns_placement:
  mosdepth:
    median_coverage: 601
    1_x_pc: 666
    5_x_pc: 666
    10_x_pc: 602
    20_x_pc: 603
    30_x_pc: 604
    50_x_pc: 605
    100_x_pc: 606
    200_x_pc: 607
    1000_x_pc: 608
  "Samtools: stats":
    raw_total_sequences: 500
    reads_mapped: 501
    reads_mapped_percent: 502
    reads_properly_paired_percent: 503
    average_quality: 504
    error_rate: 555
    reads_MQ0_percent: 555
    non-primary_alignments: 555
  "Picard: HsMetrics":
    FOLD_ENRICHMENT: 888
    MEDIAN_TARGET_COVERAGE: 888
    PCT_TARGET_BASES_30X: 888
    FOLD_80_BASE_PENALTY: 801
    PCT_SELECTED_BASES: 800
    ZERO_CVG_TARGETS_PCT: 803
  "Picard: InsertSizeMetrics":
    summed_median: 803
    summed_mean: 803
  "Picard: Mark Duplicates":
    PERCENT_DUPLICATION: 802
  Picard:
    TOTAL_READS: 500
    PCT_SELECTED_BASES: 801
    FOLD_80_BASE_PENALTY: 802
    PCT_PF_READS_ALIGNED: 888
    summed_median: 888
    PERCENT_DUPLICATION: 803
    summed_mean: 804
    STANDARD_DEVIATION: 805
    ZERO_CVG_TARGETS_PCT: 888
    MEDIAN_COVERAGE: 888
    MEAN_COVERAGE: 888
    SD_COVERAGE: 888
    PCT_30X: 888
    PCT_TARGET_BASES_30X: 888
    FOLD_ENRICHMENT: 888

```

Key Output Files

Output File Description
qc/multiqc_DNA.html The final aggregated pipeline QC HTML report
qc/mosdepth_bed/{sample}_{type}.mosdepth.summary.txt Summary coverage statistics for Mosdepth
qc/picard_collect_hs_metrics/{sample}_{type}.HsMetrics.txt Target enrichment metrics

Configuration

The exact tools executed and parameters passed inside the Poppy pipeline are defined in config.yaml. The key parameters specific to the QC metrics block:

fastqc:
  container: "docker://hydragenetics/fastqc:0.11.9"

mosdepth_bed:
  container: "docker://hydragenetics/mosdepth:0.3.2"
  thresholds: "100,200,1000"
  extra: " --mapq 20 "

multiqc:
  container: "docker://hydragenetics/multiqc:1.21"
  reports:
    DNA:
      config: "{{POPPY_HOME}}/config/config_multiqc.yaml"
      included_unit_types:
        - T
        - N
      qc_files:
        - "qc/fastqc/{sample}_{type}_{flowcell}_{lane}_{barcode}_{read}_fastqc.zip"
        - "prealignment/fastp_pe/{sample}_{type}_{flowcell}_{lane}_{barcode}_fastp.json"
        - "qc/mosdepth_bed/{sample}_{type}.mosdepth.summary.txt"
        - "qc/mosdepth_bed/{sample}_{type}.per-base.bed.gz"
        - "qc/mosdepth_bed/{sample}_{type}.mosdepth.region.dist.txt"
        - "qc/mosdepth_bed/{sample}_{type}.regions.bed.gz"
        - "qc/mosdepth_bed/{sample}_{type}.thresholds.bed.gz"
        - "qc/picard_collect_hs_metrics/{sample}_{type}.HsMetrics.txt"
        - "qc/picard_collect_alignment_summary_metrics/{sample}_{type}.alignment_summary_metrics.txt"
        - "qc/picard_collect_duplication_metrics/{sample}_{type}.duplication_metrics.txt"
        - "qc/picard_collect_insert_size_metrics/{sample}_{type}.insert_size_metrics.txt"
        - "qc/picard_collect_gc_bias_metrics/{sample}_{type}.gc_bias.summary_metrics"
        - "qc/samtools_stats/{sample}_{type}.samtools-stats.txt"

picard_collect_alignment_summary_metrics:
  container: "docker://hydragenetics/picard:2.25.0"

picard_collect_duplication_metrics:
  container: "docker://hydragenetics/picard:2.25.0"

picard_collect_gc_bias_metrics:
  container: "docker://hydragenetics/picard:2.25.0"

picard_collect_hs_metrics:
  container: "docker://hydragenetics/picard:2.25.0"

picard_collect_insert_size_metrics:
  container: "docker://hydragenetics/picard:2.25.0"

For the comprehensive configuration of Hydra-Genetics QC tools, see the full config.yaml.