Read quality report

Generate summary HTMl report with quality statistics of sequenced paired-end reads, including

  • Overall sequence quality
  • Per base sequence quality
  • GC content
  • Sequence length
  • Sequence duplication
  • Adapter content

The pipeline uses FastQC utility to generate quality reports for individual samples. Individual reports are aggregated into the summary HTML report using custom scripts.

Purpose

  • Quickly assess quality of sequencing run
  • Identify potential problems with downstream analysis - avoid sequencing artefacts
  • Important to properly set configuration of downstream trimming analysis

Required inputs

  • Sequenced paired-end reads from Illumina sequencer in gzipped fastq format.
    • each sample is represented by two gzipped fastq files
    • standard output files of paired-end sequencing
|-- reads/original
        |-- <sample_1>_R1.fastq.gz
        |-- <sample_1>_R2.fastq.gz
        |-- <sample_2>_R1.fastq.gz
        |-- <sample_2>_R2.fastq.gz

Generated outputs

  • Summary HTML table with quality statistics of sequenced reads of multiple samples
  • Individual FastQC reports

Example

How to run example:

cd /usr/local/snakelines/example/genomic

snakemake \
   --snakefile ../../snakelines.snake \
   --configfile config_quality_report.yaml \
   --use-conda

Example configuration:

sequencing: paired_end
samples:                            # List of sample categories to be analysed
    - name: example.*               # Regex expression of sample names to be analysed (reads/original/example.*_R1.fastq.gz)

report_dir: report/public/01-quality_report # Generated reports and essential output files would be stored there
threads: 16                                 # Number of threads to use in analysis

reads:
    report:                         # Summary reports of read characteristics to assess their quality
        quality_report:             # HTML summary report of read quality
            method: fastqc          # Supported values: fastqc
            read_types:             # List of preprocess steps for quality reports
                - original

Planned improvements

  • Aggregate quality statistics of multiple samples with the MultiQC