Read quality report¶
Generate summary HTMl report with quality statistics of sequenced paired-end reads, including
- Overall sequence quality
- Per base sequence quality
- GC content
- Sequence length
- Sequence duplication
- Adapter content
The pipeline uses FastQC utility to generate quality reports for individual samples. Individual reports are aggregated into the summary HTML report using custom scripts.
Purpose¶
- Quickly assess quality of sequencing run
- Identify potential problems with downstream analysis - avoid sequencing artefacts
- Important to properly set configuration of downstream trimming analysis
Required inputs¶
- Sequenced paired-end reads from Illumina sequencer in gzipped fastq format.
- each sample is represented by two gzipped fastq files
- standard output files of paired-end sequencing
|-- reads/original
|-- <sample_1>_R1.fastq.gz
|-- <sample_1>_R2.fastq.gz
|-- <sample_2>_R1.fastq.gz
|-- <sample_2>_R2.fastq.gz
Generated outputs¶
- Summary HTML table with quality statistics of sequenced reads of multiple samples
- Individual FastQC reports
Example¶
How to run example:
cd /usr/local/snakelines/example/genomic
snakemake \
--snakefile ../../snakelines.snake \
--configfile config_quality_report.yaml \
--use-conda
Example configuration:
sequencing: paired_end
samples: # List of sample categories to be analysed
- name: example.* # Regex expression of sample names to be analysed (reads/original/example.*_R1.fastq.gz)
report_dir: report/public/01-quality_report # Generated reports and essential output files would be stored there
threads: 16 # Number of threads to use in analysis
reads:
report: # Summary reports of read characteristics to assess their quality
quality_report: # HTML summary report of read quality
method: fastqc # Supported values: fastqc
read_types: # List of preprocess steps for quality reports
- original