Map reads to a reference genome¶

Align sequenced reads to a reference genome to find the most probable genomic positions of origin. Reference genome may represent DNA of a single organism, multiple organisms, transcripts…

Purpose¶

Identify the source of sequenced material
Assess composition of sequenced material
Removal of a known contamination
Purification of sequenced reads - select only reads from the target organism
Essential step for several downstream analysis types - variant calling, transcriptomics …

Required inputs¶

Sequenced reads in gzipped fastq format.
- each sample is represented by two gzipped fastq files
- standard output files of paired-end sequencing
Reference genome in fasta format

|-- reads/original
        |-- <sample_1>_R1.fastq.gz
        |-- <sample_1>_R2.fastq.gz
        |-- <sample_2>_R1.fastq.gz
        |-- <sample_2>_R2.fastq.gz
|-- reference/<reference>
        |-- <reference>.fa

Generated outputs¶

Mapped reads in sorted, indexed BAM format with marked duplicates
Reports to assess mapping quality
- individual report for each sample
- summary report for comparison of multiple samples

Example¶

How to run example:

cd /usr/local/snakelines/example/mhv

snakemake \
   --snakefile ../../snakelines.snake \
   --configfile config_mapping.yaml

Example configuration:

Planned improvements¶

Aggregate quality statistics of preprocess and mapping with the MultiQC
Realignment postprocess step to refine alignment in indel regions

Map reads to a reference genome¶

Purpose¶

Required inputs¶

Generated outputs¶

Example¶

Planned improvements¶

Included pipelines¶

Table of Contents

Previous topic

Next topic

This Page