Call variants

Identify alternations of a genomic material of a sequenced individual with respect to a model, reference genome. That includes small point mutations (SNPs), small insertions and deletions.

Purpose

  • Identify genomic mutation that causes phenotypic trait of interest
  • Identify genomic differences between close species
  • Important for various genetic tests
    • inherited diseases
    • de novo mutations
    • oncological diseases - screening and assessing its type

Required inputs

  • Sequenced reads in gzipped fastq format.
    • each sample is represented by two gzipped fastq files
    • standard output files of paired-end sequencing
  • Reference genome in fasta format
|-- reads/original
        |-- <sample_1>_R1.fastq.gz
        |-- <sample_1>_R2.fastq.gz
        |-- <sample_2>_R1.fastq.gz
        |-- <sample_2>_R2.fastq.gz
|-- reference/<reference>
        |-- <reference>.fa

Generated outputs

  • List of identified variants in VCF file, filtered by user-defined criteria
  • Summary PDF report to assess quality of reads, mapping and variant calling

Example

How to run example:

cd /usr/local/snakelines/example/mhv

snakemake \
   --snakefile ../../snakelines.snake \
   --configfile config_variant_calling.yaml

Example configuration:

Planned improvements

  • Call variants only in pre-defined regions (BED file)
  • Aggregate quality statistics of preprocess and mapping with the MultiQC
  • Annotation of variants
  • Filtering of variants based on external annotations
  • Interpretation of variants