Analyse methylation profiles¶

Identify genomic regions with and without methylation. The pipeline expects paired-end Illumina reads with the bisulfide conversion.

Purpose¶

Epigenetic marker for several diseases (e.g. oncology)
Compare between samples with different phenotype (e.g. tissues)

Required inputs¶

Sequenced paired-end reads from Illumina sequencer in gzipped fastq format.
- each sample is represented by two gzipped fastq files
- standard output files of paired-end sequencing
Reference genome in fasta format

|-- reads/original
        |-- <sample_1>_R1.fastq.gz
        |-- <sample_1>_R2.fastq.gz
        |-- <sample_2>_R1.fastq.gz
        |-- <sample_2>_R2.fastq.gz
|-- reference/<reference>
        |-- <reference>.fa

Generated outputs¶

Summary report of methylation profiles in sequenced samples

Example¶

How to run example:

cd /usr/local/snakelines/example/genomic

snakemake \
   --snakefile ../../snakelines.snake \
   --configfile config_methylseq.yaml \
   --use-conda

Example configuration:

sequencing: paired_end
samples:                            # List of sample categories to be analysed
    - name: example.*               # Regex expression of sample names to be analysed (reads/original/example.*_R1.fastq.gz)
      reference: mhv                # Reference genome for reads in the category (reference/mhv/mhv.fa)

report_dir: report/public/01-methyl-seq  # Generated reports and essential output files would be stored there
threads: 16                         # Number of threads to use in analysis

reads:                              # Prepare reads and quality reports for downstream analysis
    preprocess:                     # Pre-process of reads, eliminate sequencing artifacts, contamination ...

        trimmed:                    # Remove low quality parts of reads
            method: trimmomatic     # Supported values: trimmomatic
            temporary: False        # If True, generated files would be removed after successful analysis
            crop: 500               # Maximal number of bases in read to keep. Longer reads would be truncated.
            quality: 20             # Minimal average quality of read bases to keep (inside sliding window of length 5)
            headcrop: 20            # Number of bases to remove from the start of read
            minlen: 35              # Minimal length of trimmed read. Shorter reads would be removed.

        deduplicated:               # Remove fragments with the same sequence (PCR duplicated)
            method: fastuniq        # Supported values: fastuniq
            temporary: False        # If True, generated files would be removed after successful analysis

    report:                         # Summary reports of read characteristics to assess their quality
        quality_report:             # HTML summary report of read quality
            method: fastqc          # Supported values: fastqc
            read_types:             # List of preprocess steps for quality reports
                - original
                - trimmed
                - deduplicated

mapping:                            # Find the most similar genomic region to reads in reference (mapping process)
    mapper:                         # Method for mapping
        method: bismark             # Supported values: bowtie2, bwa, bismark
        temporary: True

    index:                          # Generate .bai index for mapped reads in .bam files
        method: samtools            # Supported values: samtools

    postprocess:                    # Successive steps to refine mapped reads
        sorted:
            method: samtools
            temporary: False

    report:                         # Summary reports of mapping process and results
        quality_report:             # HTML summary with quality of mappings
            method: qualimap        # Supported values: qualimap
            map_types:              # List of post-process steps for quality reports
                - sorted
        methylation:
            method: bismark

Planned improvements¶

Aggregate quality statistics of preprocess and mapping with the MultiQC
Include coverage tracks (Bismark can produce them as well)

Analyse methylation profiles¶

Purpose¶

Required inputs¶

Generated outputs¶

Example¶

Planned improvements¶

Included pipelines¶

Table of Contents

Previous topic

Next topic

This Page