Analyse methylation profiles¶
Identify genomic regions with and without methylation. The pipeline expects paired-end Illumina reads with the bisulfide conversion.
Purpose¶
- Epigenetic marker for several diseases (e.g. oncology)
- Compare between samples with different phenotype (e.g. tissues)
Required inputs¶
- Sequenced paired-end reads from Illumina sequencer in gzipped fastq format.
- each sample is represented by two gzipped fastq files
- standard output files of paired-end sequencing
- Reference genome in fasta format
|-- reads/original
|-- <sample_1>_R1.fastq.gz
|-- <sample_1>_R2.fastq.gz
|-- <sample_2>_R1.fastq.gz
|-- <sample_2>_R2.fastq.gz
|-- reference/<reference>
|-- <reference>.fa
Generated outputs¶
- Summary report of methylation profiles in sequenced samples
Example¶
How to run example:
cd /usr/local/snakelines/example/genomic
snakemake \
--snakefile ../../snakelines.snake \
--configfile config_methylseq.yaml \
--use-conda
Example configuration:
sequencing: paired_end
samples: # List of sample categories to be analysed
- name: example.* # Regex expression of sample names to be analysed (reads/original/example.*_R1.fastq.gz)
reference: mhv # Reference genome for reads in the category (reference/mhv/mhv.fa)
report_dir: report/public/01-methyl-seq # Generated reports and essential output files would be stored there
threads: 16 # Number of threads to use in analysis
reads: # Prepare reads and quality reports for downstream analysis
preprocess: # Pre-process of reads, eliminate sequencing artifacts, contamination ...
trimmed: # Remove low quality parts of reads
method: trimmomatic # Supported values: trimmomatic
temporary: False # If True, generated files would be removed after successful analysis
crop: 500 # Maximal number of bases in read to keep. Longer reads would be truncated.
quality: 20 # Minimal average quality of read bases to keep (inside sliding window of length 5)
headcrop: 20 # Number of bases to remove from the start of read
minlen: 35 # Minimal length of trimmed read. Shorter reads would be removed.
deduplicated: # Remove fragments with the same sequence (PCR duplicated)
method: fastuniq # Supported values: fastuniq
temporary: False # If True, generated files would be removed after successful analysis
report: # Summary reports of read characteristics to assess their quality
quality_report: # HTML summary report of read quality
method: fastqc # Supported values: fastqc
read_types: # List of preprocess steps for quality reports
- original
- trimmed
- deduplicated
mapping: # Find the most similar genomic region to reads in reference (mapping process)
mapper: # Method for mapping
method: bismark # Supported values: bowtie2, bwa, bismark
temporary: True
index: # Generate .bai index for mapped reads in .bam files
method: samtools # Supported values: samtools
postprocess: # Successive steps to refine mapped reads
sorted:
method: samtools
temporary: False
report: # Summary reports of mapping process and results
quality_report: # HTML summary with quality of mappings
method: qualimap # Supported values: qualimap
map_types: # List of post-process steps for quality reports
- sorted
methylation:
method: bismark
Planned improvements¶
- Aggregate quality statistics of preprocess and mapping with the MultiQC
- Include coverage tracks (Bismark can produce them as well)