Welcome to SnakeLines¶
Motivation¶
With decreasing price of massive parallel sequencing technologies, more and more laboratories are utilizing resulting sequences of DNA fragments for genomic analysis. An substantial obstacle for interpretation is transforming of sequenced data into results interpretable by clinicians and researchers without computational background. Laboratories are generally using computational pipelines consisting of several bioinformatic tools.
We propose several computational pipelines for processing of paired-end Illumina reads; including mapping, assembly, variant calling, viral identification, RNA-seq and metagenomics analysis. All provided pipelines are embedded into virtual environments that ensures isolation of required resources from host operating system, rapid deployment and reproducibilty of results accross different platforms.
How to execute pipelines¶
Source code of the SnakeLines pipelines can be downloaded from the Github repository. The documentation is accessible from the ReadTheDocs. See the Quick start or the Running SnakeLines section of the documentation to see instructions for execution of pipelines.
Why use SnakeLines¶
Several workflow management systems have been proposed to date, most notably Galaxy and Snakemake. SnakeLines extends traditional SnakeMake rules with following benefits:
Ready-to-use pipelines¶
SnakeLines contains wide set of computational pipelines that are ready to use immediately after downloading SnakeLines sources. In addition to standard secondary analysis, such as mapping and assembly, SnakeMake facilitate:
- identification of variation in mapped reads
- metagenomics analysis
- transcriptomics
- methylation profiles
Reporting¶
Each step of analysis is supported by graphical or tabular reports. Essential output and report files are stored in separate, report directory. In addition to reports that are generated for each sample individually, SnakeLines store also aggregated reports to examine quality of several samples at once.
No need to install additional tools¶
SnakeLines requires only Snakemake and Miniconda to be installed. The rest of bioinformatic tools required for analysis are compiled automatically. Tools are installed into separated virtual environments. This way, they do not break any installation or dependencies on the operating system.
Easily configurable¶
Each pipeline is parametrized in a single, YAML based configuration file. Configuration is useful overview over analysis, since every step of the analysis is configured there in sequential order. You may easily swap tools or change their parameters by adjusting the configuration file.
Easily extensible¶
Structure of the SnakeLines sources allows to easily extend existing pipelines, or just replace method in some step with custom solution.
Contents¶
User documentation
Developer notes
Pipelines
- Read quality report
- Preprocess sequencing reads
- Map reads to a reference genome
- Download reference and map reads
- Assemble reads
- Call variants
- Analyse methylation profiles
- Analyse gene expression
- Analyse microbial composition
- Identify viruses
- Download reference
- Infer phylogeny between genomic sequences
Documentation
- Rules
- Custom - Prepare Description File
- Entrez - Download Sequences By Genbank Id
- Picard - Prepare Dict Index
- Samtools - Prepare Fai Index
- Fasta_Summary - Blast
- Fasta_Summary - Coverage
- Fasta_Summary - Mapped Reads
- Fasta_Summary - Copy Contigs
- Fasta_Summary - Summarize Annotations
- Undocumented rules
- Custom - Alpha Diversity
- Bandage - Visualise Contig Overlaps
- Spades - Assemble Reads Into Contigs
- Unicycler - Assemble Reads Into Contigs
- Gatk - Fix Vcf Header
- Tabix - Index Vcf
- Picard - Bed To Interval List
- Gatk - Collect Variant Calling Metrics
- Custom - Summary Report
- Vardict - Create Wgs Bed File
- Vardict - Prepare Bed File
- Vardict - Call Germline Variants
- Vardict - Test Strand Bias
- Vardict - Tsb To Vcf
- Picard - Mark Duplicates
- Samtools - Sort Mapped Reads
- Bowtie2 - Map Reads To Reference
- Bwa - Map Reads To Reference
- Bismark - Map Methyl Seq Reads To Reference
- Bismark - Prepare Index
- Bwa - Prepare Index
- Bowtie2 - Prepare Index
- Qualimap - Mapping Quality Report Across Reference
- Qualimap - Mapping Quality Report Across Panel
- Qualimap - Summarize Quality Reports
- Bismark - Methylation Extractor
- Bismark - Summary Report
- Samtools - Bam Index
- Fastqc - Html Summary For Joined Reads
- Fastqc - Html Summary For Paired Reads
- Fastqc - Quality Report
- Bowtie2 - Filter Reads From Reference
- Fastuniq - Deduplicate Reads
- Seqtk - Subsample Reads
- Pear - Join Read Pairs
- Pear - Concat Joined With Single
- Trimmomatic - Trim Reads
- Pigz - Unzip File
- Salmon - Classify Reads
- Salmon - Create Reference Index
- Salmon - Prepare For Krona
- Custom - Visualise Taxonomic Counts As Barplot
- Custom - Summarize Taxonomic Counts Into Tsv Table
- Custom - Extract Taxonomic Level From Taxonomic Table
- Krona - Single Sample Pieplot
- Krona - Multi Sample Pieplot
- Sklearn - Pca Comparison
- Custom - Export De Genes For Revigo
- Custom - Summarize Transcriptomic Counts Into Tsv Table
- Custom - Visualise Transcriptomic Counts In Html Table
- Fasta_Summary - Blast
- Fasta_Summary - Coverage
- Fasta_Summary - Mapped Reads
- Fasta_Summary - Copy Contigs
- Fasta_Summary - Summarize Annotations
- Undocumented rules
- Fast_Virome_Explorer - Estimate Virome Composition
- Custom - Fill Na Values With Virusnames
- Custom - Convert To Tpm Metric
- Custom - Convert To Krona
- Metaxa2 - Classify Reads
- Metaxa2 - Create Reference Index
- Metaxa2 - Summarize Classification
- Metaxa2 - Prepare For Krona
- Rdp - Classify Reads
- Rdp - Prepare For Krona
- Undocumented rules
- Virfinder - Identify Viral Sequences
- Blast - Find Homologues
- Blast - Annotate With Taxonomy
- Blast - Prepare Reference Index For Nucleotide
- Blast - Prepare Reference Index For Protein
- Edger - Identify Transcripts With Changed Expression
- Custom - Filter Significantly Expressed Transcripts