Welcome to SnakeLines¶
Motivation¶
With decreasing price of massive parallel sequencing technologies, more and more laboratories are utilizing resulting sequences of DNA fragments for genomic analysis. An substantial obstacle for interpretation is transforming of sequenced data into results interpretable by clinicians and researchers without computational background. Laboratories are generally using computational pipelines consisting of several bioinformatic tools.
We propose several computational pipelines for processing of paired-end Illumina reads; including mapping, assembly, variant calling, viral identification, RNA-seq and metagenomics analysis. All provided pipelines are embedded into virtual environments that ensures isolation of required resources from host operating system, rapid deployment and reproducibilty of results accross different platforms.
How to execute pipelines¶
Source code of the SnakeLines pipelines can be downloaded from the Github repository. The documentation is accessible from the ReadTheDocs. See the Quick start or the Running SnakeLines section of the documentation to see instructions for execution of pipelines.
Why use SnakeLines¶
Several workflow management systems have been proposed to date, most notably Galaxy and Snakemake. SnakeLines extends traditional SnakeMake rules with following benefits:
Ready-to-use pipelines¶
SnakeLines contains wide set of computational pipelines that are ready to use immediately after downloading SnakeLines sources. In addition to standard secondary analysis, such as mapping and assembly, SnakeMake facilitate:
- identification of variation in mapped reads
- metagenomics analysis
- transcriptomics
- methylation profiles
Reporting¶
Each step of analysis is supported by graphical or tabular reports. Essential output and report files are stored in separate, report directory. In addition to reports that are generated for each sample individually, SnakeLines store also aggregated reports to examine quality of several samples at once.
No need to install additional tools¶
SnakeLines requires only Snakemake and Miniconda to be installed. The rest of bioinformatic tools required for analysis are compiled automatically. Tools are installed into separated virtual environments. This way, they do not break any installation or dependencies on the operating system.
Easily configurable¶
Each pipeline is parametrized in a single, YAML based configuration file. Configuration is useful overview over analysis, since every step of the analysis is configured there in sequential order. You may easily swap tools or change their parameters by adjusting the configuration file.
Easily extensible¶
Structure of the SnakeLines sources allows to easily extend existing pipelines, or just replace method in some step with custom solution.
Contents¶
- Read quality report
- Preprocess sequencing paired-end reads
- Map paired-end reads to a reference genome
- Download reference and map paired-end reads
- Assemble reads
- Call variants
- Analyse methylation profiles
- Analyse gene expression
- Analyse microbial composition
- Identify viruses
- Download reference
- Infer phylogeny between genomic sequences
- Analyse single-end Illumina samples
- Analyse variants from nanopore reads
- Variant detection in SARS-CoV-2
- Rules
- Undocumented rules
- Virfinder - Identify Viral Sequences
- Sklearn - Pca Comparison
- Custom - Summarize Transcriptomic Counts Into Tsv Table
- Custom - Export De Genes For Revigo
- Custom - Visualise Transcriptomic Counts In Html Table
- Krona - Single Sample Pieplot
- Krona - Multi Sample Pieplot
- Custom - Visualise Taxonomic Counts As Barplot
- Custom - Alpha Diversity
- Custom - Summarize Taxonomic Counts Into Tsv Table
- Custom - Extract Taxonomic Level From Taxonomic Table
- Fasta_Summary - Blast
- Fasta_Summary - Coverage
- Fasta_Summary - Mapped Reads
- Fasta_Summary - Copy Contigs
- Fasta_Summary - Summarize Annotations
- Undocumented rules
- Blast - Find Homologues
- Blast - Annotate With Taxonomy
- Blast - Prepare Reference Index For Nucleotide
- Blast - Prepare Reference Index For Protein
- Salmon - Classify Reads
- Salmon - Create Reference Index
- Salmon - Prepare For Krona
- Edger - Identify Transcripts With Changed Expression
- Edger - Convert Summary Tsv Table To Xlsx
- Custom - Filter Significantly Expressed Transcripts
- Fast_Virome_Explorer - Estimate Virome Composition
- Custom - Fill Na Values With Virusnames
- Custom - Convert To Tpm Metric
- Custom - Convert To Krona
- Metaxa2 - Classify Reads
- Metaxa2 - Create Reference Index
- Metaxa2 - Summarize Classification
- Metaxa2 - Prepare For Krona
- Rdp - Classify Reads
- Rdp - Prepare For Krona
- Bwa - Map Reads To Reference
- Bismark - Map Methyl Seq Reads To Reference
- Bowtie2 - Map Reads To Reference
- Quast - Quality Report For Assembled Contigs
- Bandage - Visualise Contig Overlaps
- Megahit - Assemble Reads Into Contigs
- Megahit - Generate Contig Graph
- Spades - Assemble Reads Into Contigs
- Unicycler - Assemble Reads Into Contigs
- Fastqc - Html Summary For Joined Reads
- Fastqc - Html Summary For Paired Reads
- Fastqc - Quality Report
- Pear - Join Read Pairs
- Pear - Concat Joined With Single
- Fastuniq - Deduplicate Reads
- Bowtie2 - Filter Reads From Reference
- Seqtk - Subsample Reads
- Trimmomatic - Trim Reads
- Bcftools - Build Consensus Sequence
- Qualimap - Mapping Quality Report Across Reference
- Qualimap - Mapping Quality Report Across Panel
- Qualimap - Summarize Quality Reports
- Bismark - Methylation Extractor
- Bismark - Summary Report
- Custom - Remove Overlapping Reads
- Custom - Infer Read Groups
- Picard - Mark Duplicates
- Samtools - Sort Mapped Reads
- Bamtools - Filter Fragments
- Bamtools - Merge Bams Into Single Bam File
- Samtools - Convert Sam To Bam
- Samtools - Bam Index
- Bowtie2 - Prepare Index
- Bwa - Prepare Index
- Bismark - Prepare Index
- Phylo - Visualise Phylogenetic Tree
- Msaviewer - Visualise Alignment
- Entrez - Download Sequences By Genbank Id
- Mafft - Align Sequences In Reference
- Custom - Prepare Description File
- Picard - Prepare Dict Index
- Samtools - Prepare Fai Index
- Iqtree - Infer Phylogeny
- Custom - Summary Report
- Gatk - Fix Vcf Header
- Tabix - Index Vcf
- Picard - Bed To Interval List
- Gatk - Collect Variant Calling Metrics
- Vardict - Create Wgs Bed File
- Vardict - Prepare Bed File
- Vardict - Call Germline Variants
- Vardict - Test Strand Bias
- Vardict - Tsb To Vcf
- Clair - Variant Call Mapped Reads
- Vcfcat - Merge Vcf Files
- Medaka - Variant Call Mapped Reads
- Freebayes - Variant Call Mapped Reads
- Pigz - Unzip File
- Minimap2 - Map Reads To Reference
- Bowtie2 - Map Reads To Reference
- Fastqc - Html Summary For Joined Reads
- Fastqc - Html Summary For Paired Reads
- Fastqc - Quality Report
- Bowtie2 - Filter Reads From Reference
- Trimmomatic - Trim Reads