Fast_Virome_Explorer - Estimate Virome Composition

Asses viral composition of sample based on read_counts of particular taxonomic units.

Location

  • Filepath: <SnakeLines_dir>/rules/paired_end/classification/read_based/fast_virome_explorer.snake
  • Rule name: fast_virome_explorer__estimate_virome_composition

Input(s):

  • reads_f: fastq file with sequences from forward strand
  • reads_r: fastq file with sequences from reverse strand
  • index: kallisto index created from reference database
  • ref_lens: lenghts of particular reference genomes from database

Output(s):

  • composition: TSV table containing information about number of reads assigned to taxonomic units (most common species)
  • abundance: TSV table containing NCBI ID of all found taxonomic units with assigned read counts and transkripts per milion

Custom - Fill Na Values With Virusnames

Python script, replaces blank space in input TSV file with virus names from that row and create new changed TSV file.

Location

  • Filepath: <SnakeLines_dir>/rules/paired_end/classification/read_based/fast_virome_explorer.snake
  • Rule name: custom__fill_na_values_with_virusnames

Input(s):

  • composition: TSV table containing information about number of reads assigned to taxonomic units (most common species), generated as output of previous rule

Output(s):

  • checked_composition: new TSV table but that NA values replace with virus names from first column

Custom - Convert To Tpm Metric

Python script (have to be set in config => count_type: tpm), create new TSV table with metric turned into tpm (transcripts per milion).

Location

  • Filepath: <SnakeLines_dir>/rules/paired_end/classification/read_based/fast_virome_explorer.snake
  • Rule name: custom__convert_to_tpm_metric

Input(s):

  • checked_composition: checked TSV table in previous rule, containing information about number of reads assigned to taxonomic units (most common species)
  • abundance: TSV table containing NCBI ID of all found taxonomic units with assigned read counts and transkripts per milion, output from rule fast_virome_explorer__estimate_virome_composition

Output(s):

  • checked_tpm_composition: new TSV table but that count metric is changed from read count to tpm

Custom - Convert To Krona

Create from input file new krona file.

Location

  • Filepath: <SnakeLines_dir>/rules/paired_end/classification/read_based/fast_virome_explorer.snake
  • Rule name: custom__convert_to_krona

Input(s):

  • composition: containing information about number of reads assigned to taxonomic units (most common species), output file from one of the last two previous rules (according to selected count metric)

Output(s):

  • krona: new krona file

Metaxa2 - Classify Reads

Find closest homologue sequence for each sequenced fragment

Location

  • Filepath: <SnakeLines_dir>/rules/paired_end/classification/read_based/metaxa2.snake
  • Rule name: metaxa2__classify_reads

Input(s):

  • r1: Left side of sequenced fragments in gzipped fastq format
  • r2: Right side of sequenced fragments in gzipped fastq format
  • blast: Blast index of reference sequences (generated by Metaxa2 database builder)
  • cutoffs: Auxiliary files from reference sequences (generated by Metaxa2 database builder)
  • hmm: Auxiliary file from reference sequences (generated by Metaxa2 database builder)

Output(s):

  • taxonomy: Summary taxonomies of classified sequenced fragments

Metaxa2 - Create Reference Index

Transform genomic sequences into Metaxa2 index for faster classification

Location

  • Filepath: <SnakeLines_dir>/rules/paired_end/classification/read_based/metaxa2.snake
  • Rule name: metaxa2__create_reference_index

Input(s):

  • fasta: Genomic reference sequences in Fasta format
  • tax: Taxonomies for each reference sequence

Output(s):

  • blast: Blast index of reference sequences
  • cutoffs: Auxiliary files from reference sequences
  • hmm: Auxiliary file from reference sequences

Metaxa2 - Summarize Classification

Summarize taxonomies per individual taxonomic levels - e.g. for species, order …

Location

  • Filepath: <SnakeLines_dir>/rules/paired_end/classification/read_based/metaxa2.snake
  • Rule name: metaxa2__summarize_classification

Input(s):

  • taxonomy: Classified fragments - output of metaxa2 tool
  • nomatch_template: Auxiliary file for margin case without any classified fragment
  • nomatch_tax_template: Auxiliary file for margin case without any classified fragment

Output(s):

  • summary: Summarized taxonomy per species level (others should be generated accordingly)

Metaxa2 - Prepare For Krona

Convert metaxa2 classification files into standardised format suitable for generation of Krona reports

Location

  • Filepath: <SnakeLines_dir>/rules/paired_end/classification/read_based/metaxa2.snake
  • Rule name: metaxa2__prepare_for_krona

Input(s):

  • classification: Summarized classification from Metaxa2 classifier

Output(s):

  • krona: Tabular format suitable for Krona report generation

Rdp - Classify Reads

Find closest homologue sequence for each sequenced fragment

Location

  • Filepath: <SnakeLines_dir>/rules/paired_end/classification/read_based/rdp.snake
  • Rule name: rdp__classify_reads

Input(s):

  • reads: Joined sequenced fragments in fasta format

Output(s):

  • readtax: Individual taxonomy for each analysed fragment
  • taxonomy: Summary taxonomies of classified sequenced fragments

Rdp - Prepare For Krona

Convert RDP classification files into standardised format suitable for generation of Krona reports

Location

  • Filepath: <SnakeLines_dir>/rules/paired_end/classification/read_based/rdp.snake
  • Rule name: rdp__prepare_for_krona

Input(s):

  • classification: Summarized classification from RDP classifier

Output(s):

  • krona: Tabular format suitable for Krona report generation