Infer phylogeny between genomic sequences

Compare sequences from the reference FASTA file and visualise their relationship in form of phylogenetic tree and interactive multiple alignment.

Purpose

  • Assess relationship and similarity between genomic sequences

Required inputs

  • Reference genome in fasta format
|-- reference/<reference>
        |-- <reference>.fa

Generated outputs

  • Phylogenetic tree with distances between genomic sequences
  • Interactive visualization of multiple alignment of reference sequences

Example

How to run example:

cd /usr/local/snakelines/example/fasta_processing

snakemake \
   --snakefile ../../snakelines.snake \
   --configfile config_infer_phylogeny.yaml \
   --use-conda

Example configuration:

sequencing: paired_end
samples:                                   # List of sample categories to be analysed
    - reference: lacto_supplied            # Reference genome for reads in the category (reference/lacto_supplied/lacto_supplied.fa)

report_dir: report/public/01-phylogeny     # Generated reports and essential output files would be stored there

reference:                                 # Prepare and analyse reference sequences
    alignment:                             # Multiple alignment of reference sequences
        method: mafft                      # Supported values: mafft

    phylogeny:                             # Assess phylogenetic relationship between sequences
        method: iqtree                     # Supported values: iqtree
        model: GTR+I+G4                    # Model to use for phylo tree generation - see iqtree documentation

    report:                                # Visually assess relationship between reference sequences
        phylogenetic_tree:                 # Visual inspection of distances between sequences in tree graph structure
            method: phylo                  # Supported values: phylo

        comparison:                        # Interactive HTML visualization of multiple alignment
            method: msaviewer              # Supported values: msaviewer

Planned improvements

  • SVG figure size should be scaled with the number of the sequences

Included pipelines

  • None