Analyse microbial composition¶
Determine composition of organisms (typically microbial) in sequenced samples. Data analysis pipeline expect data from shotgun sequencing of a single target gene, for example 16S. Visually assess changes in compositions between several samples.
Purpose¶
- Identify which organisms are present in sequenced samples
- Compare an abundance of organisms with each other in a single sample
- Compare abundances of organisms between samples
Required inputs¶
- Sequenced reads in gzipped fastq format.
- each sample is represented by two gzipped fastq files
- standard output files of paired-end sequencing
- .tax file with taxonomic description of each organism in reference fasta
- TSV format
- no header
- each row represents a single target gene from the reference fasta
- the first column represents the identifier of the gene (the first word after > symbol)
- the second column represents taxonomy of the organism
- for example
KF494428.1.1396 Bacteria;Epsilonbacteraeota;Campylobacteria;Campylobacterales;Thiovulaceae;Sulfuricurvum;Sulfuricurvum sp. EW1;;;;;;;;
AF506248.1.1375 Bacteria;Cyanobacteria;Oxyphotobacteria;Nostocales;Nostocaceae;Nostoc PCC-73102;Nostoc sp. 'Nephroma expallidum cyanobiont 23';;;;;;;;
EF603722.1.1487 Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Muribaculaceae;;;;;;;;;;
...
|-- reads/original
|-- <sample_1>_R1.fastq.gz
|-- <sample_1>_R2.fastq.gz
|-- <sample_2>_R1.fastq.gz
|-- <sample_2>_R2.fastq.gz
In case, you are using Metaxa2 classifier, you also need reference genomes of target genes in fasta format.
|-- reference/<reference>
|-- <reference>.fa
|-- <reference>.tax
Generated outputs¶
- Summary table with the number of reads per organism in sequenced sample
- Pie plot with hierarchical taxonomical composition of the samples
- Barplots for visual comparison of organisms abundance across sequenced samples
Example¶
Metaxa2 classifier¶
How to run example:
You need to provide reference sequences for this example to run. Sequences of the 16S gene (store them as reference/silva-16S/silva-16S.fa) may be downloaded from the Silva database. Sequences of the ITS gene (store them as reference/unite/unite.fa) may be downloaded from the Unite database.
cd /usr/local/snakelines/example/tatry
snakemake \
--snakefile ../../snakelines.snake \
--configfile config_metagenomics.yaml
Example configuration:
RDP classifier¶
How to run example:
RDP classifier has already prebuilt databases that may be used, specifically 16srrna, fungallsu, fungalits_warcup, fungalits_unite
cd /usr/local/snakelines/example/tatry
snakemake \
--snakefile ../../snakelines.snake \
--configfile config_metagenomics_rdp.yaml
Example configuration: