Analyse gene expression

Determine level of expression of individual transcripts from sequenced RNA. Compare expressions between samples from two conditions to identify transcripts with changed expression.

Purpose

  • Identify which transcripts are expressed
  • Compare expression of transcripts with each other in a single sample
  • Compare changes of expression between samples with different conditions, e.g.
    • environment
    • tissues
    • phenotypic trait
  • Assess impact of a gene on biochemical pathways (compare wild type with sample without the gene)

Required inputs

  • Sequenced reads in gzipped fastq format.
    • each sample is represented by two gzipped fastq files
    • standard output files of paired-end sequencing
  • Transcripts of reference genome in fasta format
  • Sample metadata file in TSV format
    • each row describe a single sequenced sample
    • the first, ‘sample-id’ column represents names of fastq files without _R[12].fastq.gz part, e.g. sample_1, sample_2
    • other columns represent attributes of sequenced samples that may be used to separate samples into categories for comparison, e.g. environment, tissue
|-- description
        |-- sample-metadata.csv
|-- reads/original
        |-- <sample_1>_R1.fastq.gz
        |-- <sample_1>_R2.fastq.gz
        |-- <sample_2>_R1.fastq.gz
        |-- <sample_2>_R2.fastq.gz
|-- reference/<reference>
        |-- <reference>.transcripts.fa

Generated outputs

  • Summary table of number of transcripts per sample, normalised by read count and length of transcripts
  • Graphical, 2D PCA comparison of samples based on their expression to visually assess relationships between them
  • Set of transcripts with significant set of expression between selected conditions

Example

How to run example:

cd /usr/local/snakelines/example/mhv

snakemake \
   --snakefile ../../snakelines.snake \
   --configfile config_transcriptomics.yaml

Example configuration not ready yet.

Planned improvements

  • Aggregate quality statistics of preprocess and mapping with the MultiQC
  • Add mapped based approach with resulting coverage tracks