Download reference

At first, download sequences from NCBI according to selected genbank ids and prepare reference FASTA. Also extract taxonomic ids and store them in the TAX file that is further used, particularly in metagenomics.

Purpose

  • Download genomic sequences and prepare reference database

Required inputs

  • none

Generated outputs

  • Reference FASTA file with sequences
  • Reference TAX file with taxonomies for FASTA file

Example

How to run example:

cd /usr/local/snakelines/example/fasta_processing

snakemake \
   --snakefile ../../snakelines.snake \
   --configfile config_download_reference.yaml \
   --use-conda

Example configuration:

sequencing: paired_end
samples:                             # List of sample categories to be analysed
    - reference: lacto_download1     # Reference genome in the category (would be downloaded into reference/lacto_download1/lacto_download1.fa)
    - reference: lacto_download2     # Reference genome in the category (would be downloaded into reference/lacto_download2/lacto_download2.fa)

report_dir: report/public/01-ncbi_reference     # Generated reports and essential output files would be stored there

reference:                           # Prepare and analyse reference sequences
    download:                        # Download reference sequences
        method: entrez               # Supported values: entrez
        email: FILLME@SOMEMAIL.COM   # Inform NCBI who you are to contact you in case of excessive use.
        lacto_download1:             # List of genbank ids for the lacto_download1 reference
            - U32971
            - AF182732
            - X74221
            - Z75478
        lacto_download2:             # List of genbank ids for the lacto_download2 reference
            - AF074857
            - U97135
            - Z75475

Planned improvements

  • None

Included pipelines

  • None