Download reference¶
At first, download sequences from NCBI according to selected genbank ids and prepare reference FASTA. Also extract taxonomic ids and store them in the TAX file that is further used, particularly in metagenomics.
Purpose¶
- Download genomic sequences and prepare reference database
Required inputs¶
- none
Generated outputs¶
- Reference FASTA file with sequences
- Reference TAX file with taxonomies for FASTA file
Example¶
How to run example:
cd /usr/local/snakelines/example/fasta_processing
snakemake \
--snakefile ../../snakelines.snake \
--configfile config_download_reference.yaml \
--use-conda
Example configuration:
sequencing: paired_end
samples: # List of sample categories to be analysed
- reference: lacto_download1 # Reference genome in the category (would be downloaded into reference/lacto_download1/lacto_download1.fa)
- reference: lacto_download2 # Reference genome in the category (would be downloaded into reference/lacto_download2/lacto_download2.fa)
report_dir: report/public/01-ncbi_reference # Generated reports and essential output files would be stored there
reference: # Prepare and analyse reference sequences
download: # Download reference sequences
method: entrez # Supported values: entrez
email: FILLME@SOMEMAIL.COM # Inform NCBI who you are to contact you in case of excessive use.
lacto_download1: # List of genbank ids for the lacto_download1 reference
- U32971
- AF182732
- X74221
- Z75478
lacto_download2: # List of genbank ids for the lacto_download2 reference
- AF074857
- U97135
- Z75475
Planned improvements¶
- None
Included pipelines¶
- None