Assemble reads ============== Join paired-end reads with overlaps into larger continuous genomic sequences, contigs. Purpose ------- * Determine genomic sequence of sequenced organism * Reduce the huge number of small read sequences into larger, more manageable genomic contigs Required inputs --------------- * Sequenced paired-end reads from Illumina sequncer in gzipped fastq format. * each sample is represented by two gzipped fastq files * standard output files of paired-end sequencing :: |-- reads/original |-- _R1.fastq.gz |-- _R2.fastq.gz |-- _R1.fastq.gz |-- _R2.fastq.gz Generated outputs ----------------- * Assembled genomic sequences (contigs) in fasta format * Reports to assess quality of assembly * Graph visualisation of assembly to visually assess its complexity Example ------- How to run example: .. code-block:: bash cd /usr/local/snakelines/example/genomic snakemake \ --snakefile ../../snakelines.snake \ --configfile config_assembly.yaml \ --use-conda Example configuration: .. literalinclude:: ../../example/genomic/config_assembly.yaml :language: yaml Planned improvements -------------------- * Aggregate quality statistics of preprocess and mapping with the `MultiQC `_ * Connect contigs into scaffolds based on known genomic sequence of related organism * Aggregate quast results of individual samples into summary report Included pipelines ------------------ .. toctree:: :maxdepth: 2 /pipelines/quality_report /pipelines/preprocess_paired_end