Data processing with MIMA

How to use MIMA to process your shotgun metagenomics data.

This section shows you how to use the MIMA pipeline for data processing of shotgun metagenomics sequenced reads using the assembly-free approach.

MIMA: data-processing

This section covers the data processing pipeline, which consists of three main modules:

Module
Quality control (QC) of the sequenced reads
Taxonomy profiling after QC, for assigning reads to taxon (this step can be run in parallel with step 3)
Functional profiling after QC, for assigning reads to genes (this step can be run in parallel with step 2)

How the tutorials works

The tutorials are split into the three modules:

Each module has six sub-sections where actions are required in steps 2 to 6.

  1. A brief introduction
  2. RUN command to generate PBS scripts
  3. Check the generated PBS scripts
  4. RUN command to submit PBS scripts as jobs
  5. Check the expected outputs after PBS job completes
  6. RUN Post-processing step (optional steps are marked)

Check list

For this set of tutorials, you need

  • Access to High-performance cluster (HPC) with a job scheduling system like OpenPBS
    • HPC system must have Apptainer or Singularity installed
  • Install MIMA Container image
    • start an interactive PBS job
    • create the image sandbox
    • set the SANDBOX environment variable
  • Take note of the paths for the reference databases, namely
    • MiniMap2 host reference genome file
    • Kraken2 and Bracken reference database (same directory for the two tools)
    • HUMAnN reference databases
    • MetaPhlAn reference databases
  • Understand the need to know points
  • Data processing worked on paired-end sequenced reads with two files per sample
    • forward read fastQ files usually has some variation of _R1.fq.gz or _1.fq.gz filename suffix, and
    • reverse read fastQ files usually some variation of _R2.fq.gz or _1.fq.gz filename suffix
  • Download tutorial data and check the manifest file

Download tutorial data

metadata and sequence data files for shotgun metagenomics data-processing

Need to know

preparation for data-processing tutorials

Quality control (QC)

QC module in MIMA, check reads are of high standard

Taxonomy Profiling

assign reads to taxa, generating a taxonomy feature table ready for analysis

Functional Profiling

assign reads to gene families and pathways, generating a function feature table ready for analysis



Last modified 25.09.2024