Functional Profiling

assign reads to gene families and pathways, generating a function feature table ready for analysis

Introduction to Functional Profiling

Functional profiling, like taxonomy profiling, takes the cleaned sequenced reads after QC checking as input and matches them against a reference database of previously characterised gene sequences.

There are different types of functional classification tools available. This pipeline uses the HUMAnN3 pipeline, which also comes with its own reference databases.

Workflow description

One PBS script per sample is generated by this module. In this tutorial, we have 9 samples, so there will be 9 PBS scripts.

Steps in the functional profiling module:

Steps
HUMAnN3 is used for processing and generates three outputs for each sample:
  1. gene families
  2. pathway abundances
  3. pathway coverage
Generate feature table is performed after all samples have been processed. This combines the output and generates a feature table that contains the abundance of gene/pathway X in sample Y.

Step 1. Generate PBS scripts

Before starting, have you understood the need to know points?

  • After QC checking your sequence samples and generating a set of CleanReads
  • Find the absolute paths for the HUMAnN3 and MetaPhlAn reference databases depending on the MIMA version you installed
  • Replace the highlighted lines with the absolute path for the reference databases
    • path/to/humann3/chocophlan
    • path/to/humann3/uniref
    • path/to/humann3/utility_mapping
    • path/to/metaphlan_databases
  • the backslash (\) at the end of each line informs the terminal that the command has not finished and there’s more to come (we broke up the command for readability purposes to explain each parameter below)
  • note if you enter the command as one line, remove the backslashes
1
2
3
4
5
6
7
8
9
apptainer run --app mima-function-profiling $SANDBOX \
-i ~/mima_tutorial/output/QC_module/CleanReads \
-o ~/mima_tutorial/output \
--nucleotide-database </path/to>/humann3/chocophlan \
--protein-database /<path/to>/humann3/uniref \
--utility-database /<path/to>/humann3/utility_mapping \
--metaphlan-options "++bowtie2db /<path/to>/metaphlan_databases" \
--mode container \
--pbs-config ~/mima_tutorial/pbs_header_func.cfg

Parameters explained

Parameter
Required?Description
-i <input>yesfull path to the ~/mima_tutorial/output/QC_module/CleanReads directory that was generated from Step 1) QC, above. This directory should hold all the *_clean.fastq files
-o <output>yesfull path to the ~/mima_tutorial/output output directory where you would like the output files to be saved, can be the same as Step 1) QC
--nucleotide-database <path>yesdirectory containing the nucleotide database, (default=/refdb/humann/chocophlan)
--protein-database <path>yesdirectory containing the protein database, (default=/refdb/humann/uniref)
--utility-database <path>yesdirectory containing the protein database, (default=/refdb/humann/utility_mapping)
--metaphlan-options <string>yesadditional MetaPhlAn settings, like specifying the bowtie2 reference database (default="++bowtie2db /refdb/humann/metaphlan_databases"). Enclose parameters in double quotes and use ++ for parameter settings.
--mode containerno (default=‘single’)set this if you are running in the Container mode
--pbs-configyes if --mode containerpath to the pbs configuration file (see below). You must specify this parameter if --mode container is set. You do not need to set this parameter if running outside of Singularity
--mpa3nothis parameter is only applicable for MIMA container with MetaPhlAn4. You can set --mpa3 for backward compatibility with MetaPhlAn3 databases

Step 2. Check PBS scripts output

  • After step 1, you should see in the output directory: ~/mima_tutorial/output/Function_profiling
tree ~/mima_tutorial/output/Function_profiling
  • There should be one PBS script/sample (total of 9 bash scripts in this tutorial)
...
├── featureTables
│   ├── generate_func_feature_tables.pbs
│   └── generate_func_feature_tables.sh
├── SRR17380209.pbs
├── SRR17380218.pbs
├── SRR17380222.pbs
├── SRR17380231.pbs
├── SRR17380232.pbs
└── SRR17380236.pbs

Step 3. Submit Function jobs

  • Let’s examine one of the PBS scripts
$ cat ~/mima_tutorial/output/Function_profiling/SRR17380209.pbs
  • Your PBS script should look something like below, with some differences
    • line 10: IMAGE_DIR should be where you installed MIMA and build the sandbox
    • line 11: APPTAINER_BIND should be setup during installation when binding paths
      • make sure to include the path where the host reference genome file is located
    • line 14: /home/user is replaced with the absolute path to your actual home directory
    • lines 22-25: ensure the paths to the reference databases are all replaced (the example below uses the vJan21 database version and the parameter index is sent to MetaPhlAn to disable autocheck of index)
    • note that the walltime is set to 8 hours
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/bin/bash
#PBS -N mima-func
#PBS -l ncpus=8
#PBS -l walltime=8:00:00
#PBS -l mem=64GB
#PBS -j oe

set -x

IMAGE_DIR=~/mima-pipeline
export APPTAINER_BIND="/path/to/humann3_database:/path/to/humann3_database,/path/to/metaphlan_databases:/path/to/metaphlan_databases"


cd /home/user/mima_tutorial/output/Function_profiling/

# Execute HUMAnN3
cat /home/user/mima_tutorial/output/QC_module/CleanReads/SRR17380209_clean_1.fq.gz ~/mima_tutorial/output/QC_module/CleanReads/SRR17380209_clean_2.fq.gz > ~/mima_tutorial/output/Function_profiling/SRR17380209_combine.fq.gz

outdir=/home/user/mima_tutorial/output/Function_profiling/
apptainer exec ${IMAGE_DIR} humann -i ${outdir}SRR17380209_combine.fq.gz --threads 28 \
-o $outdir --memory-use maximum \
--nucleotide-database </path/to/humann3>/chocophlan \
--protein-database </path/to/humann3>/uniref \
--utility-database </path/to/humann3>/utility_mapping \
--metaphlan-options \"++bowtie2db /path/to/metaphlan_databases ++index mpa_vJan21_CHOCOPhlAnSGB_202103\" \
--search-mode uniref90
  • Change directory to ~/mima_tutorial/output/Functional_profiling
  • Submit PBS job using qsub
$ cd ~/mima_tutorial/output/Functional_profiling
$ qsub SRR17380209.pbs
  • Repeat this for each *.pbs file
  • You can check your job statuses using qstat
  • Wait until all PBS jobs have completed

Step 4. Check Function outputs

  • After all PBS jobs have completed, you should have the following outputs
$ ls -1 ~/mima_tutorial/output/Function_profiling
  • Only a subset of the outputs are shown below with ... meaning “and others”
  • Each sample that passed QC, will have a set of functional outputs
featureTables
...
SRR17380115_combine_genefamilies.tsv
SRR17380115_combine_humann_temp
SRR17380115_combine_pathabundance.tsv
SRR17380115_combine_pathcoverage.tsv
SRR17380115.pbs
SRR17380118_combine_genefamilies.tsv
SRR17380118_combine_humann_temp
SRR17380118_combine_pathabundance.tsv
SRR17380118_combine_pathcoverage.tsv
SRR17380118.pbs
...
SRR17380236.pbs

Step 5. Generate Function feature tables

  • After all samples have been functionally annotated, we need to combine the tables together and normalise the abundances
  • Navigate to ~/mima_tutorial/output/Fuctional_profiling/featureTables
  • Submit the PBS script file generate_func_feature_tables.pbs
$ cd <PROJECT_PATH>/output/Functional_profiling/featureTables
$ qsub generate_func_feature_tables.pbs
  • Check the output
  • Once the job completes you should have 7 output files beginning with the merge_humann3table_ prefix
~/mima_tutorial/
└── output/
    ├── Function_profiling
    │   ├── featureTables
    │   ├── func_table.o2807928
    │   ├── generate_func_feature_tables.pbs
    │   ├── generate_func_feature_tables.sh
    │   ├── merge_humann3table_genefamilies.cmp_uniref90_KO.txt
    │   ├── merge_humann3table_genefamilies.cpm.txt
    │   ├── merge_humann3table_genefamilies.txt
    │   ├── merge_humann3table_pathabundance.cmp.txt
    │   ├── merge_humann3table_pathabundance.txt
    │   ├── merge_humann3table_pathcoverage.cmp.txt
    │   └── merge_humann3table_pathcoverage.txt
    ├── ...

Congratulations !

You have completed processing your shotgun metagenomics data and are now ready for further analyses

Analyses usually take in either the taxonomy feature tables or functional feature tables



Last modified 25.09.2024