Need to know
Project working directory
After downloading the tutorial data, we assume that the mima_tutorial
is the working directory located in your home directory (specified by the tilde, ~
). Hence, we will try to always make sure we are in the right directory first before executing a command, for example, run the following commands:
$ cd ~/mima_tutorial
$ tree .
- the starting directory structure for
mima_tutorial
should look something like:
mima_tutorial
├── ftp_download_files.sh
├── manifest.csv
├── pbs_header_func.cfg
├── pbs_header_qc.cfg
├── pbs_header_taxa.cfg
├── raw_data/
├── SRR17380115_1.fastq.gz
├── SRR17380115_2.fastq.gz
├── ...
...
From here on, ~/mima_tutorial
will refer to the project directory as depicted above. Replace this path if you saved the tutorial data in another location.
TIP: Text editors
There are several places where you may need to edit the commands, scripts or files. You can use the vim
text editors to edit text files directly on the terminal.
For example, the command below lets you edit the pbs_head_qc.cfg
text file
vim pbs_header_qc.cfg
Containers and binding paths
When deploying images, make sure you check if you need to bind any paths.
PBS configuration files
The three modules (QC, Taxonomy profiling and Function profiling) in the data-processing pipeline require access to a job queue and instructions about the resources required for the job. For example, the number of CPUs, the RAM size, the time required for execution etc. These parameters are defined in PBS configuration text files.
Three such configuration files are in provided after you have downloaded the tutorial data. There are 3 configuration files, one for each module as they require different PBS settings indicated by lines starting with the #PBS
tags.
|
|
PBS settings | Description |
---|---|
#PBS -N | name of the job |
#PBS -l ncpus | number of CPUs required |
#PBS -l walltime | how long the job will take, here it’s 2 hours. Note check the log files whether your jobs have completed correctly or failed due to not enough time |
#PBS -l mem=64GB | how much RAM the job needs, here it’s 64GB |
#PBS -l -j oe | standard output logs and error logs are concatenated into one file |
IMAGE_DIR
refers to where you installed MIMA and built your sandbox.APPTAINER_BIND
is the environment variable you set when binding file paths to the container.
Use absolute paths
When running the pipeline it is best to use full paths when specifying the locations of input files, output files and reference data to avoid any ambiguity.
Absolute/Full paths
always start with the root directory, indicated by the forward slash (/
) on Unix based systems.
- e.g., below changes directory (
cd
) to a folder namedscripts
that is located in the userjsmith
’s home directory. Provided this folder exists, then this command will work from anywhere on the filesystem."
[~/metadata] $ cd /home/jsmith/scripts
Relative paths
are relative to the current working directory
- Now imagine the following file system structure in the user
john_doe
’s home directory - The asterisks marks his current location, which is inside the
/home/user/john_doe/studyAB/metadata
sub-folder
/home/user/john_doe
├── apps
├── reference_data
├── studyABC
│ ├── metadata **
│ ├── raw_reads
│ └── analysis
├── study_123
├── scripts
└── templates
- In this example we are currently in the
metadata
directory, and change directory to a folder nameddata
that is located in the parent directory (..
) - This command only works provided there is a
data
folder in the parent directory abovemetadata
- According to the filesystem above, the parent directory of
metadata
isstudyABC
and there is nodata
subfolder in this directory, so this command will fail with an error message
[/home/user/john_doe/studyABC/metadata] $ cd ../data
-bash: cd: ../data: No such file or directory
Now that you have installed the data and know the basics, you can begin data processing with quality control.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.