PEMA: A pipeline for environmental DNA metabarcoding analysis

Submitted by katrina.exter@v... on Wed, 08/14/2019 - 09:47

PEMA: a Pipeline for Environmental DNA Metabarcoding Analysis

PEMA supports the metabarcoding analysis of four marker genes, 16S rRNA (Bacteria), ITS (Fungi) as well as COI and 18S rRNA (metazoa), and now also includes 12S rRNA. As input, PEMA accepts .fastq.gz files as returned by Illumina sequencing platforms. PEMA processes the reads from each sample and returns an OTU-table with the taxonomies of the organisms found and their abundances in each sample. It also returns statistics and a FASTQC diagram of the quality of the reads for each sample.

In the case of 16S, PEMA. returns alpha and beta diversities, and make correlations between samples. The last step is facilitated by the phyloseq R package which allows the downstream 16S amplicon analysis of microbial profiles. For COI, two clustering algorithms can be performed by PEMA: CROP and SWARM. For 16S, two approaches for taxonomy assignment are supported: alignment- and phylogenetic-based. For the latter, a reference tree with 1000 taxa was created using SILVA_132_SSURef, EPA-ng and RaxML-ng.

The most recent developements in PEMA (2022) are to incorporate a Ro-Crating of its outputs, to allow for a more machine-accessible (i.e. FAIR) handling of its output files.

PEMA can be downloaded from its github repository, and it can also be run via its Docker Hub and in Singularity Hub instances. In addition, PEMA is to be included in the LifeWatch Tesseract workflow on NIS (invasive species): this being a cloud installation, it will allow users to run PEMA without needing to install it themselves. More information will be posted here when this is available for general users.