Session 2: Metagenomics

In this use case we will use the metagenomics tools included in OmicsBox to analyze the microbial communities of two different soda lakes from Brazil. The original study was carried out by Ana P. D. Andreote, et al., 2018 (doi: 10.3389/fmicb.2018.00244).

Soda lakes are special ecosystems found across Africa, Europe, Asia, etc. These lakes show high levels of sodium carbonate and elevated salinity and pH. Given their special nature, it is interesting to examine their taxonomic composition and functional patterns.

Salina Preta and Salina Verde (Mato Grosso do Sul, Brazil) are two soda lakes that mainly differ by permanent cyanobacterial blooms in Verde, while Preta shows no record of cyanobacterial blooms. Therefore, it can be expected that highly different microbial communities and functional compositions are present between lakes.

The objectives of the study are:

  • To describe the bacterial diversity between Salina Preta and Salina Verde, and to identify the microorganisms responsible for blooms in Salina Verde.

  • To identify the functional genetic potential of these microbial communities.

Experimental design and available sequencing data

12 single-end metagenomic samples were considered for the analysis: three replicates for each lake, taken at two different times, morning and afternoon. Samples were collected at two different depths: 0.25 m in Salina Verde, and 0.35 m in Salina Preta.


Time of Sampling

Sample Names


Morning (10 AM)



Afternoon (3 PM)



Morning (10 AM)



Afternoon (3 PM)


Use case: Taxonomic Classification with OmicsBox.

To study the taxonomic community compositions of both lakes, we use the taxonomic classification functionality in OmicsBox (Metagenomics > Taxonomic Classification), based on the Kraken, with the preprocessed reads as input.

To explore the classification results, we obtain the plain result table, a PDF report, various charts, and plots.

Exercise 1: Exploring the dataset


Preprocessed Reads (not necessary to complete this exercise)

Experimental Design

Taxonomic Classification Result


Download the experimental design and the taxonomic classification results to respond to these questions:

  1. Was the sequencing depth enough to capture all different OTUs?

  2. Does the number of samples give us a good overview of taxonomic diversity?

  3. Are there clear differences in taxonomic compositions between samples? How do the samples separate?

  4. Are these differences in abundances statistically significant?



After loading the taxonomic classification results, the result table is visible and in its side panel we have various options. The Rarefaction Curves suggest that maybe the sequencing depth was not enough to cover all the species present in these environments, especially in the case of Preta lake samples, which present a much higher level of microbial diversity.


From the same side panel, the Diversity Curve plots the cumulative number of distinct OTUs as a function of the number of samples examined. The accumulation curve suggests that the number of samples was adequate to obtain a good image of the whole environment, and the effect of adding more samples to the dataset would not contribute to obtaining a better representation.


A look at the result table does not make it obvious but the Principal Coordinate Analysis plot (PCoA) suggests that big differences exist between the taxonomic compositions of Salina Verde and Salina Preta. The first coordinate separates them perfectly. The x-axis label informs us that this separation explains about 80% of the variance in the whole dataset.

Additionally, the Taxa Bar Chart shows the different taxonomic compositions at the 7 main taxonomic levels and allows for inter-sample comparison.

Hiding unclassified reads shows that the main cyanobacterial genus in Salina Verde is Nodularia (40%), a filamentous nitrogen-fixing group of cyanobacteria, commonly associated with algae bloom events in saline water systems. Other cyanobacterial genera found in Salina Verde are Nostoc, Calothrix, Anabaena, and Fischerella. None of them is found in significant abundances in Salina Preta.

The dominant genera found in Salina Preta are Pseudomonas (7%), Hydrogenophaga (4%), Acidovorax (3%) and Burkholderia (3%).


To get a better understanding of the taxonomic differences between lakes, a differential abundance test (Metagenomics > Comparative Analysis > OTU Differential Abundance Testing) can be performed in OmicsBox. We choose the Salina Verde as contrast and Salina Preta as the reference group.

A report and heatmap (side panel) show the differentially abundant OTUs. As expected, the plot shows that the most abundant taxa in Verde lake when comparing it to Preta lake corresponded to cyanobacteria (Nostoc, Nodularia, Calothrix, …). In addition, the hierarchical clustering among samples shows what the PCoA already revealed before: both lakes are taxonomically different from each other, but there is no clear separation between the sampling times (morning and afternoon), even within a single lake.

Exercise 2: