Exercise 5: Differential Expression Analysis
This task consists of performing a pairwise differential expression analysis from RNA-seq expression data. This analysis involves the identification of differentially expressed genomic features in a pairwise comparison of two different experimental conditions. To understand the effect of flubendazole on the Ascaridia galli species, a differential expression analysis will be carried out to compare the expression levels of the samples taken from the treated organisms with those of the untreated (control) organisms.
The Pairwise Differential Expression Analysis tool, based on the edgeR program, allows identification of differentially expressed genomic features (e.g. genes) in a pairwise comparison of two different experimental conditions. The software package edgeR (empirical analysis of DGE in R), which belongs to the Bioconductor project, implements quantitative statistical methods to evaluate the significance of individual genes between two experimental conditions.
First, open the “count_table_a_galli” project. It contains a count table obtained from the six samples of the complete Ascaridia galli dataset. They belong to two experimental groups, labeled as “flubendazole” (treated organisms) and “control” (untreated organisms). In the side panel, there are some visualization options that allow inspecting some properties, such as the library size or the distribution of counts of each sample.
Go to Transcriptomics → Run Differential Expression Analysis, and select the Pairwise Differential Expression Analysis option. Adjust the options as follows:
In the first wizard page:
Establish the CPM filter at 0.5.
Establish the samples reaching the CPM filter at 3.
Select the TMM normalization method.
In the second wizard page, provide the experimental_design.txt file provided the input data. A table should be displayed.
In the third wizard page, select the simple design option, and establish the remaining parameters as follows:
Primary Experimental Factor: Treatment.
Primary Contrast Condition: flubendazole.
Primary Reference Condition: control.
Select the Exact Test option.
Check the robust parameter.
Once the input counts have been processed and analyzed, a new tab is opened containing the results (Figure 8). The results table contains the differential expression statistics, where each row corresponds to a feature:
logFC: A measure that describes how much the expression changes between conditions (log2-fold-changes are shown).
logCPM: The average log2-counts-per-millions.
P-Value: Tell us if this expression difference is significant or not (difference in comparison to the variance).
FDR: False Discovery Rate calculated by the Benjamini-Hochberg method (multiple hypothesis testing corrections).
Tags: Indicate whether a gene is upregulated (FDR ≤ 0.05, logFC ≥ 0) or downregulated (FDR ≤ 0.05, logFC ≥ 0).
Genes that have not passed the filtering step are not shown in the new tab.
In the sidebar, there are several options to explore differential expression results, including a summary and visualizations.