Exercise 6: Functional Enrichment Analysis

The aim of this exercise is to learn how functional enrichment analysis can be performed from differential expression results and how to interpret the results to gain biological insights. Functional enrichment analysis is a procedure to identify functions that are over-represented in a set of genes and may have an association with an experimental condition (e.g. phenotype, treatment…).

Taking advantage of the results of the differential expression analysis performed in exercise 5, Fisher’s Exact Test will be used to find out what functions are overrepresented in genes that show significant differences in expression.

OmicsBox has integrated the FatiGO package for statistical assessment of annotation differences between 2 sets of sequences. This package uses Fisher's Exact Test and corrects for multiple testing. For this analysis, the completion (but not exclusively) of the involved sequences with their annotations must be loaded in the application. This can either be the result of a OmicsBox annotation or the imported annotation by file (.annot).

Open the “count_table_a_galli_results” project containing the differential expression results from the previous task. Click on the Enrichment Analysis (Fisher's Exact Test) button (sidebar), and configure the analysis as follows: 

  • Select the "Down-regulated genes" option as Test-set genes. 

  • Select the "a_galli_annot.annot" project as Reference Annotation. 

  • Leave the remaining parameters as default. 

Once completed the results table will be shown in a new tab (see image below), where the adjusted p-values of each annotation above a given threshold will be shown. The main columns are:

  • FDR: Corrected p-value by False Discovery Rate control according to Benjamini-Hochberg.

  • P-Value: P-Value without multiple testing corrections.

Using the context menu of each row It is possible to get more details about the annotation and also create an ID-List with the sequences annotated in the Test-Set or the Reference-Set.

  • #Test is the number of sequences that are annotated with the GO and are in the test set.

  • #NotAnnotTest is the number of sequences that are not annotated with that GO, that is in the test set. 

Adding these two numbers it gives the total amount of sequences that are annotated at all in your test set e.g. GO:0061135: 9 + 52 = 61.

In the sidebar, there are several options to explore functional enrichment results, including the enriched graph.

Questions:

  1. How many functions related to down-regulated genes have been detected as enriched? Can you find any interesting functions related to the purpose of the study?

  2. Click on the Show Bar Chart button. Interpret the chart.

  3. Click on the Make Enriched graph button. How many charts are generated? Why? Try to understand the charts.

  4. Click on the Reduce to Most Specific button. How many enriched functions are there now? 

  5. Open the remaining visualizations (Treemap and WordCloud) to see the results of the enrichment analysis in different ways.