Exercise 3: Functional Annotation
The goal of this exercise is to perform all steps of the Blast2GO functional annotation pipeline for a set of 100 Ascaridia galli assembled transcript sequences. This way we can learn more about each annotation step (blast, mapping and annotation), as well as to explore and understand results. Please perform the following tasks and answer the questions.
Task 1: Blast search
First, load the sequences into OmicsBox. For this, go to File → Load → Load Sequences → Load Fasta file. Select the “100_seqs_a_galli.fasta” file.
OmicsBox uses the Basic Local Alignment Search Tool (BLAST) to find sequences similar to your query set. In this case, we will use the CloudBlast option to blast against NCBI Nematoda (nr subset) database. To do this, go to functional analysis → Blast → Run Blast → CloudBlast, and configure the analysis as follows:
Select the blastx-fast option as Blast Program.
Select the Non-redudnand protein sequences (nr_v5) blast database.
Establish the "nematoda" taxa as Taxonomy filter. You should put the NCBI taxonomy ID: 6231. Click on the “Add” button.
Set the "Blast against a subset of taxonomies" option.
Leave the remaining parameters as default.
In the last wizard page, establish an output directory for XML2 outputs.
As the BLAST search progresses, sequences with successful BLAST results change their color on the Main Sequence Table from white to orange and the BLAST result related columns will be filled. In case no results could be retrieved for a given sequence, this row will turn dark-red. With a mouse the right click on a sequence, the Single Sequence Menu will be displayed and it is possible to see the BLAST results for each sequence individually.
Task 2: Ontology mapping
Once Blast is finished you can launch the Gene Ontology Mapping step. Mapping is the process of retrieving GO terms associated with the Hits obtained by the BLAST search. OmicsBox performs four different mappings steps:
BLAST result accessions are used to retrieve gene names or symbols making use of two mapping files provided by the NCBI (gene_info, gene2accession). Identified gene names are then searched in the species-specific entries of the gene-product table of the GO database.
GeneBank identifiers (gi), the primary blast Hit ids, are used to retrieve UniProt IDs making use of a mapping file from PIR (Non-redundant Reference Protein Database) including PSD, UniProt, Swiss-Prot, TrEMBL, RefSeq, GenPept and PDB.
Accessions are searched directly in the dbxref table of the GO database.
BLAST result accessions are searched directly in the gene-product table of the GO database.
Go to functional analysis → Blast2GO Mapping → Run GO Mapping, and run the analysis.
Task 3: Annotation step
This is the process of selecting GO terms from the GO pool obtained by the Mapping step and assigning them to the query sequences. GO annotation is carried out by applying an annotation rule (AR) on the found ontology terms. The rule seeks to find the most specific annotations with a certain level of reliability. This process is adjustable in specificity and stringency.
For each candidate GO an annotation score (AS) is computed. The AS is composed of two additive terms:
The first, direct term (DT), represents the highest hit similarity of this GO weighted by a factor corresponding to its EC.
The second term (AT) of the AS provides the possibility of abstraction. This is defined as an annotation to a parent node when several child nodes are present in the GO candidate collection. This term multiplies the number of total GOs unified at the node by a user-defined GO weight factor that controls the possibility and strength of abstraction. When GO weight is set to 0, no abstraction is done.
Finally, the AR selects the lowest term per branch that lies over a user-defined threshold.
Go to functional analysis → Blast2GO Annotation → Run Annotation, and configure the analysis using the default parameters.