Gene Set Test

The Gene Set Test tool tests whether GO terms are over-represented in a set of differentially expressed genes (input as a statistical comparison track) using a hypergeometric test. The tool will require a GO annotation file that must be previously saved in the Navigation Area of the workbench.

GO annotation files are available from several sources (Blast2Go, GO ontology database). Before import, check that the table does have a GO column, and if not, edit the table to change the relevant column header to GO before import in the workbench using the Standard Import function. For GO annotation files in GAF format, use the option "Force import as type: Gene Ontology Annotation file" from the drop down menu at the bottom of the Standard Import dialog.

RefSeq files are available via the Data Manager, and are saved in the "CLC_Reference" folder in the Navigation Area if you have already downloaded a Reference Data Set.

It is also possible to format a text file of custom annotations into a format the Gene Set Test tool can use (see Generic annotation file for expression data format and Generic ontology annotation files).

This custom annotation file can be imported using the Standard Import functionality.

To start the tool:

        Toolbox | RNA-Seq Analysis | Gene Set Test

Select a statistical comparison track (Image stats_track_16_n_p) and click Next (see figure 29.40). To run several statistical comparisons at once, use the batch function.

Image genesettest_output
Figure 29.40: Select one statistical comparison.

In the "Annotation testing parameters" dialog, you need to specify a GO annotation file and have several annotation testing options(see figure 29.41).

Image genesettest_output1
Figure 29.41: Select annotation tetsing parameters.

Click Next to access the "Filtering parameters" dialog (see figure 29.42).

Image genesettest_output2
Figure 29.42: Specify filtering parameters.

Instead of annotating all genes present in the statistical comparison track, it is possible to focus on the subset of genes that are differentially expressed. The filtering parameters allow you to define this subset:

Click Finish to Open or Save the file in a specified location of the Navigation Area.

During analysis, a black banner in the left hand side of the workbench warns if duplicate features were found while processing the file. If you get this warning message, consider unchecking the "Ignore gene name capitalization" option.

The output is a table called "GO enrichment analysis" (see figure 29.43). The table is sorted in order of ascending p-values but it can easily be sorted differently, as well as filtered to highlight only the GO terms that are over-represented. The table also provides FDR and Bonferroni-corrected p-values. Note that the p-values provided in the table are meant as a guide, as GO annotations are not strictly independent of each other (for example, "reproduction" is a broad category that encompass a nested set of terms from other categories such as "pheromone biosynthetic process").

Image genesettest_output3
Figure 29.43: The GO enrichment analysis table generated by the Gene Set Test tool.