For this, you need a GO association file, which includes gene names and associated Gene Ontology terms. You can download that from the Gene Ontology web site for different species (http://www.geneontology.org/GO.downloads.annotations.shtml). However, it is better to use a file with only the top-level GO terms annotated. For some species you can get that directly or you can create one on your own via the QuickGO tool (http://www.ebi.ac.uk/QuickGO/GMultiTerm).
When you run the GO Enrichment Analysis, you have to specify both the annotation association file, a gene track and finally which ontology (cellular component, biological process or molecular function) you like to test for (see figure 26.40).
The analysis starts by associating all of the variants from the input track with genes in the gene track, based on overlap with the gene annotations. Next, the Workbench tries to match gene names from the gene track with the gene names in the GO association file. Please be aware that the same gene name definition should be used in both files.
Based on this, the Workbench finds GO terms that are over-represented in the list. To find out which GO terms are over-represented, a hypergeometric test is used applied on the number of altered genes having GO term X in comparison to the number all genes in the GO association file having the same GO term.
The result is a table with GO terms and the calculated p-value for the candidate variants and a new variant file with annotated GO terms and the corresponding p-value. The p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed. That means how significant (trustworthy) a result is. In case of a small p-value the chance achieving the same result by chance with the same test statistic is very small.