Filter
Underneath the volcano plot you find the Filter section (figure 54). You can filter feature table, volcano plot and heat map based on minimum and maximum fold changes, and on regular p-values or FDR p-values depending on the volcano plot p-value selection. You can enter or paste values in the field to adjust thresholds, or you can use the horizontal and vertical bars on the volcano plot to do this.
Figure 54: Adjust fold change and p-value, apply advanced filtering and select a biological insights filter to narrow down the list of genes or miRNAs.
Additional filtering options are available from the Advanced filters button. In the resulting dialog, you can filter on biotypes (when available), minimum, mean and maximum CPM ((Counts per Million, TMM-adjusted) values for each sample group, and minimum, mean and maximum CPM values across groups (figure 55). You can apply multiple filters, one at a time:
- Select criterion
- Select operator and enter value, or select one or more biotypes
- Click Add filter
Figure 55: With Advanced filters, you can filter on biotypes and CPM expression values.
Each advanced filter is added as a filter tag below the dialog table and can be removed again by clicking the 'x'.
Once you have added the desired filters, click Apply filter criteria to feature list to close the dialog.
An badge next to the Advanced filters button indicates the number of applied advanced filters (figure 54).
Text below the regular filter fields signals whether a biological insights filter has been applied.
Click on Save to save the filter including biological insights and advanced filtering.
Click on Reset to reset filtering, or choose one of your previously saved filters from the drop-down list on the right.
P-value and FDR p-value
- P-value. Standard, uncorrected p-value. Genes/transcripts that are not observed in any sample have undefined p-values and are reported as '-'.
- FDR p-value. The false discovery rate corrected p-value. The FDR-corrected p-value will always be larger than the uncorrected p-value.
The differential expression analysis includes the step Filter on average expression for FDR correction. This filters away some genes prior to the FDR correction. As a result, those genes will have undefined FDR p-values, reported as '-'.
The p-value controls the chance of getting a false positive result. When you apply a statistical test and use a p-value cut-off of 0.05, you should expect 5% of your significant results to be false positives.
When you carry out many tests, e.g. test differential expression for many genes in one experiment, you run into the multiple testing problem: If in a multiple-test scenario 10000 tests turn out significant and for each of these you use a 0.05 p-value cut-off, in total you should expect 500 of those significant results to be false positives. This approach may not be useful in practice.
The False Discovery Rate (FDR) p-value addresses this problem and allows you to control the overall false positive rate in the multiple testing scenario. In this approach it is not the chance of making a false positive call in each individual test that is being controlled, but rather the proportion of false positive tests among all significant test. When you use a FDR-corrected p-value cut-off of 0.05, you should expect 5% of your significant tests to be false positives. Mathematically, the FDR-corrected p-values are obtained from examining the distribution of the traditional p-values across all tests performed, and identifying cut-offs between significant and non-significant values.
Typically, the standard p-value is used in cases where only a few tests are being performed in parallel, whereas the FDR corrected p-value is used when many tests are being performed. Analyzing differential expression across many genes or miRNAs is a example of the latter, i.e. for this use case, the FDR p-value is normally recommended. Using the FDR p-value will result in fewer false positive calls, without much loss of sensitivity.