Empirical analysis of DGE - implementation parameters
The 'Empirical analysis of DGE' algorithm in the CLC Genomics Workbench is a re-implementation of the "Exact Test", available as part of the EdgeR Bioconductor package.
The parameter values used in the CLC Genomics Workbench implementation are the default values for the equivalent parameters in the EdgeR Bioconductor implementation in all but one case. The exception is the estimateCommonDisp parameter, where the default is more stringent than that of EdgeR. The advantage of using a more stringent value for this parameter is that the results will be more accurate. The disadvantage is that the algorithm will be slightly slower, however according to our performance tests, this change has only a marginal impact on the run time of the tool. Overall, the user has a somewhat compromised run time but gains greater confidence in the results at the end.
The parameter values used in the CLC Genomics Workbench implementation, with reference to the EdgeR function names for clarity, are provided in the table below.
Function in BioC package | Parameter name | Value used and comments |
calcNormFactors | method | "TMM" |
refColumn | NULL (automatically selected) | |
logratioTrim | 0.3 | |
sumTrim | 0.05 | |
doWeighting | TRUE | |
Acutoff | -1e10 | |
estimateCommonDisp | tol | 1e-14 (default in edgeR: 1e-6) |
rowsum.filter | Set by user in wizard ("Total count filter cutoff", default 5) | |
estimateTagewiseDisp | prior.df | 10 |
trend | "movingave" | |
span | NULL | |
method | "grid" | |
grid.length | 11 | |
grid.range | c(-6, 6) | |
mglmOneGroup | maxit | 50 |
tol | 1e-10 | |
aveLogCPM | prior.count | 2 |
dispersion | 0.05 | |
exactTest | pair | Set by user in wizard ("Exact test comparisons") |
dispersion | "auto" (tagwise if available, otherwise common) | |
rejection.region | "doubletail" | |
big.count | 900 | |
prior.count | 0.125 |