Perform independent filtering to threshold the over-representation testing on the number of features in each category to limit the multiple testing burden
add_independent_filtering_padj.RdWhen testing for enrichment of feature annotations, we frequently test thousands of annotations simultaneously, which increases the multiple testing burden considerably. Given many annotations have very few annotated features, there will be too little power to detect over-representation. Inspired by DESeq2, which was in turn inspired by Bourgon, Gentleman, and Huber (2010), this function applies the principle of 'independent filtering' to threshold the minimum number of features per annotation (in the background) to optimally limit the multiple testing burden. This is statistically valid because the filtering criteria (number of features per annotation) is statistically independent from the test statistic under the null hypothesis. The independent filtering works because the filtering criteria is correlated with the test statistic under the alternative hypothesis. For further justification and discussion of independent filtering, see Bourgon, Gentleman, and Huber (2010) and the DESeq2 vignette.
The code function is largely lifted from DEseq:::pvalueAdjustment
Please cite Bourgon, Gentleman, and Huber (2010) when using this function.
Usage
add_independent_filtering_padj(
obj,
alpha = 0.1,
theta = seq(0, 1, 0.01),
p_value_col = "over_represented_pvalue",
filter_col = "numInCat",
plot = TRUE
)Arguments
- obj
data.framecontaininggoseqresults as generated byget_enriched_goSee below for an example.- alpha
numericalpha value to use for rejection of null hypothesis. Note this is only used for optimising the threshold and any alternative value for alpha can be used downstream- theta
numericvector of thresholds (fractions of the data to remove)- p_value_col
characterColumn with p-values- filter_col
characterColumn with filtering criteria values- plot
logicalPlot the relationship between the