Skip to contents

When testing for enrichment of feature annotations, we frequently test thousands of annotations simultaneously, which increases the multiple testing burden considerably. Given many annotations have very few annotated features, there will be too little power to detect over-representation. Inspired by DESeq2, which was in turn inspired by Bourgon, Gentleman, and Huber (2010), this function applies the principle of 'independent filtering' to threshold the minimum number of features per annotation (in the background) to optimally limit the multiple testing burden. This is statistically valid because the filtering criteria (number of features per annotation) is statistically independent from the test statistic under the null hypothesis. The independent filtering works because the filtering criteria is correlated with the test statistic under the alternative hypothesis. For further justification and discussion of independent filtering, see Bourgon, Gentleman, and Huber (2010) and the DESeq2 vignette.

The code function is largely lifted from DEseq:::pvalueAdjustment

Please cite Bourgon, Gentleman, and Huber (2010) when using this function.

Usage

add_independent_filtering_padj(
  obj,
  alpha = 0.1,
  theta = seq(0, 1, 0.01),
  p_value_col = "over_represented_pvalue",
  filter_col = "numInCat",
  plot = TRUE
)

Arguments

obj

data.frame containing goseq results as generated by get_enriched_go See below for an example.

alpha

numeric alpha value to use for rejection of null hypothesis. Note this is only used for optimising the threshold and any alternative value for alpha can be used downstream

theta

numeric vector of thresholds (fractions of the data to remove)

p_value_col

character Column with p-values

filter_col

character Column with filtering criteria values

plot

logical Plot the relationship between the

Value

Returns the data.frame with an added column denoting the optimised adjusted p-value ('padj_if')