Estimate effect size of over-representation — estimate

This is a crude function to estimate the effect size of over-representation i.e. we know a feature is over-represented, but we want to estimate the effect size/how over-represented it is. This function is typically run after get_enriched_go but will work from the output of goseq, regardless whether the functional enrichment tested was for GO terms.

Usage

estimate_overrep(obj, pwf, gene2cat)

Arguments

obj: data.frame containing goseq results as generated by get_enriched_go or goseq.
pwf: data.frame as used in get_enriched_go or goseq.
gene2cat: data.frame as used in get_enriched_go or goseq.

Value

Returns obj with an extra column added called adj_overrep. This column is calculated for each GO term by:

numDEInCat / numInCat / (avgTermWeight / avgNonTermWeight) / (totalDEFeatures / totalFeatures)

where:

numDEInCat is the number of differentially expressed genes (aka. proteins) assigned to that GO term.
numInCat is the total number of genes (aka. proteins) annotated to that GO term.
avgTermWeight is the average pwf$pwf value for all the differentially expressed genes that were assigned to that GO term.
avgNonTermWeight is the average pwf$pwf for all the other genes supplied in pwf.
totalDEFeatures is the total number of differentially expressed genes indicated in pwf.
totalFeatures is the total number of genes indicated in pwf, i.e. the number of rows.