Summarise per-feature MNAR scores into a single dataset-level index
mnar_index.RdAggregates the per-feature Tjur R² scores produced by mnar_score()
into a single value in 0-1 representing how strongly experimental
condition predicts missingness across the dataset.
Usage
mnar_index(
summary_df,
weight_by = c("miss_frac", "equal"),
coverage_penalty = TRUE
)Arguments
- summary_df
A data.frame as returned in the
$summaryelement ofmnar_score(). Must contain columns:mnar_score,miss_frac, andmnar_class.- weight_by
Character. How to weight features when computing the mean score. One of:
- "miss_frac"
(Default) Weight each feature by its missingness fraction. Features with more missing values contribute more — they represent a larger share of the imputation/modelling challenge.
- "equal"
Unweighted mean across all informative features.
- coverage_penalty
Logical. If TRUE (default), scales the weighted mean score by the fraction of features that are informative (i.e. have at least
min_observedobserved andmin_missingmissing values as set inmnar_score()). When most features are fully observed and therefore uninformative, the dataset-level index is attenuated accordingly. If FALSE, the index reflects only the informative features without penalising for their proportion.
Value
A list with the following elements:
- index
Numeric in 0-1. The dataset-level MNAR index. Values near 0 indicate MAR-like missingness; values near 1 indicate missingness that is strongly structured by experimental condition.
- weighted_mean_score
Numeric. The weighted mean Tjur R² across informative features, before any coverage penalty is applied.
- coverage
Numeric. Fraction of features that were informative (had scoreable missingness).
- n_informative
Integer. Number of informative features.
- n_total
Integer. Total number of features.
- weight_by
Character. The weighting scheme used.
- coverage_penalty
Logical. Whether the coverage penalty was applied.
Details
Unlike mnar_global_score(), which fits a pooled logistic regression
and can miss condition effects that cancel across features (e.g. peptide A
missing in condition X and peptide B missing in condition Y), this function
aggregates absolute per-feature scores and correctly identifies such
opposing patterns as condition-structured missingness.
The index is computed as follows:
Uninformative features (NA score) are excluded.
The weighted mean Tjur R² is computed across informative features, using
miss_fracor equal weights depending onweight_by.If
coverage_penalty = TRUE, the result is multiplied by the fraction of all features that were informative.
The coverage penalty ensures that a dataset where only 5\ have any missingness does not score the same as one where 80\ are missing in a condition-structured way, even if the informative features have identical per-feature scores.
When coverage_penalty = FALSE the index reflects the strength of
condition-missingness association among features that do have
missing values, which may be preferable when missingness prevalence is not
itself of interest.
Examples
if (FALSE) { # \dontrun{
result <- mnar_score(obj, i = "peptides", group_cols = "condition")
# Default: miss_frac weighted, with coverage penalty
idx <- mnar_index(result$summary)
cat("MNAR index:", round(idx$index, 3), "\n")
# Strength among missing features only, no coverage penalty
idx2 <- mnar_index(result$summary,
weight_by = "equal",
coverage_penalty = FALSE)
} # }