Prepare Data with Missing Values for Mediation Analysis
Source:R/PrepareMissingData.R
PrepareMissingData.RdHandles missing values in the dataset through multiple imputation and prepares the imputed datasets for within-subject mediation analysis. The function imputes missing data, processes each imputed dataset, and provides diagnostics for the imputation process.
Usage
PrepareMissingData(
data_missing,
m = 5,
method_num = "pmm",
seed = 123,
M_C1,
M_C2,
Y_C1,
Y_C2,
C_C1 = NULL,
C_C2 = NULL,
C = NULL,
C_type = NULL,
W = NULL,
W_type = NULL,
center_W = TRUE,
keep_W_raw = TRUE,
keep_C_raw = TRUE
)Arguments
- data_missing
A data frame containing the raw dataset with missing values.
- m
An integer specifying the number of imputations to perform. Default is
5.- method_num
Character; imputation method for numeric variables (for example,
"pmm","norm"). Default is"pmm".- seed
An integer specifying the random seed for reproducibility. Default is
123.- M_C1
A character vector of column names representing mediators at condition 1.
- M_C2
A character vector of column names representing mediators at condition 2. Must match the length of
M_C1.- Y_C1
A character string representing the column name of the outcome variable at condition 1.
- Y_C2
A character string representing the column name of the outcome variable at condition 2.
- C_C1
Character vector of within-subject control variable names (condition 1).
- C_C2
Character vector of within-subject control variable names (condition 2).
- C
Character vector of between-subject control variable names.
- C_type
Optional vector of the same length as
C. Each element is"continuous","categorical", or"auto"(default). Ignored whenC = NULL.- W
Optional character vector: moderator names (at most J).
- W_type
Optional vector of the same length as
W. Same coding asC_type. Ignored whenW = NULL.- center_W
Logical. Whether to center the moderator variable W.
- keep_W_raw, keep_C_raw
Logical; keep the original W / C columns in the returned data?
Value
A list containing:
processed_data_listA list of
mdata frames, each representing an imputed and processed dataset ready for within-subject mediation analysis.imputation_summaryA summary of the imputation process, including diagnostics and convergence information.
Details
This function is designed to preprocess datasets with missing values for mediation analysis. It performs the following steps:
Multiple imputation: Uses specified imputation methods (for example, predictive mean matching) to generate
mimputed datasets.Data preparation: Applies
PrepareDatato each of themimputed datasets to calculate difference scores and centered averages for mediators and the outcome variable.Imputation diagnostics: Provides summary diagnostics for the imputation process, including information about missing data patterns and convergence.
This function integrates imputation and data preparation, ensuring that the resulting datasets are ready for subsequent mediation analysis.
Examples
# Example dataset with missing values
data("example_data", package = "wsMed")
set.seed(123)
example_dataN <- mice::ampute(
data = example_data,
prop = 0.1
)$amp
#> Warning: Data is made numeric because the calculation of weights requires numeric data
# Prepare the dataset with multiple imputations
prepared_missing_data <- PrepareMissingData(
data_missing = example_dataN,
m = 5,
M_C1 = c("A2", "B2"),
M_C2 = c("A1", "B1"),
Y_C1 = "C2",
Y_C2 = "C1"
)
#> Class: mids
#> Number of multiple imputations: 5
#> Imputation methods:
#> C2 C1 A2 B2 A1 B1
#> "pmm" "pmm" "" "pmm" "" "pmm"
#> PredictorMatrix:
#> C2 C1 A2 B2 A1 B1
#> C2 0 1 1 1 1 1
#> C1 1 0 1 1 1 1
#> A2 1 1 0 1 1 1
#> B2 1 1 1 0 1 1
#> A1 1 1 1 1 0 1
#> B1 1 1 1 1 1 0
# Access processed datasets
processed_data_list <- prepared_missing_data$processed_data_list
imputation_summary <- prepared_missing_data$imputation_summary