These tools will no longer be maintained as of December 31, 2024. Archived website can be found here. PubMed4Hh GitHub repository can be found here. Contact NLM Customer Service if you have questions.


PUBMED FOR HANDHELDS

Search MEDLINE/PubMed


  • Title: Multiple imputation in the presence of an incomplete binary variable created from an underlying continuous variable.
    Author: Grobler AC, Lee K.
    Journal: Biom J; 2020 Mar; 62(2):467-478. PubMed ID: 31304611.
    Abstract:
    Multiple imputation (MI) is used to handle missing at random (MAR) data. Despite warnings from statisticians, continuous variables are often recoded into binary variables. With MI it is important that the imputation and analysis models are compatible; variables should be imputed in the same form they appear in the analysis model. With an encoded binary variable more accurate imputations may be obtained by imputing the underlying continuous variable. We conducted a simulation study to explore how best to impute a binary variable that was created from an underlying continuous variable. We generated a completely observed continuous outcome associated with an incomplete binary covariate that is a categorized version of an underlying continuous covariate, and an auxiliary variable associated with the underlying continuous covariate. We simulated data with several sample sizes, and set 25% and 50% of data in the covariate to MAR dependent on the outcome and the auxiliary variable. We compared the performance of five different imputation methods: (a) Imputation of the binary variable using logistic regression; (b) imputation of the continuous variable using linear regression, then categorizing into the binary variable; (c, d) imputation of both the continuous and binary variables using fully conditional specification (FCS) and multivariate normal imputation; (e) substantive-model compatible (SMC) FCS. Bias and standard errors were large when the continuous variable only was imputed. The other methods performed adequately. Imputation of both the binary and continuous variables using FCS often encountered mathematical difficulties. We recommend the SMC-FCS method as it performed best in our simulation studies.
    [Abstract] [Full Text] [Related] [New Search]