The Impact Of Imputation Timing On Model Performance Estimation
Document Type
Conference Proceeding
Publication Date
8-19-2025
Published In
2025 IEEE International Conference On Artificial Intelligence Testing (AITest)
Abstract
Handling missing data is a critical challenge in applying machine learning, as most algorithms assume complete data. Imputation, the process of replacing missing values with estimates from available data, is a common solution. This study investigates the impact of imputation timing (before vs. after train-test split) on machine learning classifier performance estimates, particularly focusing on the biases introduced by different imputation strategies. We evaluate the effects of imputation before train-test split (IBS) and imputation after train-test split (IAS) across multiple datasets and imputation methods, including Random Forest (RF), KNN, and Mean Imputation. Our findings reveal that IBS consistently overestimates generalization performance, with severity worsening as the proportion of missing data increases, while IAS underestimates performance, again worsening as missing data fractions grow. These discrepancies highlight the potential for bias and instability in performance estimates, emphasizing the need for careful handling of imputation techniques to avoid misleading conclusions about model robustness. Our results further underscore the influence of missing data rates and dataset characteristics on classifier performance, suggesting that no single imputation method is universally appropriate.
Keywords
Missing Data, Imputation, Performance Estimation, Data Preprocessing, Data Quality
Published By
IEEE
Conference
IEEE AITest 2025
Conference Dates
July 21-24, 2025
Conference Location
Tucson, AZ
Recommended Citation
Ben Mitchell , '05 and Shikha Shrestha , '24.
(2025).
"The Impact Of Imputation Timing On Model Performance Estimation".
2025 IEEE International Conference On Artificial Intelligence Testing (AITest).
201-208.
DOI: 10.1109/AITest66680.2025.00032
https://works.swarthmore.edu/fac-comp-sci/130