Computer Science Faculty Works

The Impact Of Imputation Timing On Model Performance Estimation

Ben Mitchell , '05, Swarthmore CollegeFollow
Shikha Shrestha , '24

Document Type

Conference Proceeding

Publication Date

8-19-2025

Published In

2025 IEEE International Conference On Artificial Intelligence Testing (AITest)

Abstract

Handling missing data is a critical challenge in applying machine learning, as most algorithms assume complete data. Imputation, the process of replacing missing values with estimates from available data, is a common solution. This study investigates the impact of imputation timing (before vs. after train-test split) on machine learning classifier performance estimates, particularly focusing on the biases introduced by different imputation strategies. We evaluate the effects of imputation before train-test split (IBS) and imputation after train-test split (IAS) across multiple datasets and imputation methods, including Random Forest (RF), KNN, and Mean Imputation. Our findings reveal that IBS consistently overestimates generalization performance, with severity worsening as the proportion of missing data increases, while IAS underestimates performance, again worsening as missing data fractions grow. These discrepancies highlight the potential for bias and instability in performance estimates, emphasizing the need for careful handling of imputation techniques to avoid misleading conclusions about model robustness. Our results further underscore the influence of missing data rates and dataset characteristics on classifier performance, suggesting that no single imputation method is universally appropriate.

Keywords

Missing Data, Imputation, Performance Estimation, Data Preprocessing, Data Quality

Published By

IEEE

Conference

IEEE AITest 2025

Conference Dates

July 21-24, 2025

Conference Location

Tucson, AZ

Recommended Citation

Ben Mitchell , '05 and Shikha Shrestha , '24. (2025). "The Impact Of Imputation Timing On Model Performance Estimation". 2025 IEEE International Conference On Artificial Intelligence Testing (AITest). 201-208. DOI: 10.1109/AITest66680.2025.00032
https://works.swarthmore.edu/fac-comp-sci/130

This document is currently not available here.

Find in your library

COinS

Computer Science Faculty Works

The Impact Of Imputation Timing On Model Performance Estimation

Document Type

Publication Date

Published In

Abstract

Keywords

Published By

Conference

Conference Dates

Conference Location

Recommended Citation

Search

Browse

Resources

Computer Science Faculty Works

The Impact Of Imputation Timing On Model Performance Estimation

Authors

Document Type

Publication Date

Published In

Abstract

Keywords

Published By

Conference

Conference Dates

Conference Location

Recommended Citation

Share

Search

Browse

Resources