How Close Is Close Enough?: Testing Nonexperimental Estimates Of Impact Against Experimental Estimates Of Impact With Education Test Scores As Outcomes

Document Type

Paper

Publication Date

1-1-2002

Published In

How Close Is Close Enough?: Testing Nonexperimental Estimates Of Impact Against Experimental Estimates Of Impact With Education Test Scores As Outcomes

Series Title

Institute For Research On Poverty Discussion Paper

Abstract

In this study we test the performance of some nonexperimental estimators of impacts applied to an educational intervention—reduction in class size—where achievement test scores were the outcome. We compare the nonexperimental estimates of the impacts to "true impact" estimates provided by a random-assignment design used to assess the effects of that intervention. Our primary focus in this study is on a nonexperimental estimator based on a complex procedure called propensity score matching. Previous studies which tested nonexperimental estimators against experimental ones all had employment or welfare use as the outcome variable. We tried to determine whether the conclusions from those studies about the performance of nonexperimental estimators carried over into the education domain. Project Star is the source of data for the experimental estimates and the source for drawing nonexperimental comparison groups used to make nonexperimental estimates. Project Star was an experiment in Tennessee involving 79 schools in which students in kindergarten through third grade were randomly assigned to small classes (the treatment group) or to regular-size classes (the control group). The outcome variables from the data set were the math and reading achievement test scores. We carried out the propensity-score-matching estimating procedure separately for each of 11 schools' kindergartens and used it to derive nonexperimental estimates of the impact of smaller class size. We also developed proper standard errors for the propensity-score-matched estimators by using bootstrapping procedures. We found that in most cases, the propensity-score estimate of the impact differed substantially from the "true impact" estimated by the experiment. We then attempted to assess how close the nonexperimental estimates were to the experimental ones. We suggested several different ways of attempting to assess "closeness." Most of them led to the conclusion, in our view, that the nonexperimental estimates were not very "close" and therefore were not reliable guides as to what the "true impact" was. We put greatest emphasis on looking at the question of "how close is close enough" in terms of a decision-maker trying to use the evaluation to determine whether to invest in wider application of the intervention being assessed—in this case, reduction in class size. We illustrate this in terms of a rough cost-benefit framework for small class size as applied to Project Star. We find that in 30 to 45 percent of the 11 cases, the propensity-score-matching nonexperimental estimators would have led to the "wrong" decision.

Share

COinS