Correspondence  |   October 2008
Propensity Scores Do Not Necessarily Lie!
Author Affiliations & Notes
  • Yvonne Vergouwe, Ph.D.
  • *Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht, The Netherlands.
Article Information
Correspondence   |   October 2008
Propensity Scores Do Not Necessarily Lie!
Anesthesiology 10 2008, Vol.109, 746-747. doi:10.1097/ALN.0b013e3181863894
Anesthesiology 10 2008, Vol.109, 746-747. doi:10.1097/ALN.0b013e3181863894
To the Editor:—  Recently, an Editorial View was published on propensity score methods.1 The editorial describes strengths and weaknesses of propensity score methods in observational therapeutic studies. The authors apparently refer in their title to a quote said by the English Prime Minister Benjamin Disraeli (1804–1880) in the 19th century: “There are lies, damn lies and statistics.” In general, we appreciate links to statements from outside the clinical research world. However, the title of the Editorial View may be misinterpreted as a statement against propensity scores. Readers of Anesthesiology in general are not professional statisticians and may be reluctant to use propensity scores, even in appropriate situations, because of such a title.
The editorial is of value because it reviews an important problem of observational therapeutic studies. In such studies, investigators do not have control over who is or is not receiving the index treatment, which potentially results in imbalance of prognostic factors across the treatment arms. In the absence of randomization, treatment indication and assignment are typically related to the prognosis of the patient. For example, more patients with advanced disease may be given the index treatment than patients with early disease stages. As a consequence, the estimated treatment effect can be biased. This is known as confounding by indication and can be adjusted for in the statistical analysis. However, adjustments can be made only for prognostic factors that were measured in the study. Prognostic factors that were not measured may introduce hidden bias, for which adjustment is not possible. Any statistical method that aims to adjust for confounding by indication suffers from this problem, which is by no means restricted to propensity score methods! Propensity score methods may even have particular advantages over other correction methods. Therefore, the chosen title of the Editorial View was in our view very unfortunate.1 
Prognostic factors can influence the treatment effect only if the factors are related both to the patient outcome and to the assignment of treatment. This implies that two different analytical strategies are possible. Conventionally, the measured prognostic factors are directly included in a regression model together with the assigned treatment and with the patient outcome as a dependent variable (treatment model). The propensity scores method contains two steps. First, the focus is on the association between the assigned treatment (dependent variable) and the prognostic factors, to develop a so-called propensity score. The propensity score predicts the probability of having received the index treatment based on the prognostic factors. Second, the focus is on the association between the patient outcome and the prognostic factors included as one combined variable (i.e.  , the propensity score) together with the assigned treatment. The propensity score is here used to adjust the treatment effect for all prognostic factors.2 
Nuttall et al.  seem to suggest that both analytical methods are equally insufficient. We like to stress that propensity score methods have particular advantages when the outcome event is rare, the treatment is common, and many prognostic factors are collected.3 The low number of outcome events in fact limits the number of prognostic factors that can be included in the conventional treatment model. A low ratio of “number of events over number of included factors” jeopardizes proper estimation of the treatment effect in the regression analysis. In contrast, the numbers of patients in the two treatment groups are generally high. This allows for adequate modeling of the association between the treatment assignment and many prognostic factors—a high ratio of “number of patients with the treatment over number of included factors.” Subsequently, the treatment model includes only the assigned treatment and the propensity score, allowing for a proper and adjusted estimation of the treatment effect, despite the low number of outcome events. The efficiency of propensity scores in relation to the number of outcome events has been shown in a previous study, where propensity scores were found to produce less biased, more robust, and more precise estimates when fewer than seven events were available for each prognostic factor.4 
Like any other correction method in observational therapeutic studies, propensity scores cannot control for hidden bias. However, sensitivity analysis has been proposed to indicate the magnitude of hidden bias that should be present to alter the conclusion of the study.5 Furthermore, propensity scores cannot fix other potential methodologic bias, as discussed by Nuttall et al.  , which again applies also to the conventional approach. Propensity scores do not pretend to solve these problems. Hence, propensity scores can not be considered as “liars.”
In conclusion, Nuttall et al.  discussed confounding by indication as an important weakness of observational therapeutic studies. However, when for ethical, economical, or practical reasons randomized trials can not be conducted, observational studies are the only appropriate alternative.6 Imbalance in prognostic factors can be adjusted for in the analysis. Particularly when the number of outcome events is small, propensity score methods can more efficiently adjust for the imbalance than can conventional methods. Sensitivity analysis may complete the statistical analysis to study possible effects of hidden bias.
*Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht, The Netherlands.
Nuttall GA, Houle TT: Liars, damn liars, and propensity scores. Anesthesiology 2008; 108:3–4Nuttall, GA Houle, TT
Rosenbaum PR, Rubin DB: Reducing bias in observational studies. J Am Stat Assoc 1984; 79:516–24Rosenbaum, PR Rubin, DB
Braitman LE, Rosenbaum PR: Rare outcomes, common treatments: Analytic strategies using propensity scores. Ann Intern Med 2002; 137:693–5Braitman, LE Rosenbaum, PR
Cepeda MS, Boston R, Farrar JT, Strom BL: Comparison of logistic regression versus  propensity score when the number of events is low and there are multiple confounders. Am J Epidemiol 2003; 158:280–7Cepeda, MS Boston, R Farrar, JT Strom, BL
Rosenbaum PR: Discussing hidden bias in observational studies. Ann Intern Med 1991; 115:901–5Rosenbaum, PR
Vandenbroucke JP: When are observational studies as credible as randomised trials? Lancet 2004; 363:1728–31Vandenbroucke, JP