Newly Published
Education  |   October 2017
Implementation and Evaluation of the Z-Score System for Normalizing Residency Evaluations
Author Notes
  • From the Departments of Anesthesiology (J.P.W., B.S.R., W.S.S., M.D.M.) and Biomedical Informatics (J.P.W.), Vanderbilt University Medical Center, Nashville, Tennessee; and Department of Surgery, Federal University of Santa Catarina, Florianópolis, Brazil (G.R.d.O.F.).
  • Supplemental Digital Content is available for this article. Direct URL citations appear in the printed text and are available in both the HTML and PDF versions of this article. Links to the digital files are provided in the HTML text of this article on the Journal’s Web site (www.anesthesiology.org).
    Supplemental Digital Content is available for this article. Direct URL citations appear in the printed text and are available in both the HTML and PDF versions of this article. Links to the digital files are provided in the HTML text of this article on the Journal’s Web site (www.anesthesiology.org).×
  • Submitted for publication June 20, 2017. Accepted for publication September 11, 2017.
    Submitted for publication June 20, 2017. Accepted for publication September 11, 2017.×
  • Acknowledgment: The authors thank Nimesh Patel for his efforts in developing our SQL implementation of the Z-score system.
    Acknowledgment: The authors thank Nimesh Patel for his efforts in developing our SQL implementation of the Z-score system.×
  • Research Support: This work was supported by the Department of Anesthesiology, Vanderbilt University Medical Center, Nashville, Tennessee. Dr. Wanderer was funded by the Foundation for Anesthesia Education and Research and the Anesthesia Quality Institute’s Mentored Research Training Grant-Health Services Research.
    Research Support: This work was supported by the Department of Anesthesiology, Vanderbilt University Medical Center, Nashville, Tennessee. Dr. Wanderer was funded by the Foundation for Anesthesia Education and Research and the Anesthesia Quality Institute’s Mentored Research Training Grant-Health Services Research.×
  • Competing Interests: The authors declare no competing interests. Dr. McEvoy received funding (not related to this article) from the GE Foundation for educational research work in Kenya, from Edwards Lifesciences for research in goal-directed fluid therapy, and from Cheetah Medical for research in goal-directed fluid therapy.
    Competing Interests: The authors declare no competing interests. Dr. McEvoy received funding (not related to this article) from the GE Foundation for educational research work in Kenya, from Edwards Lifesciences for research in goal-directed fluid therapy, and from Cheetah Medical for research in goal-directed fluid therapy.×
  • Correspondence: Address correspondence to Dr. Wanderer: Vanderbilt University Medical Center, 1301 Medical Center Drive, TVC 4648, Nashville, Tennessee 37204. jonathan.p.wanderer@vanderbilt.edu. Information on purchasing reprints may be found at www.anesthesiology.org or on the masthead page at the beginning of this issue. Anesthesiology’s articles are made freely accessible to all readers, for personal use only, 6 months from the cover date of the issue.
Article Information
Education / Education / CPD / Quality Improvement
Education   |   October 2017
Implementation and Evaluation of the Z-Score System for Normalizing Residency Evaluations
Anesthesiology Newly Published on October 19, 2017. doi:10.1097/ALN.0000000000001919
Anesthesiology Newly Published on October 19, 2017. doi:10.1097/ALN.0000000000001919
Abstract

Background: Assessment of clinical competence is essential for residency programs and should be guided by valid, reliable measurements. We implemented Baker’s Z-score system, which produces measures of traditional core competency assessments and clinical performance summative scores. Our goal was to validate use of summative scores and estimate the number of evaluations needed for reliable measures.

Methods: We performed generalizability studies to estimate the variance components of raw and Z-transformed absolute and peer-relative scores and decision studies to estimate the evaluations needed to produce at least 90% reliable measures for classification and for high-stakes decisions. A subset of evaluations was selected representing residents who were evaluated frequently by faculty who provided the majority of evaluations. Variance components were estimated using ANOVA.

Results: Principal component extraction from 8,754 complete evaluations demonstrated that a single factor explained 91 and 85% of variance for absolute and peer-relative scores, respectively. In total, 1,200 evaluations were selected for generalizability and decision studies. The major variance component for all scores was resident interaction with measurement occasions. Variance due to the resident component was strongest with raw scores, where 30 evaluation occasions produced 90% reliable measurements with absolute scores and 58 for peer-relative scores. For Z-transformed scores, 57 evaluation occasions produced 90% reliable measurements with absolute scores and 55 for peer-relative scores. The results were similar for high-stakes decisions.

Conclusions: The Baker system produced moderately reliable measures at our institution, suggesting that it may be generalizable to other training programs. Raw absolute scores required few assessment occasions to achieve 90% reliable measurements.