Correspondence  |   January 2008
The Use of Simulation Education in Competency Assessment: More Questions than Answers
Author Notes
  • Stanford University School of Medicine, Stanford, California.
Article Information
Correspondence   |   January 2008
The Use of Simulation Education in Competency Assessment: More Questions than Answers
Anesthesiology 1 2008, Vol.108, 167. doi:
Anesthesiology 1 2008, Vol.108, 167. doi:
To the Editor:—
With great interest many anesthesia educators have read three manuscripts in previous issues of Anesthesiology. Morgan, a well-known and respected researcher in simulation education, and her coauthors presented a thoughtful description of the use of simulation in the evaluation of teamwork in obstetrical practice.1 However, in light of their conclusions that the “study does not support the use of Human Factors Rating Scale for assessment of obstetrical teams” and recommended limited use of the Global Rating Scale, taken along with Murray and Enarson’s reflective editorial2 on the difficulties of teamwork and communication skills assessment, I fear that some anesthesia educators might be tempted to throw the newly born discipline of “simulation based assessment” out with the proverbial bathwater.
Morgan et al.  ’s investigation1 raises several issues that need to be addressed in light of our urgent need to develop authentic teaching and assessment of clinical competency in anesthesiology.3 The most pressing is our need for reliable and valid performance assessment tools used in anesthesia education, training, and practice.
Morgan et al.  found the Human Factors Rating Scale and Global Rating Scale assessment tools of limited reliability in the obstetrical setting; however, they do not further examine the sources of variance in the reliability other than from the raters themselves. Although the number of raters, the number of items, and the occasions of testing seem to be sufficient, we have no definitive analysis of this. Classic Test Theory and the use of interrater reliability and inter- and intraclass correlations for diagnosis of measurement error does not provide a view of the relative importance or interactions of these and other variances sources. Modern Test Theory, specifically Generalizability Theory, provides an analysis of multiple sources of variance and determination of optimal sampling not only for raters but also for subjects, items, and occasions of testing.4–6 In this investigation,1 the nonconcordance of the correlations of both Human Factors Rating Scale and Global Rating Scale suggest that something else is going on here. It could be the result of nonparallel scenarios, lack of rater training, or (more significantly), faulty construct validity of the Human Factors Rating Scale and Global Rating Scale for the anesthesia Crisis Resource Management trait. However, we do not know from this report.
On reviewing of the original development of the Human Factors Rating Scale7 and its 2000 revision, we still do not have formal psychometric analysis of its construct validity and factor status “due to limited sample size.”8 Although its authors claim that the items cluster around “team roles, authority/command structure, stress recognition and organizational climate,”8 these highly complex behaviors warrant more formal factor analysis before general use. We as anesthesiologists would not use a new clinical test without knowing the measure of its specificity and sensitivity; we should be equally rigorous with the degree of validity and reliability in high-stakes testing and use of resource-intense instruction. Modern Test Theory offers many advantages necessary for the authentic assessment of complex cognitive, technical, and behaviors skills in simulation-based education and performance assessment.9 
Stanford University School of Medicine, Stanford, California.
Morgan PJ, Pittini R, Regehr G, Marrs C, Haley MF: Evaluating teamwork in a simulated obstetric environment. Anesthesiology 2007; 106:907–15Morgan, PJ Pittini, R Regehr, G Marrs, C Haley, MF
Murray D, Enarson C: Communication and teamwork: Essential to learn but difficult to measure. Anesthesiology 2007; 106:895–6Murray, D Enarson, C
Tetzlaff JE: Assessment of competency in anesthesiology. Anesthesiology 2007; 106:812–25Tetzlaff, JE
Shavelson RJ WN: Generalizability theory: A primer. Newberry Park, SAGE Publications, 1911Shavelson, RJWN Newberry Park SAGE Publications
Handbook of Complementary Methods in Education Research. Washington, DC, American Educational Research Association, 2006 Washington, DC American Educational Research Association
Boulet JR, Murray D, Kras J, Woodhouse J, McAllister J, Ziv A: Reliability and validity of a simulation-based acute care skills assessment for medical students and residents. Anesthesiology 2003; 99:1270–80Boulet, JR Murray, D Kras, J Woodhouse, J McAllister, J Ziv, A
Helmreich RL MA, Sherman PJ, Gregoric SE, Wiener EL. The Flight Management Attitudes Questionnaire (FMAQ), technical report 93-4. Austin, NASA/UT/FAA, 1993Helmreich, RLMA Sherman, PJ Gregoric, SE Wiener, EL Austin NASA/UT/FAA
Sexton J. Operating Room Management Attitudes Questionnaire, technical report 00-2. Austin, University of Texas at Austin, 2000:2Sexton, J Austin University of Texas at Austin
Colliver JA: Effect-size measures and research in developmental and behavioral pediatrics. J Dev Behav Pediatr 2007; 28:145–50Colliver, JA