Clinical Science  |   November 1996
Variation in Expert Opinion in Medical Malpractice Review
Author Notes
  • (Posner) Research Associate Professor, Anesthesiology.
  • (Caplan) Clinical Professor, Anesthesiology; and Chair, Committee on Professional Liability, American Society of Anesthesiologists.
  • (Cheney) Professor and Chair, Anesthesiology.
  • Received from the Departments of Anesthesiology, University of Washington School of Medicine, and the Virginia Mason Medical Center, Seattle, Washington. Submitted for publication May 1, 1996. Accepted for publication July 19, 1996. Supported by the American Society of Anesthesiologists. Preliminary results presented at the 1991 Annual Meeting of the American Society of Anesthesiologists, San Francisco, California, October 29, 1991. The opinions expressed herein are those of the authors and do not represent the policy of the American Society of Anesthesiologists.
  • Address reprint requests to Dr. Posner: Department of Anesthesiology, University of Washington School of Medicine, Box 356540, 1959 NE Pacific Ave., Seattle, Washington 98195-6540. Address electronic mail to:
Article Information
Clinical Science
Clinical Science   |   November 1996
Variation in Expert Opinion in Medical Malpractice Review
Anesthesiology 11 1996, Vol.85, 1049-1054. doi:
Anesthesiology 11 1996, Vol.85, 1049-1054. doi:
Key words: Anesthesiology: liability; peer review. Insurance: claim review; liability. Malpractice, medical: expert testimony.)
The role of the medical expert in malpractice litigation is to assist the court in determining the cause of injury and whether the applicable standard of care was met. Disagreement among medical experts not only plays a prominent role in court proceedings but also influences pretrial decisions on whether a claim of injury is pursued and the manner in which it is pursued. [1,2] An expert opinion that the standard of care was breached may lead a plaintiff to aggressively pursue a case or may encourage a defendant to settle the suit. Conversely, an expert opinion that the standard of care was met may lead to an aggressive defense by the defendant or withdrawal of the suit by the plaintiff.
The existence of conflicting expert opinion is often attributed to bias arising from monetary compensation by the plaintiff or defendant. Less obvious bias may occur when the expert develops a personal affinity for the plaintiff or defendant or for one or more members of the litigation team, introducing advocacy rather than objectivity into the opinions rendered. [3] 
Another important source of conflicting expert opinion may be the nature of expert review. Expert opinion is a form of implicit judgment. Each expert may use his or her own unstated criteria to assess quality of care. [4] This contrasts with an explicit process based on criteria specified before the assessment. [5] 
Few studies of the reliability of quality of care judgments based on implicit peer review criteria have incorporated sound statistical methods. [6] Those that have been conducted indicate that agreement is generally poor. [6] However, most of these studies addressed the issue of peer review of quality of care in some general sense or in the context of quality assurance review, [7-16] which differs from a judgment of whether the care met current standards in medical-legal review. [17,18] Of the studies investigating the reliability of medical-legal judgments of appropriateness of care, few [17,19,20] provided reviewers access to original medical records. In the remaining studies, assessments relied on case abstracts rather than complete and original documents in the peer review process. [21,22] 
This study measures the level of agreement among objective medical expert reviewers of actual malpractice claim files. The review was sponsored by an outside party with no role in the litigation, and the medical experts received no compensation for their reviews. All expert reviewers were members of the same specialty at issue in the claim. They had access to complete claim files, as would a medical expert participating in litigation. With other sources of bias eliminated or held constant, the level of agreement among expert implicit judgments of the appropriateness of care was measured.
Data were collected as part of the American Society of Anesthesiologists (ASA) Closed Claims Project. This project, carried out by the ASA Committee on Professional Liability, is an ongoing study of adverse anesthetic outcomes based on information contained in the closed claim files of 34 professional liability insurance companies throughout the United States that insure approximately 14,500 anesthesiologists. The project and data collection procedures were described previously. [23-27] Procedures specific to this investigation were carried out during ongoing data collection for the Closed Claims Project and are described below.
Between December 1988 and October 1994, teams of anesthesiologist-reviewers visited the offices of ten insurance companies to review files of closed malpractice claims against anesthesiologists. All claims except those for dental damage were eligible for inclusion in the study. At each company, sets of three files each were randomly selected by one of the reviewers from the entire set of files available for review at the start of the visit. Selection of three-file sets was made using a random-number list provided by the investigators. Each three-file set was reviewed by two anesthesiologists who were instructed to refrain from consulting with each other. When possible, paired reviews occurred on different dates, with time of day matched. Each participating anesthesiologist was eligible to review multiple sets of files with different partners on the same or different review visits, up to a maximum of two three-file sets (six files) per reviewer per visit and five three-file sets (15 files) total per reviewer over the course of the study. Reviewers with previous experience reviewing files for the ASA Closed Claims Project were eligible to participate in this study if they were active in the clinical practice of anesthesia and had been in practice at least 3 y. Completed reviews were sent to the central project office for analysis.
The claim review process consists of review of all materials in the file and completion of the ASA Closed Claims Project data collection form according to a standardized set of instructions. Typically a closed claim file contains the hospital record, anesthesia record, narrative statements, statements of involved health care personnel, expert and peer reviews, deposition summaries, outcome reports, and cost of settlement or jury award. Reviewers completed a data collection form for each claim in which there was enough information to reconstruct the sequence of events and nature of the injury. The data collection form consists of more than 140 items covering basic demographics, anesthetic techniques and personnel, damaging events, patient injury, settlement information, assessments of preventibility, judgment of appropriateness of care, and a narrative description of the sequence of events.
For this study, we measured agreement on only a single item on the data collection form: Was the anesthesia care appropriate, less than appropriate, or impossible to judge? Appropriate care was defined as care that was reasonable and prudent by the standards of anesthetic care at the time of the event. Less-than-appropriate care was defined as care that was less than that standard of a reasonable and prudent practitioner at the time of the event. Reviewers were instructed to render a judgment of "impossible to judge" if, due to inadequate or conflicting information, they could not determine if the standard of care had been met. Claims for which this question was unanswered by reviewers were excluded from analysis.
Because severity of patient injury can influence reviewer judgments of appropriateness of care, [21] we separated the claims into two subsets for analysis: (1) temporary or nondisabling injuries and (2) permanent and disabling injuries. We also analyzed the entire set of claims as a group. Temporary or nondisabling injuries include such complications as emotional distress, sore throat, corneal abrasion, and uncomplicated pneumothorax. Permanent and disabling injuries include death, permanent brain damage, major nerve damage, and other injuries from which full recovery cannot occur or is not expected. Claims on which reviewers disagreed on this generalized level of severity of injury were excluded from the subset analysis.
Agreement between paired reviewers was measured using the kappa statistic (Appendix 1). [28,29] Kappa values provide an index of the amount of agreement beyond that expected purely by chance. Because agreement is expected to exceed chance levels among a group of specialists, it is the amount of agreement beyond chance rather than simply statistical significance that serves as the measure of reliability. A kappa value less than 0.40 is considered poor agreement, and 0.40 to 0.75 is fair to good agreement beyond chance. A kappa value greater than 0.75 is considered excellent. [30] Because a sample size of 25 to 30 is required for significance testing of kappa, [29] data collection continued until at least 30 claims in each of the analysis subsets were reviewed. Confidence intervals of kappa were calculated using jackknife calculations of standard error. [22] Probability values less or equal to 0.05 were considered statistically significant.
One hundred three claims, each independently reviewed by two anesthesiologists, were eligible for inclusion in the study. In all, 30 anesthesiologists reviewed 2 to 15 claims each (median of five claims). The median age of reviewers was 48 y (range, 31 to 68 y) and all were board certified. Most had previous experience in an expert witness capacity (25 or 83%). Sixteen (53%) practiced in an academic setting, five (17%) were in private practice, and nine (30%) engaged in private practice with teaching responsibilities. Reviewers had been in practice 5 to 41 y (median, 16 y).
Overall, reviewers agreed on whether the care was appropriate in 64 (62%) of the claims and disagreed in 39 (38%; Table 1). This level of agreement exceeded chance levels but was in the poor-to-good range (kappa = 0.37; 95% CI = 0.22 to 0.52). Reviewers agreed that care was appropriate in 27% of claims, less than appropriate in 32%, and impossible to judge in 3% (Table 1).
Table 1. Paired Ratings of Appropriateness of Care
Image not available
Table 1. Paired Ratings of Appropriateness of Care
Forty-two (41%) of the claims reviewed were for temporary or nondisabling injuries, whereas 50 (49%) involved permanent and disabling injuries. Agreement on severity of injury was excellent (kappa = 0.80). However, in 11 claims (11%), the reviewers disagreed on the general severity of injury. These claims were not included in the subset analysis.
On claims for temporary or nondisabling injuries, the proportion of claims on which there was agreement on appropriateness of care was similar (64%) to agreement for the overall group of claims, although the chance-corrected level of agreement was less (kappa = 0.32). The level of agreement was less for permanent disabling injuries (60%, kappa = 0.27) than agreement for the overall group. All kappa values were statistically significant.
Although the distinction between appropriate and less-than-appropriate care may seem evident to individual physicians, the results of the present study suggest that practicing anesthesiologists exhibit only fair agreement on this issue. When presented with identical malpractice claim files containing extensive documentation and records, objective reviewers agreed on appropriateness of care in 62% of claims (kappa = 0.37). Agreement was not improved by controlling for severity of injury. Although this level of agreement was statistically significant and the upper 95% confidence limit (0.52) did fall into the "good" range, it did not approach the excellent level (0.75). Nonrandom agreement (i.e., statistical significance of kappa) is expected among any group of reviewers sharing similar training and is not a particularly meaningful assessment tool. A kappa value of 0.40 is generally considered the minimally acceptable level, and kappa values in the 0.4 to 0.6 range are common in studies of medical diagnosis and tests.
The use of multiple experts has the potential to improve the reliability of reviewer assessments. [31] Application of the Spearman-Brown formula for stepped-up reliability to the findings of this study suggests that five objective reviewers would be needed to increase reliability to the "excellent" level of kappa = 0.75. [30,31] Although a proposal to incorporate yet more experts into medical malpractice review might at first seem a costly alternative, the long-term consequences of repeatedly consulting a single expert whose opinions deviate from the community norm may be much greater. Incorporation of explicit criteria such as clinical practice guidelines into a structured review process might reduce the number of reviewers needed for a reliable assessment.
The process of review in the present study was analogous to that which occurs when a medical expert reviews a case for a malpractice proceeding, except that sources of bias were minimized or held constant. As in actual malpractice review, each reviewer was provided with a detailed set of original records and related documents. Judgments of appropriateness of care were based on the conventional yardstick of reasonable and prudent practice. [32] In addition, each reviewer was specifically instructed to refrain from consulting with colleagues during the process of formulating an opinion. This is the usual practice in litigation review. The reviewers were all experienced in claims review, board certified, and active in the clinical practice of anesthesiology. Reviewers all met the current ASA guidelines for expert witnesses.* Unlike actual medical experts, these reviewers were not paid for rendering their opinions, thus avoiding the potential advocacy relationship that may result from the economic framework of actual expert review in malpractice proceedings.
Previously we showed that standard-of-care judgments based on implicit criteria are influenced by case outcome. [21] When 112 anesthesiologists were presented with identical clinical scenarios but differing outcomes, it was observed that reviewers were more likely to judge anesthesia care as appropriate if the injury was temporary; conversely, reviewers were more likely to judge anesthesia care as substandard or impossible to judge if the injury was permanent. [21] Any such systematic bias in reviewer judgments would be expected to increase agreement among reviewers. Although the biasing effect of knowledge of the injury might have been avoided in this study by blinding reviewers to the outcome of the cases, the study procedures replicated the real-life situation in which an expert has access to outcome information when forming an opinion.
We must be cautious in making generalizations from the results of this study because of several limitations of the study design. Reviewers were not selected at random but rather represent an opportunity sample selected from a national set of volunteers. Criteria for selection were geographic proximity to the site of claim files (insurance companies) and review experience. Although reviewers were not matched with claims by subspecialty, exclusion of cardiac, pediatric, and obstetric claims from the analysis did not change the results. Similarly, claims were not selected purely at random. Although claims reviewed at each site were selected by a random process, the companies were not. Only companies that allow access to their files and had more than 25 closed claims available for review were included in the study. The distribution of injuries and their severity in this study closely matches the distribution of claims in the national ASA Closed Claims database. [23,33] Because of limitations of that database, [23] we do not know how closely this reflects the distribution of all anesthesia claims. However, because the claims in the database are derived from carriers that provide coverage for approximately 50% of all practicing anesthesiologists in the United States, the distribution may be reasonably representative.
This study introduces a new perspective into the expert witness problem in medical liability. Previous discussion has focused on the issue of objectivity. [3,34-36] Objectivity has been questioned on the basis of the advocacy relationship that may develop between an attorney and an expert witness. [3] Objectivity is certainly an issue in the case of the physician who is willing to provide expert testimony even if such testimony requires disregard for the objective facts of the case. [34] Although remedies to the so-called expert witness problem have been proposed (e.g., appointment of experts by the court, peer review of expert testimony, guidelines for expert witnesses), these have not met with significant approval or success. [3,35] The results of this study suggest that the variability inherent in the implicit judgment process also plays an important role in producing divergent expert opinions.
Some of the proposals to mitigate the problem of poor agreement on implicit quality-of-care judgments might be applicable to expert review in medical malpractice. These include the use of a structured review process, acknowledged experts, multiple reviewers, practice guidelines, and separation of process and outcome assessments. [6] The review process in our study was structured in that each reviewer completed the entire data form, which focused the review on a consistent set of case elements, a process similar to other studies. [17,19,20] The experts in the present study were experienced in this review process and were practicing in the specialty being reviewed, thus fitting some of the criteria proposed for acknowledged experts. [6] Although clinical practice guidelines are available in anesthesiology, the role of guidelines in the claims included in this study was not specifically assessed. Clinical practice guidelines have not replaced the medical expert in malpractice proceedings and do not appear to be the solution to the expert witness problem. [36,37] Although clinical practice guidelines introduce explicit criteria into the review process, the expert may be relied on to determine whether the clinical practice guidelines apply to the case at hand or whether other evidence or factors are applicable, [36-38] again introducing implicit judgments into the review process.
Anesthesiologists commonly disagree on the appropriateness of care when approaching the task of expert review with the intent of objectivity. This finding suggests that divergent expert opinions may be easily found by seeking opinions from multiple experts.
The authors thank the members of the American Society of Anesthesiologists who served as claims reviewers for this study and the insurance organizations that served as sources of closed claims.
Appendix: The Kappa Statistic
The kappa statistic provides a measure of agreement between different raters applicable to nominal-level ratings (categories). Some agreement may be expected purely by chance. The amount of agreement expected by chance may vary depending on several factors (the number of categories, the number of raters, the prevalence of the different categories, and how these categories are used by different raters). Kappa has been indexed to account for random agreement, usually providing a value between 0 and 1. A kappa of 0 indicates no agreement beyond chance, whereas a kappa of 1 is perfect agreement.
The general expression for kappa is Equation 1.
The following example (two raters, 200 cases, and two categories) illustrates how expected agreement (and kappa) may vary when raters agree on the same proportion of cases. In Table 2and Table 3, reviewers agree that 60 of the 200 cases are category A and 60 cases are category B, giving an observed agreement of Equation 2.
Table 2.
Image not available
Table 2.
Table 3.
Image not available
Table 3.
The tables differ in the total ratings for each reviewer for each category. In Table 2, each reviewer puts 100 (0.50) of the cases in category A and 100 (0.50) in category B. This gives an expected agreement of Equation 3.
In Table 3, reviewer 1 puts 60 (0.30) of cases in category A and 140 (0.70) in category B. Reviewer 2 does just the opposite, putting 140 (0.70) in category A and 60 (0.30) in category B. Expected agreement in this case is Equation 4.
The chance that the two raters will agree (expected agreement) in Table 3(0.42) is less than in Table 2(0.50). The kappa value is affected by this difference. Table 3has a larger kappa (0.31 vs. 0.20) because agreement by chance (expected agreement) is less than in Table 2, leaving more room for agreement beyond chance.
For an introduction to the literature on kappa, see Posner and colleagues.
* American Society of Anesthesiologists: 1995 Directory of Members. Washington, DC: American Society of Anesthesiologists, 1995.
Hyams AL, Brandenburg JA, Lipsitz SR, Shapiro DW, Brennan TA: Practice guidelines and malpractice litigation: A two-way street. Ann Intern Med 1995; 122:450-5.
Sloan FA, Hsieh CR: Injury, liability, and the decision to file a medical malpractice claim. Law Society Review 1995; 29:413-35.
Katz J: The fallacy of the impartial expert revisited. Bull Am Acad Psychiatry Law 1992; 20:141-52.
Angell M: Shattuck Lecture-Evaluating the health risks of breast implants: The interplay of medical science, the law, and public opinion. N Engl J Med 1996; 334:1513-18.
Donabedian A: The quality of care. How can it be assessed? JAMA 1988; 260:1743-8.
Goldman RL: The reliability of peer assessments of quality of care. JAMA 1992; 267:958-60.
Bates DW, O'Neill AC, Petersen LA, Lee TH, Brennan TA: Evaluation of screening criteria for adverse events in medical patients. Med Care 1995; 33:452-62.
Bigby JA, Dunn J, Adams JB, Jen P, Landefeld CS, Romaroff AL: Assessing the preventability of emergency hospital admissions. Am J Med 1987; 83:1031-36.
Brook RH, Appel FA: Quality of care assessment: Choosing a method for peer review. N Engl J Med 1973; 288:1323-9.
Dubois RW, Brook RH: Preventable deaths: Who, how often, and why? Ann Int Med 1988; 109:582-9.
Hastings GE, Sonneborn R, Lee GH, Bick L: Peer review checklist: Reproducibility and validity of a method for evaluating the quality of ambulatory care. Am J Public Health 1980; 70:222-8.
Hayward RA, McMahon LF, Bernard AM: Evaluating the care of general medicine inpatients: How good is implicit review? Ann Intern Med 1993; 118:550-6.
Horn SD, Pozen MW: An interpretation of implicit judgments in chart review. J Community Health 1977; 2:251-8.
Medical Review Project, Empire State Medical Scientific and Educational Foundation, Inc: Rochester region perinatal study. N Y State J Med 1976; 67:1205-10.
Richardson FM: Peer review of medical care. Med Care 1972; 10:29-39.
Rosenfeld LS: Quality of medical care in hospitals. Am J Public Health 1957; 47:856-65.
Brennan TA, Localio RJ, Laird NL: Reliability and validity of judgments concerning adverse events suffered by hospitalized patients. Med Care 1989; 27:1148-58.
Leape LL, Brennan TA, Laird N, Lawthers AG, Localio AR, Barnes BA, Hebert L, Newhouse JP, Weiler PC, Hiatt H. The nature of adverse events in hospitalized patients. Results of the Harvard Medical Practice Study II. N Engl J Med 1991; 324:377-84.
Brennan TA, Leape LL, Laird NM, Hebert L, Localio AR, Lawthers AG, Newhouse JP, Weiler PC, Hiatt HH: Incidence of adverse events and negligence in hospitalized patients. N Engl J Med 1991; 324:370-6.
Brennan TA, Localio AR, Leape LL, Laird NM, Peterson L, Hiatt HH, Barnes BA: Identification of adverse events occurring during hospitalization. A cross-sectional study of litigation, quality assurance, and medical records at two teaching hospitals. Ann Intern Med 1990; 112:221-6.
Caplan RA, Posner KL, Chewy FW: Effect of outcome on physician judgments of appropriateness of care. JAMA 1991; 265:1957-60.
Posner KL, Sampson PD, Caplan RA, Ward RJ, Cheney FW: Measuring interrater reliability among multiple raters: An example of methods for nominal data. Stat Med 1990; 9:1103-15.
Cheng FW, Posner K, Caplan RA, Ward RJ: Standard of care and anesthesia liability. JAMA 1989; 261:1599-1603.
Tinker JH, Dull DL, Caplan RA, Ward RJ, Cheney FW: Role of monitoring devices in prevention of anesthetic mishaps: A closed claims analysis. Anesthesiology 1989; 71:541-6.
Caplan RA, Posner KL, Ward RJ, Cheng FW: Adverse respiratory events in anesthesia: A closed claims analysis. Anesthesiology 1990; 72:828-33.
Kroll DA, Caplan RA, Posner K, Ward RJ, Cheney FW: Nerve injury associated with anesthesia. Anesthesiology 1990; 73:202-7.
Chadwick HS, Posner K. Caplan RA, Ward RJ, Cheney FW: Comparison of obstetric and non-obstetric anesthesia malpractice claims. Anesthesiology 1991; 74:242-9.
Fleiss JL: Measuring nominal scale agreement among many raters. Psychol Bull 1971; 76:378-83.
Fleiss JL, Nee JCM, Landis JR: Large sample variance of kappa in the case of different sets of raters. Psychol Bull 1979; 86:974-7.
Fleiss JL: Statistical Methods for Rates and Proportions, Second edition. New York, John Wiley and Sons, 1981, p 218.
Fleiss JL: The Design and Analysis of Clinical Experiments. New York, John Wiley and Sons, 1986, pp 14-5.
Hirshfeld EB: Economic considerations in treatment decisions and the standard of care in medical malpractice litigation. JAMA 1990; 264:2004-12.
Cheney FW, Posner KL, Caplan RA: Adverse respiratory events infrequently leading to malpractice suits. Anesthesiology 1991; 75:932-9.
Beck M: The hired gun expert witness. Missouri Medicine 1994; 91:179-82.
Lundberg GD: Expert witness for whom? JAMA 1984; 252:251.
Hyams AL, Shapiro DW, Brennan TA: Medical practice guidelines in malpractice litigation: An early retrospective. J Health Polit Policy Law 1996; 21:289-313.
Garnick DW, Hendricks AM, Brennan TA: Can practice guidelines reduce the number and costs of malpractice claims? JAMA 1991; 266:2856-60.
Hirshfeld EB: Should practice parameters be the standard of care in malpractice litigation? JAMA 1991; 266:2886-91.
Table 1. Paired Ratings of Appropriateness of Care
Image not available
Table 1. Paired Ratings of Appropriateness of Care
Table 2.
Image not available
Table 2.
Table 3.
Image not available
Table 3.