International Journal of Medical Informatics
Volume 77, Issue 9 , Pages 602-612, September 2008

Randomized controlled trial of an automated problem list with improved sensitivity

  • Stéphane M. Meystre

      Affiliations

    • Corresponding Author InformationCorresponding author at: University of Utah, Department of Biomedical Informatics, 26 South 2000 East, HSEB Suite 5700, Salt Lake City, UT 84112-5750, United States. Tel.: +1 801 581 4080; fax: +1 801 581 4297.
  • ,
  • Peter J. Haug

Department of Biomedical Informatics, University of Utah, School of Medicine, Salt Lake City, UT, United States

Received 1 November 2006; received in revised form 10 December 2007; accepted 10 December 2007. published online 30 January 2009.

Article Outline

Abstract 

Purpose

To improve the completeness and timeliness of an electronic problem list, we have developed a system using Natural Language Processing (NLP) to automatically extract potential medical problems from clinical, free-text documents; these problems are then proposed for inclusion in an electronic problem list management application.

Methods

A prospective randomized controlled evaluation of the Automatic Problem List (APL) system in an intensive care unit and in a cardiovascular surgery unit is reported here. A total of 247 patients were enrolled: 76 in an initial control phase and 171 in the randomized controlled trial that followed. During this latter phase, patients were randomly assigned to a control or an intervention group. All patients had their documents analyzed by the system, but the medical problems discovered were only proposed in the problem list for intervention patients. We measured the sensitivity, specificity, positive and negative predictive values, likelihood ratios and the timeliness of the problem lists.

Results

Our system significantly increased the sensitivity of the problem lists in the intensive care unit, from about 9% to 41%, and even 77% if problems automatically proposed but not acknowledged by users were also considered. Timeliness of addition of problems to the list was greatly improved, with a time between a problem's first mention in a clinical document and its addition to the problem list reduced from about 6 days to less than 2 days. No significant effect was observed in the cardiovascular surgery unit.

Keywords: Medical records, problem-oriented [MeSH E05.318.308.940.968.750], Problem list, Natural Language Processing [MeSH L01.224.065.580], MetaMap transfer, MMTx, Program evaluation [MeSH E05.337.820]

 

Back to Article Outline

1. Introduction 

The last decade has witnessed a substantial growth in the amount of medical data recorded for a given patient, along with an increasing pressure to improve the quality of healthcare and reduce medical errors. The problem-oriented, Electronic Health Record (EHR), centered on the problem list, is seen by many as a possible answer to these growing challenges. The problem list is a central place for clinicians to have a concise view of all of a patient's medical problems. The problem list also encourages an orderly process of clinical problem solving, prevents redundant actions [1], and supports the clear documentation of patient condition and clinical decision-making and improves communication among caregivers.

At Intermountain Health Care (IHC), a health maintenance organization serving Utah, the problem list is an important piece of the medical record, and a central component of the clinical information system called HELP2 [2]. To enable its potential benefits, the problem list has to be as accurate, complete and timely as possible. Unfortunately, problem lists are usually incomplete and inaccurate, and are often totally unused. To address this deficiency, we have created an application using Natural Language Processing (NLP) to harvest potential problem list entries from the multiple free-text electronic documents available in a patient's EHR [3], [4]. The medical problems identified are then proposed to the physicians for addition to the official problem list. This system, referred to as the Automatic Problem List (APL) system, is evaluated here. We hypothesize that the use of NLP to automatically provide potential medical problems will improve the completeness, accuracy and timeliness (decreased time between problems identification and their addition to the list) of this Automated Problem List.

Back to Article Outline

2. Background 

The problem list in a Problem-Oriented Medical Record (POMR) was proposed more than three decades ago by Weed [5], [6] as an answer to the complexity of medical knowledge and clinical data, and to address weaknesses in the documentation of medical care. In recent years, the problem-oriented, Computer-based Patient Record (CPR) and the problem list have seen renewed interest as an organizational tool [1], [7], [8], [9], [10], [11], [12], [13]. Advantages to this approach are that the problem list provides a central place for clinicians to obtain a concise view of each patient's problems. This approach also facilitates associating clinical information in the record to a specific problem, and encourages an orderly process of clinical problem solving and clinical judgment. The problem list in a problem-oriented patient record also provides a context in which continuity of care is supported, preventing redundant actions [1]. The problem list reminds clinicians of issues often forgotten, helps reduce errors [14], and improves communication between healthcare providers [15]. The Institute of Medicine [16] recommends that the CPR contain a problem list that specifies the patient's clinical problems and the status of each. Also, convinced of the benefits of the problem list, the Joint Commission for the Accreditation of Hospitals (JCAHO [17]) has established the problem list as a required feature of hospital records.

To enable many of the potential advantages of a computerized problem list, problem list entries must be coded, which means that each problem entered will have a corresponding code in a controlled vocabulary. Advantages of coded data are that data are classified and standardized, facilitating storage and retrieval, clinical research, and administrative functions like billing. Coded data are also desirable to enable exchange and sharing of data [18]. Medical vocabularies used in problem lists are numerous, ranging from ICD-9-CM [19], [20], to SNOMED [21], the Unified Medical Language System (UMLS®) [11], [22], and locally developed vocabularies [23], sometimes mapped to multiple medical vocabularies, like the University of Nebraska Medical Center's multidisciplinary problem list [24]. Coding of problems may be achieved by manually assigning a code when the problem is entered, or by using NLP techniques to map free-text problem list entries to appropriate codes. The process of manual coding is usually eased by the use of pick lists or search engines [25]; both of these features are available in our institution's application for management of the problem list. NLP techniques, where available, allow users of the problem list to use natural language, still the most user-friendly and expressive way of recording information.

The patient record contains a large amount of information captured as narrative text. These free-text documents represent the majority of the information used for medical care [26] and have the advantage of relating findings, interpretations, and decisions as a part of the documentation process. Decision-support, research, and quality improvement activities create a need for structured and coded data instead. As a possible answer to this issue, NLP can be used to convert free-text into coded data [27].

At Intermountain Health Care, a web-based clinical information system has been developed and is in use. This system is called HELP2 [2] and offers secured access to clinical data through specialized modules like “Patient search,” “Labs,” “Medications,” and “Problems.” The “Problems” module allows viewing, modifying, and adding medical problems along with their status (active, inactive, resolved, or error) and other information. The records created include the user entering the problem, the date it was entered, and relevant comments. Filters control the display of problems based on their status and other personal preferences. Active problems can also be assigned a priority order. Access to medical knowledge specific to each problem is provided through the “Infobutton” [28]. This problem list features most of the attributes recommended by Campbell [7]: it is clinically focused, with coded problems, modifiable problems’ status, and, to a degree, allows tailoring its presentation to the users’ needs. The only missing feature is an audit trail of the problems modifications that is easily accessible by the users. This electronic problem list has been used regularly in the outpatient setting, but was in limited use for hospitalized patients. Only one of the wards that participated in our study, the medical and surgical ICU, had significant experience with this application. This site was piloting the inpatient use of the problem list.

A second ward, the cardiovascular surgery unit was not using an electronic problem list but expressed a willingness to begin using it and to help test our system. All other inpatient wards were using a paper-based problem list, or no problem list at all. We should also mention that, except the “Infobutton” cited above, this electronic problem list has limited integration with other computer-based clinical activities. In the future, order sets, physician documentation, and nursing intervention and documentation will be linked with the problem list and provide far more incentives to use it.

Back to Article Outline

3. Methods 

As mentioned earlier, the Automated Problem List system extracts potential medical problems from free-text medical documents, and uses NLP to achieve this task. The two main components comprising the system are a background application for problem discovery and the problem list management application mentioned above. The background application is responsible for the text processing and analysis and stores extracted problems in the central clinical database. These problems can then be accessed by the problem list management application integrated into HELP2. For the study described here, we developed the background application to recognize 80 different diagnosis problems, which were selected based on their frequency in the clinical environments chosen for our evaluation (the cardiovascular surgery unit and the intensive care unit). Those 80 problems represented about 64% of all coded medical problem instances in our EHR in 2003. The NLP tools used in this experiment were based on the Java™ version of MetaMap [29], [30] called MMTx (MetaMap Transfer) and on a negation detection algorithm called NegEx2 [31], and are described in another publication [4]. The problem list management application was based on the “Problems” module described above, and was enhanced to take advantage of the problems automatically detected by the background application. These problems were listed with a new proposed status, and included a link back to the source document(s) and the sentence(s) that each problem was extracted from, as seen in Fig. 1.

  • View full-size image.
  • Fig. 1. 

    Screen capture of the “Problems” module in the HELP2 system. The source document for the problem pulmonary edema is displayed, with the source sentence highlighted in red (underlined in this grayscale figure). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)

3.1. Study design 

We evaluated the Automated Problem List system with a prospective study in a clinical environment. The APL system includes a human intervention to accept or reject problems proposed by the background application. The evaluation of this intervention therefore requires a prospective study. Patients were included if they met the following criteria: inpatient in one of the two inpatient wards for a duration of at least 48h (cardiovascular surgery or medical and surgical ICU at the LDS Hospital, Salt Lake City, Utah), older than 18 years, and not already enrolled in a previous phase of this study.

The study started with an initial control phase with all patients assigned to a control group, followed by a Randomized Controlled Trial (RCT) phase with patients randomly assigned to a control or to an intervention group. In the control group, patients received care from physicians using the standard electronic problem list (without proposed problems). In the intervention group, patients were treated by physicians with access to the Automated Problem List system. Their documents were analyzed by the background application, and medical problems extracted were proposed for inclusion into their electronic problem list.

This study was single blinded. The information used was that routinely collected as part of the patient work-up, and patients were not aware of the study. Users of the problem list were physicians and they could not be blinded, since the difference in content of the problem list between the two groups was obvious. All measurements and calculations are described in the following sections and are detailed in Table 1.

Table 1. Study measurements
MeasurementAbbreviationDefinition
True positiveTPProblem recorded in the patient's documents and in the problem list
False negativeFNProblem recorded in the patient's documents but absent from the problem list
False positiveFPProblem not recorded but listed in the problem list
True negativeTNProblem not recorded and not listed in the problem list

Sensitivity TP/(TP+FN)
Specificity TN/(TN+FP)
Positive predictive valuePPVTP/(TP+FP)
Negative predictive valueNPVTN/(TN+FN)
Likelihood ratio for a positive testLR+Sensitivity/(1specificity)
Likelihood ratio for a negative testLR−(1sensitivity)/specificity

3.2. Problem list completeness and accuracy 

Three different problem lists were considered for each patient: the reference standard (i.e., what should be in the problem list), the “official” problem list (i.e., problems manually entered and listed as active, inactive, or resolved and problems proposed by the APL system and changed to active, inactive, or resolved by the user), and the “potential” problem list (i.e., problems listed in the “official” list, combined with the problems that had remained proposed at patient discharge) as seen in Fig. 2. We compared the content of each patient's official and potential problem list with the relevant reference standard. We counted and categorized each problem list entry as true positive, false negative, false positive, or true negative. Only our 80-targeted problems were considered in this study.

We then calculated different standard measurements listed in Table 1 to assess the quality of the problem list's content. These calculations were made for each patient, and then averaged across the group or subgroup of patients analyzed.

Statistical analysis was executed using a nonparametric test (Mann–Whitney test) for non-normality reasons.

3.3. Reference standard 

We created the “gold” standard for our 80-targeted problems using an electronic chart review. Two physicians reviewed each electronic document independently using a web-based review application described in another publication [3]. They were asked to detect all mentions of any of the 80-targeted problems that were present (i.e., not negated), in the present or in the past. When the two reviewers disagreed, a third physician determined the presence or absence of the disputed problem. The documents analyzed were all clinical documents (radiology reports, consultation reports, progress notes, H&Ps, discharge summaries, etc.) stored for each patient during his hospital stay, plus a maximum of five older documents of specific types including discharge summaries and consultation reports from previous hospital stays or outpatient care episodes. The reviewers also reviewed the patients’ electronic problem list, to map problems manually entered as free-text (not coded) with a relevant coded problem, and also to map “children” of our targeted problems to the relevant “parent” problem (e.g. adhesive pericarditis to pericarditis). To reduce disagreement between reviewers, we used a medical record review technique called explicit review, that directs the reviewers’ attention to specific issues (our list of targeted problems) on which judgment is to be based [32]. The explicit review technique is associated with higher inter-rater reliability than implicit review, where reviewers use only their knowledge or beliefs to make judgments. We implemented the explicit review technique in the web-based review application by displaying the document to review beside a checklist of the 80-targeted medical problems. The resulting list of problems provided a reference standard against which other listing of problems could be compared.

3.4. Problem list timeliness 

For all problem list entries, we measured the time elapsed between the first mention of the medical problem in a clinical document (i.e., the date/time when the document was stored) and its addition to the “official” problem list (i.e., the date/time when the status was changed from proposed to active, inactive, or resolved or when the problem was manually added). If a problem was manually added before being detected in a document, this time was equal to zero. This value is one representation of the timeliness of problem recording and was compared between the control and the intervention groups. This timeliness does not represent the real time elapsed between a problem's “appearance” (i.e., the first moment that a medical expert could recognize the problem from the patient's data) and its addition to the problem list. The timeliness does, within the limitations of the study design, allow us to calculate an interval from the first mention in clinical documentation and the addition to the problem list.

Back to Article Outline

4. Results 

Ten different reviewers, all physicians, reviewed clinical documents to create the reference standard. Eight were board-certified physicians (most of them in internal medicine), and two were residents with at least 2 years of training. Each reviewer examined an average of 686 different documents with the web-based application described above. The time spent for a review was between 48 and 216s per document. Reviewers’ overall agreement was very good, with a Finn's R of 0.897 when reviewing documents and 0.995 when reviewing problem lists. We used Finn's R instead of Cohen's kappa, because the agreement table was strongly skewed, with far more true negatives than true positives.

During the study, 247 patients were enrolled: 76 during the initial control phase, and 171 during the RCT phase (Table 2). Enrollment of patients and data collection lasted 69 days (23 days for the initial control phase; 46 days for the RCT phase), between December 2004 and February 2005.

Table 2. Number and distribution of patients enrolled in the study
All patientsICU patientsCVS patients
Initial controls764432
RCT: tests885434
RCT: controls835132

Total24714998

A total of 2943 clinical documents of various types (Table 3) were analyzed by our system (893 during the initial control phase, 2050 during the RCT phase).

Table 3. Clinical documents analyzed during the study
Document typesInitial control phaseRCT phase
Radiology reports395861
Cardiac catheterization/angiographies3657
Progress notes48165
Consultation reports68168
Operative and procedure notes64152
History and physicals94200
Discharge summaries56118
Emergency department reports68132
Surgical pathology/cytology reports2471
Letters and other reports40126

Total8932050

During the whole study, 1385 medical problems were added to problem lists: 205 during the initial control phase and 1180 during the RCT phase (Fig. 3). During the latter, 1128 medical problems were automatically extracted and proposed. These problems were eligible to have their proposed status changed by users of the problem list (i.e. physicians taking care of enrolled patients) to another status, either active, inactive, or resolved when adding them to the “official” problem list, or to error when rejecting them as erroneous.

4.1. Problem list completeness and accuracy 

Mean and 0.95 confidence intervals were computed for the sensitivity, specificity, positive predictive value, negative predictive value, and likelihood ratios (Table 4). These calculations were done using the data from all patients enrolled and also using data from subgroups of those patients.

Table 4. Measurements during the study, in all patients, in the ICU patient subgroup, and in the cardiovascular surgery patient subgroup (tests+prop. corresponds to the potential problem list)
Initial control phaseRandomized controlled trial phase
All patientsICU patientsCV surgery patientsAll patientsICU patientsCV surgery patients
ControlsControlsControlsControlsTestsTests+prop.ControlsTestsTests+prop.ControlsTestsTests+prop.
Sensitivity0.042 (0.022–0.062)0.062 (0.032–0.093)0.014 (0–0.035)0.102 (0.069–0.135)0.266 (0.192–0.34)0.815 (0.771–0.859)0.089 (0.049–0.129)0.41 (0.308–0.512)0.774 (0.714–0.835)0.123 (0.063–0.182)0.037 (0.013–0.063)0.88 (0.823–0.938)
Specificity0.998 (0.996–0.999)0.997 (0.994–0.999)0.999 (0.997–1)0.998 (0.995–1)0.993 (0.988–0.999)0.957 (0.947–0.966)0.999 (0.998–1)0.989 (0.98–0.998)0.963 (0.95–0.976)0.995 (0.989–1)10.947 (0.933–0.96)
PPV0.649 (0.462–0.836)0.652 (0.45–0.854)0.625 (−0.729 to 1.98)0.886 (0.795–0.976)0.924 (0.88–0.967)0.784 (0.744–0.825)0.919 (0.81–1)0.905 (0.852–0.958)0.811 (0.758–0.865)0.839 (0.671–1)10.743 (0.681–0.804)
NPV0.85 (0.829–0.87)0.87 (0.846–0.894)0.821 (0.785–0.857)0.864 (0.848–0.879)0.88 (0.861–0.899)0.965 (0.955–0.974)0.863 (0.842-0.884)0.898 (0.874-0.923)0.956 (0.942–0.97)0.865 (0.842–0.889)0.851 (0.824–0.878)0.978 (0.969–0.987)
LR+2.9 (−1.15 to 6.95)1.49 (−1.17 to 4.16)9.23 (−108 to 126)8.103 (1.047–15.16)38.291 (25.307–51.275)24.319 (19.652–28.987)12.565 (0–92.514)38.291 (25.307–51.275)27.902 (20.374–35.43)5.872 (0–12.333)N/A19.9 (15.127–24.675)
LR−0.961 (0.94–0.981)0.941 (0.91–0.972)0.987 (0.967–1.01)0.9 (0.867–0.933)0.737 (0.663–0.81)0.191 (0.146–0.236)0.912 (0.872–0.952)0.595 (0.492–0.697)0.233 (0.171–0.294)0.881 (0.822–0.94)0.963 (0.937–0.989)0.125 (0.066–0.183)

During the initial control phase, results showed a sensitivity of about 6% in the subgroup of patients from the ICU, but this result was significantly higher than in the cardiovascular surgery patient subgroup where sensitivity was only 1.4%. Those results mean that the electronic problem list was barely used during the initial control phase.

When comparing the initial control phase and the RCT phase, and only considering ICU patients, no significant differences were observed between the control groups. When only considering cardiovascular surgery control patients, we measured a significantly higher sensitivity during the RCT phase, rising to 12% (about 1.4% in the initial control phase).

Patients in the intervention group had a significantly higher sensitivity than in both control groups (initial control phase and RCT phase). It reached 41% in the subgroup of patients from the ICU, and about 26% when considering all patients. Likelihood ratios were also very significantly different, and the negative predictive value was significantly higher in the intervention group.

During the RCT phase, when evaluating all patients, the results showed a sensitivity and a likelihood ratio for a positive test (LR+) that were significantly higher in the intervention group (Fig. 4). The likelihood ratio for a negative test (LR−) was significantly lower in the intervention group, dropping from 0.90 to 0.737. Sensitivity in the intervention group was 26.6%. When evaluating the potential problem list (i.e., with proposed problems included), the sensitivity increased even more in the intervention group, but specificity and positive predictive value were reduced.

  • View full-size image.
  • Fig. 4. 

    Measurements in all patients, with means and 95% confidence intervals. Results of potential problem lists (i.e., problems that remained proposed included) are also displayed.

Analysis of the subgroup of patients from the ICU showed greater differences between control and intervention groups (Fig. 5). The sensitivity in the intervention group reached 41%, and even 77.4% when analyzing the potential problem list. The LR+ was not significantly different.

  • View full-size image.
  • Fig. 5. 

    Measurements in the ICU patients, with means and 95% confidence intervals. Results of potential problem lists (i.e., problems that remained proposed included) are also displayed.

In the cardiovascular surgery patient subgroup, no significant difference between the control and the intervention groups were observed (Fig. 6). Even though physicians in this unit expressed their interest and willingness to use the electronic problem list, we were not able to find an incentive compelling enough to cause them to edit the problem list.

  • View full-size image.
  • Fig. 6. 

    Measurements in the cardiovascular surgery patients, with means and 95% confidence intervals. Results of potential problem lists (i.e., problems that remained proposed included) are also displayed.

These results mean that the electronic problem list was used minimally in the cardiovascular surgery unit, but was well used in the ICU, where it grew to be more complete and more timely in the intervention group than in the control group.

4.2. Problem list timeliness 

Finally, during the RCT phase, the timeliness of the addition of problems to the problem list was significantly different between the control and the intervention groups. We also executed the statistical analysis using a nonparametric test (Mann–Whitney test) because of non-normality of the groups. In the control group, the distribution was especially skewed, as seen in Fig. 7. In the intervention group, the mean time until a problem, mentioned in clinical documents, was confirmed by a physician was 44h 27min and 39s. This time was significantly longer (p=0.0413) in the control group: 144h 28min 39s (after excluding a few outliers at up to 58 days).

Back to Article Outline

5. Discussion 

This evaluation of our Automated Problem List system suggests that the addition of NLP to improve accuracy and timeliness of the problem list was successful. We measured a significantly increased sensitivity. Clearly, enhancing the problem list management application with NLP made the problem list more complete. We also measured a significantly improved timeliness of the problem list, with an average time difference between a medical problem's first mention in text and its addition to the problem list that was reduced from about 6 to 2 days.

5.1. Completeness, accuracy and timeliness of the problem list 

Within all patients (ICU and cardiovascular surgery patients), the sensitivity of the problem list was increased from about 10% to 25% in the intervention group. The problem list was more complete in this group. The specificity of the problem list was high before the intervention and remained high (i.e. proposing problems for inclusion did not significantly increase the presence of false positive problems in the list). The positive predictive value was also high, and was not altered by our intervention. The likelihood ratios were improved by the intervention. The LR+ increased from about 8 to 38, meaning that this Automated Problem List could be used to “rule in” a patient's medical problem when present in the problem list. These results support using the problem list as an alerting tool for medical problems present in the patient's text documents.

Analysis of the ICU patients separately showed results similar to the whole patient group, but with accentuated differences: the sensitivity grew to almost 80%, but the specificity was slightly lower (p=0.0238), and the negative predictive value was increased in this subgroup.

The results of the cardiovascular surgery patients were very different, showing no significant difference between the control and intervention group. These results reduced the effect of our system when considering all patients. They were due to a lack of use of the problem list in this clinical environment during the study, even after multiple presentations and discussions with potential users there, and positive feedback from them. This part of the EHR was simply not used there, and our study was not a sufficient motivation to users to start using it. This issue is discussed further below.

5.2. Evaluation of the potential problem list 

When analyzing the potential problem list (i.e., including problems that remained proposed), the sensitivity increased as in the example depicted in Fig. 2 (where Diabetes remained proposed but was a true positive). When considering all patients, we measured a sensitivity of more than 80%. However, in the absence of review by users of the problem list (i.e. accepting or rejecting problems proposed by our system), a few errors were introduced, reducing the specificity of the problem list from about 100% to 96%. We could have inserted proposed problems directly into the “official” problem list, and this would have added only about one false positive medical problem in three problem lists, but the final human review was considered important in this first evaluation of such a system in a clinical setting, to eliminate the false positives introduced by the system.

5.3. Agreement between reviewers 

The excellent inter-reviewer agreement in this study allowed a high quality reference standard, therefore giving reliable results. This was made possible by the use of explicit review techniques: the list of targeted problems was always provided beside the document or problem list to review.

5.4. Comparison with other similar studies 

Our results are difficult to compare to other published results because very few similar studies have been published. A rare example is an evaluation by Szeto et al., measuring the accuracy of an outpatient problem list for nine different diagnoses [33]. A sensitivity of 49% and a specificity of 98–100% were measured. Our study targeted 80 different diagnoses, and gave very similar specificity results, but the sensitivity without intervention was much lower. The effect of our Automated Problem List system increased the sensitivity to a similar degree.

5.5. Limitations 

A first important issue that was striking in the cardiovascular patient subgroup is the use of the problem list by physicians. The application suite in which these tools were embedded originated in the outpatient setting and is in the process of moving into the hospital environment. The study described here was the first introduction for most of the physicians to any electronic means of maintaining a problem list. As mentioned in the background section of this paper, the problem list currently used in this environment is paper-based and is usually incomplete and is seldom timely. In fact, it is often totally unused. There is good reason for this: very few of the therapeutic or documentation functions that are done by the physicians are tied, in any way to the problem list. Maintaining a dynamic list on paper is also quite difficult. The frequent resetting of statuses, movement of problems to new positions to show relationships, subsumption or replacement of problems, etc. do not lend themselves to paper-based management.

In the electronic record that is evolving at our institution, the problem list will be integrated into the care process. Electronic order entry, documentation, and a variety of decision support tools will be tied to the problem list. However, currently, the single function currently mediated through the problem list is the “infobutton” [28], a tool that provides problem-specific electronic information to the user. Therefore, this study is best seen as an effort to explore the possibilities offered by NLP to support the problem list. Efforts to secure the general adoption of the problem list await its further integration into the clinical workflow.

Another issue with this system is the scalability of the list of targeted medical problems and the performance of our NLP module. Our system is currently designed to extract 80 different medical problems, but more will need to be added to allow this system to be used in other settings. A very simple solution is to use the default full UMLS® data set provided with MMTx instead of a custom data subset, but this reduces the NLP module performances (decreased recall and slower processing).

The speed of our NLP module may be an issue. During the 46 days this study lasted, 2050 text documents were analyzed and the average time required to analyze a document was about 50s. This means that a maximum of about 79,000 documents could be analyzed during the same period. Each patient had an average of 12 documents analyzed. The maximum number of patients that could be analyzed by our system would therefore be about 6600, or 140 each day. This gives room for extension to other settings, but the need to include additional problems will reduce this ability to expand.

Finally, study design issues are related to blinding and potential biases. The blinding issue has already been discussed as a part of the study design. In a study of this sort, many different biases are possible. Blinding the data collectors and reviewers eliminated the assessment bias. Recruitment and allocation biases were excluded by clearly defining the inclusion and exclusion criteria, randomizing as late as possible, and concealing the allocation until the recruitment was irreversible. Data collection biases were avoided by collecting the same data the same way in all groups. Within our study design, some biases were still possible and should be taken into consideration. These include contamination, learning effect, and a global Hawthorne effect. Contamination was of special concern and motivated the provision of an initial control phase. The goal was to be able to compare the initial control phase with the control group during the RCT phase and use the initial controls if we could not identify significant differences during the RCT phase. Biases of these sorts would tend to underestimate the differences between groups. We have shown significant differences between the control group and the intervention group, but these differences could therefore be even more important since some of those biases may have been present in this study.

5.6. Potential benefits of our system 

The medical problem list figures prominently in our plans for computerized physician order entry and medical documentation in the new Electronic Health Record currently under development at IHC. A well-maintained problem list will significantly enhance this Electronic Health Record. The Automated Problem List system improved the quality of the problem list, a central component for our electronic health record. This could be beneficial for many reasons: A better problem list could potentially improve patient outcomes and reduce costs by reducing omissions and delays, improving the organization of care, and reducing adverse events. It could enhance decision-support for applications requiring knowledge of patient medical problems. A timely and accurate problem list could improve patient safety, an important and timely issue that has received substantial attention since the 1999 Institute of Medicine report [34].

Back to Article Outline

6. Conclusion 

The Automated Problem List system that we developed to extract potential medical problems from free-text documents in a patient's EHR has shown satisfying results. This system's goal to improve the problem list's quality by increasing its completeness and timeliness was met, showing higher sensitivity and better timeliness in the intervention group. This was achieved only when the problem list was used. By encouraging the use of a problem list of better quality, this system could potentially improve patient outcomes and security, improve care organization, reduce costs, and diminish adverse events.

Back to Article Outline

Acknowledgments 

This work is supported by a Deseret Foundation Grant (Salt Lake City, Utah). We would like to thank Min Bowman for her help with the modified Problems module. We would also like to thank Greg Gurr for his advices and his help. Scott Narus and Stan Huff also gave us helpful advice and guidance for which we are grateful. Finally, we are especially grateful to Terry Clemmer whose enthusiasm for the problem list made this study possible.

Summary points

What was known before this research?

The problem list gives a concise view of the patient's medical problems, eases continuity of care and prevents redundant actions, improves communication and documentation, and has many additional advantages. It is recommended by the U.S. Institute of Medicine and the Joint Commission on Accreditation of Healthcare Organizations.

To enable these potential advantages, the problem list has to be complete and timely, and should contain coded problems. In practice, the problem list is often incomplete or even unused. Its sensitivity has rarely been evaluated; a study (cited in the manuscript) measured a sensitivity of 49%.

The majority of the information used for medical care is captured as narrative text in a patient Electronic Health Record. NLP has been used to extract various features from narrative text, such as UMLS® concepts.

What did the study add to the body of knowledge?

NLP can be used to extract medical problems from narrative text, and can be used in real time in a clinical setting.

Automating the management of the problem list by using NLP:
can improve the problem list completeness,

can improve the problem list timeliness, and

does not alter the problem list accuracy.


Back to Article Outline

References 

  1. Bayegan E, Tu S. The helpful patient record system: problem oriented and knowledge based. Proc. AMIA Symp. 2002;36–40
  2. Clayton PD, Narus SP, Huff SM, Pryor TA, Haug PJ, Larkin T, et al. Building a comprehensive clinical information system from components. The approach at Intermountain Health Care. Methods Inf. Med. 2003;42(1):1–7
  3. Meystre S, Haug PJ. Natural language processing to extract medical problems from electronic clinical documents: performance evaluation. J Biomed. Inform. 2006;39(6):589–599Epub Dec 5, 2005
  4. Meystre S, Haug PJ. Automation of a problem list using natural language processing. BMC Med. Inform. Decis. Mak. 2005;5:30
  5. Weed LL. Medical records that guide and teach. N. Engl. J. Med. 1968;278(11):593–600
  6. Weed LL. Medical records that guide and teach. N. Engl. J. Med. 1968;278(12):652–657concl
  7. Campbell JR. Strategies for problem list implementation in a complex clinical enterprise. Proc. AMIA Symp. 1998;285–289
  8. Campbell JR, Payne TH. A comparison of four schemes for codification of problem lists. Proc. Annu. Symp. Comput. Appl. Med. Care. 1994;201–205
  9. Donaldson MS, Povar GJ. Improving the master problem list: a case study in changing clinician behavior. QRB Qual. Rev. Bull. 1985;11(11):327–333
  10. Elkin PL, Mohr DN, Tuttle MS, Cole WG, Atkin GE, Keck K, et al. Standardized problem list generation, utilizing the Mayo canonical vocabulary embedded within the Unified Medical Language System. Proc. AMIA Annu. Fall Symp. 1997;500–504
  11. Goldberg H, Goldsmith D, Law V, Keck K, Tuttle M, Safran C. An evaluation of UMLS as a controlled terminology for the Problem List Toolkit. Medinfo. 1998;9(Pt 1):609–612
  12. Hales JW, Schoeffler KM, Kessler DP. Extracting medical knowledge for a coded problem list vocabulary from the UMLS Knowledge Sources. Proc. AMIA Symp. 1998;275–279
  13. Starmer J, Miller R, Brown S. Development of a structured problem list management system at vanderbilt. Proc. AMIA Annu. Fall Symp. 1998;1083
  14. Simborg DW, Starfield BH, Horn SD, Yourtee SA. Information factors affecting problem follow-up in ambulatory care. Med. Care. 1976;14(10):848–856
  15. Starfield B, Steinwachs D, Morris I, Bause G, Siebert S, Westin C. Concordance between medical records and observations regarding information on coordination of care. Med. Care. 1979;17(7):758–766
  16. Institute of Medicine (U.S.), Committee on Improving the Patient Record, R.S. Dick, E.B. Steen, D.E. Detmer, The Computer-based Patient Record: An Essential Technology for Health Care, Rev. ed., National Academy Press, Washington, DC, 1997.
  17. Joint Commission on Accreditation of Healthcare Organizations (JCAHO), Available from http://www.jcaho.org.
  18. van Ginneken AM. The computerized patient record: balancing effort and benefit. Int. J. Med. Inform. 2002;65(2):97–119
  19. A.A. Bui, R.K. Taira, S. El-Saden, A. Dordoni, D.R. Aberle, Automated medical problem list generation: towards a patient timeline, Medinfo, San Francisco, CA, 2004, pp. 587–591.
  20. Scherpbier HJ, Abrams RS, Roth DH, Hail JJ. A simple approach to physician entry of patient problem list. Proc. Annu. Symp. Comput. Appl. Med. Care. 1994;206–210
  21. Wasserman H, Wang J. An applied evaluation of SNOMED CT as a clinical vocabulary for the computerized diagnosis and problem list. Proc. AMIA Symp. 2003;699–703
  22. Payne T, Martin DR. How useful is the UMLS metathesaurus in developing a controlled vocabulary for an automated problem list?. Proc. Annu. Symp. Comput. Appl. Med. Care. 1993;705–709
  23. Zelingher J, Rind DM, Caraballo E, Tuttle M, Olson N, Safran C. Categorization of free-text problem lists: an effective method of capturing clinical data. Proc. Annu. Symp. Comput. Appl. Med. Care. 1995;416–420
  24. Warren JJ, Collins J, Sorrentino C, Campbell JR. Just-in-time coding of the problem list in a clinical environment. Proc. AMIA Symp. 1998;280–284
  25. Wang SJ, Bates DW, Chueh HC, Karson AS, Maviglia SM, Greim JA, et al. Automated coded ambulatory problem lists: evaluation of a vocabulary and a data entry tool. Int. J. Med. Inform. 2003;72(1–3):17–28
  26. Pratt AW. Medicine, computers, and linguistics. Adv. Biomed. Eng. 1973;3:97–140
  27. Spyns P. Natural language processing in medicine: an overview. Methods Inf. Med. 1996;35(4/5):285–301
  28. Reichert JC, Glasgow M, Narus SP, Clayton PD. Using LOINC to link an EMR to the pertinent paragraph in a structured reference knowledge base. Proc. AMIA Symp. 2002;652–656
  29. Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc. AMIA Symp. 2001;17–21
  30. Aronson AR, Bodenreider O, Chang HF, Humphrey SM, Mork JG, Nelson ea. SJ. The NLM indexing initiative. Proc. AMIA Symp. 2000;17–21
  31. W.W. Chapman, NegEx 2, Available at: http://web.cbmi.pitt.edu/chapman/NegEx.html.
  32. Ashton CM, Kuykendall DH, Johnson ML, Wray NP. An empirical assessment of the validity of explicit and implicit process-of-care criteria for quality assessment. Med. Care. 1999;37(8):798–808
  33. Szeto HC, Coleman RK, Gholami P, Hoffman BB, Goldstein MK. Accuracy of computerized outpatient diagnoses in a Veterans Affairs general medicine clinic. Am. J. Manage. Care. 2002;8(1):37–43
  34. Institute of Medicine, C.o.Q.o.H.C.i.A., L.T. Kohn, J.M. Corrigan, M.S. Donaldson, To Err is Human: Building A Safer Health System, 1999.

PII: S1386-5056(07)00212-2

doi:10.1016/j.ijmedinf.2007.12.001

International Journal of Medical Informatics
Volume 77, Issue 9 , Pages 602-612, September 2008