International Journal of Medical Informatics
Volume 78, Issue 12 , Pages e19-e26 , December 2009

Developing a standard for de-identifying electronic patient records written in Swedish: Precision, recall and F-measure in a manual and computerized annotation trial

  • Sumithra Velupillai

      Affiliations

    • Department of Computer and Systems Sciences, Stockholm University/KTH, Forum 100, 164 40 Kista, Sweden
    • Corresponding Author InformationCorresponding author. Tel.: +46 8 16 11 74.
  • ,
  • Hercules Dalianis

      Affiliations

    • Department of Computer and Systems Sciences, Stockholm University/KTH, Forum 100, 164 40 Kista, Sweden
  • ,
  • Martin Hassel

      Affiliations

    • Department of Computer and Systems Sciences, Stockholm University/KTH, Forum 100, 164 40 Kista, Sweden
  • ,
  • Gunnar H. Nilsson

      Affiliations

    • Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden

Received 31 October 2008 ,Revised 2 March 2009 ,Accepted 9 April 2009.

References 

  1. Sweeney L. Replacing personally-identifying information in medical records, the Scrub system. In: Proc. AMIA Annu. Fall Symp.. 1996;p. 333–337
  2. Neamatullah IM, Douglass M, Lehman LH, Reisner A, Villarroel M, Long WJ, et al. Automated de-identification of free text medical records. BMC Medical Informatics and Decision Making. 2008;8:32
  3. Sibanda T, Uzuner O. Role of local context in automatic de-identification of ungrammatical, fragmented text. In: Proc. HLT-NAACL 2006. New York. 2006;
  4. i2b2, Informatics for integrating biology and the bedside, 2008. Available at: http://www.i2b2.org (accessed October 31, 2008).
  5. Uzuner Ö, Luo Y, Szolovits P. Evaluating the state-of-the-art in automatic de-identification. Journal of the American Medical Informatics Association. 2007;14(5 (September)):550–563
  6. Szarvas G, Farkas R, Busa-Fekete R. State-of-the-art anonymization of medical records using an iterative machine learning framework. Journal of the American Medical Informatics Association. 2007;14:574–580
  7. Uzuner ÖTC, Sibandam Y, Luo Y, Szolovits P. A de-identifier for medical discharge summaries. Journal of Artificial Intelligence in Medicine. 2008;42(1 (January)):13–35
  8. Kokkinakis D, Thurin A. Identification of entity references in hospital discharge letters. In: Proc. 16th Nordic Conference on Computational Linguistics NODALIDA-2007. Tartu: University of Tartu; 2007;
  9. Artstein R, Poesio M. Inter-coder agreement for computational linguistics. Journal of Computational Linguistics. 2008;34(4 (December)):555–596
  10. Wilbur WJ, Rzhetsky A, Shatkay H. New directions in biomedical text annotation: definitions, guidelines and corpus construction. BMC Bioinformatics. 2006;7:356
  11. Velupillai S, Dalianis H, Hassel M. Diagnosing diagnoses in Swedish Clinical Records. In:  Karsten H,  Back B,  Salakoski T,  Salanterä S,  Suominen H editor. Proc. First Conference on Text and Data Mining of Clinical Documents. Turku, Louhi’08, September 3–4. 2008;p. 110–112
  12. HIPAA, Health Insurance Portability and Accountability (HIPAA), Privacy Rule and Public Health Guidance, 2003. From CDC and the U.S. Department of Health and Human Services, April 11, 2003. Available at: http://www.cdc.gov/mmwr/preview/mmwrhtml/m2e411a1.htm (accessed October 31, 2008).
  13. Ogren P. Knowtator: a Protégé plug-in for annotated corpus construction. In: Proc. HLT-NAACL 2006. Morristown, NJ, USA, ACL. 2006;p. 273–275
  14. Protégé, 2008. Available at: http://protege.stanford.edu/ (accessed October 31, 2008).
  15. FASS, 2008. Available at: http://npl.mpa.se/mpa.npl.services/home2.aspx (accessed October 31, 2008).
  16. Svenska Namn. Available at: http://www.svenskanamn.se/ (Swedish names, in Swedish) (accessed February 27, 2009).
  17. H. Dalianis, E. Åström, SweNam—a Swedish Named Entity Recognizer, Its Construction, Training and Evaluation, Technical Report, TRITA-NA-P0113, IPLab-NADA, KTH, June 2001.
  18. F. Olsson, Bootstrapping named entity annotation by means of active machine learning, A method for creating corpora, Ph.D. Thesis, University of Gothenburg, 2008, ISBN 978-91-87850-37.
  19. Mani I, Hu Z, Bae Jang S, Samuel K, Krause M, Phillips J, et al. Protein name tagging guidelines: lessons learned. Comparative and Functional Genomics. 1–2. John Wiley & Sons, Ltd.; 2005;pp. 72–76
  20. Chinchor N, Sundheim B. MUC-5 evaluation metrics. In: MUC5’93: Proc. Fifth Conference on Message Understanding. Baltimore, MD: Association for Computational Linguistics; 1993;p. 69–78
  21. Hirschman L, Yeh A, Blaschke C, Valencia A. Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics. 2005;6(Suppl. 1):S1
  22. C. Grouin, A. Rosier, O. Dameron, P. Zweigenbaum, Testing tactics to localize de-identification, in: MIE 2009: Proc. 22nd Conference of the European Federation for Medical Informatics, Sarajevo, Bosnia and Herzegovina, 2009.

PII: S1386-5056(09)00069-0

doi: 10.1016/j.ijmedinf.2009.04.005

International Journal of Medical Informatics
Volume 78, Issue 12 , Pages e19-e26 , December 2009