International Journal of Medical Informatics
Volume 78, Issue 12 , Pages e7-e12, December 2009

Towards automated processing of clinical Finnish: Sublanguage analysis and a rule-based parser

  • Veronika Laippala

      Affiliations

    • Department of Information Technology,University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland
    • Department of French Studies, University of Turku, Henrikink. 2, 20014 Turku, Finland
    • Corresponding Author InformationCorresponding author at: Department of French Studies, University of Turku, Henrikink. 2, 20014 Turku, Finland. Tel.: +358 407782814.
  • ,
  • Filip Ginter

      Affiliations

    • Department of Information Technology,University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland
  • ,
  • Sampo Pyysalo

      Affiliations

    • Turku Centre for Computer Science (TUCS), University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland
  • ,
  • Tapio Salakoski

      Affiliations

    • Department of Information Technology,University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland
    • Turku Centre for Computer Science (TUCS), University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland

Received 31 October 2008; received in revised form 28 January 2009; accepted 10 February 2009. published online 20 March 2009.

Abstract 

Introduction

In this paper, we present steps taken towards more efficient automated processing of clinical Finnish, focusing on daily nursing notes in a Finnish Intensive Care Unit (ICU). First, we analyze ICU Finnish as a sublanguage, identifying its specific features facilitating, for example, the development of a specialized syntactic analyser. The identified features include frequent omission of finite verbs, limitations in allowed syntactic structures, and domain-specific vocabulary. Second, we develop a formal grammar and a parser for ICU Finnish, thus providing better tools for the development of further applications in the clinical domain.

Methods

The grammar is implemented in the LKB system in a typed feature structure formalism. The lexicon is automatically generated based on the output of the FinTWOL morphological analyzer adapted to the clinical domain. As an additional experiment, we study the effect of using Finnish constraint grammar to reduce the size of the lexicon. The parser construction thus makes efficient use of existing resources for Finnish.

Results

The grammar currently covers 76.6% of ICU Finnish sentences, producing highly accurate best-parse analyzes with F-score of 91.1%. We find that building a parser for the highly specialized domain sublanguage is not only feasible, but also surprisingly efficient, given an existing morphological analyzer with broad vocabulary coverage. The resulting parser enables a deeper analysis of the text than was previously possible.

Keywords: Nursing narratives, Parsing, Typed feature structure grammars

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S1386-5056(09)00020-3

doi:10.1016/j.ijmedinf.2009.02.005

International Journal of Medical Informatics
Volume 78, Issue 12 , Pages e7-e12, December 2009