International Journal of Medical Informatics
Volume 78, Issue 12 , Pages e1-e6, December 2009

Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling: Method and clinical application

  • Filip Ginter

      Affiliations

    • Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland1
    • Corresponding Author InformationCorresponding author. Tel.: +358 50 4138305.
  • ,
  • Hanna Suominen

      Affiliations

    • Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland1
    • Turku Centre for Computer Science (TUCS), Joukahaisenkatu 3-5, 20520 Turku, Finland
  • ,
  • Sampo Pyysalo

      Affiliations

    • Turku Centre for Computer Science (TUCS), Joukahaisenkatu 3-5, 20520 Turku, Finland
  • ,
  • Tapio Salakoski

      Affiliations

    • Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland1
    • Turku Centre for Computer Science (TUCS), Joukahaisenkatu 3-5, 20520 Turku, Finland

Received 31 October 2008; received in revised form 17 December 2008; accepted 5 February 2009. published online 27 March 2009.

Abstract 

Motivation

Topic segmentation and labeling systems enable fine-grained information search. However, previously proposed methods require annotated data to adapt to different information needs and have limited applicability to texts with short segment length.

Methods

We introduce an unsupervised method based on a combination of hidden Markov models and latent semantic analysis which allows the topics of interest to be defined freely, without the need for data annotation, and can identify short segments.

Results

The method is evaluated on intensive care nursing narratives and motivated by information needs in this domain. The method is shown to considerably outperform a keyword-based heuristic baseline and to achieve a level of performance comparable to that of a related supervised method trained on 3600 manually annotated words.

Keywords: Hidden Markov models, Latent semantic analysis, Topic segmentation, Topic classification, Information retrieval, Computerized patient records, Nursing

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S1386-5056(09)00017-3

doi:10.1016/j.ijmedinf.2009.02.003

International Journal of Medical Informatics
Volume 78, Issue 12 , Pages e1-e6, December 2009