International Journal of Medical Informatics
Volume 78, Issue 12 , Pages e1-e6 , December 2009

Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling: Method and clinical application

  • Filip Ginter

      Affiliations

    • Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland1
    • Corresponding Author InformationCorresponding author. Tel.: +358 50 4138305.
  • ,
  • Hanna Suominen

      Affiliations

    • Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland1
    • Turku Centre for Computer Science (TUCS), Joukahaisenkatu 3-5, 20520 Turku, Finland
  • ,
  • Sampo Pyysalo

      Affiliations

    • Turku Centre for Computer Science (TUCS), Joukahaisenkatu 3-5, 20520 Turku, Finland
  • ,
  • Tapio Salakoski

      Affiliations

    • Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland1
    • Turku Centre for Computer Science (TUCS), Joukahaisenkatu 3-5, 20520 Turku, Finland

Received 31 October 2008 ,Revised 17 December 2008 ,Accepted 5 February 2009.

References 

  1. Suominen H, Pyysalo S, Ginter F, Salakoski T. Automated text segmentation and topic labeling of clinical narratives. In: Proceedings of Louhi’08. TUCS. 2008;p. 99–103
  2. Tange HJ, Schouten HC, Kester ADM, Hasman A. The granularity of medical narratives and its effect on the speed and completeness of information retrieval. Journal of American Medical Informatics Association. 1998;5(6):571–582
  3. Hearst MA. TextTiling: segmenting text into multi-paragraph subtopic passages. Computational Linguistics. 1997;23(1):33–64
  4. Ferret O. Using collocations for topic segmentation and link detection. In: Proceedings of COLING’02. ACL. 2002;p. 1–7
  5. Yamron JP, Carp I, Gillick L, Lowe S, Mulbregt Pvan. A hidden Markov model approach to text segmentation and event tracking. In: Proceedings of ICASSP’98. IEEE. 1998;p. 333–336
  6. Blei DM, Moreno PJ. Topic segmentation with an aspect hidden Markov model. In: Proceedings of SIGIR’01. ACM. 2001;p. 343–348
  7. Gruber A, Rosen-Zvi M, Weiss Y. Hidden topic Markov models. In: Proceedings of AISTATS’07. Society for Artificial Intelligence and Statistics. 2007;
  8. Ponte JM, Croft WB. Text segmentation by topic. In: Proceedings of ECDL’97. Springer-Verlag. 1997;p. 113–125
  9. Chang T-H, Lee Ch-H. Topic segmentation for short texts. In: Proceedings of PACLIC 17. Colips Publications. 2003;p. 159–165
  10. Mullen T, Mizuta Y, Collier N. A baseline feature set for learning rhetorical zones using full articles in the biomedical domain. SIGKDD Explorations. 2005;7(1):52–58
  11. Ogren PV. Knowtator: a Protégé plug-in for annotated corpus construction. In: Proceedings of HLT-NAACL’06. ACL. 2006;p. 273–275
  12. Schütze H. Automatic word sense discrimination. Computational Linguistics. 1998;24(1):97–123
  13. Widdows D, Peters S. Word vectors and quantum logic: experiments with negation and disjunction. In: Proceedings of MoL8. 2003;p. 141–154
  14. Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE. 1989;77(2):257–286
  15. Ginter F, Suominen H, Pyysalo S, Salakoski T. Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling: method and clinical application. In: Proceedings of SMBM’08. TUCS. 2008;p. 37–44
  16. Koskenniemi K. Two-level model for morphological analysis. In: Proceedings of IJCAI’83. Morgan Kaufmann. 1983;p. 683–685
  17. Dorow B, Widdows D. Discovering corpus-specific word senses. In: Proceedings of EACL’03. ACL. 2003;p. 79–82
  18. Pevzner L, Hearst MA. A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics. 2002;28(1):19–36
  19. Hofmann T. Probabilistic latent semantic analysis. In: Proceedings of UAI’99. Morgan Kaufmann. 1999;p. 289–296
  20. Kanerva P, Kristofersson J, Holst A. Random indexing of text samples for latent semantic analysis. In: Proceedings of CogSci’00. Erlbaum. 2000;p. 1036

PII: S1386-5056(09)00017-3

doi: 10.1016/j.ijmedinf.2009.02.003

International Journal of Medical Informatics
Volume 78, Issue 12 , Pages e1-e6 , December 2009