International Journal of Medical Informatics
Volume 79, Issue 4 , Pages 284-296, April 2010

A methodology to enhance spatial understanding of disease outbreak events reported in news articles

National Institute of Informatics, 2-1-2, Hitotsubashi, Chiyoda-ku, Tokyo, Japan

Received 15 June 2009; received in revised form 24 January 2010; accepted 24 January 2010. published online 15 February 2010.

Abstract 

Purpose

The emergence and re-emergence of disease outbreaks of international concern in the last several years has raised the importance of health surveillance systems that exploit the open media for their timely and precise detection of events. However, one of the key barriers faced by current event-based health surveillance systems is in identifying fine-grained terms for an outbreak's geographical location. In this article, we present a method to tackle this problem by associating each reported event with the most specific spatial information available in a news report. This would be useful not only for health surveillance systems, but also for other event-centered processing systems.

Methods

To develop an automated spatial attribute annotation system, we first created a gold standard corpus for training a machine learning model. Since the qualitative analysis on data suggested that the event class might have an impact on the spatial attribute annotation, we also developed an event classification system to incorporate event class information into the spatial attribute annotation model. To automatically recognize the spatial attribute of events, several approaches, ranging from a simple heuristic technique to a more sophisticated approach based on a state-of-the-art Conditional Random Fields (CRFs) model were explored. Different feature sets were incorporated into the model and compared.

Results

The evaluations were conducted on 100 outbreak news articles. Spatial attribute recognition performance was evaluated based on three metrics; precision, recall and the harmonic mean of precision and recall (F-score). Among three strategies proposed in this article, the CRF model appeared to be the most promising for spatial attribute recognition with a best performance of 85.5% F-score (86.3% precision and 84.7% recall).

Conclusion

We presented a methodology for associating each event in media outbreak reports with their spatial attribute at the finest level of granularity. Our goal has been to provide a means for enhancing the spatial understanding of outbreak-related events. Evaluation studies showed promising results for automatic spatial attribute annotation. In the future, we plan to explore more features, such as semantic correlation between words, that maybe useful for the spatial attribute annotation task.

Keywords: Natural language processing, Public health surveillance, Geographical information, Information system

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S1386-5056(10)00027-4

doi:10.1016/j.ijmedinf.2010.01.014

International Journal of Medical Informatics
Volume 79, Issue 4 , Pages 284-296, April 2010