International Journal of Medical Informatics
Volume 78, Issue 12 , Pages e84-e96, December 2009

Predicting the graft survival for heart–lung transplantation patients: An integrated data mining methodology

  • Asil Oztekin

      Affiliations

    • Oklahoma State University, School of Industrial Engineering & Management, Stillwater, OK 74078, USA
    • Tel.: +1 405 744 4664.
  • ,
  • Dursun Delen

      Affiliations

    • William S. Spears School of Business, Oklahoma State University, North Hall, Suite 378, 700 North Greenwood Avenue, Tulsa, OK 74106, USA
    • Corresponding Author InformationCorresponding author. Tel.: +1 918 594 8283; fax: +1 918 594 8281.
  • ,
  • Zhenyu (James) Kong

      Affiliations

    • Oklahoma State University, School of Industrial Engineering and Management, 322 Engineering North, Stillwater, OK 74078, USA
    • Tel.: +1 405 744 6055.

Received 31 October 2008; received in revised form 22 February 2009; accepted 9 April 2009. published online 04 June 2009.

Abstract 

Background

Predicting the survival of heart–lung transplant patients has the potential to play a critical role in understanding and improving the matching procedure between the recipient and graft. Although voluminous data related to the transplantation procedures is being collected and stored, only a small subset of the predictive factors has been used in modeling heart–lung transplantation outcomes. The previous studies have mainly focused on applying statistical techniques to a small set of factors selected by the domain-experts in order to reveal the simple linear relationships between the factors and survival. The collection of methods known as ‘data mining’ offers significant advantages over conventional statistical techniques in dealing with the latter's limitations such as normality assumption of observations, independence of observations from each other, and linearity of the relationship between the observations and the output measure(s). There are statistical methods that overcome these limitations. Yet, they are computationally more expensive and do not provide fast and flexible solutions as do data mining techniques in large datasets.

Purpose

The main objective of this study is to improve the prediction of outcomes following combined heart–lung transplantation by proposing an integrated data-mining methodology.

Methods

A large and feature-rich dataset (16,604 cases with 283 variables) is used to (1) develop machine learning based predictive models and (2) extract the most important predictive factors. Then, using three different variable selection methods, namely, (i) machine learning methods driven variables—using decision trees, neural networks, logistic regression, (ii) the literature review-based expert-defined variables, and (iii) common sense-based interaction variables, a consolidated set of factors is generated and used to develop Cox regression models for heart–lung graft survival.

Results

The predictive models’ performance in terms of 10-fold cross-validation accuracy rates for two multi-imputed datasets ranged from 79% to 86% for neural networks, from 78% to 86% for logistic regression, and from 71% to 79% for decision trees. The results indicate that the proposed integrated data mining methodology using Cox hazard models better predicted the graft survival with different variables than the conventional approaches commonly used in the literature. This result is validated by the comparison of the corresponding Gains charts for our proposed methodology and the literature review based Cox results, and by the comparison of Akaike information criteria (AIC) values received from each.

Conclusions

Data mining-based methodology proposed in this study reveals that there are undiscovered relationships (i.e. interactions of the existing variables) among the survival-related variables, which helps better predict the survival of the heart–lung transplants. It also brings a different set of variables into the scene to be evaluated by the domain-experts and be considered prior to the organ transplantation.

Keywords: Survival analysis, Combined heart–lung transplantation, Classification, Data mining, Cox proportional hazards models

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S1386-5056(09)00070-7

doi:10.1016/j.ijmedinf.2009.04.007

International Journal of Medical Informatics
Volume 78, Issue 12 , Pages e84-e96, December 2009