International Journal of Medical Informatics
Volume 77, Issue 2 , Pages 81-97 , February 2008

Predictive data mining in clinical medicine: Current issues and guidelines

  • Riccardo Bellazzi

      Affiliations

    • Dipartimento di Informatica e Sistemistica, Università di Pavia, via Ferrata 1, 27100 Pavia, Italy
    • Corresponding Author InformationCorresponding author. Tel.: +39 0382 505511; fax: +39 0382 505373.
  • ,
  • Blaz Zupan

      Affiliations

    • Faculty of Computer Science, University of Ljubljana, Slovenia
    • Department of Human and Molecular Genetics, Baylor College of Medicine, Houston, TX, United States

Received 27 October 2006 ,Accepted 17 November 2006.

  • Image Result

    Induction of prediction models. The figure shows an example of a training data set with three attributes, an outcome and 20 instances (A), a nomogram representing a naïve Bayesian classifier (B), and

    Induction of prediction models. The figure shows an example of a training data set with three attributes, an outcome and 20 instances (A), a nomogram representing a naïve Bayesian classifier (B), and a decision tree developed from the same data set (C). To use a nomogram for prediction, each attribute value relates to the number of points (the topmost scale), which after summation give the total number of points and corresponding probability (the two scales on the bottom of B).

  • Image Result
    Classification rules inferred by a CN2-like covering algorithm from the data set from Fig. 1A. While the first rule covers only those examples with a good outcome, the class distribution of the other

    Classification rules inferred by a CN2-like covering algorithm from the data set from Fig. 1A. While the first rule covers only those examples with a good outcome, the class distribution of the other two rules is mixed as the coverage includes one example from the minority (good outcome) class. Rule quality was assessed through a Laplace probability estimate.

  • Image Result
    Predictions of the naive Bayesian classifier (Fig. 1B) and decision tree (Fig. 1C) for three different cases. The question mark in the third case for the attribute Health signifies a missing (unknown)

    Predictions of the naive Bayesian classifier (Fig. 1B) and decision tree (Fig. 1C) for three different cases. The question mark in the third case for the attribute Health signifies a missing (unknown) value. Probabilities by each classifier are given for both outcomes, ‘good’ and ‘bad’ (rightmost two columns, probabilities are separated by a column, the prevailing class label is also shown).

  • Image Result
    Evaluation results for a naive Bayesian classifier and decision tree inference algorithm on an example data set from Fig. 1A using a ‘leave-one-out’ test.

    Evaluation results for a naive Bayesian classifier and decision tree inference algorithm on an example data set from Fig. 1A using a ‘leave-one-out’ test.

  • Image Result
    Scatterplot of a two-class data set with maximum-margin hyperplanes found by a support vector machine induction algorithm with a linear kernel. The data instances along the hyperplanes that define the

    Scatterplot of a two-class data set with maximum-margin hyperplanes found by a support vector machine induction algorithm with a linear kernel. The data instances along the hyperplanes that define the margin (plotted in red) are called support vectors.

  • Image Result
    The output of the survival prediction problem in a malignant skin tumor, presented by Sierra et al. [75]. Subfigure (A) shows the Bayesian network as induced from the data, while (B) shows the naive B

    The output of the survival prediction problem in a malignant skin tumor, presented by Sierra et al. [75]. Subfigure (A) shows the Bayesian network as induced from the data, while (B) shows the naive Bayesian model. Model (A) better describes the relationships between the variables and the outcomes.

  • Image Result
    Snapshot of decisions-at-Hand software on a PocketPC that shows the nomogram reporting on the outcome. The prediction was made on the same case as shown in Fig. 1B.

    Snapshot of decisions-at-Hand software on a PocketPC that shows the nomogram reporting on the outcome. The prediction was made on the same case as shown in Fig. 1B.

PII: S1386-5056(06)00274-7

doi: 10.1016/j.ijmedinf.2006.11.006

International Journal of Medical Informatics
Volume 77, Issue 2 , Pages 81-97 , February 2008