|
|
||||||||
Operations and Information Management, School of Business, University of Connecticut, Storrs, Connecticut 06269
Data sets with many discrete variables and relatively few cases arise in health care, e-commerce, information security, text mining, and many other domains. Learning effective and efficient prediction models from such data sets is a challenging task. In this paper, we propose a tabu search-enhanced Markov blanket (TS/MB) algorithm to learn a graphical Markov blanket model for classification of high-dimensional data sets. The TS/MB algorithm makes use of Markov blanket neighborhoods: restricted neighborhoods in a general Bayesian network based on the Markov condition. Computational results from real-world data sets drawn from several domains indicate that the TS/MB algorithm, when used as a feature selection method, is able to find a parsimonious model with substantially fewer predictor variables than is present in the full data set. The algorithm also provides good prediction performance when used as a graphical classifier compared with several machine-learning methods.
The H. John Heinz III School of Public Policy and Management, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Department of Philosophy, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Department of Philosophy, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
xue.bai{at}business.uconn.edu
rpadman{at}andrew.cmu.edu
jdramsey{at}andrew.cmu.edu
ps7z{at}andrew.cmu.edu
Key words: Markov blanket; Bayesian networks; tabu search; machine learning; text analysis; health care decision support; online marketing
History: received June 2006;
revised June 2007;
accepted October 2007.
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |