Abstract
In microarray-based cancer classification and prediction, gene selection is an important research problem owing to the large number of genes and the small number of experimental conditions. In this paper, we propose a Bayesian approach to gene selection and classification using the logistic regression model. The basic idea of our approach is in conjunction with a logistic regression model to relate the gene expression with the class labels. We use Gibbs sampling and Markov chain Monte Carlo (MCMC) methods to discover important genes. To implement Gibbs Sampler and MCMC search, we derive a posterior distribution of selected genes given the observed data. After the important genes are identified, the same logistic regression model is then used for cancer classification and prediction. Issues for efficient implementation for the proposed method are discussed. The proposed method is evaluated against several large microarray data sets, including hereditary breast cancer, small round blue-cell tumors, and acute leukemia. The results show that the method can effectively identify important genes consistent with the known biological findings while the accuracy of the classification is also high. Finally, the robustness and sensitivity properties of the proposed method are also investigated.
Original language | English (US) |
---|---|
Pages (from-to) | 249-259 |
Number of pages | 11 |
Journal | Journal of Biomedical Informatics |
Volume | 37 |
Issue number | 4 |
DOIs | |
State | Published - Aug 2004 |
Keywords
- Bayesian gene selection
- Cancer classification
- Gene microarray
- Logistic regression
ASJC Scopus subject areas
- Computer Science Applications
- Health Informatics