TY - GEN
T1 - Superresolution MUSIC based on Marčenko-Pastur limit distribution reduces uncertainty and improves DNA gene expression-based microarray classification
AU - Peterson, Leif E.
PY - 2014
Y1 - 2014
N2 - We introduce a bootstrap root MUSIC (BRM) technique, which employs superresolution multisignal classification to reduce high-dimensional sets of genes from expression microarrays to low-dimensional sets used in supervised classification analysis. During BRM, the Marčenko-Pastur limit distribution of eigenvalues for the array-by-array gene expression covariance matrix was used for determining the eigenvalue cutoff for the noise subspace. Classifier results were compared with and without replacing gene expression values with the inverse of the distance to class-specific noise eigenspace for each microarray. Nine gene expression datasets were used for classification, and results of using BRM were compared with classification results based on use of random and best ranked N genes. On average, BRM resulted in greater classification of randomly selected genes when compared with direct use of randomly selected genes for classifier input. In addition, when BRM was applied to best ranked N genes, the interquartile ranges of accuracy were smaller when compared with direct input of best ranked genes into classifiers. Overall, BRM can optimally be used with 128 or 256 best ranked markers, requiring less extensive filtering to identify smaller sets of predictors. Use of a larger set of markers with BRM can help minimize the effect of concept drift over time.
AB - We introduce a bootstrap root MUSIC (BRM) technique, which employs superresolution multisignal classification to reduce high-dimensional sets of genes from expression microarrays to low-dimensional sets used in supervised classification analysis. During BRM, the Marčenko-Pastur limit distribution of eigenvalues for the array-by-array gene expression covariance matrix was used for determining the eigenvalue cutoff for the noise subspace. Classifier results were compared with and without replacing gene expression values with the inverse of the distance to class-specific noise eigenspace for each microarray. Nine gene expression datasets were used for classification, and results of using BRM were compared with classification results based on use of random and best ranked N genes. On average, BRM resulted in greater classification of randomly selected genes when compared with direct use of randomly selected genes for classifier input. In addition, when BRM was applied to best ranked N genes, the interquartile ranges of accuracy were smaller when compared with direct input of best ranked genes into classifiers. Overall, BRM can optimally be used with 128 or 256 best ranked markers, requiring less extensive filtering to identify smaller sets of predictors. Use of a larger set of markers with BRM can help minimize the effect of concept drift over time.
UR - http://www.scopus.com/inward/record.url?scp=84958553258&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84958553258&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-09042-9_14
DO - 10.1007/978-3-319-09042-9_14
M3 - Conference contribution
AN - SCOPUS:84958553258
SN - 9783319090412
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 194
EP - 209
BT - Computational Intelligence Methods for Bioinformatics and Biostatistics - 10th International Meeting, CIBB 2013, Revised Selected Papers
PB - Springer-Verlag
T2 - 10th International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics, CIBB 2013
Y2 - 20 June 2013 through 22 June 2013
ER -