Abstract
We used human protein-protein interaction (PPI) data transformed into documents to perform text-mining via concept clusters. The advantage of text-mining PPI data is that words (proteins) that are very sparse or over-abundant can be dropped, leaving the remaining bulk of data for clustering and rule mining. Libraries of tissue-specific binary PPIs were constructed from a list of 36,137 binary PPIs in the Human Protein Reference Database(HPRD). A randomization test for intermittency in the form of spikes and holes in frequency distributions of cluster-specific word frequencies was developed using scaled factorial moments. The test was based on a permutation form of a log-linear regression model to determine differences in slopes for ln(F 2) vs. ln(M) in the intermittent and null distributions. Significant intermittency (p < 0.0005) in PPI was detected for prostate and testis tissue after a Bonferroni adjustment for multiple tests. The presence of intermittency reflects spikes and holes in histograms of cluster-specific word frequencies and possibly suggests identification of novel large signal transduction pathways or networks.
Original language | English (US) |
---|---|
Title of host publication | 2008 International Joint Conference on Neural Networks, IJCNN 2008 |
Pages | 3634-3640 |
Number of pages | 7 |
DOIs | |
State | Published - Nov 24 2008 |
Event | 2008 International Joint Conference on Neural Networks, IJCNN 2008 - Hong Kong, China Duration: Jun 1 2008 → Jun 8 2008 |
Other
Other | 2008 International Joint Conference on Neural Networks, IJCNN 2008 |
---|---|
Country/Territory | China |
City | Hong Kong |
Period | 6/1/08 → 6/8/08 |
ASJC Scopus subject areas
- Software
- Artificial Intelligence