Text-mining protein-protein interaction corpus using concept clustering to identify intermittency

Leif E. Peterson, Matthew A. Coleman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

We used human protein-protein interaction (PPI) data transformed into documents to perform text-mining via concept clusters. The advantage of text-mining PPI data is that words (proteins) that are very sparse or over-abundant can be dropped, leaving the remaining bulk of data for clustering and rule mining. Libraries of tissue-specific binary PPIs were constructed from a list of 36,137 binary PPIs in the Human Protein Reference Database(HPRD). A randomization test for intermittency in the form of spikes and holes in frequency distributions of cluster-specific word frequencies was developed using scaled factorial moments. The test was based on a permutation form of a log-linear regression model to determine differences in slopes for ln(F 2) vs. ln(M) in the intermittent and null distributions. Significant intermittency (p < 0.0005) in PPI was detected for prostate and testis tissue after a Bonferroni adjustment for multiple tests. The presence of intermittency reflects spikes and holes in histograms of cluster-specific word frequencies and possibly suggests identification of novel large signal transduction pathways or networks.

Original languageEnglish (US)
Title of host publication2008 International Joint Conference on Neural Networks, IJCNN 2008
Pages3634-3640
Number of pages7
DOIs
StatePublished - Nov 24 2008
Event2008 International Joint Conference on Neural Networks, IJCNN 2008 - Hong Kong, China
Duration: Jun 1 2008Jun 8 2008

Other

Other2008 International Joint Conference on Neural Networks, IJCNN 2008
Country/TerritoryChina
CityHong Kong
Period6/1/086/8/08

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Text-mining protein-protein interaction corpus using concept clustering to identify intermittency'. Together they form a unique fingerprint.

Cite this