Abstract
The comparative analysis of distributions of the presence/absence of short subsequences of different length ("n-mers", n = 5 - 20) in more than 100 microbial genomes has been performed. Our results show that for organisms, which are not close relatives of each other, the presence/absence of different 10-20-mers in their genomes are not correlated. For close biological relatives, some correlation of the presence of n-mers appears, but is not as strong as expected. Suppressed correlations among the n-mers present in different genomes lead to the possibility of using random sets of n-mers (with appropriately chosen n) to discriminate genomes of different organisms with a low probability of error. We have performed in silico experiments to demonstrate that the presence/absence pattern of 1000 random oligomers of length 12-13 in a bacterial genome is sufficiently characteristic to readily and unambiguously distinguish any known bacterial genome from any other.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, METMBS'04 |
Editors | F. Valafar, H. Valafar |
Pages | 363-367 |
Number of pages | 5 |
State | Published - Dec 1 2004 |
Event | Proceedings of the International Conference on Mathematics and Engineering Techniques in medicine and Biological Sciences, METMBS'04 - Las Vegas, NV, United States Duration: Jun 21 2004 → Jun 24 2004 |
Other
Other | Proceedings of the International Conference on Mathematics and Engineering Techniques in medicine and Biological Sciences, METMBS'04 |
---|---|
Country/Territory | United States |
City | Las Vegas, NV |
Period | 6/21/04 → 6/24/04 |
Keywords
- Microarray
- Pathogen identification
ASJC Scopus subject areas
- General Engineering