Abstract
Background: Antimicrobial resistance is a major public health threat, and new agents are needed. Computational approaches have been proposed to reduce the cost and time needed for compound screening. Aims: A machine learning (ML) model was developed for the in silico screening of low molecular weight molecules. Methods: We used the results of a high-throughput Caenorhabditis elegans methicillin-resistant Staphylococcus aureus (MRSA) liquid infection assay to develop ML models for compound prioritization and quality control. Results: The compound prioritization model achieved an AUC of 0.795 with a sensitivity of 81% and a specificity of 70%. When applied to a validation set of 22,768 compounds, the model identified 81% of the active compounds identified by high-throughput screening (HTS) among only 30.6% of the total 22,768 compounds, resulting in a 2.67-fold increase in hit rate. When we retrained the model on all the compounds of the HTS dataset, it further identified 45 discordant molecules classified as non-hits by the HTS, with 42/45 (93%) having known antimicrobial activity. Conclusion: Our ML approach can be used to increase HTS efficiency by reducing the number of compounds that need to be physically screened and identifying potential missed hits, making HTS more accessible and reducing barriers to entry.
Original language | English (US) |
---|---|
Pages (from-to) | 1-10 |
Number of pages | 10 |
Journal | IEEE/ACM Transactions on Computational Biology and Bioinformatics |
Early online date | Jul 26 2024 |
DOIs | |
State | E-pub ahead of print - Jul 26 2024 |
Keywords
- antimicrobial drug resistance
- Compounds
- Data models
- Grippers
- High-temperature superconductors
- high-throughput screening
- in silico screening
- Libraries
- machine learning
- Random forests
- Training
ASJC Scopus subject areas
- Biotechnology
- Genetics
- Applied Mathematics