Peer review process
Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.
Read more about eLife’s peer review process.Editors
- Reviewing EditorAlan TaleviNational University of La Plata, La Plata, Argentina
- Senior EditorAleksandra WalczakÉcole Normale Supérieure - PSL, Paris, France
Reviewer #1 (Public review):
Summary:
The work provides more evidence of the importance of data quality and representation for ligand-based virtual screening approaches. The authors have applied different machine learning (ML) algorithms and data representation using a new dataset of BRAF ligands. First, the authors evaluate the ML algorithms, and demonstrate that independently of the ML algorithm, predictive and robust models can be obtained in this BRAF dataset. Second, the authors investigate how the molecular representations can modify the prediction of the ML algorithm. They found that in this highly curated dataset the different molecule representations are adequate for the ML algorithms since almost all of them obtain high accuracy values, with Estate fingerprints obtaining the worst performing predictive models and ECFP6 fingerprints producing the best classificatory models. Third, the authors evaluate the performance of the models on subsets of different composition and size of the BRAF dataset. They found that given a finite number of active compounds, increasing the number of inactive compounds worsens the recall and accuracy. Finally, the authors analyze if the use of "less active" molecules affect the model's predictive performance using "less active" molecules taken from ChEMBl Database or using decoys from DUD-E. As results, they found that the accuracy of the model falls as the number of "less active" examples in the training dataset increases while the implementation of decoys in the training set generates results as good as the original models or even better in some cases. However, the use of decoys in the training set worsens the predictive power in the test sets that contain active and inactive molecules.
Strengths:
This is a highly relevant topic in medicinal chemistry and drug discovery. The manuscript is well-written, with a clear structure that facilitates easy reading, and it includes up-to-date references. The hypotheses are clearly presented and appropriately explored. The study provides valuable insights into the importance of deriving models from high-quality data, demonstrating that, when this condition is met, complex computational methods are not always necessary to achieve predictive models. Furthermore, the generated BRAF dataset offers a valuable resource for medicinal chemists working in ligand-based virtual screening.
Weaknesses:
While the work highlights the importance of using high-quality datasets to achieve better and more generalizable results, it does not present significant novelty, as the analysis of training data has been extensively studied in chemoinformatics and medicinal chemistry. Additionally, the inclusion of "AI" in the context of data-centric AI is somewhat unclear, given that the dataset curation is conducted manually, selecting active compounds based on IC50 values from ChEMBL and inactive compounds according to the authors' criteria.
Moreover, the conclusions are based on the analysis of only two high-quality datasets. To generalize these findings, it would be beneficial to extend the analysis to additional high-quality datasets (at least 10 datasets for a robust benchmarking exercise).
A key aspect that could be improved is the definition of an "inactive" compound, which remains unclear. In the manuscript, it is stated:
• "The inactives were carefully selected based on the fact that they have no known pharmacological activity against BRAF."
Does the lack of BRAF activity data necessarily imply that these compounds are inactive?
• "We define a compound as 'inactive' if there are no known pharmacological assays for the said compound on our target, BRAF."
However, in the authors' response, they mention:
• "We selected certain compounds that we felt could not possibly be active against BRAF, such as ligands for neurotransmitter receptors, as inactives."
Given that the definition of "inactive" is one of the most critical concepts in the study, I believe it should be clearly and consistently explained.
Lastly, while statistical comparison is not always common in machine learning, it would greatly enhance the value of this work, especially when comparing models with small differences in accuracy.
Reviewer #2 (Public review):
Summary:
The authors explored the importance of data quality and representation for ligand-based virtual screening approaches. I believe the results could be of potential benefit to the drug discovery community, especially to those scientists working in the field of machine learning applied to drug research. The in silico design is comprehensive and adequate for the proposed comparisons.
This manuscript by Chong A. et al describes that it is not necessary to resort to the use of sophisticated deep learning algorithms for virtual screening, since based on their results considering conventional ML may perform exceptionally well if feeded by the right data and molecular representations.
The article is interesting and well-written. The overview of the field and the warning about dataset composition are very well thought-out and should be of interest to a broad segment of the AI in drug discovery readership. This article further highlights some of the considerations that need to be taken into consideration for the implementation of data-centric AI for computer-aided drug design methods.
Strengths:
This study contributes significantly to the field of machine learning and data curation in drug discovery. The paper is, in general, well-written and structured. However, in my opinion, there are some suggestions regarding certain aspects of the data analyses.
Weaknesses:
The conclusions drawn in the study are based on the analysis of a two dataset. The authors chose BRAF as an example in this study, and expanded with BACE-1 dataset; however a benchmark with several targets would be suitable to evaluate reproducibility or transferability of the method. One concern could be the applicability of the method in other targets.
Reviewer #3 (Public review):
Summary:
The authors presented a data-centric ML approach for virtual ligand screening. They used BRAF as an example to demonstrate the predictive power of their approach.
Strengths:
The performance of predictive models in this study is superior (nearly perfect) with respect to exiting methods.
Comments on revisions:
In the revised manuscript, the presented approach has been robustly tested and can be very useful for ligand prediction.