Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.
Read more about eLife’s peer review process.Editors
- Reviewing EditorIlse DaehnIcahn School of Medicine at Mount Sinai, New York, United States of America
- Senior EditorMatthias BartonUniversity of Zurich, Zurich, Switzerland
Reviewer #1 (Public Review):
Summary:
Rigor in the design and application of scientific experiments is an ongoing concern in preclinical (animal) research. Because findings from these studies are often used in the design of clinical (human) studies, it is critical that the results of the preclinical studies are valid and replicable. However, several recent peer-reviewed published papers have shown that some of the research results in cardiovascular research literature may not be valid because their use of key design elements is unacceptably low. The current study is designed to expand on and replicate previous preclinical studies in nine leading scientific research journals. Cardiovascular research articles that were used for examination were obtained from a PubMed Search. These articles were carefully examined for four elements that are important in the design of animal experiments: use of both biological sexes, randomization of subjects for experimental groups, blinding of the experimenters, and estimating the proper size of samples for the experimental groups. The findings of the current study indicate that the use of these four design elements in the reported research in preclinical research is unacceptably low. Therefore, the results replicate previous studies and demonstrate once again that there is an ongoing problem in the experimental design of preclinical cardiovascular research.
Strengths:
This study selected four important design elements for study. The descriptions in the text and figures of this paper clearly demonstrate that the rate of use of all four design elements in the examined research articles was unacceptably low. The current study is important because it replicates previous studies and continues to call attention once again to serious problems in the design of preclinical studies, and the problem does not seem to lessen over time.
Weaknesses:
The current study uses both descriptive and inferential statistics extensively in describing the results. The descriptive statistics are clear and strong, demonstrating the main point of the study, that the use of these design elements is quite low, which may invalidate many of the reported studies. In addition, inferential statistical tests were used to compare the use of the four design elements against each other and to compare some of the journals. The use of inferential statistical tests appears weak because the wrong tests may have been used in some cases. However, the overall descriptive findings are very strong and make the major points of the study.
Reviewer #2 (Public Review):
Summary
This study replicates a 2017 study in which the authors reviewed papers for four key elements of rigor: inclusion of sex as a biological variable, randomization of subjects, blinding outcomes, and pre-specified sample size estimation. Here they screened 298 published papers for the four elements. Over a 10 year period, rigor (defined as including any of the 4 elements) failed to improve. They could not detect any differences across the journals they surveyed, nor across models. They focused primarily on cardiovascular disease, which both helps focus the research but limits the potential generalizability to a broader range of scientific investigation. There is no reason, however, to believe rigor is any better or worse in other fields, and hence this study is a good 'snapshot' of the progress of improving rigor over time.
Strengths
The authors randomly selected papers from leading journals, e.g., PNAS). Each paper was reviewed by 2 investigators. They pulled papers over a 10-year period, 2011 to 2021, and have a good sample of time over which to look for changes. The analysis followed generally accepted guidelines for a structured review.
Weaknesses
The authors did not use the exact same journals as they did in the 2017 study. This makes comparing the results complicated. Also, they pulled papers from 2011 to 2021, and hence cannot assess the impact of their own prior paper.
The authors write "the proportion of studies including animals of both biological sexes generally increased between 2011 and 2021, though not significantly (R2= 0.0762, F(1,9)= 0.742, p= 0.411 (corrected p=8.2". This statement is not rigorous because the regression result is not statistically significant. Their data supports neither a claim of an increase nor a decrease over time. A similar problem repeats several times in the remainder of their results presentation.
I think the Introduction and the Discussion are somewhat repetitive and the wording could be reduced.
Impact and Context
Lack of reproducibility remains an enormous problem in science, plaguing both basic and translational investigations. With the increased scrutiny on rigor, and requirements at NIH and other funding agencies for more rigor and transparency, one would expect to find increasing rigor, as evidenced by authors including more study design elements (SDEs) that are recommended. This review found no such change, and this is quite disheartening. The data implies that journals-editors and reviewers-will have to increase their scrutiny and standards applied to preclinical and basic studies. This work could also serve as a call to action to investigators outside of cardiovascular science to reflect on their own experiences and when planning future projects.