Science Forum: Consensus-based guidance for conducting and reporting multi-analyst studies

ELTE Eotvos Lorand University, Hungary
Karolinska Institutet, Sweden
Stockholm University, Sweden
Tilburg University, Netherlands
University of Groningen, Netherlands
Utrecht University, Netherlands
University Medical Center Groningen, University of Groningen, Netherlands
Friesland Mental Health Care Services, Netherlands
University of California Los Angeles, United States
National Bureau of Economic Research, United States
University of Amsterdam, Netherlands
Dartmouth College, United States
University of Münster, Germany
University of Rennes, CNRS, Inria and Inserm, France
McLean Hospital, United States
Harvard Medical School, United States
University of Missouri, United States
National University of Singapore, Singapore
University of New South Wales, Australia
Stockholm School of Economics, Sweden
University of Innsbruck, Austria
University Hospital Basel, Switzerland
Monash University, Australia
University of Wisconsin-Madison, United States
Ludwig-Maximilians-University, Germany
Maastricht University, Netherlands
Princeton University, United States
University of Victoria, Canada
Université Paris-Saclay, France
Neurospin, CEA, France
Amsterdam University, Netherlands
University of Bristol, United Kingdom
Center for Open Science, United States
University of Virginia, United States
Stanford University, United States
University of Basel, Switzerland
Tel Aviv University, Israel
ESMT Berlin, Germany
University College London, United Kingdom
University of Sussex, United Kingdom
University of Illinois, United States
University of Alberta, Canada
Lund University, United States
University of Massachusetts Amherst, United States
INSEAD, Singapore

Nov 9, 2021

Open access
Copyright information

Download
Cite
CommentOpen annotations (there are currently 0 annotations on this page).
Share

Article
Figures and data
Abstract
Introduction
Methods
Multi-analyst guidance
Conclusions
Data availability
References
Decision letter
Author response
Article and author information
Metrics

Abstract

Any large dataset can be analyzed in a number of ways, and it is possible that the use of different analysis strategies will lead to different results and conclusions. One way to assess whether the results obtained depend on the analysis strategy chosen is to employ multiple analysts and leave each of them free to follow their own approach. Here, we present consensus-based guidance for conducting and reporting such multi-analyst studies, and we discuss how broader adoption of the multi-analyst approach has the potential to strengthen the robustness of results and conclusions obtained from analyses of datasets in basic and applied research.

Introduction

Empirical investigations often require researchers to make a large number of decisions about how to analyze the data. However, the theories that motivate investigations rarely impose strong restrictions on how the data should be analyzed. This means that empirical results typically hinge on analytical choices made by just one or a small number of researchers, and raises the possibility that different – but equally justifiable – analytical choices could lead to different results (Figure 1).

Figure 1

Download asset Open asset

Analysis choices and alternative plausible paths.

The analysis of a large dataset can involve a sequence of analysis choices, as depicted in these schematic diagrams. The analyst first must decide between two options at the start of the analysis (top), and must make three additional decisions during the analysis: this leads to 16 possible paths for the analysis (grey lines). The left panel shows an example in which all possible paths lead to the same conclusion; the right panel shows an example in which some paths lead to conclusion A and other paths lead to conclusion B. Unless we can test alternative paths, we cannot know if the results obtained by following one particular path (thick black line) are robust, or if other plausible paths would lead to different results.

This "analytical variability" may be particularly high for datasets that were not initially collected for research purposes (such as electronic health records) because data analysts might know relatively little about how those data were collected and/or generated. However, when analyzing such datasets – and when making decisions based on the results of such analyses – it is important to be aware that the results will be subject to higher levels of analytical variability than the results obtained from analyses of data from, say, clinical trials. A recent example of the perils of analytical variability is provided by two articles in the journal Surgery that used the same dataset to investigate the same question: does the use of a retrieval bag during laparoscopic appendectomy reduce surgical site infections? Each paper used reasonable analysis, but there were notable differences between them in how they addressed inclusion and exclusion criteria, outcome measures, sample sizes, and covariates. As a result of these different analytical choices, the two articles reached opposite conclusions: one paper reported that using a retrieval bag reduced infections (Fields et al., 2019), and the other reported that it did not (Turner et al., 2019; see also Childers and Maggard-Gibbons, 2021). This and other medical examples (de Vries et al., 2010; Jivanji et al., 2020; Shah et al., 2021) illustrate how independent analysis of the same data can reach different, yet justifiable, conclusions.

The robustness of results and conclusions can be studied by evaluating multiple distinct analysis options simultaneously (e.g., vibration of effects [Patel et al., 2015] or multiverse analysis [Steegen et al., 2016]), or by employing a "multi-analyst approach" that involves engaging multiple analysts to independently analyze the same data. Rather than exhaustively evaluating all plausible analyses, the multi-analyst approach examines analytical choices that are deemed most appropriate by independent analysts. Botvinik-Nezer et al., 2020a, for example, asked 70 teams to test the same hypotheses using the same functional magnetic resonance imaging dataset. They found that no two teams followed the same data preprocessing steps or analysis strategies, which resulted in substantial variability in the teams’ conclusions. This and other work (Bastiaansen et al., 2020; van Dongen et al., 2019; Salganik et al., 2020; Silberzahn et al., 2018; Dutilh et al., 2018; Fillard et al., 2011; Starns et al., 2019; Maier-Hein et al., 2017; Poline et al., 2006) confirms how results can depend on analytic choices.

Although the multi-analyst approach will be new to many researchers, it has been in use since the 19th century. In 1857, for example, the Royal Asian Society asked four scholars to independently translate a previously unseen inscription to verify that the ancient Assyrian language had been deciphered correctly. The almost perfect overlap between the solutions indicated that “they have Truth for their basis” (Fox Talbot et al., 1861). The same approach can be used to analyze data today. With just a few co-analysts, the multi-analyst approach can be informative about the analytic robustness of results and conclusions. When the results of independent data analyses converge, more confidence in the conclusions is warranted. However, when the results diverge, confidence will be reduced, and scientists can examine the reasons for these discrepancies and identify potentially meaningful moderators of the results. With enough co-analysts, it is possible to estimate the variability among analysis strategies and attempt to identify factors explaining this variability.

The multi-analyst approach is still rarely used, but we argue that many disciplines could benefit from its broader adoption. To help researchers overcome practical challenges, we provide consensus-based guidance (including a checklist) to help researchers surmount the practical challenges of preparing, conducting, and reporting multi-analyst studies.

Methods

To develop this guidance, we recruited a panel of 50 methodology experts who followed a preregistered ‘reactive-Delphi’ expert consensus procedure (McKenna, 1994). We adopted this procedure to ensure that the resulting guidance represents the shared thinking of relevant experts and that it incorporates their topic-related insights. The applied consensus procedure and its reporting satisfy the recommendations of CREDES (Jünger et al., 2017), a guidance on conducting and reporting Delphi studies. A flowchart of the Delphi expert consensus procedure is available at https://osf.io/pzkcs/.

Preparation

Preregistering the project

Before the start of the project, on 11 November 2020, a research plan was compiled and uploaded to a time-stamped repository at https://osf.io/dgrua. During the project, we followed the preregistered plan in all respects except implementing slight changes in the wording of the survey questions to improve comprehension and not using R to analyze our results. We declared that we would share the R code and codebook of our analyses, but the project ultimately did not require us to conduct analyses in R. Instead, we shared our code in Excel and ODS format at https://osf.io/h36qy/.

Creating the initial multi-analyst guidance draft

Before the expert consensus process, the first three authors and the last author (henceforth: proposers) created an initial multi-analyst guidance draft after brainstorming and reviewing all the previously published multi-analyst-type projects they were aware of Bastiaansen et al., 2020; van Dongen et al., 2019; Salganik et al., 2020; Silberzahn et al., 2018; Botvinik-Nezer et al., 2020a; Dutilh et al., 2018; Fillard et al., 2011; Starns et al., 2019; Maier-Hein et al., 2017; Poline et al., 2006. This initial document is available here: https://osf.io/kv8jt/.

Recruiting experts

The proposers contacted 81 experts to join the project. The contacted experts included all the organizers of previous multi-analyst projects known at the time (Bastiaansen et al., 2020; van Dongen et al., 2019; Salganik et al., 2020; Silberzahn et al., 2018; Botvinik-Nezer et al., 2020a; Dutilh et al., 2018; Fillard et al., 2011; Starns et al., 2019; Maier-Hein et al., 2017; Poline et al., 2006), as well as the members of the expert panel from another methodological consensus project (Aczel et al., 2020). The previous projects were identified by conducting an unsystematic literature search and by surveying researchers in social media. Of the 81 experts, 3 declined our invitation and 50 accepted the invitation and participated in the expert consensus procedure (their names are available at https://osf.io/fwqvp/), while 28 experts did not respond to our call.

Preparatory rounds

Upon joining the project, the experts received a link to the preparatory online survey (available at https://osf.io/kv8jt/) which included the initial Multi-Analyst Guidance draft where they had the option to comment on each of the items and the overall content of the guidance.

Based on the feedback received from the preparatory online survey, the proposers updated and revised the initial Multi-Analyst Guidance. This updated document was uploaded to an online shared document and was sent out to the experts who had the option to edit and comment on the content. Again, based on feedback, the proposers revised the content of the document, and this new version was included in the expert consensus survey.

Consensus survey

The expert consensus questionnaire was sent out individually to each expert first on 8 February 2021 in the following Qualtrics survey available at https://osf.io/wrpnq/. The consensus survey approach had the advantage of minimizing potential biases in the experts’ judgments: the questions were posed in a neutral way, experts all received the same questions, and experts did not see the responses of the other experts or any reaction of the project organizers. The survey contained the ten recommended practices grouped into the following five stages:( i) recruiting co-analysts; (ii) providing the dataset, research questions, and research tasks; (iii) conducting the independent analyses; (iv) processing the results; (v) reporting the methods and results. The respondents were asked to rate each of the ten recommended practices on a nine-point Likert-type scale (‘I agree with the content and wording of this guidance section’ ranging from “1-Disagree” to “9-Agree”). Following each section, the respondents could leave comments regarding the given item.

The preregistration indicated consensus on the given item if the interquartile range of its ratings was two or smaller. It defined support for an item if the median rating was six or higher (as in Aczel et al., 2020).

Each recommended practice found support and consensus from the 48 experts who completed ratings in our first round. For each item, the median rating was eight or higher with an interquartile range of two or lower. Thus, following our preregistration, there was no need to conduct additional consensus-survey rounds; all of the items were eligible to enter the guidance with consensual support. This high level of consensus might have been due to the experts’ involvement in the preparatory round of the project. The summary table of the results is available at https://osf.io/qc7a8/.

Finalizing the manuscript

The proposers drafted the manuscript and supplements. All texts and materials were sent to the expert panel members. Each contributor was encouraged to provide feedback on the manuscript, the report, and the suggested final version of the guidance. After all discussions, minor wording changes were implemented, as documented at https://osf.io/e39j4/. No contributor objected to the content and form of the submitted materials and all approved the final item list.

Multi-analyst guidance

The final guidance includes ten recommended practices (Table 1) concerning the five main stages of multi-analyst studies. To further assist researchers in documenting multi-analyst projects, we also provide a modifiable reporting template (Supplementary file 1), as well as a reporting checklist (Supplementary file 2).

Table 1

Recommended practices for the main stages of the multi-analyst method.

Stage	Recommended practices
Recruiting co-analysts	1. Determine a minimum target number of co-analysts and outline clear eligibility criteria before recruiting co-analysts. We recommend that the final report justifies why these choices are adequate to achieve the study goals. 2. When recruiting co-analysts, inform them about (a) their tasks and responsibilities; (b) the project code of conduct (e.g., confidentiality/ non-disclosure agreements); (c) the plans for publishing the research report and presenting the data, analyses, and conclusion; (d) the conditions for an analysis to be included or excluded from the study; (e) whether their names will be publicly linked to the analyses; (f) the co-analysts’ rights to update or revise their analyses; (g) the project time schedule; and (h) the nature and criteria of compensation (e.g., authorship).
Providing datasets, research questions, and research tasks	3. Provide the datasets accompanied with a codebook that contains a comprehensive explanation of the variables and the datafile structure. 4. Ensure that co-analysts understand any restrictions on the use of the data, including issues of ethics, privacy, confidentiality, or ownership. 5. Provide the research questions (and potential theoretically derived hypotheses that should be tested) without communicating the lead team’s preferred analysis choices or expectations about the conclusions.
Conducting the independent analyses	6. To ensure independence, we recommend that co-analysts should not communicate with each other about their analyses until after all initial reports have been submitted. In general, it should be clearly explained why and at what stage co-analysts are allowed to communicate about the analyses (e.g., to detect errors or call attention to outlying data points).
Processing the results	7. Require co-analysts to share with the lead team their results, the analysis code with explanatory comments (or a detailed description of their point-and-click analyses), their conclusions, and an explanation of how their conclusions follow from their results. 8. The lead team makes the commented code, results, and conclusions of all non-withdrawn analyses publicly available before or at the same time as submitting the research report.
Reporting the methods and results	9. The lead team should report the multi-analyst process of the study, including (a) the justification for the number of co-analysts; (b) the eligibility criteria and recruitment of co-analysts; (c) how co-analysts were given the data sets and research questions; (d) how the independence of analyses was ensured; (e) the numbers of and reasons for withdrawals and omissions of analyses; (f) whether the lead team conducted an independent analysis; (g) how the results were processed; (h) the summary of the results of co-analysts; (i) and the limitations and potential biases of the study. 10. Data management should follow the FAIR principles (Wilkinson et al., 2016), and the research report should be transparent about access to the data and code for all analyses (Aczel et al., 2020).

In addition to the Multi-analyst Guidance and Checklist, we provide practical considerations that can support the organization and execution of multi-analyst projects. This section contains various clarifications, recommendations, practical tools, and optional extensions, covering the five main stages of a multi-analyst project.

Recruiting co-analysts

Choosing co-analysts

The term co-analyst refers to one researcher or team of researchers working together in a multi-analyst project. Researchers can collaborate on the analyses, but if they do, we recommend that they submit the analyses as one co-analyst team, in order to ensure the independence of the analyses across teams. Researchers from the same lab or close collaborators should be able to submit separate reports in the multi-analyst project as long as they do not discuss their analyses with each other until the project rules allow that. The lead team may conduct an analysis themselves depending on the study goals and the design of the project (e.g., to set a performance baseline for comparing submitted models). Alternatively, the lead team may choose not to conduct an analysis themselves; in any case, they are expected to be transparent about their level of involvement as well as the timing (e.g., whether they conducted their analyses with or without knowing the results of the crowd of analysts).

Researchers should carefully consider both the breadth and depth of statistical and research-area expertise required for their project and should justify their choices about the required qualifications, skills, and credentials for analysts in the project. If the aim of the study is to explore what factors influence researchers’ analytical choices, then it can be useful to seek “natural variation” (representativeness) within an expert community or to maximize diversity of the co-analysts along the dimensions where they might differ the most in their choices (e.g., experience, background, discipline, interest in the findings, intellectual allegiance to different theories, paradigmatic viewpoints).

Deciding on the number of co-analysts

To decide on the desired number of co-analysts, one has to consider which of the two main purposes of the multi-analyst method applies to the given project:

Checking the robustness of the conclusions

The aim here is solely to check whether different analysts obtain the same conclusions. Confidence in the stability of the conclusions decreases with divergent results and increases with convergent results. Many projects can achieve this aim by recruiting only one additional analyst, or a handful of further analysts. For example, the above-mentioned two analyses of the same dataset published in the journal Surgery (Fields et al., 2019; Turner et al., 2019) were sufficient to detect that the analytical space allows for opposite conclusions.

Assessing the variability of the analyses

Those who wish to estimate the variability among the different analysis strategies often need to satisfy stricter demands. For example, studies that aim to assess how much the results vary among the analysts will require a larger number of co-analysts. When determining the number of co-analysts in such cases, the same factors need to be taken into consideration as in standard sample size estimation methods. For example, Botvinik-Nezer et al. (Botvinik-Nezer et al., 2020a) presented the analyses of 70 teams to demonstrate the divergence of results when analyzing a functional magnetic resonance imaging dataset.

Recruiting co-analysts

Depending on the specific goal of the research, the recruitment of co-analysts can happen in several ways. Co-analysts can be recruited before or after obtaining the dataset. With stricter eligibility criteria, co-analysts can be invited individually from among topic experts or statistical experts. Follow-up open invitations can ask experts to suggest others to be invited. Alternatively, the lead team can open the opportunity to anyone to join the project as a co-analyst within the expert community (e.g., in professional society mailing lists and on social media), where expertise can be defined as the topic requires it.

It is important to note that whenever the co-authors’ behavior is the subject of the study then they should be regarded similarly to human participants respecting ethical and data protection regulations. Useful templates for project advertisement and analyst surveys can be found in Silberzahn et al., 2018; Schweinsberg et al., 2021.

Providing the dataset, research questions, and research tasks

Providing the dataset

The lead team can invite the co-analysts to conduct data preprocessing (in addition to the main analysis). If the lead team decides to conduct the preprocessing themselves, showing their preprocessing methods can be informative to the co-analysts, but also has the potential to influence them if the preprocessing reflects some preference of methods or expectations of outcomes.

Before providing the dataset, the lead team should ensure that data management will comply with legal (e.g., the General Data Protection Regulation (GDPR) in the European Union) and ethical regulations applying to all teams (see Lundberg et al., 2019). If the dataset contains personal information, a version should be provided where data can no longer be related to an individual. An alternative is to provide a simulated dataset and ask the co-analysts to provide code to analyze the data (Drechsler, 2011; Quintana, 2020). The lead team can then run the code on the actual data.

It is important that the co-analysts understand not just the available dataset but also any ancillary information that might affect their analyses (e.g., prior exclusion of outliers or handling of missing data in the blinded dataset). Providing a codebook that is accessible and understandable for researchers with different backgrounds is essential (Kindel et al., 2019).

Providing the research question

The provided research question(s) should motivate the analysis conducted by the co-analysts. The research questions should be conveyed without specifying preferred analysis choices or expectations about the conclusions. Depending on the purpose of the project, the research questions can be more or less specific. While more specific research questions limit the analytical freedom of the co-analysts, less specific ones better explore the ways researchers can diverge in their operationalization of their question. A research question (e.g., “Is happiness age-dependent?”) can be more specific when, for example, it is formulated as a directional hypothesis (e.g., “Are young people more happy than old ones?”) or when the constructs are better operationalized (e.g., by defining what counts as young and happy).

Providing the task

The multi-analyst approach can leave the operationalization of the research question to the co-analysts so that they can translate the theoretical question into the measurement. Taking this approach can reveal the operational variations of a question, but it can also make it difficult to compare the statistical results.

Requesting results in terms of standardized metrics (e.g., t-values, standardized beta, Cohen’s d) makes it easier to compare results between co-analysts. The requested metric can be determined from the aim of the analysis (e.g., hypothesis testing, parameter estimation). It needs to be borne in mind, however, that this request might bias the analysis strategies towards using methods that easily provide such a metric. A practical tool with instructions on reporting effect estimates can be found in Parker et al., 2020.

Co-analysts should be asked to keep a record of any code, derivatives etc. that were part of the analysis, at least until the manuscript is submitted and all relevant materials are (publicly) shared.

As an extension, the co-analysts can be asked to record considered but rejected analysis choices and the reasoning behind their choices (e.g., by commented code, log-books, or dedicated solutions such as DataExplained [Schweinsberg et al., 2021]). These logs can reflect where and why co-analysts diverge in their choices.

Robustness, or multiverse analyses (in the sense that each team is free to provide a series of outcomes instead of a single one) can also be part of the task of the co-analysts so that multiple analyses are conducted under alternative data analysis preprocessing choices.

Communication with co-analysts

In projects with many co-analysts, keeping contact via a dedicated email address and automating some of the messages (e.g., automated emails when teams finished a stage in the process) can help streamline the communication and make the process less prone to human errors. For co-analyst teams with multiple members, it can be helpful for each team to nominate one member as the representative for communications.

If further information is provided to a co-analyst following specific questions, it can be useful to make sure the same information is provided to all teams, for example via a Q&A section of the project website, hosting weekly office hours where participants could ask questions, or via periodic email with updates.

Conducting the independent analyses

Preregistering the process and statistical analyses

We can distinguish meta- and specific preregistrations. Meta-preregistrations concern the plan of the whole multi-analyst project. It is good practice for the lead team to preregister how they would process, handle, and report the results of the co-analysts in order to prevent result-driven biases. This can be done in the form of a Registered Report at journals that invite such submissions (Chambers, 2013). Any metascientific questions, such as randomization of co-analysts to different conditions with variations in instructions or data, or covariates of interest for studying associations to analytic variability, should be specified.

Specific preregistrations concern the analysis plans of the co-analysts. Requiring co-analysts to prepare a specific preregistration for each analysis can be a strategy to prevent overfitting and undisclosed flexibility. It makes sense to require it from either all or none of the teams in order to maintain equal treatment among them (unless the effect of preregistration is a focus of the study).

Requiring specific preregistrations may be misaligned with the goals of the project when the aim is to explore how the analytic choices are formed during the analyses, independent of initial plans. Under such circumstances, requiring specific preregistrations may be counterproductive. Nevertheless, the lead team can record their meta-preregistration that lays down the details of the multi-analyst project.

There are alternative solutions to prevent researchers from being biased by their data and results. For example, co-analysts could be provided with blinded datasets (Dutilh et al., 2018; Starns et al., 2019; Gøtzsche, 1996), simulated datasets (Quintana, 2020), or with a subset of the data (e.g., 11).

Processing the results

Collecting the results

To facilitate summarizing the co-analysts’ methods, results, and conclusions, the lead team can collect results through provided templates or survey forms that can structure analysts’ reports. It is practical to ask the co-analysts at this stage to acknowledge that they did not communicate or cooperate with other co-analysts regarding the analysis in the project. It can also be helpful for the lead team if the co-analysts explain how their conclusions were derived from the results. In case preregistration was employed for any analyses, the template can also collect any deviations from the preregistered plan for inclusion in an online supplement.

To collect analytic code, it may be useful to require a container image (Boettiger, 2015; Nüst et al., 2020) or a portable version of the code that handles issues like software package availability (Liu and Salganik, 2019) (for a guideline see Elmenreich et al., 2019).

Validating the results

The lead team is recommended to ensure that each analyst’s codes/procedures reproduce that analyst’s submitted results. Computational reproducibility can be ascertained by running the code or repeating the analytic process by the lead team, but independent experts or the other co-analysts can also be invited to undertake this task (Hurlin and Perignon, 2019; Pérignon et al., 2019).

The project can leverage the crowd by asking co-analysts to review others’ analyses, or the lead team can employ external statistical experts to assess analyses and detect major errors. The lead team can decide to omit analyses with major errors. In that case, the reasons for omission should be documented, and for transparency, the results of the omitted analyses should be included in an online supplement.

After all the analyses have been submitted and validated, the co-analysts could have the option in certain projects to inspect the work of the other analysts and freely withdraw their own analyses. This can be appropriate if seeing other analyses makes them aware of major mistakes or shortcomings in their analytic procedures. A potential bias in this process is that co-analysts might lose confidence in their analyses after seeing other, more senior, or more expert co-analysts’ work. One way to decrease this potential bias is to follow a multi-stage process: after the first round of analyses is submitted, co-analysts could be allowed to see each other’s analysis steps/code without knowing the identity of the co-analyst or the results of their analysis. It is the lead team’s decision whether they allow co-analysts to correct or update their analyses after an external analyst or the co-analysts themselves find issues in their analyses.

Importantly, it is a minimum expectation that from the start of the project, the co-analysts should know about the conditions for their analyses to be included in, or omitted from, the study. All withdrawals, omissions, and updates of the results should be transparent in subsequent publications, for example in the supplementary materials.

Reporting the methods and results

Recording contributorship

Using CRediT taxonomy can transparently record organizers’ and co-analysts’ contributions to the study. Practical tools (e.g., tenzing Holcombe et al., 2020) can make this task easier. Co-analysts can be invited to be co-authors and/or be compensated for their contribution in other ways (e.g., prizes, honorariums). Expectations for contribution and authorship should be communicated clearly at the outset.

Presenting the methods and results

Beyond a descriptive presentation of results in a table or graph, the reporting of the results of multi-analyst projects is not straightforward and remains an open area of research. Published reports of multi-analyst projects have adopted several effective methods for presenting results. For binary outcomes, Botvinik-Nezer et al. used a table with color coding (i.e., a binary heat map) to visualize outcomes across all teams (Botvinik-Nezer et al., 2020b). They overlaid each teams’ confidence in their findings and added additional information about analytical paths in adjacent columns (Supplementary file 1, Table 1). For a project with a relatively small number of effect sizes for continuous outcomes, Schweinsberg et al. used interval plots combined with an indication of analytical choices underlying each estimate (Schweinsberg et al., 2021; Figure 3). Olsson Collentine et al. used funnel plots (Figure 2 in Olsson-Collentine et al., 2020), and Patel et al. used volcano plots to depict numerous, diverse outcomes with an intuitive depiction of clustering, akin to a multiverse analysis (Figures 1 and 2 in Patel et al., 2015).

If the main purpose is to estimate variability of analyses, it is interesting to investigate and report factors that might influence variability in the chosen analytic approaches and in the results obtained by these analytical approaches. If, on the other hand, the main purpose is to investigate the robustness of conclusions by assessing the degree to which different analysts obtain the same results, it is advisable to focus more on methods that produce only a single answer to the research question of interest. When each analysis team can provide multiple, distinct responses to the same research question, it becomes more difficult to explore how conclusions depend on the analysis choices because the individual analyses are no longer independent of each other.

The analytical approach of each co-analyst can be divided into discrete choices concerning, for instance, data preprocessing steps and decisions in model specification. If it is possible to recombine the individual choices (which will not always be the case as certain data preprocessing steps or method choices may only make sense if the aim is to fit a certain class of models), it may be worthwhile to create a larger set of possible analytical approaches that is made up of all possible combinations. In this case, the descriptive results of the multi-analyst project can be combined with a multiverse type approach (e.g., vibration of effects [Patel et al., 2015], multiverse analysis [Steegen et al., 2016], or specification curve [Simonsohn et al., 2020]) to quantify and compare the variability in results that can be explained by the different analytical choices (Patel et al., 2015; Liu et al., 2021). Additionally, this larger set of possible combinations can be helpful to present the results in an interactive user interface in which readers can explore how the results change as a function of certain analytical choices (Liu et al., 2021; Dragicevic et al., 2019). Finally, dividing the co-analysts' analytical approaches into individual choices may ultimately help in providing a unique answer to the research question of interest while accounting for the uncertainty in the choice of the analytical approach. While there are so far no approaches that would allow the derivation of a unique result that integrates all uncertain decisions, it may be a promising area of research to extend Bayesian approaches that account for model uncertainty (Hoeting et al., 1999) and measurement error (Richardson and Gilks, 1993).

To support the reporting of Multi-Analyst projects, we provide a freely modifiable Reporting Template available from here: https://osf.io/h9mgy/.

Limitations

The present work does not cover all aspects of multi-analyst projects. For instance, the multi-analyst approach outlined here entails the independent analysis of one or more datasets, but it should be acknowledged that other crowd-sourced analysis approaches might not require such independence of the analyses. Some of our practical considerations reflect disagreement and/or uncertainty within our expert panel, so they remain underspecified. Those include how to determine the number or eligibility of co-analysts for a project, how best to assess the validity of each analysis; and how to measure robustness of conclusions. Therefore, we emphasize that this consensus-based guidance is a first step towards the broader adoption of the multi-analyst approach in empirical research, and we hope and expect that our recommendations will be developed further in response to user feedback. Users of this guidance can provide feedback and suggestions for revisions at https://forms.gle/2fVqZAD3KKHVUDKq7.

Conclusions

This guidance document aims to facilitate adoption of the multi-analyst approach in both basic and clinical research. Although the multi-analyst approach is at an incipient stage of adoption, we believe that the scientific benefits greatly outweigh the extra logistics required, especially for projects with high relevance for clinical practice and policy making. The approach should have particular relevance when it indicates that applying different analysis strategies to a given dataset may lead to conflicting results. The multi-analyst approach allows a systematic exploration of the analytical space to assess whether the reported results and conclusions are dependent on the chosen analysis strategy, ultimately improving the transparency, reliability, and credibility of research findings.

We hope that our guidance here and in guideline databases will make it easier for researchers to adopt this approach to empirical analyses. We encourage journals and funders to consider recommending or requesting independent analyses whenever it is crucial to know whether the conclusions are robust to alternative analysis strategies.

Data availability

All anonymized data as well as the survey materials are publicly shared on the Open Science Framework page of the project: https://osf.io/4zvst/. Our methodology and data-analysis plan were preregistered. The preregistration document can be accessed at: https://osf.io/dgrua.

References

1. Aczel B
2. Szaszi B
3. Sarafoglou A
4. Kekecs Z
5. Kucharský Š
6. Benjamin D
7. Chambers CD
8. Fisher A
9. Gelman A
10. Gernsbacher MA
11. Ioannidis JP
12. Johnson E
13. Jonas K
14. Kousta S
15. Lilienfeld SO
16. Lindsay DS
17. Morey CC
18. Munafò M
19. Newell BR
20. Pashler H
21. Shanks DR
22. Simons DJ
23. Wicherts JM
24. Albarracin D
25. Anderson ND
26. Antonakis J
27. Arkes HR
28. Back MD
29. Banks GC
30. Beevers C
31. Bennett AA
32. Bleidorn W
33. Boyer TW
34. Cacciari C
35. Carter AS
36. Cesario J
37. Clifton C
38. Conroy RM
39. Cortese M
40. Cosci F
41. Cowan N
42. Crawford J
43. Crone EA
44. Curtin J
45. Engle R
46. Farrell S
47. Fearon P
48. Fichman M
49. Frankenhuis W
50. Freund AM
51. Gaskell MG
52. Giner-Sorolla R
53. Green DP
54. Greene RL
55. Harlow LL
56. de la Guardia FH
57. Isaacowitz D
58. Kolodner J
59. Lieberman D
60. Logan GD
61. Mendes WB
62. Moersdorf L
63. Nyhan B
64. Pollack J
65. Sullivan C
66. Vazire S
67. Wagenmakers EJ
(2020) A consensus-based transparency checklist
Nature Human Behaviour 4:4–6.

https://doi.org/10.1038/s41562-019-0772-6
- PubMed
- Google Scholar
1. Bastiaansen JA
2. Kunkels YK
3. Blaauw FJ
4. Boker SM
5. Ceulemans E
6. Chen M
7. Chow SM
8. de Jonge P
9. Emerencia AC
10. Epskamp S
11. Fisher AJ
12. Hamaker EL
13. Kuppens P
14. Lutz W
15. Meyer MJ
16. Moulder R
17. Oravecz Z
18. Riese H
19. Rubel J
20. Ryan O
21. Servaas MN
22. Sjobeck G
23. Snippe E
24. Trull TJ
25. Tschacher W
26. van der Veen DC
27. Wichers M
28. Wood PK
29. Woods WC
30. Wright AGC
31. Albers CJ
32. Bringmann LF
(2020) Time to get personal? The impact of researchers choices on the selection of treatment targets using the experience sampling methodology
Journal of Psychosomatic Research 137:110211.

https://doi.org/10.1016/j.jpsychores.2020.110211
- PubMed
- Google Scholar
1. Boettiger C
(2015) An introduction to Docker for reproducible research
ACM SIGOPS Operating Systems Review 49:71–79.

https://doi.org/10.1145/2723872.2723882
- Google Scholar
1. Botvinik-Nezer R
2. Holzmeister F
3. Camerer CF
4. Dreber A
5. Huber J
6. Johannesson M
7. Kirchler M
8. Iwanir R
9. Mumford JA
10. Adcock RA
11. Avesani P
12. Baczkowski BM
13. Bajracharya A
14. Bakst L
15. Ball S
16. Barilari M
17. Bault N
18. Beaton D
19. Beitner J
20. Benoit RG
21. Berkers R
22. Bhanji JP
23. Biswal BB
24. Bobadilla-Suarez S
25. Bortolini T
26. Bottenhorn KL
27. Bowring A
28. Braem S
29. Brooks HR
30. Brudner EG
31. Calderon CB
32. Camilleri JA
33. Castrellon JJ
34. Cecchetti L
35. Cieslik EC
36. Cole ZJ
37. Collignon O
38. Cox RW
39. Cunningham WA
40. Czoschke S
41. Dadi K
42. Davis CP
43. Luca AD
44. Delgado MR
45. Demetriou L
46. Dennison JB
47. Di X
48. Dickie EW
49. Dobryakova E
50. Donnat CL
51. Dukart J
52. Duncan NW
53. Durnez J
54. Eed A
55. Eickhoff SB
56. Erhart A
57. Fontanesi L
58. Fricke GM
59. Fu S
60. Galván A
61. Gau R
62. Genon S
63. Glatard T
64. Glerean E
65. Goeman JJ
66. Golowin SAE
67. González-García C
68. Gorgolewski KJ
69. Grady CL
70. Green MA
71. Guassi Moreira JF
72. Guest O
73. Hakimi S
74. Hamilton JP
75. Hancock R
76. Handjaras G
77. Harry BB
78. Hawco C
79. Herholz P
80. Herman G
81. Heunis S
82. Hoffstaedter F
83. Hogeveen J
84. Holmes S
85. Hu CP
86. Huettel SA
87. Hughes ME
88. Iacovella V
89. Iordan AD
90. Isager PM
91. Isik AI
92. Jahn A
93. Johnson MR
94. Johnstone T
95. Joseph MJE
96. Juliano AC
97. Kable JW
98. Kassinopoulos M
99. Koba C
100. Kong XZ
101. Koscik TR
102. Kucukboyaci NE
103. Kuhl BA
104. Kupek S
105. Laird AR
106. Lamm C
107. Langner R
108. Lauharatanahirun N
109. Lee H
110. Lee S
111. Leemans A
112. Leo A
113. Lesage E
114. Li F
115. Li MYC
116. Lim PC
117. Lintz EN
118. Liphardt SW
119. Losecaat Vermeer AB
120. Love BC
121. Mack ML
122. Malpica N
123. Marins T
124. Maumet C
125. McDonald K
126. McGuire JT
127. Melero H
128. Méndez Leal AS
129. Meyer B
130. Meyer KN
131. Mihai G
132. Mitsis GD
133. Moll J
134. Nielson DM
135. Nilsonne G
136. Notter MP
137. Olivetti E
138. Onicas AI
139. Papale P
140. Patil KR
141. Peelle JE
142. Pérez A
143. Pischedda D
144. Poline JB
145. Prystauka Y
146. Ray S
147. Reuter-Lorenz PA
148. Reynolds RC
149. Ricciardi E
150. Rieck JR
151. Rodriguez-Thompson AM
152. Romyn A
153. Salo T
154. Samanez-Larkin GR
155. Sanz-Morales E
156. Schlichting ML
157. Schultz DH
158. Shen Q
159. Sheridan MA
160. Silvers JA
161. Skagerlund K
162. Smith A
163. Smith DV
164. Sokol-Hessner P
165. Steinkamp SR
166. Tashjian SM
167. Thirion B
168. Thorp JN
169. Tinghög G
170. Tisdall L
171. Tompson SH
172. Toro-Serey C
173. Torre Tresols JJ
174. Tozzi L
175. Truong V
176. Turella L
177. van ’t Veer AE
178. Verguts T
179. Vettel JM
180. Vijayarajah S
181. Vo K
182. Wall MB
183. Weeda WD
184. Weis S
185. White DJ
186. Wisniewski D
187. Xifra-Porxas A
188. Yearling EA
189. Yoon S
190. Yuan R
191. Yuen KSL
192. Zhang L
193. Zhang X
194. Zosky JE
195. Nichols TE
196. Poldrack RA
197. Schonberg T
(2020a) Variability in the analysis of a single neuroimaging dataset by many teams
Nature 582:84–88.

https://doi.org/10.1038/s41586-020-2314-9
- PubMed
- Google Scholar
Preprint
1. Botvinik-Nezer R
2. Holzmeister F
3. Camerer CF
4. Dreber A
5. Huber J
6. Johannesson M
7. Kirchler M
8. Iwanir R
9. Mumford JA
10. Adcock RA
11. Avesani P
12. Baczkowski BM
13. Bajracharya A
14. Bakst L
15. Ball S
16. Barilari M
17. Bault N
18. Beaton D
19. Beitner J
20. Benoit RG
21. Berkers R
22. Bhanji JP
23. Biswal BB
24. Bobadilla-Suarez S
25. Bortolini T
26. Bottenhorn KL
27. Bowring A
28. Braem S
29. Brooks HR
30. Brudner EG
31. Calderon CB
32. Camilleri JA
33. Castrellon JJ
34. Cecchetti L
35. Cieslik EC
36. Cole ZJ
37. Collignon O
38. Cox RW
39. Cunningham WA
40. Czoschke S
41. Dadi K
42. Davis CP
43. Luca AD
44. Delgado MR
45. Demetriou L
46. Dennison JB
47. Di X
48. Dickie EW
49. Dobryakova E
50. Donnat CL
51. Dukart J
52. Duncan NW
53. Durnez J
54. Eed A
55. Eickhoff SB
56. Erhart A
57. Fontanesi L
58. Fricke GM
59. Fu S
60. Galván A
61. Gau R
62. Genon S
63. Glatard T
64. Glerean E
65. Goeman JJ
66. Golowin SAE
67. González-García C
68. Gorgolewski KJ
69. Grady CL
70. Green MA
71. Guassi Moreira JF
72. Guest O
73. Hakimi S
74. Hamilton JP
75. Hancock R
76. Handjaras G
77. Harry BB
78. Hawco C
79. Herholz P
80. Herman G
81. Heunis S
82. Hoffstaedter F
83. Hogeveen J
84. Holmes S
85. Hu CP
86. Huettel SA
87. Hughes ME
88. Iacovella V
89. Iordan AD
90. Isager PM
91. Isik AI
92. Jahn A
93. Johnson MR
94. Johnstone T
95. Joseph MJE
96. Juliano AC
97. Kable JW
98. Kassinopoulos M
99. Koba C
100. Kong XZ
101. Koscik TR
102. Kucukboyaci NE
103. Kuhl BA
104. Kupek S
105. Laird AR
106. Lamm C
107. Langner R
108. Lauharatanahirun N
109. Lee H
110. Lee S
111. Leemans A
112. Leo A
113. Lesage E
114. Li F
115. Li MYC
116. Lim PC
117. Lintz EN
118. Liphardt SW
119. Losecaat Vermeer AB
120. Love BC
121. Mack ML
122. Malpica N
123. Marins T
124. Maumet C
125. McDonald K
126. McGuire JT
127. Melero H
128. Méndez Leal AS
129. Meyer B
130. Meyer KN
131. Mihai G
132. Mitsis GD
133. Moll J
134. Nielson DM
135. Nilsonne G
136. Notter MP
137. Olivetti E
138. Onicas AI
139. Papale P
140. Patil KR
141. Peelle JE
142. Pérez A
143. Pischedda D
144. Poline JB
145. Prystauka Y
146. Ray S
147. Reuter-Lorenz PA
148. Reynolds RC
149. Ricciardi E
150. Rieck JR
151. Rodriguez-Thompson AM
152. Romyn A
153. Salo T
154. Samanez-Larkin GR
155. Sanz-Morales E
156. Schlichting ML
157. Schultz DH
158. Shen Q
159. Sheridan MA
160. Silvers JA
161. Skagerlund K
162. Smith A
163. Smith DV
164. Sokol-Hessner P
165. Steinkamp SR
166. Tashjian SM
167. Thirion B
168. Thorp JN
169. Tinghög G
170. Tisdall L
171. Tompson SH
172. Toro-Serey C
173. Torre Tresols JJ
174. Tozzi L
175. Truong V
176. Turella L
177. van ‘t Veer AE
178. Verguts T
179. Vettel JM
180. Vijayarajah S
181. Vo K
182. Wall MB
183. Weeda WD
184. Weis S
185. White DJ
186. Wisniewski D
187. Xifra-Porxas A
188. Yearling EA
189. Yoon S
190. Yuan R
191. Yuen KSL
192. Zhang L
193. Zhang X
194. Zosky JE
195. Nichols TE
196. Poldrack RA
197. Schonberg T
(2020b) Variability in the Analysis of a Single Neuroimaging Dataset by Many Teams
bioRxiv.

https://doi.org/10.1101/843193
- Google Scholar
1. Chambers CD
(2013) Registered reports: A new publishing initiative at Cortex
Cortex 49:609–610.

https://doi.org/10.1016/j.cortex.2012.12.016
- PubMed
- Google Scholar
1. Childers CP
2. Maggard-Gibbons M
(2021) Same data, opposite results?: A call to improve surgical database research
JAMA Surgery 156:219–220.

https://doi.org/10.1001/jamasurg.2020.4991
- PubMed
- Google Scholar
(2010) The unconscious thought effect in clinical decision making: an example in diagnosis
Medical Decision Making 30:578–581.

https://doi.org/10.1177/0272989X09360820
- PubMed
- Google Scholar
Conference
1. Dragicevic P
2. Jansen Y
3. Sarma A
4. Kay M
5. Chevalier F
(2019) Increasing the transparency of research papers with explorable multiverse analyses
Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. pp. 1–15.

https://doi.org/10.1145/3290605.3300295
- Google Scholar
Book
1. Drechsler J
(2011) Synthetic Datasets for Statistical Disclosure Control: Theory and Implementation
Springer Science & Business Media.

https://doi.org/10.1007/978-1-4614-0326-5
- Google Scholar
1. Dutilh G
2. Annis J
3. Brown SD
4. Cassey P
5. Evans NJ
6. Grasman RPPP
7. Hawkins GE
8. Heathcote A
9. Holmes WR
10. Krypotos A-M
11. Kupitz CN
12. Leite FP
13. Lerche V
14. Lin Y-S
15. Logan GD
16. Palmeri TJ
17. Starns JJ
18. Trueblood JS
19. van Maanen L
20. van Ravenzwaaij D
21. Vandekerckhove J
22. Visser I
23. Voss A
24. White CN
25. Wiecki TV
26. Rieskamp J
27. Donkin C
(2018) The quality of response time data inference: A blinded, collaborative assessment of the validity of cognitive models
Psychonomic Bulletin & Review 26:1051–1069.

https://doi.org/10.3758/s13423-017-1417-2
- Google Scholar
(2019) Making simulation results reproducible-Survey, guidelines, and examples based on Gradle and Docker
PeerJ. Computer Science 5:e240.

https://doi.org/10.7717/peerj-cs.240
- PubMed
- Google Scholar
1. Fields AC
2. Lu P
3. Palenzuela DL
4. Bleday R
5. Goldberg JE
6. Irani J
7. Davids JS
8. Melnitchouk N
(2019) Does retrieval bag use during laparoscopic appendectomy reduce postoperative infection?
Surgery 165:953–957.

https://doi.org/10.1016/j.surg.2018.11.012
- PubMed
- Google Scholar
1. Fillard P
2. Descoteaux M
3. Goh A
4. Gouttard S
5. Jeurissen B
6. Malcolm J
7. Ramirez-Manzanares A
8. Reisert M
9. Sakaie K
10. Tensaouti F
11. Yo T
12. Mangin J-F
13. Poupon C
(2011) Quantitative evaluation of 10 tractography algorithms on a realistic diffusion MR phantom
NeuroImage 56:220–234.

https://doi.org/10.1016/j.neuroimage.2011.01.032
- Google Scholar
(1861) Comparative translations of the inscription of Tiglath Pileser I
Journal of the Royal Asiatic Society of Great Britain & Ireland 18:150–219.

https://doi.org/10.1017/S0035869X00013666
- Google Scholar
1. Gøtzsche PC
(1996) Blinding during data analysis and writing of manuscripts
Controlled Clinical Trials 17:285–290.

https://doi.org/10.1016/0197-2456(95)00263-4
- Google Scholar
(1999) Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors)
Statistical Science 14:382–417.

https://doi.org/10.1214/ss/1009212519
- Google Scholar
1. Holcombe AO
2. Kovacs M
3. Aust F
4. Aczel B
5. Sugimoto CR
(2020) Documenting contributions to scholarly articles using CRediT and tenzing
PLOS ONE 15:e0244611.

https://doi.org/10.1371/journal.pone.0244611
- Google Scholar
1. Hurlin C
2. Perignon C
(2019) Reproducibility certification in economics research
SSRN Electronic Journal 1:3418896.

https://doi.org/10.2139/ssrn.3418896
- Google Scholar
1. Jivanji D
2. Mangosing M
3. Mahoney SP
4. Castro G
5. Zevallos J
6. Lozano J
(2020) Association between marijuana use and cardiovascular disease in US Adults
Cureus 12:e11868.

https://doi.org/10.7759/cureus.11868
- PubMed
- Google Scholar
1. Jünger S
2. Payne SA
3. Brine J
4. Radbruch L
5. Brearley SG
(2017) Guidance on Conducting and REporting DElphi Studies (CREDES) in palliative care: Recommendations based on a methodological systematic review
Palliative Medicine 31:684–706.

https://doi.org/10.1177/0269216317690685
- PubMed
- Google Scholar
1. Kindel AT
2. Bansal V
3. Catena KD
4. Hartshorne TH
5. Jaeger K
6. Koffman D
7. McLanahan S
8. Phillips M
9. Rouhani S
10. Vinh R
11. Salganik MJ
(2019) Improving Metadata Infrastructure for Complex Surveys: Insights from the Fragile Families Challenge
Socius 5:237802311881737.

https://doi.org/10.1177/2378023118817378
- Google Scholar
1. Liu DM
2. Salganik MJ
(2019) Successes and struggles with computational reproducibility: Lessons from the Fragile Families Challenge
Socius 5:237802311984980.

https://doi.org/10.1177/2378023119849803
- Google Scholar
1. Liu Y
2. Kale A
3. Althoff T
4. Heer J
(2021) Boba: Authoring and visualizing multiverse analyses
IEEE Transactions on Visualization and Computer Graphics 27:1753–1763.

https://doi.org/10.1109/TVCG.2020.3028985
- Google Scholar
(2019) Privacy, ethics, and data access: A case study of the Fragile Families Challenge
Socius 5:237802311881302.

https://doi.org/10.1177/2378023118813023
- Google Scholar
1. Maier-Hein KH
2. Neher PF
3. Houde J-C
4. Côté M-A
5. Garyfallidis E
6. Zhong J
7. Chamberland M
8. Yeh F-C
9. Lin Y-C
10. Ji Q
11. Reddick WE
12. Glass JO
13. Chen DQ
14. Feng Y
15. Gao C
16. Wu Y
17. Ma J
18. He R
19. Li Q
20. Westin C-F
21. Deslauriers-Gauthier S
22. González JOO
23. Paquette M
24. St-Jean S
25. Girard G
26. Rheault F
27. Sidhu J
28. Tax CMW
29. Guo F
30. Mesri HY
31. Dávid S
32. Froeling M
33. Heemskerk AM
34. Leemans A
35. Boré A
36. Pinsard B
37. Bedetti C
38. Desrosiers M
39. Brambati S
40. Doyon J
41. Sarica A
42. Vasta R
43. Cerasa A
44. Quattrone A
45. Yeatman J
46. Khan AR
47. Hodges W
48. Alexander S
49. Romascano D
50. Barakovic M
51. Auría A
52. Esteban O
53. Lemkaddem A
54. Thiran J-P
55. Cetingul HE
56. Odry BL
57. Mailhe B
58. Nadar MS
59. Pizzagalli F
60. Prasad G
61. Villalon-Reina JE
62. Galvis J
63. Thompson PM
64. Requejo FDS
65. Laguna PL
66. Lacerda LM
67. Barrett R
68. Dell’Acqua F
69. Catani M
70. Petit L
71. Caruyer E
72. Daducci A
73. Dyrby TB
74. Holland-Letz T
75. Hilgetag CC
76. Stieltjes B
77. Descoteaux M
(2017) The challenge of mapping the human connectome based on diffusion tractography
Nature Communications 8:1349.

https://doi.org/10.1038/s41467-017-01285-x
- PubMed
- Google Scholar
1. McKenna HP
(1994) The Delphi technique: a worthwhile research approach for nursing?
Journal of Advanced Nursing 19:1221–1225.

https://doi.org/10.1111/j.1365-2648.1994.tb01207.x
- Google Scholar
1. Nüst D
2. Sochat V
3. Marwick B
4. Eglen SJ
5. Head T
6. Hirst T
7. Evans BD
8. Markel S
(2020) Ten simple rules for writing Dockerfiles for reproducible data science
PLOS Computational Biology 16:e1008316.

https://doi.org/10.1371/journal.pcbi.1008316
- Google Scholar
Preprint
(2020) Meta-analyzing the multiverse: A peek under the hood of selective reporting
PsyArXiv.

https://psyarxiv.com/43yae/
- Google Scholar
Website
1. Parker TH
2. Fraser H
3. Nakagawa S
4. Fidler F
5. Gould E
6. Gould E
(2020) Evolutionary Ecology Data
Accessed September 28, 2021.

https://osf.io/34fzc/
(2015) Assessment of vibration of effects due to model specification can demonstrate the instability of observational associations
Journal of Clinical Epidemiology 68:1046–1058.

https://doi.org/10.1016/j.jclinepi.2015.05.029
- PubMed
- Google Scholar
(2019) Certify reproducibility with confidential data
Science 365:127–128.

https://doi.org/10.1126/science.aaw2825
- PubMed
- Google Scholar
(2006) Motivation and synthesis of the FIAC experiment: Reproducibility of fMRI results across expert analyses
Human Brain Mapping 27:351–359.

https://doi.org/10.1002/hbm.20268
- Google Scholar
1. Quintana DS
(2020) A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation
eLife 9:e53275.

https://doi.org/10.7554/eLife.53275
- PubMed
- Google Scholar
1. Richardson S
2. Gilks WR
(1993) A Bayesian approach to measurement error problems in epidemiology using conditional independence models
American Journal of Epidemiology 138:430–442.

https://doi.org/10.1093/oxfordjournals.aje.a116875
- PubMed
- Google Scholar
1. Salganik MJ
2. Lundberg I
3. Kindel AT
4. Ahearn CE
5. Al-Ghoneim K
6. Almaatouq A
7. Altschul DM
8. Brand JE
9. Carnegie NB
10. Compton RJ
11. Datta D
12. Davidson T
13. Filippova A
14. Gilroy C
15. Goode BJ
16. Jahani E
17. Kashyap R
18. Kirchner A
19. McKay S
20. Morgan AC
21. Pentland A
22. Polimis K
23. Raes L
24. Rigobon DE
25. Roberts CV
26. Stanescu DM
27. Suhara Y
28. Usmani A
29. Wang EH
30. Adem M
31. Alhajri A
32. AlShebli B
33. Amin R
34. Amos RB
35. Argyle LP
36. Baer-Bositis L
37. Büchi M
38. Chung BR
39. Eggert W
40. Faletto G
41. Fan Z
42. Freese J
43. Gadgil T
44. Gagné J
45. Gao Y
46. Halpern-Manners A
47. Hashim SP
48. Hausen S
49. He G
50. Higuera K
51. Hogan B
52. Horwitz IM
53. Hummel LM
54. Jain N
55. Jin K
56. Jurgens D
57. Kaminski P
58. Karapetyan A
59. Kim EH
60. Leizman B
61. Liu N
62. Möser M
63. Mack AE
64. Mahajan M
65. Mandell N
66. Marahrens H
67. Mercado-Garcia D
68. Mocz V
69. Mueller-Gastell K
70. Musse A
71. Niu Q
72. Nowak W
73. Omidvar H
74. Or A
75. Ouyang K
76. Pinto KM
77. Porter E
78. Porter KE
79. Qian C
80. Rauf T
81. Sargsyan A
82. Schaffner T
83. Schnabel L
84. Schonfeld B
85. Sender B
86. Tang JD
87. Tsurkov E
88. van Loon A
89. Varol O
90. Wang X
91. Wang Z
92. Wang J
93. Wang F
94. Weissman S
95. Whitaker K
96. Wolters MK
97. Woon WL
98. Wu J
99. Wu C
100. Yang K
101. Yin J
102. Zhao B
103. Zhu C
104. Brooks-Gunn J
105. Engelhardt BE
106. Hardt M
107. Knox D
108. Levy K
109. Narayanan A
110. Stewart BM
111. Watts DJ
112. McLanahan S
(2020) Measuring the predictability of life outcomes with a scientific mass collaboration
PNAS 117:8398–8403.

https://doi.org/10.1073/pnas.1915006117
- PubMed
- Google Scholar
(2021) Same data, different conclusions: Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis
Organizational Behavior and Human Decision Processes 165:228–249.

https://doi.org/10.1016/j.obhdp.2021.02.003
- Google Scholar
1. Shah S
2. Patel S
3. Paulraj S
4. Chaudhuri D
(2021) Association of marijuana use and cardiovascular disease: A behavioral risk factor surveillance system data analysis of 133,706 US Adults
The American Journal of Medicine 134:614–620.

https://doi.org/10.1016/j.amjmed.2020.10.019
- Google Scholar
1. Silberzahn R
2. Uhlmann EL
3. Martin DP
4. Anselmi P
5. Aust F
6. Awtrey E
7. Bahník Š
8. Bai F
9. Bannard C
10. Bonnier E
11. Carlsson R
12. Cheung F
13. Christensen G
14. Clay R
15. Craig MA
16. Dalla Rosa A
17. Dam L
18. Evans MH
19. Flores Cervantes I
20. Fong N
21. Gamez-Djokic M
22. Glenz A
23. Gordon-McKeon S
24. Heaton TJ
25. Hederos K
26. Heene M
27. Hofelich Mohr AJ
28. Högden F
29. Hui K
30. Johannesson M
31. Kalodimos J
32. Kaszubowski E
33. Kennedy DM
34. Lei R
35. Lindsay TA
36. Liverani S
37. Madan CR
38. Molden D
39. Molleman E
40. Morey RD
41. Mulder LB
42. Nijstad BR
43. Pope NG
44. Pope B
45. Prenoveau JM
46. Rink F
47. Robusto E
48. Roderique H
49. Sandberg A
50. Schlüter E
51. Schönbrodt FD
52. Sherman MF
53. Sommer SA
54. Sotak K
55. Spain S
56. Spörlein C
57. Stafford T
58. Stefanutti L
59. Tauber S
60. Ullrich J
61. Vianello M
62. Wagenmakers E-J
63. Witkowiak M
64. Yoon S
65. Nosek BA
(2018) Many analysts, one data set: Making transparent how variations in analytic choices affect results
Advances in Methods and Practices in Psychological Science 1:337–356.

https://doi.org/10.1177/2515245917747646
- Google Scholar
(2020) Specification curve analysis
Nature Human Behaviour 4:1208–1214.

https://doi.org/10.1038/s41562-020-0912-z
- PubMed
- Google Scholar
1. Starns JJ
2. Cataldo AM
3. Rotello CM
4. Annis J
5. Aschenbrenner A
6. Bröder A
7. Cox G
8. Criss A
9. Curl RA
10. Dobbins IG
11. Dunn J
12. Enam T
13. Evans NJ
14. Farrell S
15. Fraundorf SH
16. Gronlund SD
17. Heathcote A
18. Heck DW
19. Hicks JL
20. Huff MJ
21. Kellen D
22. Key KN
23. Kilic A
24. Klauer KC
25. Kraemer KR
26. Leite FP
27. Lloyd ME
28. Malejka S
29. Mason A
30. McAdoo RM
31. McDonough IM
32. Michael RB
33. Mickes L
34. Mizrak E
35. Morgan DP
36. Mueller ST
37. Osth A
38. Reynolds A
39. Seale-Carlisle TM
40. Singmann H
41. Sloane JF
42. Smith AM
43. Tillman G
44. van Ravenzwaaij D
45. Weidemann CT
46. Wells GL
47. White CN
48. Wilson J
(2019) Assessing theoretical conclusions with blinded inference to investigate a potential inference crisis
Advances in Methods and Practices in Psychological Science 2:335–349.

https://doi.org/10.1177/2515245919869583
- Google Scholar
(2016) Increasing transparency through a multiverse analysis
Perspectives on Psychological Science 11:702–712.

https://doi.org/10.1177/1745691616658637
- Google Scholar
(2019) Utilization of a specimen retrieval bag during laparoscopic appendectomy for both uncomplicated and complicated appendicitis is not associated with a decrease in postoperative surgical site infection rates
Surgery 165:1199–1202.

https://doi.org/10.1016/j.surg.2019.02.010
- PubMed
- Google Scholar
1. van Dongen NNN
2. van Doorn JB
3. Gronau QF
4. van Ravenzwaaij D
5. Hoekstra R
6. Haucke MN
7. Lakens D
8. Hennig C
9. Morey RD
10. Homer S
11. Gelman A
12. Sprenger J
13. Wagenmakers E-J
(2019) Multiple perspectives on inference for two simple statistical scenarios
The American Statistician 73:328–339.

https://doi.org/10.1080/00031305.2019.1565553
- Google Scholar
1. Wilkinson MD
2. Dumontier M
3. Aalbersberg IJJ
4. Appleton G
5. Axton M
6. Baak A
7. Blomberg N
8. Boiten J-W
9. da Silva Santos LB
10. Bourne PE
11. Bouwman J
12. Brookes AJ
13. Clark T
14. Crosas M
15. Dillo I
16. Dumon O
17. Edmunds S
18. Evelo CT
19. Finkers R
20. Gonzalez-Beltran A
21. Gray AJG
22. Groth P
23. Goble C
24. Grethe JS
25. Heringa J
26. ’t Hoen PAC
27. Hooft R
28. Kuhn T
29. Kok R
30. Kok J
31. Lusher SJ
32. Martone ME
33. Mons A
34. Packer AL
35. Persson B
36. Rocca-Serra P
37. Roos M
38. van Schaik R
39. Sansone S-A
40. Schultes E
41. Sengstag T
42. Slater T
43. Strawn G
44. Swertz MA
45. Thompson M
46. van der Lei J
47. van Mulligen E
48. Velterop J
49. Waagmeester A
50. Wittenburg P
51. Wolstencroft K
52. Zhao J
53. Mons B
(2016) The FAIR Guiding Principles for scientific data management and stewardship
Scientific Data 3:160018.

https://doi.org/10.1038/sdata.2016.18
- PubMed
- Google Scholar

Decision letter

Peter Rodgers

Senior and Reviewing Editor; eLife, United Kingdom

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Decision letter after peer review:

Thank you for submitting your article "Consensus-based guidance for conducting and reporting multi-analyst studies" to eLife for consideration as a Feature Article. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by the Peter Rogers, eLife Features Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Florian Naudet; Ioana Cristea.

The reviewers and editors have discussed the reviews and we have drafted this decision letter to help you prepare a revised submission.

Summary:

The report describes a consensus-based guideline for multi-analyst studies. Overall, the manuscript provides very helpful suggestions, but I have some doubts about how much such a resource-intensive tool (beyond the rather simple version of independent reproducibility checks) will actually be used by researchers. As it stands now, the audience for such a guideline is both highly select and also I would imagine very limited (at least for now). On the other hand, as demonstrated by several papers the authors cite (such as Botvinik-Nezer et al.), the approach can have profound consequences for the practices of an entire field. However, there are a number of points that need to be addressed to make the article suitable for publication.

Essential revisions:

Section on Methods (for the expert consensus procedure)

1. Line 129: Please provide a reference for the "reactive Delphi" expert consensus procedure. Also, please comment on whether or not you followed any existing guidelines (eg CREDES) for reporting Delphi survey studies.

2. More information is needed on the process of recruiting experts. How was the list of existing many-analyst studies collected (e.g., systematic search, word of mouth)? What was the criteria for selecting "experts on research methodology"? What reasons were given by those experts who declined to get involved? Is there a minimal number of experts required for these kind of Delphi surveys? And what was the rationale for including only experts, and not a wider sample of people who might use the guideline in the future?

3. What was the rationale behind the threshold for expert consensus which was preregistered as IQR <= 2 and median >=6 (https://osf.io/dgrua)? I'm curious about why a minimum level of satisfaction wasn't sought (e.g. items 6 and 8 both have a response of "1", indicating maximum disagreement).

4. I think that more could be said in the text about the results of the consensus panel, and that consensus ratings for each item should at least be described in the main manuscript and not just by referencing a link to OSF.

5. It was a bit surprising to me that so there was so little disagreement between the expert panel on virtually all points of the guideline. This could be due to the fact that all the components of the guideline, with few exceptions, are pretty standard in terms of transparency and thus hard to disagree on (or might reflect that the relatively few multi-analyst studies conducted so far were conducted with an exceptionally high degree of rigor and transparency.) Please comment on this lack of disagreement.

Related to this, I found the section on "Practical Considerations" more nuanced and I assume there could have been some areas of disagreement in assembling this section. It would be interesting if the authors could reflect on some of these disagreements by, for instance, emphasizing points of discussion that were more controversial or where there was a more coagulated minority opinion, if any.

6. Sub-section on "Deciding on the number of co-analysts" (Lines 220-235).

My understanding is that the number of analysts needed may depend on the research question and also on the complexity of the dataset that would be analyzed. I would expect more discussion and guidance here. Currently, I don't think that this sub-section is really useful. Providing some examples would also be very helpful.

Section on Practical considerations

7. This section includes links/references for a couple of tools for any researchers who are considering embarking on their first many-analyst project (eg "instructions for reporting an effect estimate" and tenzing): it would be helpful to includes links/references for more such tools and resources, such as templates for providing the research task to many-analysts, a guide to automating emails, or a tutorial on using container images.

8. Line 271: Please provide examples of the standardized metrics that might be used for reporting results, and comment on how one might select one metric over another.

9. Please add a sub-section on "Providing the Research Questions" to the section "Providing the Dataset, Research Questions, and Research Tasks".

10. Sub-section on "Presenting the methods and results". Please give three or four examples of publications where the authors have done a good job of presenting their methods and results: please also cite the relevant figure(s) and/or table(s) for each example.

Other sections

11. Please add a paragraph to the "Conclusions" section on how you plan to disseminate these guidelines (eg, via the EQUATOR Network?)

12. There are guidelines on developing reporting guidelines for health research (eg Moher et al. 2010 Guidance for developers of health research reporting guidelines. DOI: https://doi.org/10.1371/journal.pmed.1000217). Please comment on whether or not you followed any such guidelines.

Improving the supplementary data

13. These comments are for the supplementary data file https://osf.io/qc7a8/ downloaded on 25 August 2021.

i) The excel formulas to calculate the Median and IQR are missing the rows for experts 45-49 (row 44 is blank, maybe this caused an error?). E.g. the formula in cell B53 is "=MEDIAN(B2:B44)", when it should be "=MEDIAN(B2:B50)". Fixing the formulas doesn't substantially change the results.

ii) There are no data descriptions provided in the excel file. It is not hard to work out what all the numbers and column headings mean by going back to the manuscript, but, given this is a conduct guideline and the last item is about the FAIR principles, it would be nice to lead by example and provide detailed metadata to make the document easy to re-use.

iii) Not all the data has been made publicly available. The registration page (https://osf.io/dgrua) said "All collected raw and processed anonymous data will be publicly shared on the OSF page of the project." Instead, only partial, processed data from the final consensus survey are provided (the free-text comments are not included). It would be great to see two raw files exported from Qualtrics (with names removed/anonymized): one for the preparatory survey, and one for the consensus survey.

https://doi.org/10.7554/eLife.72185.sa1

Author response

Summary:

The report describes a consensus-based guideline for multi-analyst studies. Overall, the manuscript provides very helpful suggestions, but I have some doubts about how much such a resource-intensive tool (beyond the rather simple version of independent reproducibility checks) will actually be used by researchers. As it stands now, the audience for such a guideline is both highly select and also I would imagine very limited (at least for now). On the other hand, as demonstrated by several papers the authors cite (such as Botvinik-Nezer et al.), the approach can have profound consequences for the practices of an entire field. However, there are a number of points that need to be addressed to make the article suitable for publication.

We appreciate these thoughts and believe that the scientific benefits greatly outweigh the extra logistics required, especially for projects with high relevance of important theoretical or policy making questions. The guidelines, checklist, and reporting template that we disseminate here were designed to decrease the burden of those who plan to conduct multi-analyst projects.

Essential revisions:

Section on Methods (for the expert consensus procedure)

1. Line 129: Please provide a reference for the "reactive Delphi" expert consensus procedure. Also, please comment on whether or not you followed any existing guidelines (eg CREDES) for reporting Delphi survey studies.

We reviewed the CREDES Guideline and now indicate that our approach meets those recommendations. We added a citation for this method on page 4. We also added clarification where needed (e.g., rationale behind using a Delphi-procedure, more information on the expert panel, discussion of how our approach affects potential biases).

2. More information is needed on the process of recruiting experts. What was the criteria for selecting "experts on research methodology"?

The contacted experts included all the organisers of known multi-analyst projects at the time as well as members of an expert panel of another methodological consensus project (Aczel, B., Szaszi, B., Sarafoglou, A., Kekecs, Z., Kucharský, Š., Benjamin, D.,.… and Wagenmakers, E. J. (2020). A consensus-based transparency checklist. Nature Human Behaviour, 4, 4-6.). We clarified it on page 5.

How was the list of existing many-analyst studies collected (e.g., systematic search, word of mouth)?

We conducted an unsystematic literature search and then asked those who had contributed to multi-analyst studies whether they knew of other examples in the literature. A social media call also allowed us to gather a range of multi-analyst studies.

https://mobile.twitter.com/BalazsAczel/status/1301801254348300288. We describe this approach on page 5.

What reasons were given by those experts who declined to get involved?

Three invited experts declined, noting their lack of insight in the topic. 28 invitees did not respond to our call. We now document these non-responses and declined invitations in the manuscript on page 5.

Is there a minimal number of experts required for these kind of Delphi surveys?

Murphy et al’s (1998) investigation of this question found that the reliability of group judgements increases substantially with every additional panel member up to 6 members. Beyond 12 members, the added benefits to reliability become minimal. But, a larger group can increase the diversity of perspectives and makes the result somewhat more robust. We had 50 members in our expert panel.

Murphy, M. K., Black, N. A., Lamping, D. L., McKee, C. M., Sanderson, C. F., Askham, J., and Marteau, T. (1998). Consensus development methods, and their use in clinical guideline development. Health technology assessment (Winchester, England), 2(3), i–88.

And what was the rationale for including only experts, and not a wider sample of people who might use the guideline in the future?

As the multi-analyst approach concerns specific methodological issues, we sought input primarily from researchers with relevant expertise and experience. In the manuscript, we emphasize that this guidance is a first step toward the broader adoption of the multi-analyst approach. We hope that our recommendations will be developed further based on feedback from adopters. We added a survey link to page 14 to collect such feedback and suggestions.

3. What was the rationale behind the threshold for expert consensus which was preregistered as IQR <= 2 and median >=6 (https://osf.io/dgrua)? I'm curious about why a minimum level of satisfaction wasn't sought (e.g. items 6 and 8 both have a response of "1", indicating maximum disagreement).

We used those thresholds based on this consensus paper:

https://www.nature.com/articles/s41562-019-0772-6. We judged that coherence (IQR) and summary level of support (median) are better indicators of the panel’s general thinking than are outlier values. We now cite this reference on page 5.

4. I think that more could be said in the text about the results of the consensus panel, and that consensus ratings for each item should at least be described in the main manuscript and not just by referencing a link to OSF.

We have improved the prose in that section. The paragraphs now are in page 5-6:

“The preregistration indicated consensus on the given item if the interquartile range of its ratings was 2 or smaller. […] The summary table of the results is available at https://osf.io/qc7a8/”.

As we had relatively high agreement with low variance for each item, we would prefer not to include information on each item in the paper. If the editor feels it’s essential, we could add a table (such as the one below) to the manuscript, but we felt that it would provide limited additional information beyond the description in the text.

Author response table 1

	Item1	Item2	Item3	Item4	Item5	Item6	Item7	Item8	Item9	Item10
Median ratings	8	9	9	9	9	9	8.5	9	9	9
Interquar-tile range	2	1	1	0	1.25	1.25	1.25	1	1

5. It was a bit surprising to me that so there was so little disagreement between the expert panel on virtually all points of the guideline. This could be due to the fact that all the components of the guideline, with few exceptions, are pretty standard in terms of transparency and thus hard to disagree on (or might reflect that the relatively few multi-analyst studies conducted so far were conducted with an exceptionally high degree of rigor and transparency.) Please comment on this lack of disagreement.

The high level of agreement might be due to the fact that we had a Preparatory Round before the consensus ratings. During this round, the experts could comment on each item and the overall content of the guidance. Based on their feedback, the core team updated and revised the initial draft. Then, the experts had a new option to edit and comment on the content. As a result, the item list that they rated had already incorporated many of their insights. We now discuss this procedure more fully and note that it could contribute to the high levels of agreement. (See quote in previous reply.)

Related to this, I found the section on "Practical Considerations" more nuanced and I assume there could have been some areas of disagreement in assembling this section. It would be interesting if the authors could reflect on some of these disagreements by, for instance, emphasizing points of discussion that were more controversial or where there was a more coagulated minority opinion, if any.

Thank you for this suggestion. We have added discussion of these disagreements to our Limitations section on page 13-14:

“Some of our practical considerations reflect disagreement and/or uncertainty within our expert panel, so they remain underspecified. […] Therefore, we emphasise that this consensus-based guidance is a first step towards the broader adoption of the multi-analyst approach in empirical research, and we hope and expect that our recommendations will be developed further in response to user feedback.”

6. Sub-section on "Deciding on the number of co-analysts" (Lines 220-235).

My understanding is that the number of analysts needed may depend on the research question and also on the complexity of the dataset that would be analyzed. I would expect more discussion and guidance here. Currently, I don't think that this sub-section is really useful. Providing some examples would also be very helpful.

We see value in drawing attention to the two distinct motivations for employing a multi-analyst approach – they have different requirements. We have added examples to each subsection on page 8.

Section on Practical considerations

7. This section includes links/references for a couple of tools for any researchers who are considering embarking on their first many-analyst project (eg "instructions for reporting an effect estimate" and tenzing): it would be helpful to includes links/references for more such tools and resources, such as templates for providing the research task to many-analysts, a guide to automating emails, or a tutorial on using container images.

We added references to further templates and guidelines on pages 9 and 11.

8. Line 271: Please provide examples of the standardized metrics that might be used for reporting results, and comment on how one might select one metric over another.

We added examples and explanations on page 10.

9. Please add a sub-section on "Providing the Research Questions" to the section "Providing the Dataset, Research Questions, and Research Tasks".

Done.

10. Sub-section on "Presenting the methods and results". Please give three or four examples of publications where the authors have done a good job of presenting their methods and results: please also cite the relevant figure(s) and/or table(s) for each example.

We badded the following paragraph to this subsection with examples and relevant figure numbers on pages 12-13:

“Published reports of multi-analyst projects have adopted several effective methods for presenting results For binary outcomes, Botvinik-Nezer et al. (39) used a table with colour coding (i.e., a binary heat map) to visualise outcomes across all teams. […] Olsson Collentine et al. (40) (Figure 2) used funnel plots and Patel et al. (7) (Figures 1 and 2) used volcano plots to depict numerous, diverse outcomes with an intuitive depiction of clustering (akin to a multiverse analysis).”

Other sections

11. Please add a paragraph to the "Conclusions" section on how you plan to disseminate these guidelines (eg, via the EQUATOR Network?)

Added it on page 14.

12. There are guidelines on developing reporting guidelines for health research (eg Moher et al. 2010 Guidance for developers of health research reporting guidelines. DOI: https://doi.org/10.1371/journal.pmed.1000217). Please comment on whether or not you followed any such guidelines.

We reviewed CREDES, a guideline for conducting and reporting Delphi studies, and added the following sentence to the manuscript on page 4:

“The applied consensus procedure and its reporting satisfy the recommendations of the CREDES (21) guideline on conducting and reporting Delphi studies.”

Improving the supplementary data

13. These comments are for the supplementary data file https://osf.io/qc7a8/ downloaded on 25 August 2021.

i) The excel formulas to calculate the Median and IQR are missing the rows for experts 45-49 (row 44 is blank, maybe this caused an error?). E.g. the formula in cell B53 is "=MEDIAN(B2:B44)", when it should be "=MEDIAN(B2:B50)". Fixing the formulas doesn't substantially change the results.

We really appreciate your thorough review of the accompanying code! Because of the blank row, our calculation included a mistake. We fixed the formulas and updated the supplementary materials and main text accordingly. Fixing the formulas does not substantially change the results.

ii) There are no data descriptions provided in the excel file. It is not hard to work out what all the numbers and column headings mean by going back to the manuscript, but, given this is a conduct guideline and the last item is about the FAIR principles, it would be nice to lead by example and provide detailed metadata to make the document easy to re-use.

We added detailed meta-data to the data file to make it easier to use.

iii) Not all the data has been made publicly available. The registration page (https://osf.io/dgrua) said "All collected raw and processed anonymous data will be publicly shared on the OSF page of the project." Instead, only partial, processed data from the final consensus survey are provided (the free-text comments are not included). It would be great to see two raw files exported from Qualtrics (with names removed/anonymized): one for the preparatory survey, and one for the consensus survey.

We have added the anonymized raw files to the OSF page.

https://doi.org/10.7554/eLife.72185.sa2

Article and author information

Author details

Balazs Aczel

Balazs Aczel is at ELTE Eotvos Lorand University, Budapest, Hungary

Contribution
Conceptualization, Methodology, Project administration, Writing – original draft, Writing – review and editing

Contributed equally with
Barnabas Szaszi

For correspondence
aczel.balazs@ppk.elte.hu

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-9364-4988
Barnabas Szaszi

Barnabas Szaszi is at ELTE Eotvos Lorand University, Budapest, Hungary

Contribution
Conceptualization, Methodology, Writing – original draft, Writing – review and editing

Contributed equally with
Balazs Aczel

For correspondence
szaszi.barnabas@ppk.elte.hu

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-7078-2712
Gustav Nilsonne

Gustav Nilsonne is at the Karolinska Institutet and Stockholm University, Stockholm, Sweden

Contribution
Conceptualization, Methodology, Writing – original draft, Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-5273-0150
Olmo R van den Akker

Olmo R van den Akker is at Tilburg University, Tilburg, Netherlands

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Casper J Albers

Casper J Albers is at the University of Groningen, Groningen, Netherlands

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Marcel ALM van Assen

Marcel ALM van Assen is at Tilburg University, Tilburg, and Utrecht University, Utrecht, Netherlands

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Jojanneke A Bastiaansen

Jojanneke A Bastiaansen is at the University Medical Center Groningen, University of Groningen, Groningen, and Friesland Mental Health Care Services, Leeuwarden, Netherlands

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-4831-6402
Daniel Benjamin

Daniel Benjamin is at the University of California Los Angeles, Los Angeles, and the National Bureau of Economic Research, Cambridge, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-2642-5416
Udo Boehm

Udo Boehm is at the University of Amsterdam, Amsterdam, Netherlands

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-8677-0721
Rotem Botvinik-Nezer

Rotem Botvinik-Nezer is at Dartmouth College, Hanover, United State

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-2669-1877
Laura F Bringmann

Laura F Bringmann is at the University of Groningen, Groningen, Netherlands

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-8091-9935
Niko A Busch

Niko A Busch is at the University of Münster, Münster, Germany

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-4837-0345
Emmanuel Caruyer

Emmanuel Caruyer is at the University of Rennes, CNRS, Inria and Inserm, Rennes, France

Contribution
Methodology, Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-8547-7726
Andrea M Cataldo

Andrea M Cataldo is at McLean Hospital, Belmont, and Harvard Medical School, Boston, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-2787-224X
Nelson Cowan

Nelson Cowan is at the University of Missouri, Columbia, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-3711-4338
Andrew Delios

Andrew Delios is at the National University of Singapore, Singapore

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-6791-227X
Noah NN van Dongen

Noah NN van Dongen is at the University of Amsterdam, Amsterdam, Netherlands

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-0387-7388
Chris Donkin

Chris Donkin is at the University of New South Wales, Sydney, Australia

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Johnny B van Doorn

Johnny B van Doorn is at the University of Amsterdam, Amsterdam, Netherlands

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-0270-096X
Anna Dreber

Anna Dreber is at the Stockholm School of Economics, Stockholm, Sweden, and the University of Innsbruck, Innsbruck, Austria

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-3989-9941
Gilles Dutilh

Gilles Dutilh is at the University Hospital Basel, Basel, Switzerland

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Gary F Egan

Gary F Egan is at Monash University, Melbourne, Australia

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-3186-4026
Morton Ann Gernsbacher

Morton Ann Gernsbacher is at the University of Wisconsin-Madison Madison, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-0397-3329
Rink Hoekstra

Rink Hoekstra is at the University of Groningen, Groningen, Netherlands

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-1588-7527
Sabine Hoffmann

Sabine Hoffmann is at Ludwig-Maximilians-University, Munich, Germany

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-6197-8801
Felix Holzmeister

Felix Holzmeister is at the University of Innsbruck, Innsbruck, Austria

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-9606-0427
Juergen Huber

Juergen Huber is at the University of Innsbruck, Innsbruck, Austria

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-0073-0321
Magnus Johannesson

Magnus Johannesson is at the Stockholm School of Economics, Stockholm, Sweden

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-8759-6393
Kai J Jonas

Kai J Jonas is at Maastricht University, Maastricht, Netherlands

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Alexander T Kindel

Alexander T Kindel is at Princeton University, Princeton, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Michael Kirchler

Michael Kirchler is at the University of Innsbruck, Innsbruck, Austria

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-5416-2545
Yoram K Kunkels

Yoram K Kunkels is at University Medical Center Groningen, University of Groningen, Groningen, Netherlands

Contribution
Writing – review and editing

Competing interests
No competing interests declared
D Stephen Lindsay

D Stephen Lindsay is at the University of Victoria, Victoria, Canada

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Jean-Francois Mangin

Jean-Francois Mangin is at Université Paris-Saclay, Paris, and Neurospin CEA, Paris, France

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-1612-461X
Dora Matzke

Dora Matzke is at Amsterdam University, Amsterdam, Netherlands

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Marcus R Munafò

Marcus R Munafò is at the University of Bristol, Bristol, United Kingdom

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Ben R Newell

Ben R Newell is at the University of New South Wales, Sydney, Australia

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-1898-205X
Brian A Nosek

Brian A Nosek is at the Center for Open Science and the University of Virginia, Charlottesville, United States

Contribution
Writing – review and editing

Competing interests
Executive Director of the Center for Open Science, a non-profit technology and culture change organization with a mission to increase openness, integrity, and reproducibility of research
Russell A Poldrack

Russell A Poldrack is at Stanford University, Stanford, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-6755-0259
Don van Ravenzwaaij

Don van Ravenzwaaij is at the University of Groningen, Groningen, Netherlands

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-5030-4091
Jörg Rieskamp

Jörg Rieskamp is at the University of Basel, Basel, Switzerland

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-2632-8015
Matthew J Salganik

Matthew J Salganik is at Princeton University, Princeton, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Alexandra Sarafoglou

Alexandra Sarafoglou is at Amsterdam University, Amsterdam, Netherlands

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Tom Schonberg

Tom Schonberg is at Tel Aviv University, Tel Aviv, Israel

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-4485-816X
Martin Schweinsberg

Martin Schweinsberg is at ESMT Berlin, Berlin, Germany

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-3529-9463
David Shanks

David Shanks is at University College London, London, United Kingdom

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-4600-6323
Raphael Silberzahn

Raphael Silberzahn is at the University of Sussex, Brighton, United Kingdom

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Daniel J Simons

Daniel J Simons is at the University of Illinois at Urbana-Champaign, Urbana-Champaign, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Barbara A Spellman

Barbara A Spellman is at the University of Virginia, Charlottesville, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Samuel St-Jean

Samuel St-Jean is at the University of Alberta, Edmonton, Canada, and Lund University, Lund, Sweden

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-8092-2974
Jeffrey J Starns

Jeffrey J Starns is at the University of Massachusetts Amherst, Amherst, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Eric Luis Uhlmann

Eric Luis Uhlmann is at INSEAD, Singapore, Singapore

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Jelte Wicherts

Jelte Wicherts is at Tilburg University, Tilburg, Netherlands

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-2415-2933
Eric-Jan Wagenmakers

Eric-Jan Wagenmakers is at Amsterdam University, Amsterdam, Netherlands

Contribution
Conceptualization, Methodology, Writing – original draft, Writing – review and editing

Competing interests
No competing interests declared

Funding

Netherlands Organisation for Scientific Research (406-17-568)

Alexandra Sarafoglou

Natural Sciences and Engineering Research Council of Canada (BP-546283-2020)

Samuel St-Jean

Fonds de Recherche du Québec - Nature et Technologies (290978)

Samuel St-Jean

European Research Council (726361)

Jelte Wicherts
Olmo R van den Akker

European Research Council (681466)

Yoram K Kunkels

VIDI fellowship organisation (016.Vidi.188.001)

Don van Ravenzwaaij

VENI fellowship grant (Veni 191G.037)

Laura F Bringmann

National Science Foundation (1760052)

Matthew J Salganik

Weizmann Institute of Science (Israel National Postdoctoral Award Program for Advancing Women in Science)

Rotem Botvinik-Nezer

John Templeton Foundation, Templeton World Charity Foundation, Templeton Religion Trust, and Arnold Ventures

Brian A Nosek

Institut Européen d'Administration des Affaires

Eric Luis Uhlmann

European Research Council (640638)

Noah NN van Dongen

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Publication history

Received: July 14, 2021
Accepted: November 7, 2021
Accepted Manuscript published: November 9, 2021
Version of Record published: November 26, 2021

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.