Science News, Society and Psychology

Psychology’s Replication Crisis

In 2011, social psychologist Daryl Bem published a study consisting of nine experiments and over 1,000 participants. The study, published in the Journal of Personality and Social Psychology, used similar procedures and statistical methods as his contemporaries[1].

The results of these experiments led to a massive re-examination of psychology’s experimentation process as Bem’s study (2011) seemingly proved the existence of precognition, the psychic ability to see future events.

If an outlandish claim such as the existence of psychic powers could be published, is it possible that the very foundations of psychological research are flawed? To get to the bottom of this, collaborative studies were created with the purpose of finding out if the findings of several psychological studies, many of which had already been published in reputable psychology journals, could be replicated. Often referred to as a “cornerstone of science,”[2] replication refers to the process of recreating a study under near-identical conditions with the goal of achieving the same result as the original.[2] If a study can be replicated multiple times and still achieve significant results, then more confidence can be placed in its findings.

In 2014, the Many Labs collaborative study examined “variation in replicability of 13 classic and contemporary psychological effects across 36 samples and settings.”[3] The groups enlisted in the replication study through the Open Science Collaboration, who’s mission is to strengthen research through testing the reproducibility of experiments. While carrying out the replication, labs were instructed to record a video simulation of lab setting procedures and document key features of the experimentation process. Out of the 13 effects that were tested, ten were able to be successfully replicated with statistically significant results, and three replication attempts were unsuccessful, with all of the sample effect sizes falling short of the originally reported findings[3].

This collaborative study was repeated again in 2018 under the name of Many Labs 2. This time, only 14 out of 28 effects were successfully replicated[4], indicating an even smaller reliability.

The results from the Many Labs collaboration show that three of the effects that were tested failed to replicate, as the effect sizes from the experiments were significantly lower than those of the original studies[3].

In another broad study, the Open Science Collaboration attempted the replication of 100 studies that had already been published in various journals. Much like the Many Labs collaborations, a replication was considered successful if the effect sizes of the recreations were in the same direction as the effect size of the original experiment.  Out of the 100 studies under investigation, only 36% were successfully replicated and proved to have statistically significant results[5].

With so many effects failing to replicate, many [psychologists] have called this upheaval of the field a “replication crisis.”[6]

There are a number of reasons why a study could fail to replicate. Most of the studies that were unsuccessfully replicated concluded in a false positive or type one error, where the null hypothesis (the hypothesis that there is no significant difference between the variables being tested in a study) is incorrectly rejected, and a non-existent effect is reported. Type one errors may arise from stopping data collection once a significant effect has been observed, or including or excluding covariates, which are independent variables that are not the primary focus of an experiment. Manipulations of data such as these have become common practice in psychological research, and typically increase the probability of obtaining statistically significant results[2].

Although it may seem like a dire situation, there are many steps that psychologists can take to curb the replication crisis. In the past, psychology journals have put a lot of emphasis solely on publishing studies that produce novel findings, even if those studies show little to no evidence of replicability. Going forward, a study’s replication ability will be more heavily considered in the publication process, as journals such as those published by the Association for Psychological Science have introduced a new type of article, Registered Replication Reports[2], in order to encourage regularly incorporating reproducing findings into the process of experimentation.

Additionally, the experimentation process can become even more secure by following rules similar to those required in the Many Labs studies. Preregistering an experiment requires the lab to document the necessary procedures and designs before a replication attempt takes place. The recreation will not deviate from the original experiment, and therefore its ability to be replicated can be accurately reported.

Christopher Chartier, an associate professor of psychology at Ashland University, proposed what he calls a “CERN for psychology.”[7] This organization would consist of a network of laboratories that would allow any researcher to propose their idea and have the support needed to accurately collect data and attempt replication. Through developments such as these, new findings in psychology will be made stronger through collaboration.

References

  1. Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100(3), 407–425. https://doi.org/10.1037/a0021524
  2. Maxwell, S. E., Lau, M. Y., & Howard, G. S. (2015). Is psychology suffering from a replication crisis? What does “failure to replicate” really mean? American Psychologist, 70(6), 487–498. https://doi.org/10.1037/a0039400
  3. Klein, R. A., Florida, U. of, Ratliff, K. A., Vianello, M., Padua, U. of, Jr., R. B. A., University, T. P. S., Bahník, Š., Würzburg, U. of, Bernstein, M. J., Abington, P. S. U., Bocian, K., University of Social Sciences and Humanities Campus Sopot, Brandt, M. J., University, T., Brooks, B., Brumbaugh, C. C., York, C. U. of N., Cemalcilar, Z., … Lakens, B. A. N. D. (2014, January 1). Investigating Variation In Replicability. Social Psychology. https://econtent.hogrefe.com/doi/full/10.1027/1864-9335/a000178#fig1.
  4. Richard A. Klein, M. V. (n.d.). Many Labs 2: Investigating variation In Replicability Across Samples and Settings – Richard A. Klein, et al., 2018. SAGE Journals. https://journals.sagepub.com/doi/full/10.1177/2515245918810225.
  5. Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A., Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T., Chen, Y., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T., … Wu, H. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2(9), 637–644. https://doi.org/10.1038/s41562-018-0399-z
  6. Sussex Publishers. (n.d.). Replication crisis. Psychology Today. https://www.psychologytoday.com/us/basics/replication-crisis.
  7. Crchartier. (2017, August 26). Building a CERN for Psychological Science. Christopher R. Chartier. https://christopherchartier.com/2017/08/26/building-a-cern-for-psychological-science/.