New ASU psychology professor studies how best to replicate scientific experiments
If a scientific finding cannot be replicated, can it be true? The replication of experimental findings is a crucial part of the scientific method and is first taught in elementary school science lessons. But researchers from psychology, economics, marketing and medicine have recently struggled to reproduce many findings, giving rise to the so-called “replication crisis” in science.
The reasons why findings might not replicate are numerous and complicated. Though a few instances of scientific fraud have received a lot of attention in the media, a more common reason why a finding might not replicate is the improper or ineffective use of statistics.
Most replication attempts rely on one kind of statistic to determine whether a finding reproduces. This narrow definition of replication can lead to problems, as Arizona State University’s Samantha Anderson knows. Anderson recently joined the Department of Psychology as an assistant professor, and one of the focuses of her work is how the sample size, or amount of data in an experiment, can affect whether a study replicates.
“There are several ways to assess whether a replication study is successful or if it has failed, but right now most people are deciding if a replication is successful in the same way,” said Anderson, who with her doctoral adviser recently won the 2018 Tanaka Award for the best paper from Multivariate Behavioral Research, the flagship journal of the Society of Multivariate Experimental Psychology.
“It is important to expand the definition of what it means to replicate a study.”
Anderson also works on improving how experiments are reproduced using traditional definitions of replication.
Planning appropriate sample sizes with an easy online tool
The statistic that most replication studies rely on is the “p-value,” which gives information about how likely a scientific result is. When researchers attempting to reproduce a study duplicate the original sample size and then compare p-values, the replication attempt can fail even if the finding is actually reproducible.
Figuring out how much data are needed for an experiment or replication might seem straightforward but actually requires thinking about terms like signal and statistical power.
“You want a sample size that is large enough to detect the signal you are looking for, among noise and even errors in the data,” Anderson said. “The probability of you finding that signal is the statistical power.”
Determining the necessary sample size for a study can be challenging because it requires information about the magnitude of the effect being studied, which scientists call the “effect size.” Sometimes scientists might not know much about the signal they are hoping to detect. To replicate an experiment, many researchers logically start by using the effect size reported in the original study because that is the best available information on the magnitude of an effect. But Anderson said using sample effect sizes from published studies at face value can actually cause unintended problems because of publication bias. Scientists rarely, if ever, publish experiments that did not work, so the sizes of published results tend to be skewed higher than they actually are.
Another source of unintended problems in figuring out sample sizes is uncertainty, which exists because experiments often have to measure an effect from just part of a population. For example, politician approval surveys use a small subset of the electorate, usually just a few thousand people, to infer how the American population feels.
To combat publication bias and unravel the best way to handle uncertainty in data, Anderson worked with Ken Kelley at the University of Notre Dame to develop an online tool to make sample size calculations easy for researchers.
The tool, called the “bias uncertainty corrected sample size” or BUCSS for short, is based on statistical methods that were published decades ago and several publications from Anderson’s doctoral research. The BUCSS tool can help researchers plan sample sizes for new experiments and replication studies.
“We developed software that accounts for biases and uncertainty that can result in incorrect sample sizes,” Anderson said. “The BUCSS tool is designed so people can use it without thinking about the equations and math behind it.”
Before coming to ASU, Anderson earned her doctorate at Notre Dame. She started graduate school pursuing her degree in clinical psychology but soon realized she loved methods research. Anderson completed her doctorate in quantitative psychology and won the 2017 Psi Chi/APA Edwin B. Newman Graduate Research Award, which acknowledges the best research paper from a psychology graduate student.