HOW SHOULD SCIENCE BE DONE? 82591

Lately I keep running into the idea that the proper way to do science is to continually strive to disprove a hypothesis, rather than support it*. According to these writers, this is what scientists are supposed to aspire to, but I've never actually heard a scientist say this. The latest example was recently published in the Wall Street Journal (1). This evokes an image of the Super Scientist, one who is so skeptical that he never believes his own ideas and is constantly trying to tear them down. I'm no philosopher of science, but this idea never sat well with me, and it's contrary to how science is practiced.

I could spend my entire career trying to disprove Pasteur's germ theory, and it would be a waste of time. I could spend my career trying to disprove the idea that DNA contains genetic material, and I would also be wasting my time. Why did we ever move on from testing these hypotheses? Because the evidence supporting them is overwhelming. At some level of evidence, one has to conclude that a hypothesis is sufficiently supported, stop testing it, and move on.

The scientific method is just a formalized version of common sense. If you were to try to eat five rocks, and break your teeth each time, you'd conclude that rocks aren't good food and stop trying to eat them. You wouldn't conclude that you failed to disprove the idea that rocks aren't good food, and keep trying to eat them.

To decide whether or not a hypothesis (i.e., an idea or model) is supported by evidence, a critical element is the use of a "hypothesis test". Hypothesis tests are based on probability. The techniques that allow us to do this are called statistics. These hypothesis tests are fundamental to quantitative science, because they are what allow you to say that your results are "statistically significant" rather than arising by chance, and this is an essential element of being able to claim that your hypothesis is supported rather than unsupported.

Basically, a hypothesis test is set up by pitting one hypothesis against another. Hypothesis #1 is the effect you're looking for, for example that tall people on average have bigger feet than short people. Hypothesis #2 is called the "null hypothesis", and it is what would be observed if hypothesis #1 were not correct, i.e. there is no difference in the foot size of tall and short people.

If we take our measurements and find, using the appropriate statistical test, that there is a difference in foot size between groups, and that this difference is unlikely to have arisen by chance, then we reject the null hypothesis. Therefore, the experimental hypothesis is supported and tall people probably do have bigger feet on average.

This is important to understand. In this case, the hypothesis test rejects the null hypothesis, supporting the experimental hypothesis. We don't say "our results fail to reject the hypothesis that tall people have bigger feet", as we would if every experiment were designed to try to reject our idea. We say "our results support the hypothesis that tall people have bigger feet", because the null hypothesis, that foot size is the same, has been rejected. Next, we have to decide if the effect size is large enough to be important, and how it fits in with the rest of the scientific literature. Ideally, other groups will independently do the same experiment and find the same result, otherwise we have to question our conclusions.

Experiments support hypotheses, they do not fail to reject them. This is good science. It is true that we will never be able to weed out all subjectivity from scientific research, that some scientists hold irrational beliefs in regard to their own research, and that these irrational beliefs are often due to social factors and self-serving motivations, because after all scientists are humans too. But the scientific method is nevertheless the best tool we have for minimizing subjectivity in the pursuit of information, and the way we are using it currently is pretty darn effective.

* As an aside, in many cases it is literally impossible to disprove or falsify a hypothesis using conventional statistics methods. Going back to the foot size example, if we find that there is no statistically significant difference in foot size between short and tall people, technically speaking we do not reject the hypothesis that tall people have bigger feet. We have not disproven it, what we have done is failed to support it because we couldn't reject the null hypothesis. Our test could not rule out the possibility that in the population at large (as opposed to the random sample of people in our experiment), there is a real difference in foot size that was too small to detect in our experiment.

The goal of most experiments is not to try to falsify or disprove a hypothesis (which in any case is often impossible), it is to test a hypothesis by pitting it against the null hypothesis. In other words, does the model accurately predict reality when it is tested? This is how it should be. The outcome of many experiments is either a) the hypothesis is supported, or b) the null hypothesis is not rejected, i.e. there is not sufficient evidence in support of the hypothesis. There is often no option "c", the hypothesis is falsified.

It is often said that an idea must be falsifiable to be scientific. Given the fact that hypotheses often cannot be falsified using our current methods, I think a better way to convey this idea is to say that an idea must be testable to be scientific. We can fudge this a little bit and say that an idea has been falsified if we test it several different ways and none of them support it, or if it's clear that even if the effect exists, it's too small to be important.