I’ve been pondering this topic ever since I read Philip Ball’s column in the Guardian on Christmas Eve. The recent announcement of tentative evidence for the Higgs Boson and the report of neutrinos travelling faster than the speed of light have created both interest and confusion. In his article, Philip Ball proposed some guidelines on how we should respond to these new “discoveries”. In his view, statistical measures associated with the experiments should be ignored but rather we should be lead by the prevailing beliefs of scientists in the field. Taking this approach, we could conclude that the Higgs boson might have been found but that neutrinos might not travel faster than light.
I had always thought that scientists should follow the statistics and that following opinion might run the risk of being tainted by bias so I wanted to write something about this apparent paradox. I am also not really sure why we need to second guess the outcome of these experiments. It might be better to wait until more data are available and, in the case of the neutrinos, to wait until the experiments have been repeated by other groups. But scientists are intrinsically gossipy and will always want to speculate on the science done by others so here we go.
If you delve a little, you find that Philip Ball’s piece was triggered by an article by Italian particle Physicist Giulio D’Agostini. Now this is a difficult read but it makes some good points. D’Agostini is concerned about the misuse of statistics by journalists, by lay people and by scientists. It’s easier to see his argument with an example. He uses data from a study at Fermilab in the US that were interpreted by some as evidence of a “new physics”. Scientists at Fermilab had seen an unexpected “bump” in their data and wanted to evaluate whether this was a real difference (implying a “new physics”) or just part of the variation seen in any experiment (no “new physics”). Here they used a p-value which is a statistical measure of the probability that the results they had observed would be seen even if there were no “new physics”.
The Fermilab scientists came up with a p-value of 0.00076 for the experiment and this tells us that with no “new physics” the probability of seeing the “bump” would be 0.076% or about 0.1%. Putting this in a different way, the results seen by the Fermilab scientists could have occurred with standard physics although the probability is quite low. What worries D’Agostini is that this probability value is then misused. The misuse is to turn the p-value into a predictor of the hypothesis, in this case “new physics”, and he quotes several media reports that made this error. These reports suggested that because the p-value was very low there was a very high chance that the hypothesis was correct. As D’Agostini points out, the p-value tells us about the experimental data (i.e. what is the probability of observing the data in the absence of new ideas) but not about what the experiment means i.e. the hypothesis.
If the statistics can’t help then how do we understand this experiment? Here D’Agostini suggests we look at what is already known and ask what other scientists in the field believe. Do they believe the hypothesis that a “new physics” is required to explain the results or do they not? Here we have to look at the way science works. Scientists perform experiments to test hypotheses and gradually, as the data emerge, networks of beliefs about science arise. These networks of beliefs can be called theories. These theories will be believed by many scientists in the community although there may be dissenters. If a prevailing theory, based on extensive prior information, is challenged by new data indicating an alternative idea then we will require substantial further experimental work to make us change our beliefs. As D’Agostini points out, this is why many people did not believe the Fermilab results. The faster-than-light neutrinos also fit this category, being a completely new observation that challenges existing ideas. If new data can be interpreted within existing theories then they are more likely to be believed. This seems to be the case for the data obtained on the Higgs Boson at the Large Hadron Collider.
This scheme, proposed by D’Agostini and endorsed by Ball does not, however, completely fit the way progress is made in other branches of science. In the life sciences, for example, p-values are a mandatory aspect of reporting the results of experiments and they are used extensively to assess outcomes of experiments. In examining the possible effect of a perturbation on a system, typically the experiment is set up with two groups, control and treated. The data are analysed and mean values calculated for the two groups. The two means may differ and the p-value is then the probability that the mean for the treated group (different from the control group mean) will be seen even if there is no real effect of the treatment. A low p-value implies a low probability and, as we have seen above, this should not be taken to infer the magnitude of the probability that the treatment has had an effect. Nevertheless, we need to be able to make progress and typically life scientists take a p-value of 0.05 (5%) as a cut off. p-values greater than 0.05 are taken as evidence that the treatment had no effect and p-values less than 0.05 are taken as evidence that the treatment had an effect. This is a bit different from what D’Agostini recommends but it seems to have allowed advances in the life sciences and the generation of theories of how systems function.
You may have now spotted another paradox. The p value reported for the Fermilab experiments was 0.00076 and this is much smaller than the 0.05 figure typically used in life sciences as an indicator that data are not just a result of random fluctuation. So why were the particle physicists not more accepting of the result? Apparently particle physicists use more stringent cut offs in their work with p-values of 4×10-7 being required for a “discovery”. In the end, however, these figures don’t matter greatly as it is the relation between the new findings and prior knowledge that matters. When a new finding is made, even if the experiment is supported by a strong p-value, if it goes against prior data then most scientists in the field will shrug their shoulders and wait for replication before getting too excited.
Let’s conclude by considering a potential problem in the scientific belief system. Science proceeds by experiments that test hypotheses and gradually networks of beliefs (theories) arise. As more data are found that are consistent with the theory, belief in the theory strengthens. If data are found that disagree, then the theory should, in principle, be weakened. Sometimes this weakening of belief does not work as it should because of the way people treat theories. Within the life sciences there are large research groups and smaller research groups often working in the same field. The larger groups often produce large numbers of papers and attain a preeminent status in the field. Sometimes theories produced by these groups assume dominance and are awkward to dislodge. It can even become difficult to publish data that disagree with the prevailing theory and there may be a reluctance to interpret data in terms of other frameworks. Sometimes data that disagree with the dominant theory don’t get published at all.
Strong theories can also have insidious effects on the way experiments are performed and interpreted. Sometimes when analysing data there can be a tendency for a researcher to reject data points that “don’t fit” and it is essential to avoid such bias. It is for this reason that, where drugs are tested on humans, the randomised controlled trial is the only acceptable method. This includes randomisation of subjects between, for example, control (placebo) and treatment groups. Ideally, there should also be “blinding” where neither subject nor observer knows which group a subject is in and there should be enough subjects enrolled on the trial to eliminate chance effects. This design provides a way to reduce bias to a minimum so that a fair appraisal of the effects of a drug can be obtained.
The success of the randomised controlled trial system also depends on full disclosure of data. So, for example if a pharmaceutical company performs ten trials on the effects of a new drug in different parts of the country, all of those trial data should be made public. An editorial and associated papers in the British Medical Journal this week show that full disclosure of trial data is not occurring. The results of many trials are not disclosed leading to deficits in knowledge. This is a form of bias and in some cases leads to misleading conclusions (positive and negative) about the effects of a drug.