The paradox of prediction
That prediction is one of the main goals of science goes without saying, but how exactly predictive success contributes to the epistemic value of a scientific theory has proven to be an awfully complicated philosophical riddle for centuries. By ‘epistemic value’ I don’t mean something for which we can offer a formal definition (this is another philosophical problem by its own, one I have marginally touched here sometimes), but we can accept without much discussion that the probability that the theory is (approximately) correct constitutes an important part of that value. The question is, hence, why are we legitimate to think (if we are) that a theory that makes good predictions is more probably right (or approximately right) than a theory that makes fewer good predictions, even when both theories don’t make bad, i.e., unsuccessful, predictions. In particular, surprising predictions tend to make us being very confident on the (at least approximate) validity of the theory that has been able of making them, and the history of science is full of examples of this surprisingness criterion being at work in pushing a scientific community to accept a theory towards which strong doubts existed before: one can cite Thomas Young’s wave theory of light and its prediction of bright spots in circular shadows, Mendeleiev’s periodic table and its prediction of new elements, etc.
With a formal example: if theory A predicts E, theory B predicts not-E, and we observe E, A will have made a successful prediction whereas B will have been falsified, so we know that B is false, though perhaps A is true; i.e., we can affirm something like that p(B,E) = 0, whereas p(A,E) > 0; in this case, the epistemic value of A would be obviously bigger than that of B… but this is not the problematic case: the philosophical problem arises in cases in which neither the predictions of A nor those of B have been falsified, i.e., the predictions of both theories are successful, but A has made more successful predictions than B: let A predicts E and F, and B predicts E (while being silent about whether F or not F); if both E and F are observed, can we affirm that p(A,E&F) > p(B,E&F)? This is not necessarily so: for example, it will depend on the ‘prior’ probabilities of A and B, which are difficult to assess. The problem is still bigger if the predictions of B are not a subset of the predictions of A, e.g., if B predicts G, but says nothing about E or F, and A does not say anything about G.
That’s enough of formalism for this entry; let’s put a more vivid example. Suppose that there is a big lab producing an array of empirical findings about a Very Important Scientific Problem (VISP). The lab publishes regularly its findings, but not all countries can afford the access to that publication, and in particular, there is a Very Poor Country with some Very Smart Theoretical Scientists who work on VISP, developing theories that make empirical predictions, but that have to wait for several years till, thanks to the expiration of the copyrights of the pertinent publications, they can confirm whether their predictions have been successful or not. In parallel, there is a Very Wealthy Country with a flock of Very Mediocre Theoretical Scientists who also work on VISP, but that, thanks to the generous funds provided by their government, have immediate access to the lab’s publication. These scientists are so dull, or have so bad luck, that almost every theory they propose to explain the data published by the lab till moment t is falsified by the next published datum at t+1. They have to revise and modify their theories with more or less ad hoc changes to accommodate both the old and the new data… till the next piece of evidence comes and forces to do the same again. When the last datum comes and forces the last change, they modify once more their theory to take that into account, and end proposing a theory they consider ‘definitive’ and that we’ll call A.
The researchers of the Very Poor Country, instead, knowing nothing about what empirical discoveries the lab is doing, work hard to develop the best theory they can, on the basis of their intuition of physical principles or the like, and propose a theory, B, that they save in a draw till the data arrive, years later. Not without suspecting this was to happen, they discover that B’s predictions fit exactly (within the experimental margin of error) with the published data. Another amazing triumph of the wise scientists.
The question is, what theory is better, A or B? By hypothesis, both ‘predict’ exactly the same empirical facts, with the only difference that B did it before the relevant facts were known, and A is the result of a patient and tiresome replacement and ad hoc modification of previously falsified theories. We are not surprised by the ‘success’ of A, whereas the success of B is really amazing, but how this surprisingness (or the lack of it) affects the epistemic value of each theory, once the data are there? Perhaps we can say that the surprisingness of B can be taken as an indirect indicator of future predictive success (i.e., if the lab starts doing more experiment next year, we would bet more for the success of B’s new predictions than for the success of those of A), in the way that, for example, the Akaike Information Criterion is often interpreted 1. But, what if A and B happen to be exactly one and the same theory, formally equivalent to each other? Suppose we knew nothing about the Very Poor Country and its wise scientists, and only know the story about the wealthy ones: in this case, their theory A (that, do not forget, is little more than a boring system of equations) will be just the uninspiring set of formulae for whose future success we wouldn’t bet too much. Then we learn that in the poor country the same theory was discovered before the data were produced and published, and suddenly our faith in the future success of A increases enormously. But exactly why is something we don’t know.
Does this mean that our estimation of the probability of A (or of B, which is the same theory) given the available empirical evidence seems to depend in part of the temporal order in which the evidence has been collected and the theory has been devised? Perhaps this is so, though we lack a convincing philosophical, logical or mathematical explanation of why. In any case, most of the authors that have dealt with this problem prefer not to accept this conclusion 23. Many of them opt for some different explanation, like that predictive success (as something additional to mere post hoc accommodation of empirical data) is not an epistemic value per se, but an indication of some other epistemic value. For example, perhaps the fact that the previous knowledge of the empirical data to be accommodated has not been taken into account in the process of devising the theory indicates something about the simplicity, or the prior probability, or the theoretical plausibility of the theory (this would demand another hard philosophical discussion about what ‘theoretical plausibility’ is and in what it is different from ‘empirical plausibility’ or ‘plausibility, full stop’), or about some other less obvious theoretical or pragmatic virtue theories must have. As with every intriguing problem, any ideas are welcome.
- Gelman, A., J. Hwang and A. Vehtari, 2013, “Understanding predictive information criteria for Bayesian models”, Statistics and Computing, published online, 8.20.2013 ↩
- Douglas, H., and P.D. Magnus, 2013, “State of the field: why novel predictions matter”, Studies in History and Philosophy of Science, 44, 580-589 ↩
- Mayo D. (2014). Some surprising facts about (the problem of) surprising facts, Studies in History and Philosophy of Science Part A, 45 79-86. DOI: http://dx.doi.org/10.1016/j.shpsa.2013.10.005 ↩