
As a mild-mannered statistics professor, it’s not often that I get contacted directly by the CEO of a multi-million-dollar company, much less regarding allegations of cheating and malfeasance among world champions.
But that’s precisely what happened last summer. Erik Allebest, CEO of the world’s largest online chess site, Chess.com, asked me to investigate former world chess champion Vladimir Kramnik’s concerns about the long winning streaks of top player Hikaru Nakamura.
Kramnik argued that these streaks had very low probability and were therefore very suspicious and “interesting.” He didn’t quite accuse Hikaru of cheating, but the implication was clear. Feelings were running high, with Kramnik’s supporters posting angry comments (often in Russian) about cheating as many Chess.com players and Hikaru partisans dismissed the accusations.
Who was right? Who was wrong? Who could say?
Allebest asked me to conduct an independent, unbiased statistical analysis to see just how unlikely those chess winning streaks actually were.
Now, I am no stranger to public statistical disputes, having published a
best-selling book about everyday probabilities and conducted the statistical analysis for the high-profile lottery retailer scandal. But could statistical analysis really help to clarify this simmering controversy on the world’s biggest chess stage?
Calculating probabilities
To sort this out, I first had to calculate the probability of each player winning or tying each game. Different players can have very different abilities, and more advanced players have a greater chance of defeating less experienced opponents. But just how great?
Chess.com assigns a chess rating to each player after each game, and these ratings were shared with me. My analysis suggested that a certain logistic — or s-shaped — curve function provided an accurate estimate of each game’s probabilities.
Furthermore, deviations from this probability in successive game results were approximately independent, so the influence of one game on the next could be safely ignored. This gave me a clear probability of each player winning each game.
I could then analyze those winning streaks that had provoked so much ire. It turned out that Hikaru, unlike most other top players, had played lots of games against much weaker players. This gave him a very high probability of winning each game. But even so, should he have such long winning streaks, sometimes more than 100 games in a row?
Testing randomness
To check this, I conducted some Monte Carlo simulations, which repeat a test with random variations.
I wrote computer programs to randomly assign wins and losses and draws to each of Hikaru’s games, according to the probabilities from my model. I had the computer measure the most surprising winning streaks each time. This allowed me to measure how Hikaru’s actual streaks stacked up against what we should expect.
I found that in many of the Monte Carlo simulations, the simulated results included streaks just as unlikely as the actual ones. This demonstrated that Hikaru’s chess results were just about what might be expected. He had such a high probability of winning each game, and had played so many games on Chess.com, that such long winning streaks were likely to emerge according to the rules of probability alone.
Responses to findings
I wrote up a brief report of my findings, and sent it to Chess.com.
It ran a news item on its site, which elicited many comments, mostly supportive.
Hikaru then posted his own video commentary, also supporting my analysis. But meanwhile, Kramnik posted a 29-minute video criticizing my research.
Kramnik did include some substantive points, so I wrote an addendum to my report to address his concerns and show that they would not effect the conclusion. I also converted my report into a formal paper, which I submitted to a research journal.
I then got busy with my teaching duties and put the chess controversies
out of my mind until I received a response in December. It consisted of three referee reports and editor comments, with detailed comments totalling six single-spaced pages.
I also then discovered that Kramnik had posted a second 59-minute video critiquing my addendum and raising additional points, too.
I addressed Kramnik’s and the referees’ additional points while revising my article for publication. My paper was finally published in the Harvard Data Science Review.
I was glad to have my findings published in a prestigious statistics journal, thus giving them a formal stamp of approval. And perhaps, at long last, to settle this particular champion-level chess controversy.
This article is republished from The Conversation under a Creative Commons license. Original article.