The formation of the new Presidential Advisory Commission on Election Integrity raises an interesting question. If there were cheating in the US presidential election, or any election, what would it look like? It is easy to visualize – if you can get the right data – but can be complicated to interpret.
Artist’s credit: Unknown.
This blogger does not have an opinion about whether the 2016 US presidential election outcome was influenced by cheating of any kind. Yet, the issue has already been raised by members of both traditional parties, so it makes sense to think about what cheating might look like. Doing so might help us have a more informed debate, if there has to be one at all.
There are all sorts of ways to cheat – before and during the election – but the end result has to be the same if the cheating is to be effective. The cheaters must ensure their preferred candidate A has more votes than their competition B. This could be done simply by adding votes for the preferred candidate A – ballot-stuffing – but adding too many will make the vote turnout look suspiciously high. Alternatively, subtracting votes only from B (suppression of voters for B and/or dumping ballots with votes for B) would make the turnout look too low. Thus, it might be better to simply move votes from the B to A without changing the turnout rate.[1] If there is a strong third candidate, one could also shift votes from B to C and from C to A.
These sorts of manipulations are easy to visualize – but only if you have data at the level of each voting precinct. I’ll show you two ways.
- The first option is to create histograms. Create one histogram for voter turnout, expressed as a share of the voting age population, and another for candidate vote shares, expressed as percentages of total votes. Ideally, each should show a symmetrical clustering around one specific average value. Under most circumstances, the turnout data will be clustered around some average. That average in recent years has been somewhere between 50 and 60 percent of the total voting age population of any given US state. When viewed at the precinct level, and after subtracting those who are ineligible to vote (an important topic all by itself), the range of possibilities is rather wider, between 50 and 90 percent. In most cases, the vote shares are also clustered around some average. There is no rule here for what that average should be – it depends on the interaction between the voters and the candidates. The main thing is that there is clustering around a single average.
- The second option is to create a scatter graph with voter turnout, expressed as a share of the voting age population on one axis, and candidate vote shares, expressed as percentages of total votes, on the other axis. If the histograms for each are close to symmetrical, then you get a scatter graph where almost all of the data points could fit within a circle.
The resulting graphs for an honest election might look something like those in Figure 1:
Systematic cheating on turnout would show lots of precincts with unusually high turnout rates and/or unusually low turnout rates, or both. The graphs in Figure 2 below show a simulated example. Precincts with low turnout rates are suppressed while those with high turnout rate are amplified. This shifts the turnout histogram and the scatter graph to the right.
If there was systematic shifting of votes away from candidate B to candidate A, then the vote share histograms might look like those in Figure 3, with the curve for B decreased leftward, and the curve for A increased rightward. The vote shares for A are also increased in the scatter graph.
Here is a real-life example of possible vote fraud from the Russian parliamentary election of 2011, as alleged by Samarcand Analytics (now defunct?), Ruben Enikolopov and colleagues, Oleg Kapustenko, Peter Klimek and colleagues, and several others. Note how the dots for the United Russia party show the vote share rising with turnout, while the vote share for the Communist Party does the opposite. This suggests there might have been some manipulation of turnouts and vote shares. Note also the thickening density of vote shares in the upper right corner. Under normal circumstances, it is very unusual to see a cluster of polling stations with 100 percent voter turnout and 100 percent of the vote going to one candidate or party. This can be seen even more clearly in the histograms for vote shares in the right-hand panel. It could signal a problem, especially if that pattern has not been a regular feature in previous election outcomes.
In theory, any citizen in the US can get access to comparable data at the voting precinct level. The job is not easy because the US Constitution gives responsibility for the administration of elections to the states, and not all states have organized databases for election outcomes at the precinct level. The Harvard Data Archives have a lot, but not in easy to use formats. OpenElections.Net are attempting to provide data in a more accessible manner. They are making good progress.
The trick is in being organized enough to gather it all together in an organized manner, suitable for analysis. And, because of the Electoral College, that analysis needs to be done state-by-state rather than all in one go.
If you can get your hands on such data, don’t get too suspicious. It is possible that strange patterns in vote outcomes could have innocent explanations. For example, some parts of a country or state might have different voting tendencies, as in Canada between the English speaking portion and the French speaking portion. This is definitely the case for the recent highly partisan US voting patterns displayed by urban, black voters and by rural, white voters. In some urban voting precincts (“divisions”) there are no (or very few) registered Republicans, so elections in those places end up with zero (or very few) votes for Republican candidates.[2] Similarly, there are also examples of rural counties where no one voted for a Democrat candidate.[3]
Independent evidence can provide needed validity. Sometimes one test will generate a false result. A second or third test might get you closer to the truth. These alternatives include expert election observers, comparisons to previous elections to check for changes in patterns, Walter Mebane’s statistical tests, expert surveys, and performance indices.
It’s also possible there may all sorts of administrative or technical issues. Professor Pippa Norris lists quit a few possibilities in her book “Why American Elections are Flawed.” This book is short, easy to read, and available electronically or in paperback: I highly recommend it. Her analysis is followed by some practical suggestions on how to fix the system. Her new book, “Strengthening Electoral Integrity,” expected out by the end of August, provides still more practical solutions. It’s tempting to write a follow-up blog on her diagnosis and proposals sometime soon.
Want to Learn More?
Alvarez, R.M., Hall, T.E. and Hyde, S.D. eds., 2009. Election Fraud
Detecting and Deterring Electoral Manipulation. Brookings Institution Press.
Enikolopov, R., Korovkin, V., Petrova, M., Sonin, K. and Zakharov, A., 2013. Field experiment estimate of electoral fraud in Russian parliamentary elections. Proceedings of the National Academy of Sciences, 110(2), pp. 448-452.
Kapustenko, O., 2011. Theory and practice of falsified elections. Statistical Institute for Democracy.
Klimek, P., Yegorov, Y., Hanel, R., and Thurner, S. 2013. Statistical detection of systematic election irregularities. Proceedings of the National Academy of Sciences, 109(41), pp. 16469–16473.
Norris, P., 2016. Why American elections are flawed (and how to fix them). Cornell University Press.
End Notes:
[1] This would have to be done after all votes are counted but before the final results are officially reported.
[2] See cases in Cleveland and Philadelphia.
[3] Richard Pildes, 2016. Obama Got Zero Votes in At Least 38 Precincts in 2012. Election Law Blog.
Pingback: Cyber Attacks and Election Integrity | Free, Fair, and Accountable