I remember reading about how poorly a Windows 95 trial group was doing with their early copy. It seemed that they could not figure out why computers were crashing so badly and why users were complaining so much. It turns out, it was a difference in the assumption of how it would be used vs. how it was actually being used.
The engineers who had designed the product had made the assumption that since they always followed the same procedure of properly shutting down the PC that all users would. Users, on the other hand, had just gotten used to turning things off, like how they turned off their TV.
In this case, the assumptions used were incorrect, producing wildly different results. In a way, this is sort of the same thing that badly missed forecasting a Trump victory….all big data calculations have some assumptions built into it, and when the underlying assumptions are wrong, things can be a lot different than forecasted.
One of the things that most people who like numbers (like me) find appealing is the relative predictability. The numbers are always the same as are the formulas. However, two people may look at the same thing, say an investment based on numbers, and come up with a totally different prediction. While the underlying math does not change, what does change is the assumptions…..will the market be hotter than expected, might their be a change in the economy, stuff like that. When you take those things into account, it makes sense that there is a huge possibility for differences in opinion and a wide variance in potential outcomes.
Take a traffic system….The traffic patterns of a city often have a relative consistency to them. So, at 8 am each weekday morning, the traffic picks up, so the traffic lights will stay green a bit longer in one direction to better handle the traffic and at 9 am, they go back to normal. Under regular conditions, this pattern works well…but what happens when an unexpected variable happens?
What if there is a water main burst, forcing traffic in another direction? What if a school changes its start time, forcing hundreds of cars into an earlier time slot? What if construction changes the pattern by slowing down drivers up the road? Some of these changes are short term but others may have a longer impact…
When you add these things in, your first assumptions may be wildly off, or they may not make a noticeable difference. This is the problem with Big Data…..we can be perfectly precise when the data has no variables, but little in life is like that.
This is the case with Trump’s win….models were based on traditional levels of turnout by certain demographics, but in this case, Trump’s supporters turned out much more than expected and Clinton’s turned out slightly less. When you factor those numbers in, the models would have likely had the race correctly, but that is only in hindsight.
What does this all mean? It means that while Big Data has had some incredible impacts that have helped our day to day life, it is not perfect. Life simply has too many assumptions that have to be made and too many variables that can happen for things to totally be run on Big Data. In the case of an election, emotion and passion cannot be properly calculated, and it was a variable that likely might have changed things altogether. With the correct variables, Clinton may have spent more effort in Michigan/Wisconsin, but since she relied on the models, she fatally did not…