How do statistical processes and machine learning complement one another? Can software really learn by itself? Can the future be foreseen? Our colleague, data scientist Dr. Florian Wilhelm, spoke about these subjects with mathematician PD Dr. Gudrun Thäter, head academic advisor at the Karlsruhe Institute for technology (KIT), as part of a german podcast .
PD Dr. Gudrun Thäter sees big data as an "unordered mountain of stuff; a bit useless." Predictive analytics sounds better. We also know where that is headed: forecasts that practitioners from business and industry can actually use. Sales forecasts for materials planners, an estimation of the behavior of specific customer groups, and the effectiveness of sales actions. Your discussion partner, senior data scientist Dr. Florian Wilhelm, describes how predictive analytics works. On the one hand, there is a heuristic process like neural networks and on the other hand, statistical models. Blue Yonder combines both approaches to attain optimal forecasts.
Statistical models can be used if one knows a lot about the subject to be analyzed. Materials planners, for example, know the relevant relationships from their own experience: In warm weather, people buy barbecue and coals, and women often also will put a salad or two in their grocery cart. Taken together, all these experiences can be packaged into models. Data scientists estimate the actual sales and with that the demand by having input variables, i.e. explanatory variables, defined for a model or a heuristic process. The determination of input variables for a model is always "trial and error" along with the actual experience, says Dr. Wilhelm.
Enormous data volumes in the "black box”
The connection between cause (the input variables) and effect is always obvious in the statistical processes. But in machine learning, which is often based on neural networks, that is much harder to see. The software is fed with enormous data volumes as a "black box," and processes them in a self-learning way and makes ever more accurate forecasts. The forecasts are excellent, but "you really don't know what is happening inside," says Dr. Wilhelm. To get user groups such as materials planners to trust the solution, it is important to explain the hidden relationships to them.
In machine learning, if the data volume is not too big, the danger arises that the software will work too hard. It simply learns the entire input with the coincidental deviations "by heart" and then projects what it has learned into the future. Data scientists thus train the algorithms so that they recognize irrelevant input variables and then ignore them.
A total of 120 team members, mostly physicists, are working together at sites in Karlsruhe and Hamburg, Germany, and London to harmonize the various processes and to find the individual best solution to each application case. Now and then they have a bit of a hard time with the low quality of the delivered data, because it can occur that it is not sufficient for a clear forecast.
Dr. Florian William stresses that predictive analytics merely forecasts the probability that a certain result will occur. "When the probability is high that a certain country will win the World Cup, that does not mean that the team really will raise the Cup at the end. Chance always plays a role. And we can't predict that."