Through my work with customers and data scientists building predictive applications based on the Blue Yonder Platform and integrating our standard products, such as Forward Demand, into customer workflows, I’ve encountered a few common prediction pitfalls. While these might not ruin the scientific validity of the model, they may affect the practicality of and the applicability to the business case.
Predictive applications require applied data science and cannot live outside of their application domain. Watching out for these common prediction pitfalls will help you deliver predictive applications that are relevant to the business and help your customers make (or save) money:
- Forecast what you can control. In a predictive application, forecasts serve to make automated or semi-automated decisions. This means that for the best results, align the decision horizon and granularity as well as the prediction horizon and granularity as much as possible. For instance, if a predictive application, such as Forward Demand, is controlling a replenishment workflow for slow-moving goods in which deliveries arrive on a weekly basis, making daily demand forecasts will not create any benefit to the business and might actually reduce value by reducing statistical significance of the model and shortening the prediction horizon. If your deliveries are occurring on a weekly basis then you need weekly predictions.
- Beware of false aggregations. Powerful predictive applications, such as Forward Demand, not only offer simple forecasts but they can also predict the individual probability of each single event, taking into account a wide range of influencing factors. In fact, all predictive applications based on NeuroBayes come with this capability. A good example is the prediction for failure of a machine part vs. failure of the entire machine. A dual-engine machine has an estimated 95th-percentile failure date of May 1st for one engine and May 30th for the second engine. Correctly predicting the expected failure date of the machine is not a simple aggregation (May 15th) but either requires knowledge of the inner workings of the machine or a prediction model that handles the entire machine.
- The absence of proof is not the proof of absence. Consider a demand forecasting scenario: when analyzing individual point of purchase data, no purchases can be seen in the second half of the week. Treating the absence of purchases as proof for the absence of demand could lead to costly mistakes. A different explanation might be that stores are running out of stock in the middle of the week, so that any demand in the second half of the week goes unsatisfied. In this case, treating the absence of proof as the proof of absence will lead to severe out-of-stock situations with huge opportunity costs.
- Consider the cost of uncertainty. Speaking of cost and opportunity costs, many predictive applications must make decisions in the light of hard tradeoffs and with great uncertainty. Let’s assume we are scoring the fraud risk for an online customer. Providing the customer with the ability to pay by invoice sets us up for the risk of losing the entire value of the merchandise purchased. On the other hand, denying the customer the ability to pay by invoice might lead to the customer abandoning the shopping cart. Thereby losing us the opportunity to win a customer with considerable customer lifetime value. In order to make decisions like these on a regular, automated basis, multiple predictions need to be combined: fraud risk, shopping cart abandonment risk and customer lifetime value. Leaving these factors and uncertainties unaccounted for can lead to a gross misrepresentation of risk, typically resulting in retailers' loss of revenue as fraud risks are overestimated.
- There are many wrong ways to calculate forecast quality. As soon as a predictive model is in place, the question of accuracy of the model comes up. Unfortunately, many intuitive approaches to assessing forecast quality can lead business customers astray. Simply comparing the predictions for a few top sellers can lead to both an overestimation of forecast errors (look, the model has been off by more than 1.000 units) while underestimating the overall accuracy across the entire portfolio (only 2% deviation). The topic of selecting and designing the right measure of quality is one of the key issues of consideration when building predictive applications. So much so, that my colleagues Michael Feindt and Ulrich Kerzel actually wrote a (German) book to help data scientists and business users navigate the maze of forecast quality metrics.
Building and deploying predictive applications has an incredible and transformative impact on businesses and helps them accelerate their transition to becoming a data-driven enterprise, but it is not without challenges. Steering clear of these five pitfalls will help you to get up and running faster and deliver better results.
More about predictive applications and their use can be found here: