How Artificial Intelligence Optimizes Decisions in Uncertainty (Part 1)

IN Machine Learning — 29 March, 2017

Harvard Business Review recently published an article called “You Can’t Make Good Predictions Without Embracing Uncertainty” about the impact of uncertainty on making good decisions. While the article focused more on strategic decisions and how to incorporate uncertainties into the various influencing factors to come to a better decision, the article sparks a wider question: Why is it so difficult for us humans to understand probabilities and express ourselves in them, rather than reverting to (pseudo) certainty?

Understanding probabilities

Ben Orlin in his blog "Maths with bad drawings" effectively illustrates how difficult interpreting probabilities can be in our daily life. The actual meaning of probability between 0 and 1 (or 0 and 100%) is shown in the first illustration. Small probabilities indicate that the event is unlikely to happen. As the number gets closer to one, the probability that the event will happen grows. However, the perception of probability is difficult to grasp. Take the weather forecast for example: Up to a probability of 50% we tend to dismiss it, jumping to accepting that it’s going to rain at this mark, but in reality, anything in the region of 50% comes with a big “maybe”. Depending on the respective community or profession, using words to describe probabilities vary from what the numbers actually mean, as Orlin illustrates in the example of the local news anchor.

Embracing Uncertainty

But why do we find thinking in probabilities so difficult? One reason is that our experience is mainly formed by deterministic patterns, which is how we learn about the world from the day we are born: Things always fall down, our toys always have the same shape and properties, irrespective of where we play with them. We don’t get to experience probabilities from a very young age — our building blocks are yellow, but sometimes they turn red and other times blue. As young children, we don’t really care about the probability of rain and it doesn’t affect how we act. Hence we don’t interpret the world as probabilistic, even though the fundamental laws of Nature can only be expressed in terms of probabilities.

Probabilities in real-life scenarios

probability distribution

Even our daily life is ultimately governed by probabilities. Imagine that five people leave their home in the morning and plan to buy one or two items of a specific product. At the end of the day, we can record that seven items have been sold at the particular store our fictitious customers visited that day. Now we repeat the same day, over and over, as a thought experiment, keeping everything the same: The same five customers with the same plan to visit the same store buy the same quantity of the same product. But now we add small, seemingly random deviations in each variation. Maybe a meeting runs longer and one customer doesn’t make it to the shop or forgets about it. Maybe there’s a promotion for this specific article and some customers spontaneously decide to buy more than the planned amount. Maybe there’s a promotion for a different but similar article and some customers decide to buy that instead. There are endless small variations that lead to a different outcome for the same scenario. So even for a seemingly simple case, we have to deal with many different numbers. The same day cannot be repeated over and over again with the same starting conditions as in our thought experiment.

However, we can look at the behavior in closely-related situations and find the same thing. Instead of a single deterministic number, we have to deal with a distribution of numbers. Looking into the past, such as previously recorded sales events, this distribution is called a frequency distribution. This distribution can also be interpreted differently: The distribution captures all details about a specific sales situation, e.g. the sales of a specific product in a specific store on a specific day and describes the probability, that one or more units of this product will be sold under a specific set of circumstances. The distribution becomes a probability (density) distribution. Such a distribution could contain all information about, for example, the future sale of a specific product in a particular setting, including information from a wide range of influencing factors such as the day of the week, store, promotion, holidays, price, seasons, etc. This distribution can also contain all information about the future behavior, e.g. the expected number of sales (i.e. the mean), the most likely number of sales (i.e. the mode) as well as the expected range of values (volatility), which is determined by the width of the distribution. The distribution also contains information about tails and outliers.

Obtaining the probability distribution

Having all information about future behavior available in a single predicted probability distribution is critical to transforming this prediction into an optimized decision. But where does this predicted distribution come from? Traditional forecasting methods are usually only able to predict a single number, such as the expected number of products that will be sold on a specific day in a specific store. However, this expectation value (another word for mean of the distribution) is just one number and, as we’ve just seen, cannot capture the complexities of the future events. In fact, it doesn’t allow for any optimization. This is where Artificial Intelligence (AI) and modern machine learning approaches such as NeuroBayes bring in their full power: These applications are able to analyze billions of data points and extract all relevant information. This condensed “expertise” can then be used to predict a complete probability density distribution instead of a single number for each of tomorrow’s events.

Using probability distribution to optimize decisions

At first glance, predicting a full probability distribution instead of, say, the expected sales of a specific product, adds a layer of complexity. Instead of being able to use this number directly in the operational layer, one first has to choose a specific number (or quantile), but this is a crucial step to optimizing business potential. Rather than being stuck with a single number, being able to choose the “best” number allows to take further key objectives into account, such as which service level should be achieved? How does the ordering decision affect Key Performance Indicators such as stock-out rate, capital lock-up, waste or even medium-term raw profit? What is the impact of strategic decisions on operational issues?

How are these different options evaluated?

The figure below shows a typical probability distribution, for example, for the predicted sales behavior of a specific product on a specific day in a specific store, derived from machine learning and AI algorithms using large data sets.

probability density distribution

In spite of tomorrow’s uncertainty, a hard decision has to be made today as to how many articles to order in this situation. Each specific choice derived from the complete distribution is associated with a specific outcome. For example, choosing a 95% quantile would satisfy 95% percent of the expected demand for such a product (or, in other words, demand would exceed this number in only 5 of 100 identical days). However, as the bulk of the distribution is below this quantile due to its long tail, stocking this many products will likely result in high capital cost or significant waste. Being able to evaluate all the different future options, each associated with a specific probability, allows full exploration of the impact of all operational decisions with scientific precision, for every article in every store, every day. And the optimal choices may be very different, depending e.g. on shelf life and production price.

Our process can align all  operative decisions with the overall business strategy. And it has the beautiful capability of being able  to improve two or more competing KPIs simultaneously, where global strategic decisions can improve one KPI only at the expense of the other.

The next post will focus implementing optimal decisions in the daily operations of a business in more depth.

Dr. Ulrich Kerzel Dr. Ulrich Kerzel

earned his PhD under Professor Dr Feindt at the US Fermi National Laboratory and at that time made a considerable contribution to core technology of NeuroBayes. He continued this work as a Research Fellow at CERN before he came to Blue Yonder as a Principal Data Scientist.