Inventory optimization headache and how to approach it with data science
How is your inventory optimization doing? At best, it needs a bit of improving. At worst, it is your headache. Anyway, in both cases, you may want to know the recent achievements in the field. And that’s what our data science team can show you in this overview of different approaches to inventory optimization.
Highlights:
 Approaches to inventory optimization
 How deep learning works in detail
 Why deep learningbased inventory optimization needs data scientists
Approaches to inventory optimization
Though we are data science evangelists, we don’t claim that it’s a silver bullet. Data science shows splendid results only if applied wisely and to the purpose. Below, we describe three vastly different approaches to inventory optimization, whose efficiency varies dramatically. We explain these approaches from the perspective of retail, still manufacturers and distributors can use them, too.
1. Adding a safety buffer to the forecasted demand figures
This is the simplest, yet the least efficient approach. Say, you apply the most advanced data science method to forecast demand. Your most intricate predictions for a certain SKU tell you that tomorrow you’ll sell 3,286 bottles of milk X in a particular store Y. But you keep in mind that demand is always uncertain and potentially you’ll be able to sell either more or less. So, you decide to deal with this demand uncertainty by letting your category managers add a safety buffer to the forecasted figures.
The category manager decides to add 15% to the projected quantity. As a result, they order 3,779 bottles not having weighed holding costs against outofstock costs. Later, the amount of milk you manage to sell turns out to be close to the initial forecast and the company ends up with overstock. What’s more, holding costs for this milk turn out to be high. This doesn’t sound like a success story in inventory management. Naturally, such safety buffers are not the way to go since they make companies dependent on their category managers’ gut feeling.
2. Calculating optimal inventory based on the known probability distribution of demand
Contrary to the previous approach, this method considers the costs of stock holding and stock shortage. However, demand uncertainty is still a challenge, which companies try to solve by calculating demand probability distribution.
To understand probability distribution, we need to know one more term – standard deviation (σ). This parameter allows us to measure uncertainty. Look at the picture below to check the example. With the demand probability distribution depicted there, 68% of all the possible demand values for our milk fall in the range from 2,995 to 3,695 bottles, which is +/ one standard deviation from the mean. At the same time, more than 95% of possible demand values fall within +/ two standard deviations. Thus, when the standard deviation is small, we have a narrow range of demand figures with high certainty. When the standard deviation is large, the range is also large and the certainty is low.
Now, when the theory part is over, let’s jump to the formula that will tell us the optimal inventory level for our milk:
Q_{opt} = f^{1}((pc)/p)= µ+σZ^{1}×((pc)/p), where
 p – retail price, $4.96 per bottle.
 c – purchase price, $3.89.
 f – probability distribution.
 µ – mean demand, 3,345.
 σ – standard deviation, 350.
 z – percentage of demand values that fall within the range from µ to 1σ, 0.3413 (or 34%).
Then, Q_{opt }= 3,345+350×1/0.3413×((4.96 3.89)/4.96) = 3,571 bottles.
It’s possible also to use demand forecast instead of mean demand. Then, we replace a voluntary buffer with probability distribution. Either way, we can’t say this approach is flawless. It would be good if probability distribution were always known and stable. In reality, we can only assume that demand distribution takes a certain form. That is why, despite serious math behind, predictions based on such assumptions can turn out to be rather good but still far from perfect.
3. Calculating optimal inventory based on deep learning
With a deep neural network (DNN) at its core, this approach is the most promising one, as it allows solving an inventory optimization task more effectively than other approaches do. According to this approach, we skip the demand forecasting stage and calculate inventory directly. This means that you won’t have to make wild guesses about demand probability distribution. Instead, a DNN will scrutinize your most detailed historical sales data and will consider multiple diverse factors that range from product promotions to store locations, weather conditions, etc. Then it will apply a loss function that will weight holding costs against shortage costs for the projected inventory figure to return the optimal one.
How exactly deep learning works in detail
To understand why deep learning shows splendid performance that leaves all the other methods far behind, let’s take a closer look at how a deep neural network works.
DNN architecture
DNNs have a complex architecture with numerous layers that consist of neurons (or ‘nodes’). The neurons of one layer are connected with the neurons of the layer that follows. At each layer, certain coefficients (or ‘weights’) are applied to the values produced by the neurons of the previous layer. So, to have accurate predictions, it’s critical to have the weights tuned right.
A single neuron can identify multiple linear or nonlinear dependencies. Several neurons can identify more complex dependencies, such as exponential growth or decline, surges and temporary falls, waves, etc. The more complex the dependencies are the more neurons are required.
The fairly simple example of a DNN illustrated above has 58 weights, which is not that much. A complex DNN may have hundreds of thousands of them. However, you don’t have to configure, say, all 200,000 weights manually – DNNs do this for you. They are considered the most advanced machine learning methods, and it's rightly so.
To get its intelligence, a DNN needs your historical sales figures split by SKU and by store to see inputs. Then, the network assigns random weights and comes up with an output, which is the inventory level for the SKU or the category. After that, the network applies a loss function that calculates the difference between this output with the one from the data set used for training.
Besides, loss function weights holding and shortage costs to balance the risks of outofstock and overstock. In our example with milk, holding costs will be high as the product is perishable. However, shortage costs may be higher as your customers are likely to feel frustrated if they don’t find this milk on your shelves. As long as the loss function remains large (in other words, the optimal balance between holding and shortage costs isn’t found), the network keeps reassigning the weights to minimize the error.
Do you think that your business task is unlikely to require 200,000 weights? Let’s look closer at this ‘terrifying’ figure and calculate the odds of ending up with this many connections.
An example to explain DNN essentials
Say, you are a retailer who uses deep learning (the 3rd approach we listed) to optimize inventory for each product category. To make the predictions as accurate as possible, you want the network to consider store types, seasonality and the influence of promotions and holidays. With this in mind, your input data can easily look like this:
Factors to analyze  What each factor reflects  Number of neurons for the input layer 

14 previous days’ sales figures  Latest trends  14 
Week of the year  Seasonality  52 (according to the number of weeks in a year) 
Day of the week  Weekly demand variations, the influence of holidays (for holidays bound to a weekday, say, Thanksgiving)  7 (according to the number of days in a week) 
Category  Patterns specific to a certain product category  200 (according to the number of product categories in your portfolio) 
Store type  Patterns specific to a certain store type  5 (according to the number of store types you operate) 
Promotion  The influence of promotion  1 (Yes or No) 
Holidays  The influence of holidays not bound to a weekday, say Christmas, Independence Day, St. Valentine’s Day, etc.  1 (Yes or No) 
Total number of input neurons: 280 
Only for the input level, you need to have 280 neurons. But there are also several hidden layers (3 in our example) and an output layer. And all of them will need their own sets of neurons too.
Layer  The coefficient applied to the input layer to change the number of neurons in the layers to follow (assigned by data scientists) 
Number of neurons 

Input layer  280  
Hidden layer 1  1.5  420 
Hidden layer 2  1  280 
Hidden layer 3  0.5  140 
Output layer  1 
With such inputs, our DNN has 280x420 + 420x280 + 280x140 + 140x1 = 274,540 weights.
When the neural network is trained and the weights are tuned, the DNN is ready to generate forecasts. To do that, the net ingests all the input values and applies weights to them. Then, these transformed values make their way to each neuron, where they are summed up and a certain activation function is applied to the sum. The output values of this little manipulation become the inputs for the next level, and the data flow continues through all the layers until it boils down to the output value – your optimal daily inventory level for your dairy category.
Advantages of deep learning you can enjoy:

Predicting based on diverse data
DNNs are capable of consuming both numerical and categorical values. In other words, they can successfully ‘ingest’ sales values, as well as days of the week, product categories, store types, etc.

Capturing complex nonlinear dependencies
DNNs can capture the relationships where output doesn’t vary in proportion to input. Thus, this approach creates a more exact picture of the real world as the dependencies are rarely linear in a business environment.

Providing an unbiased result
Contrary to the first two methods we described, deep learningbased approach doesn’t rely on either safety buffers or risky assumptions on probability distribution. If all the DNN settings were chosen properly, you may be sure that a DNN returns a prediction, which is precise, objective and purely algorithmdriven.
Limitations of deep learning you should be aware of:

Inability to take in factors on its own
If you don’t instruct your DNN to analyze some factor, the network won’t know that this factor influences the outcome. For example, a fashion retailer who forgets to mention weather as one of the inputs to the DNN can risk ending up with an excessive stock of warm clothes. The reason is simple: a DNN won’t have the chance to see that weeks 4548 showed a spike in demand for warm clothes last year because that November happened to be much colder than usual. Consequently, the DNN will perceive it as a seasonal spike and will include it in the forecast although this November is much warmer.

Dependence on the amount of data
If you don’t have enough data, a DNN doesn’t have enough materials to learn from. The more factors you want to cover, the more inputs and weights you’ll have, and the more data you need. Look at how an extra factor – store location – influences the amount of required data. Quite a drastic change, right?
Scenario 1  Scenario 2  

Factors  Inputs  Factors  Inputs 
Weeks  52  Weeks  52 
Store type  5  Store type  5 
Category  200  Category  200 
Store location  135  
Number of data records required to train the network  231,300  Number of data records required to train the network  538,020 

Dependence on data quality
If your data is extremely noisy, a DNN won’t be able to convert multiple unusual or erroneous observations into precise predictions. After all, DNNs don’t do magic – they merely see existent patterns that influence current data and predict future data based on them.
Why exactly deep learningbased inventory optimization needs data scientists
DNNs are powerful but it doesn’t mean they are selfcontained. For DNNs to perform faultlessly, you need professional data scientists to:
 Define factors that influence demand.
 Understand the nature of demand, for example, whether it’s seasonal, weekdaydependent, promotiondriven, with the growing/falling trend, etc.
 Distinguish among the approaches to treating shortage and holding costs for different product groups, for example, for perishable and nonperishable products.
 Choose relevant activation and optimization functions that will deliver the most accurate predictions and won’t require weeks to train the network.
 Define the applicability scope for a model, which is a trained neural network with defined hyperparameters. For example, the behavior of new products is different from that of ‘tried and true’ goods and it may require another model to accurately predict inventory level.
 Find the most efficient approach to configuring neural network parameters out of millions of possible combinations.
 Test different models and choose the best one. Sometimes, they need to run several DNNs with differently tuned parameters to assess the accuracy of each and select the model that returns the most accurate results.
 Fight overfitting. An overfitted model fails to differentiate signals from noise and describes the latter rather than real dependencies among the variables. You wouldn’t be satisfied if your algorithms returned superaccurate predictions while they were training on historical data but failed to work properly on new incoming data. Data scientists will find the ways to ensure the network’s stellar performance on new data.
 Remove unusual or erroneous observations from your data, thus turning it into highquality data sets.
So, how to relieve inventory optimization headache most effectively
Out of three pills we described, we recommend you to take the last one – with deep learning flavor. Our data scientists consider this method the most effective for solving inventory optimization tasks. This approach ensures the most accurate and reliable predictions, thanks to analyzing diverse data, capturing complex nonlinear dependencies and calculating inventory directly. Besides, it’s applicable to both raw materials and finished goods inventory.