Inventory Management

Machine Learning for Retail Demand Forecasting

Comparative study of Demand Forecasting Methods for a Retail Store (XGBoost Model vs. Rolling Mean)

Article originally published on Medium.

💌 New articles straight in your inbox for free: Newsletter

I. Demand Planning Optimization Problem Statement

For most retailers, demand planning systems take a fixed, rule-based approach to forecast and replenishment order management.

(1) Demand Planning Optimization Problem Statement — (Image by Author)

Such an approach works well enough for stable and predictable product categories but can show its limits regarding Inventory and Replenishment Optimization.

This potential optimization can reduce operational costs by:

Inventory Optimization: matching store inventory with actual needs to reduce storage space needed (Rental Costs)
Replenishment Optimization: optimizing replenishment quantity per order to minimize the number of replenishments between warehouse and stores (Warehousing & Transportation Costs)

Example: Retailer with 50 Stores

For this study, we’ll take a dataset from the Kaggle challenge: Store Item Demand Forecasting Challenge.

Scope

Transactions from 2013–01–01 to 2017–12–31
913,000 Sales Transactions
50 unique SKU
10 Stores

(Update) Improve the model

I have been working on an improved version of the model and I share my insights in another article with the full code shared in this repository.

🔗

Github Repository: Link

The goal is to understand the impact of adding business features (price change, sales trend, store closing, …) on the accuracy of the model.

II. XGBoost for Sales Forecasting

The initial dataset has been used for a Kaggle Challenge where teams were competing to design the best model to predict sales.

The first objective here is to design a prediction model using XGBoost; this model will be used to optimize our replenishment strategy ensuring inventory optimization and reducing the number of deliveries from your Warehouse.

1. Add Date Features

2. Daily, Monthly Average for Train

3. Add Daily, Monthly Averages to Test and Rolling Averages

4. Heat Map to check correlation

Pearson Correlation Heatmap — (Image by Author)

Let us keep the monthly average since it has the highest correlation with sales; and remove other features highly correlated to each other.

5. Clean features, Training/Test Split and Run model

6. Results Prediction Model

Prediction vs Actual Sales — (Image by Author)

Based on this prediction model, we’ll build a simulation model to improve demand planning for store replenishment.

DataFrame Features

date: Transaction date
item: SKU Number
store: Store Number
sales: Actual value of sales transaction
sales_prd: XGBoost prediction
error_forecast: sales_prd — sales
repln: boolean value for replenishment days (if the day is in [‘Monday’, Wednesday’, ‘Friday’, ‘Sunday’] return True)

III. Demand Planning: XGBoost vs. Rolling Mean

1. Demand Planning using Rolling Mean

The first method to forecast demand is the rolling mean of previous sales. At the end of Day n-1, you need to forecast demand for Day n, Day n+1, Day n+2.

Calculate the average sales quantity of the last p days: Rolling Mean (Day n-1, …, Day n-p)
Apply this mean to sales forecast of Day n, Day n+1, Day n+2
Forecast Demand = Forecast_Day_n + Forecast_Day_(n+1) + Forecast_Day_(n+2)

Demand Forecast Using Rolling Mean — (Image by Author)

2. XGBoost vs. Rolling Mean

With our XGBoost model on hand, we have now two methods for demand planning with Rolling Mean Method.

Let us try to compare the results of these two methods on forecast accuracy:

Prepare Replenishment at Day n-1
We need to forecast replenishment quantity for Day n, Day n +1, Day n+2
XGB prediction gives us a demand forecast
Demand_XGB = Forecast_Day(n) + Forecast_Day(n+1) + Forecast_Day(n+2)
Rolling Mean Method gives us demand forecast
Demand_RM = 3 x Rolling_Mean(Day(n-1), Day(n-2), .. Day(n-p))
Actual Demand
Demand_Actual = Actual_Day(n) + Actual_Day(n+1) + Actual_Day(n+2)
Forecast Error
Error_RM = (Demand_RM — Demand_Actual)
Error_XGB = (Demand_XGB— Demand_Actual)

A methodology using XGBoost and Rolling Mean — (Image by Author)

a. Parameter tuning: Rolling Mean for p days

Before comparing Rolling Mean results with XGBoost; let us try to find the best value for p to get the best performance.

Minimum Error with Rolling Mean — (Image by Author)

Results: -35% of error in forecast for (p = 8) vs. (p = 1)

Thus, based on the sales transactions profile we can get the best demand planning performance by forecasting the next day's sales by using the average of the last 8 days.

b. XGBoost vs. Rolling Mean: p = 8 days

Error XGBoost vs. Rolling Mean — (Image by Author)

Results: -32% of error in the forecast by using XGBoost vs. Rolling Mean

Forecast error by (axis-x: Store Number, axis-y: Item Number, axis-z: Error) — (Image by Author)

IV. Conclusion and next steps

1. Conclusion

Using the Rolling Mean method for demand forecasting, we could reduce forecast error by 35% and find the best parameter p days.

However, we could get even better performance by replacing the rolling mean by XGBoost forecast to predict day n, day n+1 and day n+2 demand reducing error by 32%.

2. Next steps

Inventory level: using XGBoost forecast to match inventory quantity with demand
Order Frequency: reduce order frequency using our forecasting model to match order quantity with demand

About Me

Let’s connect on Linkedin and Twitter, I am a Supply Chain Engineer that is using data analytics to improve logistics operations and reduce costs.

If you’re looking for tailored consulting solutions to optimize your supply chain and meet sustainability goals, please contact me.

References

[1] Kaggle Dataset, Store Item Demand Forecasting Challenge, Link