# Machine Learning for Retail Demand Forecasting

Comparative study of Demand Forecasting Methods for a Retail Store (XGBoost Model vs. Rolling Mean)

Comparative study of Demand Forecasting Methods for a Retail Store (XGBoost Model vs. Rolling Mean)

*Article originally published on Medium. *

ðŸ’Œ New articles straight in your inbox for free: __Newsletter__

### I. Demand Planning Optimization Problem Statement

For most retailers, demand planning systems take a fixed, rule-based approach to forecast and replenishment order management.

Such an approach works well enough for stable and predictable product categories but can show its limits regarding Inventory and Replenishment Optimization.

This potential optimization can reduce operational costs by:

matching store inventory with actual needs to reduce storage space needed**Inventory Optimization:****(Rental Costs)**optimizing replenishment quantity per order to minimize the number of replenishments between warehouse and stores**Replenishment Optimization:**&**(Warehousing****Transportation Costs)**

**Example: Retailer with 50 Stores**

For this study, weâ€™ll take a dataset from the Kaggle challenge: Store Item Demand Forecasting Challenge.

Scope

- Transactions from
to**2013â€“01â€“01****2017â€“12â€“31** Sales Transactions**913,000**unique SKU**50**Stores**10**

### (Update) Improve the model

I have been working on an improved version of the model and I share my insights in another article with the full code shared in this repository.

*Github Repository:***Link**The goal is to understand the impact of adding business features (price change, sales trend, store closing, â€¦) on the accuracy of the model.

## II. XGBoost for Sales Forecasting

The initial dataset has been used for a Kaggle Challenge where teams were competing to design the best model to predict sales.

The first objective here is to design a prediction model using XGBoost; this model will be used to optimize our replenishment strategy ensuring inventory optimization and reducing the number of deliveries from your Warehouse.

### 1. Add Date Features

### 2. Daily, Monthly Average for Train

### 3. Add Daily, Monthly Averages to Test and Rolling Averages

### 4. Heat Map to check correlation

Let us keep the monthly average since it has the highest correlation with sales; and remove other features highly correlated to each other.

### 5. Clean features, Training/Test Split and Run model

### 6. Results Prediction Model

Based on this prediction model, weâ€™ll build a simulation model to improve demand planning for store replenishment.

DataFrame Features

Transaction date**date:**SKU Number**item:**Store Number**store:**Actual value of sales transaction**sales:**XGBoost prediction**sales_prd:**sales_prd â€” sales**error_forecast:**boolean value for replenishment days (if the day is in [â€˜Mondayâ€™, Wednesdayâ€™, â€˜Fridayâ€™, â€˜Sundayâ€™] return True)**repln:**

## III. Demand Planning: XGBoost vs. Rolling Mean

1. Demand Planning using Rolling Mean

The first method to forecast demand is the rolling mean of previous sales. At the end of Day n-1, you need to forecast demand for Day n, Day n+1, Day n+2.

- Calculate the average sales quantity of the last p days: Rolling Mean (Day n-1, â€¦, Day n-p)
- Apply this mean to sales forecast of Day n, Day n+1, Day n+2
- Forecast Demand = Forecast_Day_n + Forecast_Day_(n+1) + Forecast_Day_(n+2)

### 2. XGBoost vs. Rolling Mean

With our XGBoost model on hand, we have now two methods for demand planning with Rolling Mean Method.

Let us try to compare the results of these two methods on forecast accuracy:

**Prepare Replenishment at Day n-1**

We need to forecast replenishment quantity for Day n, Day n +1, Day n+2**XGB prediction gives us a demand forecast**

Demand_XGB = Forecast_Day(n) + Forecast_Day(n+1) + Forecast_Day(n+2)**Rolling Mean Method gives us demand forecast**

Demand_RM = 3 x Rolling_Mean(Day(n-1), Day(n-2), .. Day(n-p))**Actual Demand**

Demand_Actual = Actual_Day(n) + Actual_Day(n+1) + Actual_Day(n+2)**Forecast Error**

Error_RM = (Demand_RM â€” Demand_Actual)

Error_XGB = (Demand_XGBâ€” Demand_Actual)

**a. Parameter tuning: Rolling Mean for p days**

Before comparing Rolling Mean results with XGBoost; let us try to find the best value for p to get the best performance.

** Results: **-35% of error in forecast for (p = 8) vs. (p = 1)

Thus, based on the sales transactions profile we can get the best demand planning performance by forecasting the next day's sales by using the average of the last ** 8 days**.

**b. XGBoost vs. Rolling Mean: p = 8 days**

** Results: **-32% of error in the forecast by using XGBoost vs. Rolling Mean

## IV. Conclusion and next steps

1. Conclusion

Using the Rolling Mean method for demand forecasting we could reduce forecast error by** 35%** and find the best parameter

**.**

**p days**However, we could get even better performance by replacing the rolling mean by XGBoost forecast to predict day n, day n+1 and day n+2 demand reducing error by **32%.**

### 2. Next steps

using XGBoost forecast to match inventory quantity with demand**Inventory level:**reduce order frequency using our forecasting model to match order quantity with demand**Order Frequency:**

### About Me

Letâ€™s connect on Linkedin and Twitter, I am a Supply Chain Engineer that is using data analytics to improve logistics operations and reduce costs.

## References

[1] Kaggle Dataset, Store Item Demand Forecasting Challenge, Link