What is Process Mining?

Process mining is the application of data analytics tools and concepts to understand, validate and improve workflows.

What is Process Mining?

Article originally published on medium.

Process mining is a type of data analytics that focuses on the discovery, monitoring, and improvement of business processes.

It involves analyzing data from various sources, such as process logs, to understand how a process is actually being executed, identify bottlenecks and inefficiencies, and suggest ways to improve the process.

Supply Chain Information Systems — (Image by Author)

In previous articles, I shared examples of python tools designed to monitor and visualize logistics performance, build an automated supply chain control tower or create smart visualizations.

In this article, we will go beyond monitoring and see how process mining with python can help you to identify bottlenecks and improve productivity.

💌 New articles straight in your inbox for free: Newsletter

What is Process Mining?

Monitor and improve processes with data

Process mining techniques can be applied to many processes, including manufacturing, supply chain management, healthcare, and customer service.

By analyzing data about the process, process mining can help organizations understand how their processes function, identify areas for improvement and make informed decisions about optimising their processes.

Process Mining for Customer Service — (Image by Author)

For example, in this article, we explore a statistical method to estimate the lead time of order processing by a customer service team.

Process Mining Approaches

There are several different approaches to process mining, including discovery, conformance, and enhancement.

  • Discovery involves using process mining techniques to uncover the structure and behaviour of a process based on data from process logs.
  • Conformance involves comparing the actual process to the desired process model to identify deviations and deviations.
  • Enhancement involves using process mining to suggest improvements to the process based on data analysis.

Example: Store Delivery Lead Time

Distribution Network for Fashion Retail

Let us take the example of the Supply Chain Network of an international clothing group with stores worldwide.

This company is selling garments, bags and accessories produced in oversea factories to replenish a network of regional warehouses.

Supply Chain Network of a Fashion Retail Company — (Image by Author)

In this article, we will focus on the store's delivery from these regional warehouses.

  • Distribution planners create replenishment orders in the ERP
  • Planners include a list of items needed (with quantity) and a requested delivery date
  • Orders are transmitted to the Warehouse Management System
  • Warehouse teams prepare the orders in pallets for shipping
  • Transportation teams organise the pick-up at the warehouse
  • Shipments are delivered and received at the stores
Information Systems — (Image by Author)

Each step of the process is managed by a specific system (ERP, WMS, TMS) that records information and time stamps that we will use for process mining.

Approach 1: Discovery

a) Time stamps definitions
From the order creation to the store reception time stamps are recorded by the different systems.

Time Stamps per Process — (Image by Author)
  • Order creation time in the ERP by the distribution planner
  • Order reception time in the Warehouse Management System (WMS)(now ready to be prepared by warehouse teams)
  • Picking time(s) that can include the starting and ending time of the picking mission that contains the order
  • Packing time refers to the end of the packing process
  • Shipping time when your orders are leaving the warehouse
  • Delivery time when your truck delivers the pallets to the store
  • Store receiving time is concluding the delivery process: the time when the store’s team records the received items in the ERP

💡 Know your processes
In your company, systems can be used differently. Therefore, be sure that processes have been mapped with detailed workflows (including data inputs and outputs).

b) Lead times definitions

As timestamps alone mean nothing, we define lead times for each process by the difference between two timestamps

Lead Times Definition — (Image by Author)

They usually refer to a specific team’s responsibility

  • End-to-end lead time is measuring the performance of the whole logistics department also defined as On Time In Full (OTIF)
  • Order Transfert lead time is under the responsibility of the IT/Infrastructure team that ensures fast transmissions of orders
  • The warehouse operational teams performance is measured by Preparation Lead time
  • Shipment lead time is a grey zone as it can be impacted by warehouse teams, the finance department (invoicing) or transportation teams (find a truck for shipment)
  • Transportation lead time is entirely under the scope of transportation teams
  • Receiving lead time is the responsibility of store teams

💡 You need the support of system(s) expert(s)
Your systems may not have a time stamp for every process. But there are always alternative solutions that can be found by your infrastructure team.

For instance, there is no timestamp to show when the packing process is ending

  • But the status of the order is changing (from packing to ready to ship).
  • A script can be developed to create a timestamp when this status is changed.
  • It can be then populated in the data lake you use as a data source.

c) Data processing with Python

Using python you can connect to different systems to extract transactional records from their respective databases.

Data Processing with Python — (Image by Author)

These systems may have different fields and metrics definitions. Use the order number to merge the tables in a single data frame and calculate the different lead times.

Example of lead times — (Image by Author)

In the graph above, you can find an example of lead times plots (in minutes) that provide an overview of performance variability of order transmission, pick and pack and warehouse-airport transfer.

For more information on how to build a logistic performance dashboard, have a look at this short video,

Approach 2: Conformance

a) Delivery Lead Time: On Time In Full

The most important key performance indicator of the lead time between order creation and delivery times.

Learn more about OTIF in this video — (Image by Author)

Distribution planners use it to manage their inventory, set the safety stock and plan new collections launches.

And it will be also used by store managers to challenge the logistics department in the different supply performance reviews.

b) End-to-end delivery process

In our example, we have a delivery lead time target of 72 hours.

Example of a Delivery Order Process — (Image by Author)

The example above shows the example of delivery that meets the target lead times.

💡 Cut-off time definition
The order reception cut-off time is 18:00:00.

That means if an order is received after this time, you’ll need to wait for an additional 24h to prepare it.

They are the main cause of delayed delivery and disruptions.

c) Visualization

After defining them, you can use your python script to compare actual lead times with these targets.

Lead Times Target Plot with Python Matplotlib — (Image by Author)

In this visual, you can see that

  • For some orders, warehouse teams are not respecting the preparation lead time target of 5.5 hours (630 minutes).
  • Order transmission or warehouse airport transfer will never be a problem as you are way below the maximum lead time.

Approach 3: Enhancement

a) Visualize delays

The objective is to ensure that you have all our orders transmitted, prepared and delivered in less than 72 hours.

It starts with visualizations that will help you to answer simple questions

  • How many orders have been delivered late?
  • Which process is impacting the most your overall lead time?
Delays Root Causes Donut Plot — (Image by Author)
Visual for Root Cause Analysis — (Image by Author)

The visual above will help you to spot the late deliveries and quickly screen the different legs to understand which one impacted your lead time

  1. Start to look at the bottom graph
  2. You spot that one order lead time is above the yellow line
  3. Screen the lead times above to find which one(s) are causing this delay

💡 Reason Code Mapping
As part of prescriptive analytics, you can start to automatically create
reason codes for late deliveries that will be used for visuals like the donut plot above. (Example: if your shipment is delayed and your loading lead time is higher than the target you can add a late delivery reason code: loading)

b) Process mining to reduce delays

Now that you know the root causes, you can focus on designing solutions to avoid delays in the future.

For instance,

  1. You have spotted that many orders are delivered late due to long shipping lead times.
  2. After further analysis, you found that the root cause is the invoicing process.
  3. This process depends on the delivery location (region, store type (duty-free, franchise, ..), product category, order type (cross-docking, replenishment, collection launches), and order creation date.
Model for Prediction of Late Invoicing— (Image by Author)

You use data to train a model that predicts the probability of having delays in the invoicing process using the features listed above

  1. You have spotted a high correlation between the boolean output (on-time invoicing) and the order creation day (Monday, .. Sunday), and also the store region (Europe, Middle East, Asia, America).
  2. The vast majority of late invoices are from orders created in the second half of the week for stores located in the Middle East.
  3. After alignment with local logistics teams in the Middle East, you discovered that the invoice process is stopped on Thursdays and Fridays as it is the weekend in this region.

You can then ask demand planners (located in Europe) to adapt their order creation processes to these local specificities.

💡 Bring insights to operational and IT teams
The example above shows that data alone cannot design a solution or solve problems alone.

They need to be included during

  • Time stamps definition to make sure that the information you retrieve from systems is matching with the operational reality.
  • Lead time definitions to match them with your company’s supply chain performance management. (Example: if store receiving is not included in the transportation team’s performance measure you need to remove it)
  • Data processing with python to ensure that the fields you are using for your calculation are the right ones.
  • Visualization creation to ensure that they can be used by operational teams to get insights.

The models and visualizations you designed will provide insights to support continuous improvement initiatives and boost strategic decision-making.


Overall, process mining can be a powerful tool for improving the efficiency and effectiveness of business processes, and is increasingly used by organizations to drive process improvement efforts.

It requires harmonized and clean data

This basic requirement can be a major obstacle in your digital transformation journey.

The worst-case scenario, which is unfortunately very common, would be an organisation with

  • several systems not communicating together like multiple ERP instances, one WMS per warehouse fully controlled by the 3PL, and several TMS not interfaced with the ERP
  • many customizations that are not documented such as additional fields created by a developer who left the company
  • absence of a single source like a data lake that groups the databases of your different systems

At this stage, you need to work on harmonizing and building a strong data architecture (with governance to maintain data integrity) before jumping into process mining.

Next Steps

After spotting operational failures and finding the root causes, you will be interested in building a Digital Twin to simulate several scenarios and test the resilience of your solutions.

What Is a Supply Chain Digital Twin?
Use python to create a model representing your supply chain network to optimize your operations and support strategic decisions.

About Me

Let’s connect on Linkedin and Twitter, I am a Supply Chain Engineer that is using data analytics to improve logistics operations and reduce costs.