# Lean Six Sigma with Python — Kruskal Wallis Test

How to replace Minitab with Python to perform Kruskal Wallis Test evaluating the impact of training on warehouse operators’ productivity

How to replace Minitab with Python to perform Kruskal Wallis Test evaluating the impact of training on warehouse operators’ productivity

*Article originally published on Medium. *

** Lean Six Sigma (LSS) **is a method based on a stepwise approach to process improvements.

This approach usually follows 5 steps (** Define, Measure, Analyze, Improve and Control) **for improving existing process problems with unknown causes.

In this article, we will explore how ** Python **can replace

**Minitab**

*(Software widely used by LSS experts)***in the**

**step to test**

**Analysis****and**

**hypotheses****what could improve the**

**understand****of a specific process.**

**performance metrics**💌 New articles straight in your inbox for free: __Newsletter__

**SUMMARY**

**I. Problem Statement**

*Can we improve the operators' productivity by giving them a training designed by R&D team?*

**II. Data Analysis**

**1. Exploratory Data Analysis**Analysis with Python sample data from experiment with few operators

**2. Analysis of Variance (ANOVA)**Verify the hypothesis that training impacts productivity

ANOVA assumptions are not verified

**3. Kruskal-Wallis test**Confirm that the hypothesis can be generalized

**III. Conclusion**## I. Problem Statement

1. Scenario

You are the ** Continous Improvement Manager **of a

**for an iconic Luxury Maison focusing on**

**Distribution Center (DC)****.**

**Fashion, Fragrances and Watches**The warehouse receives** garments** that require

**and**

**final assembling****during the inbound process.**

**value-added service (VAS)**For ** each dress received **your operators need

**in the local language and**

**to print a label****.**

**perform label sewing**In this article, we will focus on the improvement of ** label sewing productivity**. Labels are distributed to the operators in batches of 30 labels.

The productivity is calculated based on the time (in seconds) needed to finish a batch.

Edit: You can find a Youtube version of this article with animations in the link below.

### 2. Impact of training your workforce

With the support of the R&D team, you designed training for the VAS operators to improve their productivity and reduce quality issues.

**Question**

Does the training have a positive impact on the productivity of operators?

**Hypothesis**

The training has a positive impact on the productivity of VAS operators.

**Experiment**

Randomly select operators and measure the time per batch * (Time to finish a batch of 30 labels in seconds)* to build a sample of

**.**

**56 records**

*You can find the full code in this Github repository:**Link*## II. Data Analysis

### 1. Exploratory Data Analysis

You can download the results of this experiment in this CSV file to run the whole code on your computer (here).

56 records35 records of operators without training21 records of operators with training

**Box Plot**

Based on the sample data, we can see that the median and the mean of the is considerably lower for the operators who had training.

**Hypothesis**

The training reduces the average time per batch.

**CodeMinitab**

### 2. Analysis of Variance (ANOVA)

In this scenario, we want to check if the ** training (Variable X) **is impacting the

**.**

**total time per batch (Variable Y)**Because** X is a categorical variable** (Training = Yes/No) and

**, the appropriate method is**

**Y is numerical****.**

**ANOVA**** ANOVA is a statistical method** used to check if we can generalize the difference of means seen in the sample data to the entire population.

**Step 1: Calculate the p-value**

ddof: 11

ddof: 245.267

F: 17.1066

**: 0.000151308**

**p-unc**p: 20.173692p-value is below 5%

**CodeMinitab**

**Step 2: Validate the assumptions of ANOVA**

Based on the p-value, we know that the difference of mean is real and not due to random fluctuation.

However, before jumping to a conclusion we need to check that the ANOVA assumptions are satisfied

- Residuals are naturally distributed

** Answer:** No

- There are no outliers or irregularities

** Answer:** No

**Conclusion**

ANOVA requirements are not met, we need another method to confirm that the training actually impacts operators' productivity.

**CodeMinitab**

### 3. Kruskal-Wallis test

If your sample data fails to meet ** ANOVA **requirements, you can use

**to check if the difference in means is due to random fluctuation.**

**Kruskal-Wallis Test****= 54.99**

**statistic****= 1.205e-13**

**pvalue**

**p-value is below 5%**### Conclusion

The p-value is below 5% so we can conclude that the difference of means is statically significant.

We can confirm that the training has a positive impact on the productivity of the operators.

**CodeMinitab**

## III. Conclusion

This data-driven approach gave you enough elements to convince your management to invest in workforce training.

You brought ** enough insights **with a

**by generalising patterns from sample data using statistics.**

**moderate effort of experimentation**### About Me

Let’s connect on Linkedin and Twitter, I am a Supply Chain Engineer that is using data analytics to improve logistics operations and reduce costs.

## References

[1] ANOVA Analysis of Variation, Ted Hessing, Six Sigma Study Guide, link

[2] Scheduling of Luxury Goods Final Assembly Lines with Python, Samir Saci