Automate excel report creation with python by writing explicit excel formulas that any Excel user can read
Article originally published on: Medium.
The main issue you can face with an excel automation script written using python and pandas is the experience of users that have no programming skills.
Indeed, your tool can be seen as a black box that takes data from an excel file, processes it on the back-end, and exports it to another excel file.
Some users cannot trust a report if they can’t access and modify the formula used to populate the results.
Therefore, replacing excel with python can impact the user acceptance of your solutions.
In this article, I propose an alternative to overcome this problem with the python library xlswriter.
💌 New articles straight in your inbox for free: Newsletter
You are a data analyst in a fashion retail company in charge of sales reporting.
Reports are generated by the system that manages the point of sales in stores.
In these reports, you have
- Sales quantity in pieces
- Items code with a total of 50 items
- Date and week covering a full year
For reporting purposes, you need to process these reports and map the dataset with additional features that will be used to analyze sales trends.
These additional features are linked with the item code
- Item Family: Leather Goods, Accessories, Ready-to-wear or Others
- Scope: boolean value to inform if the item is in the scope of analysis
As you need to perform this task more than a hundred times per month, you are looking for a solution to automate the process.
Initial solution using Python Pandas
Initially, you built a solution using Python that was automating the processing using pandas.
Data is imported from several excel files using pandas_read excel, processed by your script and exported to another excel file.
This script was converted to an executable file (.exe) so your colleagues can use it without your support.`
- Design and test your python script
- Use pyinstaller to export your python script in an executable file
- Share your .exe file (with detailed instructions) with your colleagues
Issues with user acceptance
Some users were complaining about the fact that they cannot access the formula used to process the data.
Hello Samir, why can’t we see the formulas in the final report?
Are you sure that you take the right columns?
This issue raised some concerns among your colleagues that were questioning the accuracy of the tool as they can’t check it.
New solution with xlsxwriter
The idea is to use the python library xlsxwriter to perform the calculation in the Excel file.
You can write formulas, in excel cells, that can be read (and modified) by the users of the output file.
Thus, you keep the automation capabilities of python while providing more visibility to your users that are only familiar with Excel.
If you have a look at the documentation of xlsxwriter, you will find several methods to create formulas on excel cells.
You can install this library using Pippip install xlsxwriter
You can write formulas applied to a single cell,
And also apply your formula to an array,
You can then build a solution for our easy processing task
The three additional columns are generated using Excel formulas that can be read on each cell.
Follow me on medium for more insights related to Data Science for Supply Chain.
We have answered the need for visibility from the final users while keeping the automation capabilities of python.
Low processing speed
However, because of the structure of the library and the computing power needed to create formulas in an excel file, you will lose processing speed.
Beyond the limitations of excel formulas, you need also to cope with the limited functions of the xlsxwriter.
For instance, you cannot build pivot tables with it.
For advanced calculation and processing, you will need to educate your users and bring transparency using another way to get their trust.