Data VisualizationProgrammingPython

How to Create PDF Reports with Python – The Essential Guide

Reports are everywhere, so any tech professional must know how to create them. It’s a tedious and time-consuming task, which makes it a perfect candidate for automation with Python.

You can benefit from an automated report generation whether you’re a data scientist or a software developer. For example, data scientists might use reports to show performance or explanations of machine learning models. 

This article will teach you how to make data-visualization-based reports and save them as PDFs. To be more precise, you’ll learn how to combine multiple data visualizations (dummy sales data) into a single PDF file.

And the best thing is – it’s easier than you think!

The article is structured as follows:

You can download the Notebook with the source code here.

Data generation

You can’t have reports without data. That’s why you’ll have to generate some first—more on that in a bit.

Let’s start with the imports. You’ll need a bunch of things – but the FPDF library is likely the only unknown. Put simply, it’s used to create PDFs, and you’ll work with it a bit later. Refer to the following snippet for the imports:

Let’s generate some fake data next. The idea is to declare a function that returns a data frame of dummy sales data for a given month. It does that by constructing a date range for the entire month and then assigning the sales amount as a random integer within a given range.

You can use the calendar library to get the last day for any year/month combination. Here’s the entire code snippet:

A call to generate_sales_data(month=3) generated 31 data points for March of 2020. Here’s how the first couple of rows look like:

Image 1 - Sample of generated data (image by author)

Image 1 – Sample of generated data (image by author)

And that’s it – you now have a function that generates dummy sales data. Let’s see how to visualize it next.

Data visualization

Your next task is to create a function that visualizes the earlier created dataset as a line plot. It’s the most appropriate visualization type, as you’re dealing with time series data.

Here’s the function for data visualization and an example call:

In a nutshell – you’re creating data visualization, setting the title, playing around with fonts – nothing special. The visualization isn’t shown to the user but is instead saved to the machine. You’ll see later how powerful this can be.

An example call will save a data visualization for December of 2020. Here’s how it looks like:

Image 2 - Sales for December/2020 plot (image by author)

Image 2 – Sales for December/2020 plot (image by author)

And that’s your visualization function. There’s only one step remaining before you can create PDF documents, and that is to save all the visualization and define the report page structure. 

Create a PDF page structure

The task now is to create a function that does the following:

  • Creates a folder for charts – deletes if it exists and re-creates it
  • Saves a data visualization for every month in 2020 except for January – so you can see how to work with different number of elements per page (feel free to include January too)
  • Creates a PDF matrix from the visualizations – a 2-dimensional matrix where a row represents a single page in the PDF report

Here’s the code snippet for the function:

It’s possibly a lot to digest, so go over it line by line. The comments should help. The idea behind sorting is to obtain the month integer representation from the string – e.g., 3 from “3.png” and use this value to sort the charts. Delete this line if the order doesn’t matter, but that’s not the case with months.

Here’s an example call of the construct() function:

You should see the following in your Notebook after running the above snippet:

Image 3 - Generated visualizations (image by author)

Image 3 – Generated visualizations (image by author)

In case you’re wondering – here’s how the plots/ folder looks on my machine (after calling the construct() function): 

Image 4 - PDF report content matrix (image by author)

Image 4 – PDF report content matrix (image by author)

And that’s all you need to construct PDF reports – you’ll learn how to do that next.

Create PDF reports

This is where everything comes together. You’ll now create a custom PDF class that inherits from the FPDF. This way, all properties and methods are available in our class, if you don’t forget to call super().__init__() in the constructor. The constructor will also hold values for page width and height (A4 paper).

Your PDF class will have a couple of methods:

  • header() – used to define the document header. A custom logo is placed on the left (make sure to have one or delete this code line), and a hardcoded text is placed on the right
  • footer() – used to define the document footer. It will simply show the page number
  • page_body() – used to define how the page looks like. This will depend on the number of visualizations shown per page, so positions are margins are set accordingly (feel free to play around with the values)
  • print_page() – used to add a blank page and fill it with content

Here’s the entire code snippet for the class:

Now it’s time to instantiate it and to append pages from the 2-dimensional content matrix:

The above cell will take some time to execute, and will return an empty string when done. That’s expected, as your report is saved to the folder where the Notebook is stored. 

Here’s how to first page of the report should look like:

Image 5 - First page of the PDF report (image by author)

Image 5 – First page of the PDF report (image by author)

Of course, yours will look different due to the different logo and due to sales data being completely random. 

And that’s how you create data-visualization-powered PDF reports with Python. Let’s wrap things up next.

Conclusion

You’ve learned many things today – how to create dummy data for any occasion, how to visualize it, and how to embed visualizations into a single PDF report. Embedding your visualizations will require minimal code changes – mostly for positioning and margins. 

Let me know if you’d like to see a guide for automated report creation based on machine learning model interpretations (SHAP or LIME) or something else related to data science.

Thanks for reading.

Connect on LinkedIn.

Join my private email list for more helpful insights.

Learn more 

Dario Radečić
Data scientist, blogger, and enthusiast. Passionate about deep learning, computer vision, and data-driven decision making.

You may also like

Comments are closed.