Back to articles list Articles
5 minutes read

Best Python Packages for Excel

At some point, you’ll probably need to work with data from an Excel spreadsheet. How can you work with Excel data and files in Python? We review some of the best Python packages for Excel in this article.

An Excel spreadsheet is a very common way of storing tabular data. But Excel is not without its problems, as we discussed in the article Excel Alternative: What to Learn as a Data Analyst. For large datasets, you may need the functionality of a database. But when you’re working with smaller datasets, you may want the convenience of Excel. In this case, knowing how to work with Excel data in Python is an important skill to master.

The Python libraries we’ll discuss can allow you to do everything from reading, writing, and modifying existing Excel files to creating new Excel files.  For some background reading, check out our article How to Read Excel Files in Python. Or to broaden your skills even further, we have a Working with Files and Directories in Python course, which will give you the ability to load data more efficiently and store or share the results.

So, let’s talk about these Python packages that make working with Excel possible. But first, we need to clear one thing up: the many file formats in Excel.

Excel File Formats

Until 2007, Excel used a file format with the extension .xls. For later versions of Excel, the default file format became the Excel Workbook, which has the extension .xlsx. Other formats have appeared to support more specific functionality. These include .xlsm (a macro-enabled file format,) and the binary file format .xlsb for Excel 2007 and Excel 2010. There are also template file formats, including .xltx and .xltm (a macro-enabled file format).

5 Libraries to Make Working with Excel in Python Easier

1. openpyxl

The first Python package for Excel which we’ll discuss is openpyxl. It’s possibly the most widely used package for working with Excel files in Python. This package is designed to read and write Excel 2010 files with formats including .xlsx, .xlsm, .xltx, and .xltm. As mentioned in the official documentation, openpyxl could be vulnerable to certain malicious attacks, but these can be guarded against.

If you want to create a new Excel file, start by importing the library and creating a workbook object:

>>> import openpyxl
>>> wb_obj = openpyxl.Workbook()

Now you can get the active sheet and start assigning data to the cells. Finally, use the save() method to write the file:

>>> sheet = wb_obj.active
>>> sheet['A1'] = 2
>>> sheet['A2'] = 3
>>> wb_obj.save('data.xlsx')

You can also read an existing file and modify it using this package.

2. XlsxWriter

The next Python package for working with Excel is XlsxWriter, which works with .xlsx files. It can create files, add data to tables in the workbook, and format the data. A particularly nice feature is the ability to use Python to add charts directly into the workbook. This package also gives you the ability to apply formulas to the workbook.

XlsxWriter cannot be used to read or modify an existing Excel file. To use it, we need to create a new xlsx file, add some data, and apply a formula:

>>> wb_obj = xlsxwriter.Workbook('formula.xlsx')
>>> sheet = wb_obj.add_worksheet()
>>> sheet.write('A1', 2)
>>> sheet.write('A2', 3)
>>> sheet.write_formula('A3', '{=SUM(A1:A2)}')
>>> wb_obj.close()

For more information and examples, see the XlsxWriter documentation.

3 and 4. pyxlsb and pyxlsb2

As the name of the next package suggests, pyxlsb specialises in parsing the binary file format .xlsb. The functionality is quite limited, but you can open a workbook, get a particular sheet, and read the rows. This can be achieved with the open_workbook(), get_sheet_by_index(), and rows() methods.

There is also some limited functionality for formatting dates to convert them to datetime objects. The updated version, pyxlsb2, offers some improvements over its predecessor. These include speeding up processing, loading worksheets and macrosheets, and extracting macro formulas.

5. pylightxl

The pylightxl package is a lightweight, zero-dependency package that can read and write Excel files. The zero-dependency factor could be a compelling feature if you’re developing bigger projects, since this will avoid any compatibility issues with other software and make version control easier. Also, regardless of which version of Python you’re running (from Python 2.7.18 onwards), pylightxl will be compatible for life.

After installing this library, you can import it and read in a file:

>>> import pylightxl as xl
>>> db = xl.readxl('data.xlsx')

From here, you can use many methods on this database object to access and modify the worksheets and write out a new Excel file.

How to Excel at Python

We can’t talk about working with data in Python without mentioning pandas. This is an incredibly useful library. In the article Python Libraries Every Programming Beginner Should Know, we show an example of how to read Excel files in Python with pandas. This is a fundamental library and one of our Top 15 Python Libraries for Data Science.

Using Python to read Excel files is essential if you want to get good at working with data. To boost your motivation, here’s an explanation of the Benefits of Learning Python.

One of our 5 Tips for Learning Python From Scratch is to find a good resource. So, consider taking the How to Read and Write Excel Files in Python course. It includes 45 interactive exercises, so you get plenty of practical experience.