Back to articles list Articles
5 minutes read

How to Read Excel Files in Python

Have you worked with Excel files? I’ll show you how to read them into Python so that you can automate Excel-related tasks.

If you are a data analyst, a data engineer, or someone who otherwise writes scripts to automate simple tasks, there is a high likelihood that you have worked with or will work with Excel files. In this article, we will go over how to read Excel files into Python. You can also check out our course How to Read and Write Excel Files in Python to learn more about the complete functionality in the openpyxl library.

If you are interested in building up your skill set to become a data scientist, make sure to check out our Introduction to Python for Data Science course, where you will learn more about Python and its tools to get you started in your journey.

What Are Excel Files?

Microsoft Excel is a spreadsheet application by Microsoft for Windows, macOS, Android, and iOS. A spreadsheet is used for storing and analyzing tabular data structured in rows and columns.

Excel is one of the most widely used formats for exchanging data. It owes its popularity to the fact that the Windows operating system is so commonly used.

Excel is also popular with non-technical people. Some reasons for this are:

  1. It is easy to use.
  2. It has powerful functionalities that make visualization and analysis easier.
  3. Most databases and data wrangling tools have interfaces to work with Excel.

Let’s take a look at an example of data that can be found in an Excel file to better understand the structure.

OrderDateItemUnitsUnit Cost

This is one of the most common data structures we find in an Excel file. As a data scientist, you may be tasked with analyzing company sales using data structured like this. The data can come from another department after they work with it or directly from the database. Reading the data directly into Python makes the analysis task much simpler, especially if the data is quite large.

Another common task for a data scientist is to visualize or make the data more presentable. Data scientists often take the data generated by the company and relay the results of the analysis to managers and other non-technical employees. While Excel offers similar functionalities, the data science libraries of Python have evolved rapidly, and people are gravitating more and more toward Python. If you are not convinced, I invite you to take a look at the following blog post which gives you more reasons why Python is worth learning in addition to Excel.

As an example, I worked with tons of Excel files when I helped my girlfriend automate some tasks at her work. They were manually reading data from multiple Excel and CSV files to create reports, sharing those reports with their customers. The entire set of tasks used to take 2 days out of each month for nearly 4 coworkers. With the xlrd library and some Googling, I was able to automate all those tasks with minimum user input and create all the reports for each customer.

As was in this case, it is quite common to work with multiple file formats. The CSV is another very common file format; take a look at this article where we show how to read CSV files into Python.

How to Work With Excel Files Using Python

Many Python libraries can be used to read an Excel file easily. Some options are openpyxl, pandas, and xlrd. We will be focusing on how to read Excel files to Python using the openpyxl library.

We first import the load_workbook() function from the openpyxl library to read in the Excel file. Then we pass the name of the file we want to read as an argument to this function.

from openpyxl import load_workbook
name = 'name_of_the_file.xlsx'
wb = load_workbook(name)

With this code snippet, we can read our Excel file into Python. An Excel file may have many worksheets, but there is a handy way to check how many worksheets there are. We can simply use wb.sheetnames to get the name of each worksheet as a list. For example, they could be sheet1, sheet2, and sheet3. We can select which one to work with, similar to how we index dictionaries in Python.

We can now do a few more things with openpyxl to iterate over the rows of this file and print each row separated by a space. Keep in mind that this will print the column names as the first row.

for row in wb["SalesOrders"].iter_rows():
for cell in row:
print(cell.value, end=" ")

In the code snippet above, we use the iter_rows() function on the sheet wb["SalesOrders"].This returns an iterable object with all the rows in the worksheet, which we can use in the for loop. Then we iterate over all the cells in the row in the second for loop. We access the value in each of these cells simply using cell.value.

Now You Can Read Excel Files Into Python

Reading from Excel files lets you do a lot with the data at hand, like visualization, analysis, machine learning, or even creating structured presentations. There are many other functionalities provided by the openpyxl library. If you would like to learn more about them and get hands-on experience at the same time, take a look at Python For Data Science. This track includes working with many different file formats, including our interactive course How to Read and Write Excel Files in Python, and teaches you all the necessary skills for a data scientist.

Hands-on experience is one of the most crucial things a data scientist can have under his or her belt. For most of the positions, you will be required to demonstrate your knowledge through either technical interviews or mock projects. Take a look at our list of common Python interview questions for Data Scientists, if you want to review your knowledge before an interview.