23rd Jan 2023 8 minutes read

A Guide to the Python csv Module

csv
python

What are CSV files, and how can you use Python alongside them? Learn how to use the csv module to read and work with CSV files in Python.

You have probably seen tabular data before: it’s simply rows and columns containing some data. (Think of a table in an article or an Excel spreadsheet.) CSV files are one of the most common types of tables used by data scientists, but they can be daunting if you don’t know exactly how they work or how to use them alongside Python.

In this article, we will learn about CSV files and how to use Python to work with them. We’ll start by learning what CSV files actually are. Then, we will learn to use Python’s built-in csv module to read and write CSV files quickly and effectively. After reading this article, you’ll want to check out our course on writing and reading CSV files in Python to solidify your knowledge. But for now, let’s get started!

The Structure of a CSV File

In a nutshell, a CSV file is a plain-text file that represents a table. The data is stored as rows and columns. The name CSV stands for comma-separated values, meaning that the table’s columns are delimited by commas. On the other hand, rows are simply delimited by lines in the file. The very first line is often the table’s header, which contains a description of the data in each column.

Let’s work with an example CSV file named people.csv. Here’s what it looks like if we open it with a text editor:

Plain-text contents of people.csv. We will use this file in the examples to come.

As you can see, the columns are defined by commas. The first column (under the header label name and just before the first comma in each line) stores the name of each person. After the first comma comes the column id, then age, and so on. Quotation symbols may be used to encapsulate text, as seen in the id column.

Since the commas do not necessarily align, it is a bit difficult to visualize each column in plain text. The data inside the CSV file is easier to understand if we open it in a spreadsheet like Excel, Google Sheets or LibreOffice Calc:

The data in people.csv, as displayed in a spreadsheet.

That’s much better! Also, note how the values in the id column were interpreted as text. If they weren’t enclosed in quotation symbols, they would be considered numbers and their leading zeros would be discarded upon loading the data.

Opening CSV files as spreadsheets may lead to unexpected results – for example, the software might think that the commas are numerical separators instead of column delimiters. Some spreadsheet programs (like Excel) include a CSV import functionality that allows you to specify column delimiters and data types, among other parameters. Keep an eye out to make sure that you don’t inadvertently modify your data!

Using Python to Read a CSV File

Okay, we have a basic idea of what a CSV file is. But how do you open it with Python?

You may be tempted to use the open() function to read the contents of the file, split the line at each column separator, and finally put it into a data structure like a list or dictionary. (Side note: If you haven’t heard about the open() function, we recommend you read our article on how to write to a file in Python.) This is certainly a valid approach – we encourage you to try it out as a coding challenge – but we can do even better.

CSV files are so ubiquitous that the Python development team went ahead and created the csv module for us. We simply need to import this module and we’ll have access to tons of functionality related to loading data from CSV files and writing it back into them.

The most basic function for reading a CSV file is csv.reader. We can use it by opening the CSV file (with the open() function), then passing the file handle created by the open() function to the csv.reader function. You can see how the code looks below. (We assume that you are executing Python in the same folder where the people.csv file is.)

import csv

with open('people.csv') as csv_file:
    reader = csv.reader(csv_file)
    for row in reader:
        print(row)

# output:
# ['name', 'id', 'age', 'department', 'likes_pizza']
# ['James', '001', '22', 'sales', 'True']
# ['John', '002', '28', 'engineering', 'False']
# ['Jessica', '003', '25', 'engineering', 'True']
# ['Monica', '004', '29', 'accounting', 'False']

As you can see, the reader automatically organizes the data into rows. When we iterate over the reader, we end up iterating over the rows in the CSV file. These rows are stored as lists, without us having to ever create a single list ourselves.

The csv.reader function accepts lots of different arguments in order to parse different formatting standards in a CSV file. For example, despite having “comma-separated” in their name, some CSV files use other characters for delimiting columns or even for quoting text. if you come across such a case, you can specify these characters with the delimiter and quotechar keywords, respectively:

import csv

with open('people.csv') as csv_file:
    reader = csv.reader(csv_file, delimiter=';', quotechar='!')
    for row in reader:
        print(row)

# output:
# ['name,id,age,department,likes_pizza']
# ['James,"001",22,sales,True']
# ['John,"002",28,engineering,False']
# ['Jessica,"003",25,engineering,True']
# ['Monica,"004",29,accounting,False']

In this case, since we used an incorrect delimiter and quotechar for the formatting in our people.csv file, we ended up with poorly split rows in our reader. We should have just used the defaults!

At any rate, knowing how to change these parameters is useful if you ever come across a CSV file with different formatting. For example, you’ll often see the semicolon ( ; ) being used as the column delimiter in CSV files originated in countries where the comma ( , ) is used for decimal values.

Read a CSV File as a Dictionary Using Python

Our examples in the last section had a glaring issue: the header row appears alongside the rows that contain values! Ideally, since header and values have different meanings in a table, we want to treat them differently as well.

Luckily for us, there’s the csv.DictReader object at our service. It acts like the regular csv.reader from before (you can even pass in the same arguments, like delimiter and quotechar), but it is composed of dictionaries, instead of lists. What’s more, the keys to the dictionary come from the CSV file’s header!

Here’s how it looks in action:

import csv

with open('people.csv') as csv_file:
    reader = csv.DictReader(csv_file)
    for row in reader:
        print(row)

# output:
# {'name': 'James', 'id': '001', 'age': '22', 'department': 'sales', 'likes_pizza': 'True'}
# {'name': 'John', 'id': '002', 'age': '28', 'department': 'engineering', 'likes_pizza': 'False'}
# {'name': 'Jessica', 'id': '003', 'age': '25', 'department': 'engineering', 'likes_pizza': 'True'}
# {'name': 'Monica', 'id': '004', 'age': '29', 'department': 'accounting', 'likes_pizza': 'False'}

See how the header labels became the keys to each row’s dictionary? The csv.DictReader assumes that the first row is the header – a very safe assumption – and uses its labels to create the dictionaries. Naturally, this means that the header row does not appear when iterating over the csv.DictReader.

At first, csv.DictReader might not seem like a big step from csv.reader. However, using header-labeled dictionaries allows us to work with values much more cleanly. The following code lists the name and age from every person – hopefully, you will agree with me on how clear and understandable the code is:

import csv

with open('people.csv') as csv_file:
    reader = csv.DictReader(csv_file)
    for row in reader:
        name = row['name']
        age = row['age']
        print(name, 'is', age, 'years old.')

# output:
# James is 22 years old.
# John is 28 years old.
# Jessica is 25 years old.
# Monica is 29 years old.

Using Python to Write a CSV File

How about writing data to a CSV file? Well, the csv module does include a csv.writer function for that purpose; it works like csv.read. However, let’s skip ahead and take a look at its more powerful relative csv.DictWriter. Let’s use it to create a brand new file called names.csv:

import csv

with open('names.csv', 'w') as csv_file:
    header = ['first', 'last']
    writer = csv.DictWriter(csv_file, fieldnames=header)
    
    writer.writeheader()
    writer.writerow({'first': 'Jack', 'last': 'Hill'})
    writer.writerow({'first': 'James', 'last': 'Mitch'})

There’s a lot going on here, so let’s take it one step at a time:

We start out by opening a file in write mode ( 'w' ), which will create a brand-new CSV file in our filesystem.
We then define the header columns that our CSV file should have and pass it to the csv.DictWriter object using the fieldnames keyword argument.
After creating our csv.DictWriter and storing it to the writer variable, we call writer.writeheader(). This will automatically write the header into our CSV file.
Afterwards, we call writer.writerow(), a few times. With each call, we provide a dictionary representing the row to be written into the CSV file. Naturally, the keys of the dictionary should match the header that we previously defined.

If you take a look in the filesystem, you’ll see the newly-created names.csv file. Inspecting it as a spreadsheet reveals that the data is indeed there!

Content of the newly-created names.csv file.

Go Beyond Python’s csv Module with LearnPython.com

This covers the basics of working with CSV files in Python, but you can go further beyond the csv module. Data scientists often use a package called pandas to operate CSV files. The pandas library is far more complex than Python’s built-in csv module, but it’s also much more powerful.

If you want a taste of how pandas works, we recommend you check out our Python for Data Science track. You’ll learn about pandas, matplotlib, and much more – and that’s just in the first course! After all, every data scientist should know Python.

And if you want to go beyond CSV files and learn about other types of files used in Python, we recommend you move onwards to our Data Processing with Python track. This track teaches you how to use Python with JSON and Excel files as well as CSV files. Happy learning!

Tags:

csv
python

The Structure of a CSV File

Using Python to Read a CSV File

Read a CSV File as a Dictionary Using Python

Using Python to Write a CSV File

Go Beyond Python’s csv Module with LearnPython.com

You may also like