Back to articles list Articles
6 minutes read

How to Read a CSV File Into a List in Python

Read and process CSV files in Python.

Comma-separated value files, or CSV files, are the most popular file type for storing tabular data. Why would you want to read CSV files in Python? Perhaps your programming journey has brought you to a point where you need to work with files. Or maybe you want to perform calculations on data gathered from an IoT sensor.

The easiest way to work with CSV files in Python is to use the pandas module. From there, you can go further with your data and visualize it.

But that’s not the only way. if you have reasons to rely on just pure Pythonic ways, here's how!

Read a CSV File Into a List of Lists

Imagine you work with data from class exams. You have names and grades, and you want to calculate the class average. For simplicity, let's assume grades range from 1 (worst) to 6 (best). We have the data in the format below, in a file called data.csv:

id,student,grade
1,John,4
2,Emily,5
3,Emma,3
...

Which can be represented as a table:

idstudentgrade
1John4
2Emily5
3Emma3
...

As you see, it uses the comma as the separator, and we have a header row. Knowing all of that, let's write the code!

>>> import csv
>>> 
>>> file = open("data.csv", "r")
>>> data = list(csv.reader(file, delimiter=","))
>>> file.close()
>>> 
>>> print(data)
[['id', 'student', 'grade'], ['1', 'John', '4'], ['2', 'Emily', '5'], ['3', 'Emma', '3'], ['4', 'Patricia', '5'], ['5', 'James', '2'], ['6', 'Michael', '4'], ['7', 'David', '3'], ['8', 'Linda', '5'], ['9', 'Andrew', '5'], ['10', 'Mary', '6'], ['11', 'Kevin', '6'], ['12', 'Barbara', '1'], ['13', 'George', '1'], ['14', 'Peter', '3'], ['15', 'Zach', '4'], ['16', 'Susan', '4'], ['17', 'Lisa', '4'], ['18', 'Tim', '4.5']]

Basically, that's it! Let's go through the script line by line.

In the first line, we import the csv module. Then we open the file in the read mode and assign the file handle to the file variable.

Next, we work on the opened file using csv.reader(). We only need to specify the first argument, iterable, and we specify the comma as the delimiter. As you may have gleaned from the name of the first argument, it expects to receive an iterable, so you can also pass a list of CSV rows (as text). When we pass a file handle, csv.reader() treats it as an iterable and reads the entire file line by line.

csv.reader() also returns an iterable. Think of it as a chain of data, accessed one by one and only once. To understand the bigger picture and become more fluent with iterables, take a look at the article “An Introduction to Combinatoric Iterators in Python.”

To turn an iterable into a list, we wrap the whole expression with list(). If you ever work with enormous amounts of data – don't do this – you will probably run out of RAM in your device. To become a pro at handling huge CSV files, check out our How to Read and Write CSV Files in Python course. In the course, you also learn how to create and write your own CSV files.

Finally, once we read the whole file and no longer need it, we can safely close it with file.close(). Note that you get an I/O error if you decide to stick with iterators and try to use them after closing.

Calculating the Average

To calculate the average manually, we need two values: the total sum of all grades and the count of grades. Luckily, Python comes with functions for both of these tasks.

Let's start by extracting grades from the data.

>>> [row[2] for row in data]
['grade', '4', '5', '3', '5', '2', '4', '3', '5', '5', '6', '6', '1', '1', '3', '4', '4', '4', '4.5']

We've used a construct called list comprehension here. If you're not familiar with this syntax, Marija wrote an article about it check it out!

But our grades don't look right. We have two issues to solve. First, we've left the header in the data. Second, we can’t calculate an average of strings, so we need to cast them to floats.

To solve the first issue, we use index slicing and skip the first row. This means we write data[1:] instead of simply data. Then, we wrap row[2] with the float() function to get numbers we can work with.

We can also use some maps and filters. Sounds mysterious? If we've managed to intrigue you, then check out Xavier's article, “Map, Filter, and Reduce – Working on Streams in Python.”

Side note: writing row[2] is not the prettiest solution. It's obvious we get the value from the third column, but we'll get back to this bit later in the article to use column names.

>>> [float(row[2]) for row in data[1:]]
[4.0, 5.0, 3.0, 5.0, 2.0, 4.0, 3.0, 5.0, 5.0, 6.0, 6.0, 1.0, 1.0, 3.0, 4.0, 4.0, 4.0, 4.5]

Ahhh yes! That looks right. Let's assign that expression to the grades variable and calculate the two values we talked about at the start of this section.

>>> sum_grades = sum(grades)
>>> count_grades = len(grades)
>>> grades_avg = sum_grades / count_grades
>>> print("Average:", grades_avg)
Average: 3.861111111111111

Done! Pretty self-explanatory, isn’t it? The first line sums the whole list. The second line calculates the length of the list, that is, the number of elements. And the third line is a basic division to get us the average.

Read a CSV File Into a List of Dictionaries

As promised earlier, we now see how to use column names from the header to access the data. To do so, we use csv.DictReader().

As the name suggests, it parses each row as a dictionary, using the header row to determine column names. If you don't have a header row, you may specify the fieldnames argument. The rest is pretty much the same. Read about more details in the official documentation or learn in our CSV course.

Let's rewrite the code:

>>> import csv
>>> 
>>> file = open("data.csv", "r")
>>> data = list(csv.DictReader(file, delimiter=","))
>>> file.close()
>>> 
>>> print(data)
[{'id': '1', 'student': 'John', 'grade': '4'}, {'id': '2', 'student': 'Emily', 'grade': '5'}, {'id': '3', 'student': 'Emma', 'grade': '3'}, {'id': '4', 'student': 'Patricia', 'grade': '5'}, {'id': '5', 'student': 'James', 'grade': '2'}, {'id': '6', 'student': 'Michael', 'grade': '4'}, {'id': '7', 'student': 'David', 'grade': '3'}, {'id': '8', 'student': 'Linda', 'grade': '5'}, {'id': '9', 'student': 'Andrew', 'grade': '5'}, {'id': '10', 'student': 'Mary', 'grade': '6'}, {'id': '11', 'student': 'Kevin', 'grade': '6'}, {'id': '12', 'student': 'Barbara', 'grade': '1'}, {'id': '13', 'student': 'George', 'grade': '1'}, {'id': '14', 'student': 'Peter', 'grade': '3'}, {'id': '15', 'student': 'Zach', 'grade': '4'}, {'id': '16', 'student': 'Susan', 'grade': '4'}, {'id': '17', 'student': 'Lisa', 'grade': '4'}, {'id': '18', 'student': 'Tim', 'grade': '4.5'}]

As simple as that! Now, we can make extracting the grades prettier:

>>> [float(row["grade"]) for row in data]
[4.0, 5.0, 3.0, 5.0, 2.0, 4.0, 3.0, 5.0, 5.0, 6.0, 6.0, 1.0, 1.0, 3.0, 4.0, 4.0, 4.0, 4.5]

The remaining code – calculating the average – is the same.

Read CSV Files in Python and Do More With Data!

Great! Python comes with a plethora of modules and functions that make such tasks significantly easier. In this article, we have reviewed parsing CSV files in Python. To display a table of your data, learn with Luke's article on how to pretty-print – with just pure Python or with additional modules.

Also, take a look at our Python for Data Science track through practical step-by-step exercises. Start doing more with the data you already have!