Back to articles list November 12, 2020 - 7 minutes read How to Read CSV Files Python Yiğit Aras Tunalı Yiğit is currently working as a data scientist while pursuing a master’s degree from the Technical University of Munich. He spends most of his free time building software projects with his friends and learning more about data science. Tags: python learn python csv Have you encountered CSV files? In this article, I’ll show you what CSV files are and how easy it is to work with them in Python. If you are working as a back-end developer or data scientist, chances are that you’ve already dealt with CSV files. It is one of the most used formats for working with and transferring data. Many Python libraries can handle CSVs, but in this article, we’ll focus on Python’s csv module. What Are CSV Files? A CSV file, also known as a comma-separated values file, is a text file that contains data records. Each line represents a different record and includes one or more fields. These fields represent different data values. Let’s look at some CSV examples. Below we have a snippet of a CSV file containing student data: firstname,lastname,class Benjamin,Berman,2020 Sophie,Case,2018 The first line is the header, which is essentially column names. Each line will have the same number of fields as the first line has column names. We’re using commas as delimiters (i.e. to separate fields in a line). Let’s look at a second example: firstname|lastname|class Benjamin|Berman|2020 Sophie|Case|2018 This snippet has the same structure as the first one. The difference is the delimiter: we’re using a vertical bar. As long as we know the general structure of the CSV file, we can deal with it. Why Are CSV Files So Common? In essence, CSV files are plain-text files, meaning they are as simple as it gets. This simplicity makes it easy to create, modify, and transfer them – regardless of the platform. Thus, tabular data (i.e. data structured as rows, where each row describes one item) can be moved between programs or systems that otherwise might be incompatible. Another benefit of this simplicity is that it’s very easy to import this data into spreadsheets and databases. For spreadsheets, just opening the CSV file often automatically imports the data into the spreadsheet program. One of the most common uses of CSV files is when part of a database’s data needs to be extracted for use by a non-technical coworker. Most modern database systems allow users to export their data into CSV files. Instead of making non-technical people struggle through the database system, we can easily give them a CSV file with the data they need. We could also easily extract a CSV file from a spreadsheet and insert that into our database. This makes interfacing between non-technical personnel and databases a lot easier. At times, we might work on actual CSV files – e.g. when one team scrapes data and delivers it to the team that is supposed to work with it. The most common way to deliver the data would be in a CSV file. Or perhaps we need to get some data from a legacy system that we can’t interface with. The easiest solution is to acquire this data in CSV format, since textual data is easier to move from system to system. Reading CSV files is so common that questions about it frequently appear in Python technical interviews. You can learn more about the questions you might face in a Python-focused data science job interview in this article. Even if you’re not interested in a data science role, check it out; you might run across some of these questions in other Python jobs. Using Python’s csv Module There are many Python modules that can read a CSV file, but there might be cases where we aren’t able to use those libraries, i.e. due to platform or development environment limitations. For that reason, we’ll focus on Python’s built-in csv module. Below we have a CSV file containing two students’ grades: Name,Class,Lecture,Grade Benjamin,A,Mathematics,90 Benjamin,A,Chemistry,54 Benjamin,A,Physics,77 Sophie,B,Mathematics,90 Sophie,B,Chemistry,90 Sophie,B,Physics,90 This file includes six records. Each record contains a name, a class, a lecture, and a grade. Each field is separated by commas. To work with this file, we’ll use the csv.reader() function, which accepts an iterable object. In this case, we will be providing it with a file object. Here is the code to print all rows of the Report.csv file: import csv with open("Report.csv", "r") as handler: reader = csv.reader(handler, delimiter=',') for row in reader: print(row) Let’s analyze this code line by line. First, we import the CSV module that comes with the regular Python installation. Then we open the CSV file and create a file handler called handler. Since this file handler is an iterable object that returns a string whenever the __next__ method is called on it, we can give it as an argument in the reader() function and get a CSV handler that we call reader. And now we can iterate over reader; each element of it will be a list of fields for each line in our original CSV file. Keep in mind that the CSV file can include field names as its first line. If we know that this is the case, we can use the csv.DictReader() function to create a handler. Instead of returning a list for each row, this function will return a dictionary for each line. The key for each dictionary is the names in the first line of the CSV file. CSV Dialects and How to Deal With Them Even though CSV stands for “comma separated values”, there is no set standard for these files. Thus, csv allows us to specify the CSV dialect. The csv.list_dialects() function lists the csv module’s built-in dialects. For me, these are excel, excel-tab, and unix. The excel dialect is the default setting for CSV files exported directly from Microsoft Excel; its delimiter is a comma. A variant of this is excel-tab, where the delimiter is a tab. More info on these dialects can be seen on the Python GitHub page. If your company or team is using a custom-styled CSV, you can create your own CSV dialect and put it into the system using the register_dialect() function. See the Python GitHub page for more details. An example would look as follows: csv.register_dialect('myDialect',delimiter='|', skipinitialspace=True, quoting=csv.QUOTE_ALL) You could then use the new myDialect to read a CSV file: import csv with open("Report.csv","r") as handler: reader = csv.reader(handler, dialect="myDialect") This works much like our previous example, but instead of supplying an argument for the delimiter, we simply give our new dialect as the argument. Here we state that we are creating a dialect called “myDialect”. This dialect will use the vertical bar ( | ) as the delimiter. It also indicates that we want to skip any whitespaces (empty spaces) after delimiters and that all values are inside quotes. There are a few more parameters that can be set; see the links above for details. What If We Don’t Know the CSV Dialect? Sometimes we won’t know what dialect the CSV file has. For times like this, we can use the csv.Sniffer() functionality. I’ve found the two functions below very useful: header_exists = csv.Sniffer().has_header(reader) sniffed_dialect = csv.Sniffer().sniff(reader) The first function returns a Boolean value indicating if there is a header. The second function returns the dialect as found by csv.Sniffer(). It is always beneficial to use these functions when we don’t know the structure of the CSV file. Now That You Know About CSV Files and Python ... … you need to practice! The CSV file format is one of the oldest and most common data transfer methods out there. We simply cannot hope to avoid it when working as a data scientist or machine learning engineer. Even back-end developers deal with CSV files, either when receiving data or when writing it back to the system for some other component to use. As the csv module is already installed in Python, it’ll probably be your go-to tool for dealing with CSV files. For some hands-on practice in working with CSVs in Python, take a look at our interactive course How to Read and Write CSV Files in Python. Tags: python learn python csv You may also like What is Python Used For? When somebody asks you: 'what is Python used for?', you can confidently answer: 'It can be used for just about anything.' Here are some examples. Read more Welcome to LearnPython.com Learn Python and become a programmer. Our Python courses are available for FREE! Read more Why Is Python So Popular? An Introduction to The World's Favorite Programming Language One question we're asked time and time again here at LearnPython.com is 'Why is Python so popular?' The answer is easy, just like Python! Read more How to Learn Python Fast How do you learn Python fast? Use your time wisely and choose the right interactive course. Follow this path to success. Read more Coding Wo[men]'s World: How to Start Coding Prejudice and fear are often the reasons why people never start coding. In this article, you will read the stories of women who have overcome those concerns and learned how to code. Read more Why Learn Python in 2021 Wondering if it’s worth taking a Python course? Here’s why you should absolutely learn Python in 2021. Read more Getting Started with Python Part 1: Data Types Introduction to Python data types and Python programming, giving you an easy way to understand string, integer and float data types Read more 9 Best Online Resources to Start Learning Python Today Want to start learning Python online but struggling to find quality resources? Read more Subscribe to our newsletter Join our weekly newsletter to be notified about the latest posts.