19th Oct 2022 8 minutes read

Top Python Libraries for SQL

Did you know that Python and SQL can work together? Find out all about Python’s SQL libraries in this article.

Along with Python, SQL is a very common and useful tool for working with data. SQL stands for Structured Query Language; it’s a language used to manage data in a relational database. Although SQL is a different language to Python and has its own syntax, Python has libraries which can be used to connect to a database and write SQL queries to retrieve data – all from within your Python script. Then you can have your data in a familiar format (such as a NumPy array or a pandas DataFrame) and start your analysis.

We recently discussed the Python Libraries Every Programming Beginner Should Know. But in this article, we’ll show you two of the best Python libraries for SQL. If you haven’t encountered SQL before, don’t worry. It has a clear syntax that’s easily read and understood, and the basics can be picked up very quickly.

This article is aimed at data analysts who have a little more background in Python. If you want some relevant learning material, check out our Introduction to Python for Data Science course. For more comprehensive training, our Python for Data Science track bundles 5 useful courses. But before we start discussing the top Python libraries for SQL, we need to cover some background on databases.

Relational Databases and SQL

Databases can have different structures; these determine how the data is stored. A hierarchical model, for example, organises data into a tree-like structure. A network model can be used to represent objects and their relationships in a network structure, which can be more flexible than a hierarchical structure.

A relational model is the most common type of database. This is the model which SQL is designed to handle. Relational models organise data into tables; each table contains rows and columns. The columns represent attributes and the rows are instances. If you’ve worked with spreadsheets like Excel or with pandas DataFrames, this should be conceptually familiar to you.

SQL can be used to write queries to store, manage, and retrieve these data tables. A SQL query contains keywords and references to specific tables and variables. For example, we could have a database with two tables, which looks like this:

We can write a query to retrieve the name and age variables from the employee_data table like this:

>>> select name, age from employee_data

SQL is not case sensitive, so you’ll often see the keywords like select and from written in capitals. This can help with readability when your queries get long and complicated. We could also link the employee_data and job_data tables together to find out how many hours each employee works per week.

There are a few tools that connect to a database and write the queries, but we won’t go into them here. Instead, we’ll discuss how to do all this from within your Python scripts.

Two Ways to Use SQL Inside Python

1. MySQL

MySQL is an open-source relational database management system that’s widely used by many companies and web applications. It is written in C and C++ and typically runs on an external server dedicated to storing the data. This means you need to connect to the server to access the data from your local machine by writing SQL queries. This can all be done from within Python.

Firstly you’ll need to install the MySQL connector for Python, which can be done with pip:

>>> pip install mysql-connector-python

Now you can use Python code to connect to the external server and access the data. You’ll need to have a username and password already set up and know the IP address of the server.

>>> import mysql.connector
>>> db = mysql.connector.connect(user='julia',
				   password='supertopsecretpassword',
				   host='111.1.1.1',
				   database='employees')

To access a table in the database, you need to create a cursor object to help navigate through the database:

>>> cur = db.cursor()

You can now execute SQL queries using this cursor object. For example, you can use the same query we saw in the previous section:

>>> cur.execute('select name, age from employee_data')

Use the execute method to run any SQL command here. You can also use it to create new tables and manipulate them. To view the data in the cursor object, you can use a loop to print out the rows:

>>> for data in cur:
... 	print(data)
('andrea', 32)
('phillip', 25)

This will print all rows in the table, which may not be a good idea if you’re working with large tables. Sometimes, just summarising the data is enough, like we discuss in How to Summarize Data in Python. Or, instead of simply printing the data, you could insert it into a pandas DataFrame. Try it for yourself. Then you can jump straight into producing visualisations of your data and doing your analysis. Here’s an example of How to Plot a Running Average in Python Using matplotlib.

We mentioned it’s common to use MySQL to access data on a remote server. This doesn’t have to be the only way to use MySQL; if you don’t have access to a database on a remote server, you can set up a relational database on your local machine for practice. First, you’ll need to download the MySQL Installer. Then – with the connector we installed above using pip – you can practice creating and manipulating tables within Python.

2. SQLite

The next tool we’ll discuss is SQLite. It is very useful for smaller projects where you want to take advantage of database functionality without having to create a heavy-duty database. With SQLite, you can create a serverless database on your local machine. Like its name suggests, this is a very lightweight library, coming in at around 250 KB. (MySQL requires about 600 MB).

Conveniently, SQLite is part of the Python standard library; you don’t need to worry about installing it separately. You can either create an in-memory database or write your data to a file. We’ll do the latter.

Let’s jump in and import the library into Python and create a database:

>>> import sqlite3
>>> db = sqlite3.connect("database.db")

This will create a new file in your working directory named 'database.db'. If you already have this database file, the connect() method will load the existing file to the db variable.

As in the previous section, you’ll need to create a cursor. Then you can use the execute() method to write SQL queries. First, create a new table in the database file:

>>> cur = db.cursor()
>>> cur.execute("CREATE TABLE employee_data (
name text,
age integer,
job text
)")
>>> db.commit()

The SQL command here creates a new table with the column names and data types specified. (Don’t forget to commit your changes.) To add data to the table, simply do the following:

>>> cur.execute("INSERT INTO employee_data VALUES(
'andrea',
32,
'programmer'
)")
>>> db.commit()
>>> db.close()

You can run the execute() step multiple times to add more rows to the table. It’s always a good practice to close the connection when you’re finished.

Now that you’ve created the local relational database, you’ll want to access the data and start doing an analysis. In the first example, you used the cursor object to get the data; here, you’ll use pandas to read the data directly into a DataFrame:

>>> import pandas as pd
>>> db = sqlite3.connect("database.db")
>>> df = pd.read_sql_query("SELECT * FROM employee_data WHERE age=32", db)
>>> db.close()
>>> print(df)
   name	age	job
0  andrea   32	programmer

This code loaded the database and retrieved all columns where age equals 32. The result is stored directly in a DataFrame. You can also retrieve the data and store it in a NumPy array for efficient processing of numerical data. For more information on NumPy, here’s An Introduction to NumPy in Python.

Using Object-Relational Mappers to Handle Data

An object-relational mapper (ORM) is a library that allows you to transfer data stored in relational databases into a Python object. Using an ORM library lets you create, read, and modify data in the database by writing only Python code – no need for SQL queries. Indeed, this is one of the main advantages of using ORMs, which makes it a compelling choice if you have no background in SQL and don’t want to invest the time to learn it.

ORMs also make it possible to switch between different relational databases. If you developed a prototype application using SQLite, you could switch to MySQL to scale things up quicker. However, switching may not always be a good idea.

There are several Python ORM libraries to choose from. A good place to start is SQLAlchemy, which is a popular library due to its simplicity, speed, and feature set. Check out the SQLAlchemy documentation to see if it’s a good fit for your use case.

Python SQL Libraries: Where Should You Go From Here?

Becoming a successful data analyst requires skills and knowledge in many different analysis techniques and many different tools. Python is an essential tool for data scientists; knowing how to write SQL queries is a big advantage. Combining this with the ability to analyse data in Python will give you a competitive edge in the job market.

Tags: