Back to articles list Articles
8 minutes read

Object Serialization in Python

Serialization is a useful technique for saving complex objects.

In this article, we give you an introduction to object serialization in Python and explain why it is important. Serialization is essentially a way of storing data or objects and is a useful technique for saving complex objects. It’s the process of converting an object into a byte stream that can be stored, for example in memory or to a file. This file can then be reconstructed or deserialized to retrieve the original object, then be reused in different programs or even different environments.

Below, we show you some examples of how to use some of the most popular tools for serialization in Python. If you are new to Python and want to build your skills in processing data, check out this track. It teaches you all you need to know about processing various file formats with Python.

A Basic Class

Remember that Python is an object-oriented language; almost everything is considered an object. This means you can serialize data stored in a list, a dictionary, or even instances of classes.

Let’s get into it and create a class for driving a car. The focus of this article isn’t on classes, so if you want more details on how classes work, take a look at this article. We also have some material that builds on top of that and shows you how to write a custom module in Python.

An object in Python has attributes (like its properties) and methods (things it can do). The important attributes for our car class are the current fuel level and the efficiency (how much fuel it consumes per kilometer). These are defined in __init__(). Cars can also have several actions associated with them, such as driving a certain distance. These are the class methods, which are written as functions within the class.

The following class allows you to drive a car a certain distance given its fuel level and efficiency:

class Car:
    def __init__(self, efficiency, fuel_level):
        self.efficiency = efficiency
        self.fuel_level = fuel_level
    
    def drive(self, distance):
        max_distance = self.fuel_level * self.efficiency
        
        if distance > max_distance:
            print('Traveled %s km, out of fuel'%(max_distance))
            self.fuel_level = 0
        else:
            self.fuel_level -= distance / self.efficiency
            print('Arrived safely!')

We can create a car object with an efficiency of 5 km/L and a fuel level of 12 L as follows:

fast_car1 = Car(5, 12)

Let’s take it for a drive for 8 km, then check the fuel level:

>>> fast_car1.drive(8)
>>> fast_car1.fuel_level
10.4

So far so good. Now we would like to serialize the fast_car1 object so we can come back and use it later without having to instantiate it again. Enter pickle.

pickle

The Python pickle module is an easy-to-use module for serializing (pickling) and deserializing (unpickling) objects in Python. A large number of objects can be pickled, including Booleans, integers, floats, and strings, as well as data structures such as lists, dictionaries, sets, etc. Functions and classes can be serialized, and as we will see below, so can instances of classes.

The Python pickle module stores the data in a binary form, so it isn’t human-readable. It comes with four methods. The two we’ll use for our case are dump() and load(), which saves and loads pickle files, respectively. The two other methods are dumps() and loads(), which serialize objects without saving them to file.

We’ll take a closer look at the first two below. But before we do, a word of caution: as stated in the official documentation, the Python pickle module is not secure against maliciously constructed data that can execute foreign code. Therefore, never unpickle data received from an untrusted source.

With the safety briefing over, let’s continue with an example of pickling and unpickling in Python:

with open('fast_car_object.pkl', 'wb') as out_file:
    pickle.dump(fast_car1, out_file)

Executing this code produces the file fast_car_object.pkl in your current working directory. To unpickle this file, simply do the following:

with open('fast_car_object.pkl', 'rb') as in_file:
    fast_car2 = pickle.load(in_file)

Note the different modes we use for serializing ('wb') and deserializing ('rb'). The two objects fast_car1 and fast_car2 are two distinct objects with different locations in memory; however, they have the same attributes and methods. By the way, if you’re not familiar with using the with statement to save a file, here’s some material about writing to file in Python.

We mentioned almost everything in Python is an object. A list is an object, which has attributes and methods. For example, list.append() is an easy way of appending arbitrary data to a list, and list.reverse() reverses the elements. There are many more you should be familiar with. Now, try serializing a list with pickle as we did above. Or better yet, try it with a dictionary. The nested data structure would be a little cumbersome to save to a CSV or text file, but it’s a two-liner with the pickle module.

JSON Serialization in Python

JSON stands for JavaScript Object Notation and is a lightweight format for storing data. Data stored in this format has a similar structure to a Python dictionary, so it shouldn’t look too foreign. If you’re not familiar with working with JSON files, take a look at this course. It contains interactive exercises and teaches you all you need to know.

Python has a module, json, which is useful if you’re looking to encode or decode data in this format. Reasons to choose this method over the pickle module include that it’s standardized and language-independent. It is also much more secure and is human-readable.

The json module can be used to serialize objects in Python. It implements the same four basic methods we have seen above. Check out the documentation for more information and many more examples.

Let’s start with a simple example of serializing a list with the json module. Here, we use the dumps() method, which doesn’t save the data to file but rather serializes it to a string:

	>>> import json
	>>> lst = [1, 2, 3, 'a', 'b', 'c']
	>>> lst_dump = json.dumps(lst)
	>>> lst_dump
	'[1, 2, 3, "a", "b", "c"]'

Now, if we try the same to serialize our fast_car1 object we instantiated above, we run into a problem:

>>> car_dump = json.dumps(fast_car1)
TypeError: Object of type Car is not JSON serializable

The JSON encoder implemented in the dump() and dumps() methods can serialize only a few basic object types. These are dictionaries, lists, strings, integers, floats, Booleans, and None. Complex objects like fast_car1 need to be custom-serialized to the JSON format by building a custom encoder in Python.

Writing a Custom Encoder

The way forward essentially boils down to representing the data in a dictionary json can serialize. You create a custom encoder class that extends the JSONEncoder class in the json module, then use the normal dump() or dumps() methods.

Let’s take a closer look at an example. Here, the Car class is the same as above, and there’s now a new class, CarJSONEncoder:

from json import JSONEncoder

class Car:
    def __init__(self, efficiency, fuel_level):
        self.efficiency = efficiency
        self.fuel_level = fuel_level

    def drive(self, distance):
        max_distance = self.fuel_level * self.efficiency
        
        if distance > max_distance:
            print('Traveled %s km, out of fuel'%(max_distance))
            self.fuel_level = 0
        else:
            self.fuel_level -= distance / self.efficiency
            print('Arrived safely!')

class CarJSONEncoder(JSONEncoder):
    def default(self, obj):
        return obj.__dict__
Then to JSON serialize our object, we do the following:
>>> fast_car1_json = json.dumps(fast_car1, cls=CarJSONEncoder)
>>> fast_car1_json
'{"efficiency": 5, "fuel_level": 10.4}'

Our custom class uses __dict__, which is a built-in attribute that stores an object's attributes as a key/value pair. Then we specify our custom class with the cls keyword argument in the dumps() method. The output shows our fast_car1 object has indeed been JSON serialized.

The Best of Both Worlds

So far, we’ve explored two methods to serialize objects in Python: first with the pickle module, and second by serializing to JSON with the json module and a custom encoder class. pickle is very user-friendly but not human-readable and not secure. json is the opposite.

The hard work of combining the best of these two methods has already been done for us, and is available in the cleverly named jsonpickle module. This module provides a set of tools to serialize complex Python objects into JSON and also handles the deserialization. jsonpickle builds on top of the basic object types that are JSON serializable and allows more complex objects to be serialized.

The easiest way to get your hands on this module is with a quick pip install command. This module comes with the same warning label that comes with pickle. Don’t use it to deserialize data from an untrusted source.

Its functionality is similar to what we’ve already seen in this article, so we won’t go into too much detail here. Needless to say, it comes with the encode() method to serialize and decode() to deserialize. It’s also highly customizable. Check out the documentation if you want more details and some quick examples.

Master Object Serialization in Python

We hope we have demystified what object serialization in Python is. We have shown you some useful tools to help you better manage your programs and your data. Make sure you get your hands dirty and play around with the examples shown here. Then, you will have mastered another aspect of Python in no time!