9th Nov 2021 12 minutes read

Map, Filter, Reduce – Working on Streams in Python

Do you know how to work with Python streams like Java streams?

A stream is a sequence of elements. With map(), filter(), and reduce() – the three cornerstone functions of functional programming – you can operate over a sequence of elements. In this article, we will learn how to work with streams in Python like we work with them in Java.

But first, let’s say a word about functional programming.

What Is Functional Programming?

Functional programming is a programming paradigm that breaks down a problem into individual functions. Every function, if possible, takes a set of input arguments and produces an output. In this paradigm, we avoid mutable data types and state changes as much as possible.

It also emphasizes recursion rather than loops, focusing on lists, pure functions, and higher-order functions.

In this article, we will explore map(), filter(), and reduce() in Python. These are the Python methods used to perform the mapping, filtering, and reducing operations that are fundamental in functional programming.

First, let’s note that map(), filter(), and reduce() are written in C and are highly optimized in terms of speed and memory usage, making them a better choice than the regular Python for loop.

As a prerequisite, it is essential to have some knowledge of functions in Python. If you need a refresher, refer to the article How to Define a Function in Python.

Working on Streams in Python: map()

map() takes a function and one or more iterables as arguments. The output is an iterator that returns the transformed items.

Here is the syntax:

map(function, iterable[, iterable1, iterable2,..., iterableN])

This first argument to map() is a transformation function, where each original item is transformed into a new one. It can be any Python callable.

Suppose you need to take a list of numeric values and transform it into a list containing the cube value of every number in the original list. You can use a for loop and code something like this:

>>> # Define numbers to transform and an empty cube list
>>> num = [2, 3, 6, 9, 10]
>>> cube = []

>>> # Define for loop to transform the numbers
>>> for n in num:
...     cube.append(n ** 3)

>>> # Compute cube of num
>>> cube
[8, 27, 216, 729, 1000]

This loop returns a list of cube values. The for loop iterates over num and applies a cube transformation on each value. Finally, it stores the resulting values in cube.

map() can achieve the same result without a for loop:

>>> # Define the transformation function
>>> def cube(num):
...   return num ** 3

>>> # List of numbers to transform
>>> num = [2, 3, 6, 9, 10]

>>> # Call map function to apply cube on each number
>>> cubed = map(cube, num)

>>> # Create a list containing the cubed values
>>> list(cubed)
[8, 27, 216, 729, 1000]

The above example illustrates how to transform a list of values with map() and a user-defined function.

Any kind of Python callable works with map() such as classes, instance methods, class methods, static methods, and functions.

A typical pattern when using map() is to use a Python lambda function as the first argument. Lambda functions are a handy way to pass an expression-based function to map(). To illustrate this, we can reuse the example of cube values using a Python lambda function:

>>> # List of input numbers to transform
>>> num = [2, 3, 6, 9, 10]

>>> # Define a lambda function to iterate on each value of num.
>>> cubed = map(lambda n: n ** 3, num)

>>> # Create a list containing the cubed values
>>> list(cubed)
[8, 27, 216, 729, 1000]

If you enter multiple iterables to map(), then the transformation function must take as many arguments as the number of iterables you pass in. Each iteration will pass one value from each iterable as an argument to the function.

When multiple iterables are passed, map() will group elements across the iterables. For example, it will take each first element and pass it to the function.

This technique is useful to merge two or more iterables of numeric values that use different math operations. Here are some examples that use the Python lambda functions to compute various math operations on several input iterables:

>>> list(map(lambda x, y: x / y, [6, 3, 5], [2, 4, 6]))
[3.0, 0.75, 0.8333333333333334]

>>> list(map(lambda x, y, z: x * y + z, [6, 2], [7, 3], [8, 10]))
[50, 16]

In the first example, we use a divide operation to merge two iterables of three items each. In the second example, we multiply and add together the values of three iterables as 6 x 7 + 8 = 50 and 2 x 3 + 10 = 16.

Also, map() is helpful to process and transform iterables of numeric values; a lot of math-related transformations can be performed with map().

We should also mention starmap(), which is very similar to map(). According to the Python documentation, starmap() is used instead of map() when the argument parameters are already grouped in tuples from a single iterable, meaning that the data has been “pre-zipped”.

To call starmap(), we need to import itertools. Let’s run a quick example of this:

>>> import itertools

>>> # Define a list of tuples
>>> num = [(2, 3), (6, 9), (10,12)]

>>> # Define a lambda function to a list of tuples
>>> multiply = itertools.starmap(lambda x,y: x * y, num)

>>> # Create a list containing the multiplied values
>>> list(multiply)
[6, 54, 120]

Working on Streams in Python: filter()

A filtering operation processes an iterable and extracts the items that satisfy a given operation. It can be performed with Python’s filter() built-in function.

The basic syntax is:

filter(function, iterable)

Filtering functions can filter out unwanted values and keep the desired values in the output. The function argument must be a single-argument function. It’s typically a boolean-valued function that returns either True or False.

The iterable argument can be any Python iterable, such as a list, a tuple, or a set. It can also hold generator and iterator objects. Note that filter() accepts only one iterable.

filter() is often used with a Python lambda function as an alternative way of defining a user-defined function. Let's run an example in which we want to get only the even numbers from a list:

>>> # List of numbers
>>> num = [12, 37, 34, 26, 9, 250, 451, 3, 10]
  
>>> # Define lambda function to filter even numbers
>>> even = list(filter(lambda x: (x % 2 == 0), num)) 
  
>>> # Print the even numbers
>>> print(even) 
[12, 34, 26, 250, 10]

The above example uses filter() to check whether numbers are even. If this condition is met and returns True, the even number "goes through the filter".

Note that it is possible to replace filter() with a list comprehension:

# Generate a list with filter()
list(filter(function, iterable))

# Generate a list with a list comprehension
[i for i in iterable if function(i)]

In both cases, the purpose is to return a list object.

When manipulating lists in Python, the list comprehension approach is more explicit than filter(). However, list comprehensions lack lazy evaluation. Also, by reading the code, we immediately know that filter() performs a filtering operation. In this sense, list comprehensions are not so explicit.

Using groupby() and sort() in Python

In this part, we will discuss other tools for working on streams in Python: sort() and groupby()

The sort() method is a helpful tool to manipulate lists in Python. For example, if you need to sort a list in ascending or reverse order, you can use the following:

>>> num = [24, 4, 13, 35, 28]

>>> # sort the list in ascending order
>>> num.sort()
>>> print(num)
[4, 13, 24, 28, 35]

And in descending order:

>>> # sort the list in descending order
>>> numbers.sort(reverse=True)
>>> print(numbers)
[35, 28, 24, 13, 4]

It is important to note that the sort() method mutates the original list and it is therefore impossible to revert back the list’s items to their original position.

Next, itertools.groupby() takes a list of iterables and groups them based on a specified key. The key is useful to specify what action has to be taken to each individual iterable. The return value will be similar to a dictionary, as it is in the {key:value} form. Because of this, it is very important to sort the items with the same key as the one used for grouping. This will ensure consistency in the code and avoid unexpected results.

Let’s run an example in which we have some monthly expenses stored as a list of tuples.

We want to group those expenses by the month and finally calculate the monthly total expenses.

>>> import itertools

>>> # Create a list of monthly spendings as a list of tuples  
>>> spendings = [("January", 25), ("February", 47), ("March", 38), ("March", 54), ("April", 67), 
             ("January", 56), ("February", 32), ("May", 78), ("January", 54), ("April", 45)]

>>> # Create an empty dictionary to store the data
>>> spendings_dic = {}

>>> # Define a func variable to specify the grouping key
>>> func = lambda x: x[0]

>>> # Group monthly spendings by month in a dictionary 
>>> for key, group in groupby(sorted(spendings, key=func), func):
...     spendings_dic[key] = list(group) 

>>> spendings_dic
{'April': [('April', 67), ('April', 45)],
 'February': [('February', 47), ('February', 32)],
 'January': [('January', 25), ('January', 56), ('January', 54)],
 'March': [('March', 38), ('March', 54)],
 'May': [('May', 78)]}

In the above snippet, we used sorted() instead of sort(). This is because we wanted to sort an iterable that was not yet a list.

Contrary to sort(), sorted() will create a copy of the original list, making it possible to retrieve the original order. Because sorted() needs to create a copy of the original list, it is slower than sort(). If you want to learn more about sorting in Python, I wrote an article that explains different ways of defining your own sorting criteria.

Finally, we can use map() from the previous section to sum the monthly expenses:

>>> # Apply map() to sum the monthly spendings
>>> monthly_spendings = {key: sum(map(lambda x: x[1], value)) for key, value in spendings_dic.items()}
>>> monthly_spendings
{'April': 112, 'February': 79, 'January': 135, 'March': 92, 'May': 78}

To learn about applying Python lambda expressions, filtering rows, and selecting columns in a Python data frame with Pandas, see Yigit Aras’ excellent article on filtering rows and selecting columns in a data frame.

Working on Streams in Python: reduce()

The reduce() function implements a technique called folding or reduction. It takes an existing function, applies it cumulatively to all the items in iterable, and returns a single final value.

reduce() was originally a built-in function and was supposed to be removed. It was moved to functools.reduce() in Python 3.0 because of some possible performance and readability issues.

Unless you cannot find any solution other than reduce(), you should avoid using it. The reduce() function can create some abysmal performance issues because it calls functions multiple times, making your code slow and inefficient.

Whenever possible, work with a dedicated function to solve these use cases. Functions such as sum(), any(), all(), min(), max(), len(), math.prod() are faster, more readable, and Pythonic. Those functions are also highly optimized and implemented in C, making them fast and efficient.

reduce() can also compromise the readability of your code when you use it with complex user-defined functions or lambda functions. reduce() will generally perform better than a Python for loop, but as Python creator Guido Van Rossum explained, a Python loop is often easier to understand than reduce(). He recommends that the applicability of reduce() be limited to associative operators.

For the sake of being complete in explaining the three main methods used in functional programming, I will briefly explain reduce() along with some use cases.

reduce() has the following syntax:

functools.reduce(function, iterable[, initializer])

Python documentation refers to the first argument of reduce() as “a function of two arguments”. However, we can pass any Python callable as long as there are two arguments. Callable objects include classes, instance methods, class methods, static methods, and functions.

The second required argument, iterable, can be any Python iterable. The official Python glossary defines an iterable as “an object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() method or with a __getitem__() method that implements Sequence semantics.”

The initializer argument of reduce() is optional. If you supply a value to the initializer, then reduce() will feed it to its first argument's first call of the function. Otherwise, it'll use the first value from the iterable.

If you want to use reduce() to process iterables that may be empty, then it is a good practice to provide a value to the initializer. This value will be used as the default return value when the iterable is empty. If you don’t provide any value, reduce() will raise a TypeError.

Let’s run some examples. As with the previous section, we can use reduce() to calculate yearly expenses:

>>> from functools import reduce
>>> yearly_spendings = reduce(lambda x, y:x + y, monthly_spendings.values())
>>> print(yearly_spendings)
496

The examples below are more difficult, but they are useful reduce() use cases. Feel free to play with the code a bit to familiarize yourself with the concepts.

We want to turn a list of [[1, 3, 5], [7, 9], [11, 13, 15]] into [1, 3, 5, 7, 9, 11, 13, 15].

We can do it as follows:

>>> from functools import reduce
>>> reduce(list.__add__, [[1, 3, 5], [7, 9], [11, 13, 15]], [])
[1, 3, 5, 7, 9, 11, 13, 15]

We can also use reduce() to find the intersection of n number of lists. For example:

>>> from functools import reduce

>>> num = [[5, 7, 8, 10, 3], [5, 12, 45, 8, 9], [8, 39, 90, 5, 12]]

>>> res = reduce(set.intersection, map(set, num))
>>> print(res)
{8, 5}

The output is a set. You can find more information about sets in Python here.

Despite the examples mentioned above, the number of reduce() use cases is minimal, which explains why it has been removed from the built-in functions in Python 3. Most of the time, you’ll be better off using another method to manipulate lists in Python.

Closing Thoughts on Python Streams

In this article, you learned about functional programming in Python and its three main methods, map(), filter(), and reduce(). You can use them to manipulate lists in Python. We also discussed how to use groupby() and sort().

All these methods make it easier to work on streams in Python. I encourage you to play with them, explore what they do, and compare the results. You can also discover more resources on LearnPython.com to learn more about Python in general.

Tags: