Back to articles list Articles
9 minutes read

Most Popular Python Packages in 2021

Python packages provide a beginner-friendly and efficient way to solve complex problems in scientific computing, data visualization, data modeling, and many other fields. Let's review 2021's most popular Python packages for data analysts and developers.

With the rise of data science and artificial intelligence, Python became one of the most popular programming languages. It's preferred by top organizations, including Netflix, Uber, IBM, AstraZeneca, NASA, and the CIA. And Python isn't limited to data science and AI; it's used in many industries, including blockchain, physics, astronomy, medicine, game development, and entertainment.

Python has several key features that make it so popular: it is beginner-friendly, supports many career paths, and has a welcoming community. However, one of the key reasons to learn Python is the language's rich and varied ecosystem. Think of any random task and there's a good chance that Python has a module or package that can make your work much more efficient.

What Is a Python Package?

Complex tasks are better solved step by step, one subtask at a time. That's why programmers create and use modules, or sets of related code saved in separate files and aimed at solving specific tasks.

When you have many different modules, you'll definitely want to group and organize them. A Python package is a directory of a collection of modules. Just as you organize your computer files into folders and sub-folders, you can organize modules into packages and sub-packages.

Each package should contain a file named __init__.py. This file usually includes the initialization code for the corresponding package.

Here's an example of the my_model package with three sub-packages: training, submission, and metrics.

Python packages

To access code from a Python package, you can either import the entire package or its specific modules and sub-packages.

For example, to get access to the code defined in precision.py, you can:

  • Import the whole package with import my_model;
  • Import the metrics sub-package with import my_model.metrics;
  • Import the precision.py module with either of these code snippets:
    import my_model.metrics.precision
    # or
    from my_model.metrics import precision
    

You don't necessarily need to create your own Python packages to enjoy the benefits of this tool. There are many built-in and third-party packages that you can use in your work. Let's review the most popular Python packages for 2021.

Top 10 Python Packages in 2021

Python packages streamline many significant processes, like analyzing and visualizing data, building machine learning models, capturing unstructured data from the web, and processing image and text information efficiently. Here are some of 2021's most important Python packages:

1. NumPy

NumPy is the primary tool for scientific computing in Python. It combines the flexibility and simplicity of Python with the speed of languages like C and Fortran.

Python packages

NumPy is used for:

  • Advanced array operations (e.g. add, multiply, slice, reshape, index).
  • Comprehensive mathematical functions.
  • Random number generation.
  • Linear algebra routines.
  • Fourier transforms, etc.

With NumPy, you are getting the computational power of compiled code, while using accessible Python syntax. No wonder that there is a huge ecosystem of Python packages and libraries drawing on the power of NumPy. These include such popular packages as pandas, Seaborn, SciPy, OpenCV, and others.

2. pandas

If you work with tabular, time series, or matrix data, pandas is your go-to Python package. It is known as a fast, efficient, and easy-to-use tool for data analysis and manipulation. It works with data frame objects; a data frame is a dedicated structure for two-dimensional data. Data frames have rows and columns just like database tables or Excel spreadsheets.

Among other things, pandas can be used for:

  • Reading/writing data from/to CSV and Excel files and SQL databases.
  • Reshaping and pivoting datasets.
  • Slicing, indexing, and subsetting datasets.
  • Aggregating and transforming data.
  • Merging and joining datasets.

If you want to learn how to use data frames in pandas and how to calculate descriptive statistics using its basic statistics functions, consider taking this interactive Python for Data Science track.

3. Matplotlib

Matplotlib is the most common data exploration and visualization library. You can use it to create basic graphs like line plots, histograms, scatter plots, bar charts, and pie charts. You can also create animated and interactive visualizations with this library. Matplotlib is the foundation of every other visualization library.

The library offers a great deal of flexibility with regards to formatting and styling plots. You can freely choose how to display labels, grids, legends, etc. However, to create complex and visually appealing plots, you'll need to write quite a lot of code.

For example, let's say we want to draw two line plots: y = 2x and z = x2, where x is in the range [0; 100].

We'll first calculate these variables using NumPy.

import numpy as np
x = np.arange(0,100)
y = x*2
z = x**2

Then, we use Matplotlib to create two subplots for two functions and customize their formatting and style:

import matplotlib.pyplot as plt
%matplotlib inline
plt.show()

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12,2))

axes[0].plot(x,y, color="green", lw=3)
axes[0].set_xlabel('x')
axes[0].set_ylabel('y')

axes[1].plot(x,z, color="blue", lw=2, ls='--')
axes[1].set_xlabel('x')
axes[1].set_ylabel('z')
Python packages

As you can see, Matplotlib syntax allows you to have several subplots in one plot, set any labels, choose line color, width, style, etc. However, every action requires additional code, and creating a visually appealing plot might turn into a very tedious and time-consuming task. Depending on your task, you may find it more effective to use a different visualization package.

Learn the basics of data visualization in Python with the Introduction to Python for Data Science course. You'll learn how to create simple data visualizations with matplotlib.

4. Seaborn

Seaborn is a high-level interface for drawing attractive statistical graphics with just a few lines of code. Let's see it in action.

We'll use the famous iris flower dataset in our example. For those not familiar with it, this dataset includes four features – the length and the width of the sepals and petals – for three species of iris (Iris setosa, Iris virginica, and Iris versicolor). We want to see how these four features relate to one another depending on the iris species.

Here's how seaborn's pairplot function solves this task. Notice that you can create a complex and visually appealing plot with just three lines of code:

import seaborn as sns
iris = sns.load_dataset('iris')
sns.pairplot (iris, hue = 'species', palette = 'pastel')
Python packages

Note how all labels, styles, and a legend have been set automatically. Similarly, you can easily create complex heatmaps, violin plots, joint plots, multi-plot grids, and many other types of plots with this library.

5. scikit-learn

Do you want to run a regression? Or maybe you have a data classification problem? scikit-learn is an efficient and beginner-friendly tool for predictive data analysis. Among other things, you can use scikit-learn to:

  • Identify which category an object is likely to belong to (used in fraud detection, image recognition, cancer detection, etc.).
  • Predict a continuous variable based on available features (used in predicting house prices and inflation).
  • Group similar objects into clusters (used in customer segmentation, social network analysis, etc.).
Python packages

scikit-learn makes machine learning with Python accessible to people with minimal programming experience. With just a few lines of code, you can model your data using algorithms like random forest, support vector machines (SVM), k-means, spectral clustering, and more.

6. Requests

This library is designed to make HTTP requests with Python more responsive and user friendly. The intuitive JSON method offered by Requests helps you avoid manually adding query strings to URLs. With Requests, you can:

  • Customize, inspect, authorize, and configure HTTP requests.
  • Add parameters, headers, and multi-part files.
  • Decompress data automatically.
  • Upload multiple files at the same time.

This package is a real blessing for beginners and advanced users, making it one of the most downloaded Python packages.

7. urllib3

urllib3 is another user-friendly HTTP client for Python. It is currently the most downloaded PyPi package, and it powers Requests and some other popular Python packages. urllib3 provides many critical features missing from the standard libraries:

  • Thread safety.
  • Connection pooling.
  • Retrying requests.
  • Dealing with HTTP redirects.
  • Full test coverage.

8. NLTK

Natural Language Toolkit (NLTK) is one of the leading Python platforms for processing language data. It is a set of language processing libraries and programs that provide a toolkit for:

  • Classification.
  • Tokenization.
  • Stemming.
  • Tagging.
  • Parsing.
  • Semantic reasoning.

NLTK is a go-to tool for computational linguistics in Python. It's highly valued by linguists, engineers, researchers, and industry users.

If you are new to natural language processing, you may benefit from the Working with Strings in Python course, which is part of our interactive Python for Data Science track.

9. Pillow

If you work with image data, make sure to check out the Pillow package. It is a fork of PIL (Python Image Library) that developed into an easy-to-use and efficient tool for image manipulation in Python.

With Pillow, you can:

  • Open and save images of different file types (JPEG, PNG, GIF, PDF, etc.).
  • Create thumbnails for images.
  • Use a collection of image filters (e.g. SMOOTH, BLUR, SHARPEN).

This is a great image manipulation tool for beginners, and it has fairly powerful image processing capabilities.

10. pytest

This package provides a variety of modules for testing new code, including small unit tests and complex functional tests for applications and libraries.

Simple syntax and an extensive feature set make pytest one of the most-loved Python packages among programmers. This test automation framework provides:

  • Built-in support for test discovery.
  • Modular fixtures for test setup (e.g. setting up the database connection, URL, input data).
  • Rich plugin architecture (315+ external plugins).
  • Built-in unit tests.

pytest is a great tool for improving your programs. And well-tested programs are good programs!

It's 2021 – Time to Learn Python Packages!

If you are considering learning Python packages, you should start by learning the language itself. This will give a significant competitive advantage in the job market. Programmers, data analysts, marketers, office workers, scientists, doctors, and even artists can improve their daily work with Python.

To get a comprehensive understanding of Python basics as well as experience with real-world use cases, I recommend taking the interactive studying tracks offered by LearnPython.com:

  • Python Basics is a mini track for those who want to get started with programming. It includes three courses with a total of 229 coding challenges. These cover variables, if statements, loops, functions, basic data structures, and more.
  • Learn Programming with Python is an extended version of the Python Basics With five interactive courses and 419 coding challenges, you'll go beyond the basics and get practical experience with Python data structures and built-in algorithms.
  • Python for Data Science is for those interested in data analytics and data science. It includes five courses and 329 coding challenges that cover the pandas and Matplotlib packages, working with strings in Python, and processing CSV, Excel, and JSON files.

Bonus. Here are some ideas for your first data science projects. Have fun!