23rd Dec 2021 7 minutes read How to Download a File in Python Xavier Rigoulet python Did you know you can download a file programmatically in Python? I will show you how to fetch and save a file in Python. This process is known as web scraping and is an essential step of any data-related project. Web scraping is the process of collecting data from a website. While it can be done manually by a user, it usually refers to an automated method of data collection with the help of a web crawler. You can do all of this programmatically in Python. By the end of this article, you will know how to download any kind of file in Python, including PDFs, images, videos, and pages. The process is similar between different types of files. To get the most out of this article, it is good to have a basic understanding of programming in Python. Also, to save time and accelerate your learning, I encourage you to check our Python programming track. To download a file in Python, we need to fetch it and save it. This process can be done by calling an API or with just a regular web URL pointing to a GIF you like. Before going further, let’s understand REST APIs. A REST API is a service that allows you to access and manipulate data such as text files, images, services, and collections of other resources on a server via REST mechanisms. An API helps improve the portability of client apps and eases the evolving process of the different components of a product. These APIs usually return UTF-8 encoded JSON objects as the resource. There are two fundamental steps to making a request when working with REST APIs. First, the client accesses a specific location on a REST API and states the method to be executed. This is known as a request. Second, the server executes the method and returns the data to the client. This is known as a response. Authentication is a critical component of internet security. Any REST API that lets clients access or modify sensitive or critical data must have an authentication system in place. Even if the API is free, the owner may introduce authentication to limit the number of requests per user. For this tutorial, we will fetch and save files in Python from place.dog and randomfox.ca. No authentication is required, so you can reuse the code snippets to download a file in Python. You can find a list of public APIs here. First, we will download a file in Python over HTTP. Later, we will download a file in Python from an API. Let’s get right to it! Download a File in Python Over HTTP In our first example, we will fetch and save a picture of a dog. This website offers random pictures of dogs you can use as placeholders for your next project. If you refresh the page, it generates another dog picture. We will use the requests library, which makes HTTP requests simpler than using the built-in urllib library. You may have to install the requests library with the following command: pip install requests Then, we import requests, set the url variable with our target URL, write a GET request and check its status. The following are the different types of response status you may face when writing a GET request: 1xx Informational. It indicates that a request has been received and the client should continue to make requests for the data payload. 2xx Successful. It indicates a requested action has been received, understood, and accepted. It helps you verify the data exists before working on it. 3xx Redirection. It indicates the client must take additional action to complete the request, such as using a proxy or a different endpoint to access the resources. 4xx Client Error. It indicates problems with the client, for example, disallowed methods, authorization issues, forbidden access, or attempts to access resources that do not exist. 5xx Server Error. It indicates problems with the server providing the API. Let’s write a request to fetch a file in Python. >>> import requests >>> url = 'https://place.dog/300/200' >>> # fetch file >>> response = requests.get(url, allow_redirects=True) >>> # Get response status >>> response.status_code 200 The 200 status code indicates the request is successful and the data exists. From there, we continue to the next step and save a file in Python with the help of the write() method. The 200 status code indicates the request is successful and the data exists. From there, we continue to the next step and save a file in Python with the help of the write() method. Now, the file has been saved as dog1.jpg and contains a picture of a dog. For a good refresher on the write() method to save a file in Python, check my article on how to write to file in Python here. Download a File in Python From an API Now, let's explore how to fetch and save a file in Python by calling an API and parsing the JSON file. In contrast to what we have done previously, we will save the file with pathlib. Most of the data available online are in the form of JSON (JavaScript Object Notation). It is used to store information in databases and is the most common data type you'll find when working with modern REST APIs. JSON data structures may be unordered name-value pairs, such as dictionaries, hash tables, objects, or keyed lists depending on the programming language, or an ordered list of values such as arrays, lists, and vectors. JSON can be difficult for humans to read and use directly. Python has different libraries to help us read the JSON data fetched from the web to resolve this problem. Among them is the JSON library with built-in support for converting JSON components into native Python objects. The following table shows the conversion mapping between JSON and Python: JSONPython objectdictionary arrayList or tuple stringstring numberInteger or float trueTrue falseFalse nullNone You have to deal with JSON data often when working with REST APIs. You can find more information about JSON in our course on How to Read and Write JSON Files in Python. The requests library has many features, but we only need the GET request and the json() formatter for the following example. As we have done previously, the first step is to import the requests library. Then, we create a GET request to the API endpoint we want to access. The API provides a response object that includes the JSON data. We are only interested in the JSON data, which is returned with the json() module. >>> import requests >>> url = "https://randomfox.ca/floof" >>> # fetch file >>> response = requests.get(url, allow_redirects=True) >>> # get json data >>> json = response.json() >>> print(json) {'image': 'https://randomfox.ca/images/2.jpg', 'link': 'https://randomfox.ca/?i=2'} The json output is similar to a Python dictionary. We extract the URL of the image as follows: >>> img = json['image'] >>> print(img) https://randomfox.ca/images/2.jpg Next, we want to save the image. As mentioned previously, we use pathlib, an object-oriented framework to handle filesystem paths. One of its advantages is its better portability between operating systems. You can find more information about pathlib in my article on how to rename files. To save the picture of our fox, we will use the Path.write_bytes(data) method to open the path in binary/bytes mode and write data to it. >>> # import Path class from pathlib >>> from pathlib import Path >>> # define filename >>> filename = Path('fox.jpg') >>> # fetch file >>> response = requests.get(img) >>> # save file >>> filename.write_bytes(response.content) Our file has now been saved as fox.jpg. We just saw how to extract the URL in the API response by inspecting the json data. Closing Thoughts on How to Download a File in Python We have now learned how to download a file in Python over HTTP and from an API. I encourage you to play with the code and fetch files from different APIs. There is a lot more to learn about JSON, which is a widespread and handy format to store data. You can find more about it and Python programming with our Python programming track. Last but not least, it is always a good idea to reflect on your Python programming skills. To help you with this process, check out my article on Things That Can Help You Write Better Python Code and browse our content on LearnPython.com. Keep learning every day! Tags: python