mirror of
https://github.com/aljazceru/Tutorial-Codebase-Knowledge.git
synced 2025-12-19 07:24:20 +01:00
init push
This commit is contained in:
186
docs/Requests/01_functional_api.md
Normal file
186
docs/Requests/01_functional_api.md
Normal file
@@ -0,0 +1,186 @@
|
||||
# Chapter 1: The Simplest Way - The Functional API
|
||||
|
||||
Welcome to the world of `Requests`! If you need to get information from a website or interact with a web service using Python, `Requests` is your friendly helper.
|
||||
|
||||
Imagine you just want to quickly grab the content of a webpage, maybe check the latest news headlines from a site, or send a simple piece of data to an online service. How do you do that without getting bogged down in complex details?
|
||||
|
||||
That's where the **Functional API** of `Requests` comes in. It's the most straightforward way to start making web requests.
|
||||
|
||||
## What's the Functional API?
|
||||
|
||||
Think of the Functional API as a set of handy, ready-to-use tools right at the top level of the `requests` library. You don't need to set anything up; you just call a function like `requests.get()` to fetch data or `requests.post()` to send data.
|
||||
|
||||
**Analogy:** Ordering Takeout 🍕
|
||||
|
||||
Using the Functional API is like using a generic food delivery app (like DoorDash or Uber Eats) to order a pizza from a place you've never ordered from before.
|
||||
|
||||
1. You open the app ( `import requests`).
|
||||
2. You find the pizza place and tap "Order" (`requests.get('pizza_place_url')`).
|
||||
3. The app handles finding a driver, sending them to the restaurant, picking up the pizza, and delivering it to you (Requests does all the connection and fetching work).
|
||||
4. You get your pizza (`Response` object).
|
||||
|
||||
It's super convenient for a one-time order!
|
||||
|
||||
## Making Your First Request: `requests.get()`
|
||||
|
||||
The most common type of request is a `GET` request. It's what your web browser does every time you type a website address and hit Enter. It means "Please *get* me the content of this page."
|
||||
|
||||
Let's try it! First, make sure you have `requests` installed (`pip install requests`). Then, in your Python script or interactive session:
|
||||
|
||||
```python
|
||||
import requests # Import the library
|
||||
|
||||
# The URL we want to get data from
|
||||
url = 'https://httpbin.org/get' # A handy website for testing requests
|
||||
|
||||
# Use the functional API 'get' function
|
||||
print(f"Fetching data from: {url}")
|
||||
response = requests.get(url)
|
||||
|
||||
# Check if the request was successful (Status Code 200 means OK)
|
||||
print(f"Status Code: {response.status_code}")
|
||||
|
||||
# Print the first 200 characters of the content we received
|
||||
print("Response Content (first 200 chars):")
|
||||
print(response.text[:200])
|
||||
```
|
||||
|
||||
**What happened here?**
|
||||
|
||||
1. `import requests`: We told Python we want to use the `requests` library.
|
||||
2. `response = requests.get(url)`: This is the core magic! We called the `get` function directly from the `requests` module, passing the URL we want to visit.
|
||||
3. `requests` did all the work: connected to the server, sent the `GET` request, and received the server's reply.
|
||||
4. The reply is stored in the `response` variable. This isn't just the text of the page; it's a special `Response` object containing lots of useful information. We'll explore this more in [Request & Response Models](02_request___response_models.md).
|
||||
5. `response.status_code`: We checked the status code. `200` is the standard code for "Everything went okay!". Other codes might indicate errors (like `404 Not Found`).
|
||||
6. `response.text`: We accessed the main content (usually HTML or JSON) returned by the server as a string.
|
||||
|
||||
## Sending Data: `requests.post()`
|
||||
|
||||
Sometimes, instead of just getting data, you need to *send* data to a website. This is often done when submitting a form, logging in, or telling an API to perform an action. The `POST` method is commonly used for this.
|
||||
|
||||
The Functional API provides `requests.post()` for this purpose.
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
# The URL we want to send data to
|
||||
url = 'https://httpbin.org/post'
|
||||
|
||||
# The data we want to send (like form fields)
|
||||
# We'll use a Python dictionary
|
||||
payload = {'username': 'tutorial_user', 'action': 'learn_requests'}
|
||||
|
||||
print(f"Sending data to: {url}")
|
||||
# Use the functional API 'post' function, passing the data
|
||||
response = requests.post(url, data=payload)
|
||||
|
||||
# Check the status code
|
||||
print(f"Status Code: {response.status_code}")
|
||||
|
||||
# The response often echoes back the data we sent
|
||||
print("Response Content:")
|
||||
print(response.text)
|
||||
```
|
||||
|
||||
**What's new?**
|
||||
|
||||
1. `payload = {...}`: We created a Python dictionary to hold the data we want to send.
|
||||
2. `response = requests.post(url, data=payload)`: We called `requests.post()`. Notice the second argument, `data=payload`. This tells `requests` to send our dictionary as form data in the body of the `POST` request.
|
||||
3. The `response.text` from `httpbin.org/post` conveniently shows us the data it received, confirming our `payload` was sent correctly.
|
||||
|
||||
`Requests` also offers functions for other HTTP methods like `put`, `delete`, `head`, `patch`, and `options`, all working similarly: `requests.put(...)`, `requests.delete(...)`, etc.
|
||||
|
||||
## How It Works Under the Hood
|
||||
|
||||
You might wonder: if it's so simple, how does `requests.get()` actually connect to the internet and manage the request?
|
||||
|
||||
Every time you call one of these functional API methods (like `requests.get` or `requests.post`), `Requests` performs a few steps behind the scenes:
|
||||
|
||||
1. **Creates a temporary `Session` object:** Think of a `Session` as a more advanced way to manage requests, especially when you need to talk to the same website multiple times. We'll learn all about these in the [Session](03_session.md) chapter. For a functional API call, `requests` creates a *brand new, temporary* `Session` just for this single request.
|
||||
2. **Uses the `Session`:** This temporary `Session` is then used to actually prepare and send your request (e.g., the `GET` to `https://httpbin.org/get`).
|
||||
3. **Gets the `Response`:** The `Session` receives the reply from the server.
|
||||
4. **Returns the `Response` to you:** The function gives you back the `Response` object.
|
||||
5. **Discards the `Session`:** The temporary `Session` is immediately thrown away. It's gone.
|
||||
|
||||
**Analogy Revisited:** The generic delivery app (Functional API) contacts *a* driver (creates a temporary `Session`), tells them the restaurant and your order (sends the request), the driver delivers the food (returns the `Response`), and then the app forgets about that specific driver (discards the `Session`). If you order again 5 minutes later, it starts the whole process over with potentially a different driver.
|
||||
|
||||
Here's a simplified diagram of what happens when you call `requests.get()`:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User as Your Code
|
||||
participant FuncAPI as requests.get()
|
||||
participant TempSession as Temporary Session
|
||||
participant Server as Web Server
|
||||
|
||||
User->>FuncAPI: Call requests.get('url')
|
||||
FuncAPI->>TempSession: Create new Session()
|
||||
activate TempSession
|
||||
TempSession->>Server: Make HTTP GET request to 'url'
|
||||
activate Server
|
||||
Server-->>TempSession: Send HTTP Response back
|
||||
deactivate Server
|
||||
TempSession-->>FuncAPI: Return Response object
|
||||
FuncAPI-->>User: Return Response object
|
||||
deactivate TempSession
|
||||
Note right of FuncAPI: Temporary Session is discarded
|
||||
```
|
||||
|
||||
You can see a glimpse of this in the `requests/api.py` code:
|
||||
|
||||
```python
|
||||
# File: requests/api.py (Simplified view)
|
||||
|
||||
from . import sessions # Where the Session logic lives
|
||||
|
||||
def request(method, url, **kwargs):
|
||||
"""Internal function that handles all functional API calls."""
|
||||
|
||||
# Creates a temporary Session just for this one call.
|
||||
# The 'with' statement ensures it's properly closed afterwards.
|
||||
with sessions.Session() as session:
|
||||
# The temporary session makes the actual request.
|
||||
return session.request(method=method, url=url, **kwargs)
|
||||
|
||||
def get(url, params=None, **kwargs):
|
||||
"""Sends a GET request (functional API)."""
|
||||
# This is just a convenient shortcut that calls the main 'request' function.
|
||||
return request("get", url, params=params, **kwargs)
|
||||
|
||||
def post(url, data=None, json=None, **kwargs):
|
||||
"""Sends a POST request (functional API)."""
|
||||
# Another shortcut calling the main 'request' function.
|
||||
return request("post", url, data=data, json=json, **kwargs)
|
||||
|
||||
# ... similar functions for put, delete, head, patch, options ...
|
||||
```
|
||||
|
||||
Each function like `get`, `post`, etc., is just a simple wrapper that calls the main `request` function, which in turn creates and uses that temporary `Session`.
|
||||
|
||||
## When Is It Good? When Is It Not?
|
||||
|
||||
**Good For:**
|
||||
|
||||
* Simple, one-off requests.
|
||||
* Quick scripts where performance isn't critical.
|
||||
* Learning `Requests` - it's the easiest starting point!
|
||||
|
||||
**Not Ideal For:**
|
||||
|
||||
* **Multiple requests to the same website:** Creating and tearing down a connection and a `Session` for *every single request* is inefficient. It's like sending a separate delivery driver for each item you forgot from the grocery store.
|
||||
* **Needing persistence:** If the website gives you a cookie (like after logging in) and you want to use it on your *next* request to that same site, the functional API won't remember it because the temporary `Session` (which holds cookies) is discarded after each call.
|
||||
* **Fine-grained control:** If you need custom configurations, specific connection pooling, or advanced features, using a `Session` object directly offers more power.
|
||||
|
||||
## Conclusion
|
||||
|
||||
You've learned about the `Requests` Functional API – the simplest way to make web requests using functions like `requests.get()` and `requests.post()`. It's perfect for quick tasks and getting started. You saw how it works by creating temporary `Session` objects behind the scenes.
|
||||
|
||||
While convenient for single shots, remember its limitations for performance and state persistence when dealing with multiple requests to the same site.
|
||||
|
||||
Now that you know how to *send* a basic request, what exactly do you get *back*? Let's explore the structure of the requests we send and the powerful `Response` object we receive.
|
||||
|
||||
**Next:** [Chapter 2: Request & Response Models](02_request___response_models.md)
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
233
docs/Requests/02_request___response_models.md
Normal file
233
docs/Requests/02_request___response_models.md
Normal file
@@ -0,0 +1,233 @@
|
||||
# Chapter 2: What Happens When You Order? Request & Response Models
|
||||
|
||||
In [Chapter 1: The Simplest Way - The Functional API](01_functional_api.md), we saw how easy it is to fetch a webpage or send data using simple functions like `requests.get()` and `requests.post()`. We also noticed that these functions return something called a `Response` object.
|
||||
|
||||
But what exactly *is* that `Response` object? And what happens behind the scenes when `requests` sends your request? Just like ordering food involves more than just shouting your order and getting a meal, web requests have structured steps and data carriers. Understanding these helps you use `requests` more effectively.
|
||||
|
||||
## Why Models? The Need for Structure
|
||||
|
||||
Imagine ordering takeout again. You don't just tell the restaurant "food!"; you give them specific details: "One large pepperoni pizza, delivery to 123 Main St." The restaurant then prepares exactly that and delivers it back to you with a receipt.
|
||||
|
||||
Web requests work similarly. You need to tell the server:
|
||||
* *What* you want (the URL, like `/get` or `/post`).
|
||||
* *How* you want to interact (the method, like `GET` or `POST`).
|
||||
* *Any extra details* (like headers or data you're sending).
|
||||
|
||||
The server then replies with:
|
||||
* *If it worked* (a status code, like `200 OK` or `404 Not Found`).
|
||||
* *Information about the reply* (headers, like the content type).
|
||||
* *The actual stuff* you asked for (the content, like HTML or JSON).
|
||||
|
||||
`Requests` uses special Python objects to hold all this information in an organized way. These are the **Request and Response Models**.
|
||||
|
||||
## The Main Characters: Request, PreparedRequest, and Response
|
||||
|
||||
Think of the process like ordering at a restaurant:
|
||||
|
||||
1. **`Request` Object (Your Order Slip):** This is your initial intention. It holds the basic details of the request you *want* to make: the URL, the method (`GET`, `POST`, etc.), any headers you want to add, and any data you want to send. You usually don't create this object directly when using the simple functional API, but `requests` does it for you internally.
|
||||
* *Analogy:* You write down "Large Pizza, Pepperoni, Extra Cheese" on an order slip.
|
||||
|
||||
2. **`PreparedRequest` Object (The Prepared Tray):** This is the finalized, ready-to-go version of your request. `Requests` takes the initial `Request` object, processes it (encodes data, applies cookies, adds default headers like `User-Agent`), and gets it ready to be sent over the network. It contains the *exact* bytes and final details. This is mostly an internal step.
|
||||
* *Analogy:* The kitchen takes your slip, makes the pizza, puts it in a box, adds napkins and maybe a drink, and puts it all on a tray ready for the delivery driver.
|
||||
|
||||
3. **`Response` Object (The Delivered Meal):** This object represents the server's reply *after* the `PreparedRequest` has been sent and the server has responded. It contains everything the server sent back: the status code (Did the order succeed?), the response headers (What kind of food is this? How was it packaged?), and the actual content (The pizza itself!). This is the object you usually work with directly.
|
||||
* *Analogy:* The delivery driver hands you the tray with the pizza and receipt. You check the receipt (`status_code`, `headers`) and eat the pizza (`content`).
|
||||
|
||||
Most of the time, you'll interact primarily with the `Response` object. But knowing about `Request` and `PreparedRequest` helps understand what `requests` is doing for you.
|
||||
|
||||
## Working with the `Response` Object
|
||||
|
||||
Let's revisit our `requests.get()` example from Chapter 1 and see what useful things are inside the `response` object we get back.
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
url = 'https://httpbin.org/get'
|
||||
print(f"Fetching data from: {url}")
|
||||
response = requests.get(url)
|
||||
|
||||
# --- Exploring the Response Object ---
|
||||
|
||||
# 1. Status Code: Was it successful?
|
||||
print(f"\nStatus Code: {response.status_code}") # A number like 200 (OK) or 404 (Not Found)
|
||||
print(f"Was it successful (status < 400)? {response.ok}") # A boolean True/False
|
||||
|
||||
# 2. Response Headers: Information *about* the response
|
||||
print(f"\nResponse Headers (Content-Type): {response.headers['Content-Type']}")
|
||||
# Headers are like a dictionary (Case-Insensitive)
|
||||
print("All Headers:")
|
||||
for key, value in response.headers.items():
|
||||
print(f" {key}: {value}")
|
||||
|
||||
# 3. Response Content (Body): The actual data!
|
||||
# - As text (decoded using guessed encoding):
|
||||
print("\nResponse Text (first 100 chars):")
|
||||
print(response.text[:100])
|
||||
|
||||
# - As raw bytes (useful for non-text like images):
|
||||
print("\nResponse Content (bytes, first 20):")
|
||||
print(response.content[:20])
|
||||
|
||||
# 4. JSON Helper: If the content is JSON
|
||||
json_url = 'https://httpbin.org/json'
|
||||
print(f"\nFetching JSON from: {json_url}")
|
||||
json_response = requests.get(json_url)
|
||||
if json_response.ok and 'application/json' in json_response.headers.get('Content-Type', ''):
|
||||
try:
|
||||
data = json_response.json() # Decodes JSON into a Python dict/list
|
||||
print("Decoded JSON data:")
|
||||
print(data)
|
||||
print(f"Value of 'title': {data['slideshow']['title']}")
|
||||
except requests.exceptions.JSONDecodeError:
|
||||
print("Response was not valid JSON.")
|
||||
```
|
||||
|
||||
**What we learned from the `Response`:**
|
||||
|
||||
1. **`response.status_code`**: A standard HTTP status code number. `200` means "OK". `404` means "Not Found". Many others exist.
|
||||
2. **`response.ok`**: A quick boolean check. `True` if the status code is less than 400 (meaning success or redirect), `False` for errors (4xx or 5xx codes).
|
||||
3. **`response.headers`**: A dictionary-like object holding the response headers sent by the server (like `Content-Type`, `Date`, `Server`). It's case-insensitive, so `response.headers['content-type']` works too.
|
||||
4. **`response.text`**: The response body decoded into a string. `Requests` tries to guess the correct text encoding based on headers, or falls back to a guess based on the content itself. Good for HTML, plain text, etc.
|
||||
5. **`response.content`**: The response body as raw bytes, exactly as received from the server. Use this for images, downloads, or when you need precise control over decoding.
|
||||
6. **`response.json()`**: A convenient method that tries to parse the `response.text` as JSON and returns a Python dictionary or list. It raises an error if the content isn't valid JSON.
|
||||
|
||||
The `Response` object neatly packages all the server's reply information for you to use.
|
||||
|
||||
## How It Works Internally: From Request to Response
|
||||
|
||||
When you call `requests.get(url)`, the following happens under the hood (simplified):
|
||||
|
||||
1. **Create `Request`:** `Requests` creates a `Request` object containing the method (`'GET'`), the `url`, and any other arguments you provided (like `headers` or `params`). (See `requests/sessions.py` `request` method which creates a `models.Request`)
|
||||
2. **Prepare `Request`:** This `Request` object is then passed to a preparation step. Here, it becomes a `PreparedRequest`. This involves:
|
||||
* Merging session-level settings (like default headers or cookies from a [Session](03_session.md), which the functional API uses temporarily).
|
||||
* Encoding parameters (`params`).
|
||||
* Encoding the body (`data` or `json`).
|
||||
* Handling authentication (`auth`).
|
||||
* Adding standard headers (like `User-Agent`, `Accept-Encoding`).
|
||||
* Resolving the final URL.
|
||||
(See `requests/sessions.py` `prepare_request` method which calls `PreparedRequest.prepare` in `requests/models.py`)
|
||||
3. **Send `PreparedRequest`:** The `PreparedRequest`, now containing the exact bytes and headers, is handed off to a **Transport Adapter** (we'll cover these in [Transport Adapters](07_transport_adapters.md)). The adapter handles the actual network communication (opening connections, sending bytes, dealing with HTTP/HTTPS specifics). (See `requests/sessions.py` `send` method which calls `adapter.send` in `requests/adapters.py`)
|
||||
4. **Receive Reply:** The Transport Adapter waits for the server's reply (status line, headers, body).
|
||||
5. **Build `Response`:** The adapter takes the raw reply data and uses it to build the `Response` object you receive. It parses the status code, headers, and makes the raw content available. (See `requests/adapters.py` `build_response` method which creates a `models.Response`)
|
||||
6. **Return `Response`:** The `send` method returns the fully formed `Response` object back to your code.
|
||||
|
||||
Here's a diagram showing the journey:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant UserCode as Your Code (e.g., requests.get)
|
||||
participant Session as requests Session (Temporary or Explicit)
|
||||
participant PrepReq as PreparedRequest
|
||||
participant Adapter as Transport Adapter
|
||||
participant Server as Web Server
|
||||
participant Resp as Response
|
||||
|
||||
UserCode->>Session: Call get(url) / post(url, data=...)
|
||||
Session->>Session: Create models.Request object
|
||||
Session->>PrepReq: prepare_request(request) -> PreparedRequest
|
||||
Note over PrepReq: Encodes data, adds headers, cookies etc.
|
||||
Session->>Adapter: send(prepared_request)
|
||||
Adapter->>Server: Send HTTP Request bytes
|
||||
Server-->>Adapter: Send HTTP Response bytes
|
||||
Adapter->>Resp: build_response(raw_reply) -> Response
|
||||
Resp-->>Adapter: Return Response
|
||||
Adapter-->>Session: Return Response
|
||||
Session-->>UserCode: Return Response
|
||||
```
|
||||
|
||||
You can see the definitions for these objects in `requests/models.py`:
|
||||
|
||||
```python
|
||||
# File: requests/models.py (Highly Simplified)
|
||||
|
||||
class Request:
|
||||
"""A user-created Request object. Used to prepare a PreparedRequest."""
|
||||
def __init__(self, method=None, url=None, headers=None, files=None,
|
||||
data=None, params=None, auth=None, cookies=None, hooks=None, json=None):
|
||||
self.method = method
|
||||
self.url = url
|
||||
# ... other attributes ...
|
||||
|
||||
def prepare(self):
|
||||
"""Constructs a PreparedRequest for transmission."""
|
||||
p = PreparedRequest()
|
||||
p.prepare(
|
||||
method=self.method,
|
||||
url=self.url,
|
||||
# ... pass other attributes ...
|
||||
)
|
||||
return p
|
||||
|
||||
class PreparedRequest:
|
||||
"""The fully mutable PreparedRequest object, containing the exact bytes
|
||||
that will be sent to the server."""
|
||||
def __init__(self):
|
||||
self.method = None
|
||||
self.url = None
|
||||
self.headers = None
|
||||
self.body = None
|
||||
# ... other attributes ...
|
||||
|
||||
def prepare(self, method=None, url=None, headers=None, files=None, data=None,
|
||||
params=None, auth=None, cookies=None, hooks=None, json=None):
|
||||
"""Prepares the entire request."""
|
||||
# ... Logic to encode data, set headers, handle auth, etc. ...
|
||||
self.method = method
|
||||
self.url = # processed url
|
||||
self.headers = # final headers
|
||||
self.body = # encoded body bytes or stream
|
||||
# ...
|
||||
|
||||
class Response:
|
||||
"""Contains a server's response to an HTTP request."""
|
||||
def __init__(self):
|
||||
self._content = False # Content hasn't been read yet
|
||||
self.status_code = None
|
||||
self.headers = CaseInsensitiveDict() # Special dictionary for headers
|
||||
self.raw = None # The raw stream from the network connection
|
||||
self.url = None
|
||||
self.encoding = None
|
||||
self.history = [] # List of redirects
|
||||
self.reason = None # Text reason, e.g., "OK"
|
||||
self.cookies = cookiejar_from_dict({})
|
||||
self.elapsed = datetime.timedelta(0) # Time taken
|
||||
self.request = None # The PreparedRequest that led to this response
|
||||
|
||||
@property
|
||||
def content(self):
|
||||
"""Content of the response, in bytes."""
|
||||
# ... logic to read from self.raw if not already read ...
|
||||
return self._content
|
||||
|
||||
@property
|
||||
def text(self):
|
||||
"""Content of the response, in unicode."""
|
||||
# ... logic to decode self.content using self.encoding or guessed encoding ...
|
||||
return decoded_string
|
||||
|
||||
def json(self, **kwargs):
|
||||
"""Returns the json-encoded content of a response, if any."""
|
||||
# ... logic to parse self.text as JSON ...
|
||||
return python_object
|
||||
|
||||
# ... other properties like .ok, .is_redirect, and methods like .raise_for_status() ...
|
||||
```
|
||||
|
||||
Understanding these models gives you a clearer picture of how `requests` turns your simple function call into a network operation and packages the result neatly for you.
|
||||
|
||||
## Conclusion
|
||||
|
||||
You've learned about the core data carriers in `Requests`:
|
||||
* `Request`: Your initial intent.
|
||||
* `PreparedRequest`: The finalized request ready for sending.
|
||||
* `Response`: The server's reply, containing status, headers, and content.
|
||||
|
||||
While you mostly interact with the `Response` object after making a request, knowing about the `Request` and `PreparedRequest` helps demystify the process. You saw how to access useful attributes of the `Response` like `status_code`, `headers`, `text`, `content`, and the handy `json()` method.
|
||||
|
||||
In Chapter 1, we noted that the functional API creates a temporary setup for each request. This is simple but inefficient if you need to talk to the same website multiple times, perhaps needing to maintain login status or custom settings. How can we do that better?
|
||||
|
||||
**Next:** [Chapter 3: Remembering Things - The Session Object](03_session.md)
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
327
docs/Requests/03_session.md
Normal file
327
docs/Requests/03_session.md
Normal file
@@ -0,0 +1,327 @@
|
||||
# Chapter 3: Remembering Things - The Session Object
|
||||
|
||||
In [Chapter 1](01_functional_api.md), we learned the easiest way to make web requests using functions like `requests.get()`. In [Chapter 2](02_request___response_models.md), we looked at the `Request` and `Response` objects that structure our communication with web servers.
|
||||
|
||||
We also saw that the simple functional API methods like `requests.get()` are great for single, one-off requests. But what if you need to talk to the *same website* multiple times? For example, maybe you need to:
|
||||
|
||||
1. Log in to a website (which gives you a "session cookie" to prove you're logged in).
|
||||
2. Make several requests to access different pages that *require* you to be logged in (using that cookie).
|
||||
|
||||
If you use `requests.get()` for each step, you'll have a problem. Remember how `requests.get()` creates a *temporary* setup for each call and then throws it away? This means it forgets the login cookie immediately after the login request! Your next request will be like visiting the site as a brand new, logged-out user.
|
||||
|
||||
How can we make `Requests` remember things between requests, just like your web browser does when you navigate around a logged-in site?
|
||||
|
||||
## Meet the `Session` Object: Your Persistent Browser Tab
|
||||
|
||||
This is where the `requests.Session` object comes in!
|
||||
|
||||
Think of a `Session` object as a dedicated browser tab you've opened just for interacting with a specific website or web service. What does a browser tab do?
|
||||
|
||||
* **Remembers Cookies:** If you log in on a website in one tab, that tab remembers your login cookie. When you click a link *within that same tab*, the browser automatically sends the cookie back, keeping you logged in.
|
||||
* **Keeps Connections Warm:** Your browser often keeps the underlying network connection (TCP connection) to the website open for a little while. This makes clicking links and loading subsequent pages much faster because it doesn't have to establish a new connection every single time. This is called **connection pooling**.
|
||||
* **Applies Consistent Settings:** You might have browser extensions that add specific headers to your requests, or your browser sends a consistent "User-Agent" string identifying itself.
|
||||
|
||||
A `requests.Session` object does all of these things for your Python script:
|
||||
|
||||
1. **Cookie Persistence:** It automatically stores cookies sent by the server and sends them back on subsequent requests to the same domain.
|
||||
2. **Connection Pooling:** It reuses the underlying TCP connections for requests to the same host, significantly speeding up multiple requests. This is managed by components called [Transport Adapters](07_transport_adapters.md).
|
||||
3. **Default Data:** You can set default headers, authentication details, query parameters, or proxy settings directly on the `Session` object, and they will be applied to all requests made through that session.
|
||||
|
||||
## Using a `Session`
|
||||
|
||||
Using a `Session` is almost as easy as using the functional API. Instead of calling `requests.get()`, you first create a `Session` object, and then call methods like `get()` or `post()` on *that object*.
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
# 1. Create a Session object
|
||||
s = requests.Session()
|
||||
|
||||
# Let's try accessing a page that requires a login (we're not logged in yet)
|
||||
login_required_url = 'https://httpbin.org/cookies' # This page shows cookies sent to it
|
||||
print("Trying to access protected page without login...")
|
||||
response1 = s.get(login_required_url)
|
||||
print("Cookies sent (should be none):", response1.json()) # httpbin returns JSON
|
||||
|
||||
# Now, let's simulate 'logging in' by visiting a page that sets a cookie
|
||||
cookie_setter_url = 'https://httpbin.org/cookies/set/sessioncookie/123456789'
|
||||
print("\nSimulating login by getting a cookie...")
|
||||
response2 = s.get(cookie_setter_url)
|
||||
# The session automatically stored the cookie! Check the session's cookie jar:
|
||||
print("Session cookies after setting:", s.cookies.get_dict())
|
||||
|
||||
# Now, try accessing the 'protected' page again using the SAME session
|
||||
print("\nTrying to access protected page AGAIN with the session...")
|
||||
response3 = s.get(login_required_url)
|
||||
print("Cookies sent (should have sessioncookie):", response3.json())
|
||||
|
||||
# Compare with using the functional API (which forgets cookies)
|
||||
print("\nTrying the same with functional API (will fail)...")
|
||||
response4 = requests.get(cookie_setter_url) # Gets cookie, but immediately forgets
|
||||
response5 = requests.get(login_required_url)
|
||||
print("Cookies sent via functional API (should be none):", response5.json())
|
||||
```
|
||||
|
||||
**What happened here?**
|
||||
|
||||
1. `s = requests.Session()`: We created our "persistent browser tab".
|
||||
2. `response1 = s.get(login_required_url)`: Our first request sent no cookies, as expected.
|
||||
3. `response2 = s.get(cookie_setter_url)`: We visited a URL designed to send back a `Set-Cookie` header. The `Session` object automatically noticed this and stored the `sessioncookie` in its internal [Cookie Jar](04_cookie_jar.md).
|
||||
4. `s.cookies.get_dict()`: We peeked inside the session's cookie storage and saw the cookie was indeed saved.
|
||||
5. `response3 = s.get(login_required_url)`: We made *another* request using the *same* session `s`. This time, the session automatically included the `sessioncookie` in the request headers. The server received it!
|
||||
6. The last part shows that if we used `requests.get()` instead, the cookie from `response4` would be lost, and `response5` would fail to send it. The `Session` was crucial for remembering the cookie.
|
||||
|
||||
## Persistent Settings: Headers, Auth, etc.
|
||||
|
||||
Besides cookies, you can set other things on the `Session` that will apply to all its requests.
|
||||
|
||||
```python
|
||||
import requests
|
||||
import os # To get environment variables for auth example
|
||||
|
||||
s = requests.Session()
|
||||
|
||||
# Set a default header for all requests made by this session
|
||||
s.headers.update({'X-My-Custom-Header': 'HelloSession'})
|
||||
|
||||
# Set default authentication (using basic auth from environment variables for example)
|
||||
# NOTE: Replace with actual username/password or use httpbin's basic-auth endpoint
|
||||
# For httpbin, the user/pass is 'user'/'pass'
|
||||
# s.auth = ('user', 'passwd') # Set directly if needed
|
||||
httpbin_user = os.environ.get("HTTPBIN_USER", "testuser") # Fake user if not set
|
||||
httpbin_pass = os.environ.get("HTTPBIN_PASS", "testpass") # Fake pass if not set
|
||||
s.auth = (httpbin_user, httpbin_pass)
|
||||
|
||||
# Set default query parameters
|
||||
s.params.update({'session_param': 'persistent'})
|
||||
|
||||
# Now make a request
|
||||
url = 'https://httpbin.org/get' # Changed endpoint to see params
|
||||
print(f"Making request with persistent session settings to: {url}")
|
||||
response = s.get(url)
|
||||
|
||||
print(f"\nStatus Code: {response.status_code}")
|
||||
# Check the response (httpbin.org/get echoes back request details)
|
||||
response_data = response.json()
|
||||
print("\nHeaders sent (look for X-My-Custom-Header):")
|
||||
print(response_data['headers'])
|
||||
# print("\nAuth info sent (if using httpbin basic-auth):")
|
||||
# print(response_data.get('authenticated'), response_data.get('user')) # Won't show here for /get
|
||||
print("\nQuery parameters sent (look for session_param):")
|
||||
print(response_data['args'])
|
||||
|
||||
# Make another request to a different endpoint using the same session
|
||||
headers_url = 'https://httpbin.org/headers'
|
||||
print(f"\nMaking request to {headers_url}...")
|
||||
response_headers = s.get(headers_url)
|
||||
print("Headers received by second request (still has custom header):")
|
||||
print(response_headers.json()['headers'])
|
||||
```
|
||||
|
||||
**What we see:**
|
||||
|
||||
* The `X-My-Custom-Header` we set on `s.headers` was automatically added to both requests.
|
||||
* The `session_param` we added to `s.params` was included in the query string of the first request.
|
||||
* If we had used a real authentication endpoint, the `s.auth` details would have been used automatically.
|
||||
* We didn't have to specify these details on each `s.get()` call! The `Session` handled it.
|
||||
|
||||
## Using Sessions with `with` (Context Manager)
|
||||
|
||||
Sessions manage resources like network connections. It's good practice to explicitly close them when you're done. The easiest way to ensure this happens is to use the `Session` as a context manager with the `with` statement.
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
url = 'https://httpbin.org/cookies'
|
||||
|
||||
# Use the Session as a context manager
|
||||
with requests.Session() as s:
|
||||
s.get('https://httpbin.org/cookies/set/contextcookie/abc')
|
||||
response = s.get(url)
|
||||
print("Cookies sent within 'with' block:", response.json())
|
||||
|
||||
# After the 'with' block, the session 's' is automatically closed.
|
||||
# Making a request now might fail or use a new connection pool if s was reused (not recommended)
|
||||
# print("\nTrying to use session after 'with' block (might not work as expected)...")
|
||||
# try:
|
||||
# response_after = s.get(url)
|
||||
# print(response_after.text)
|
||||
# except Exception as e:
|
||||
# print(f"Error using session after close: {e}")
|
||||
|
||||
print("\nSession automatically closed after 'with' block.")
|
||||
```
|
||||
|
||||
The `with` statement ensures that `s.close()` is called automatically at the end of the block, even if errors occur. This cleans up the underlying connections managed by the [Transport Adapters](07_transport_adapters.md).
|
||||
|
||||
## How It Works Internally
|
||||
|
||||
So, how does the `Session` actually achieve this persistence and efficiency?
|
||||
|
||||
1. **State Storage:** The `Session` object itself holds onto configuration like `headers`, `cookies` (in a [Cookie Jar](04_cookie_jar.md)), `auth`, `params`, etc.
|
||||
2. **Request Preparation:** When you call a method like `s.get(url, headers=...)`, the `Session` takes your request details *and* its own stored settings and merges them together. It uses these merged settings to create the `PreparedRequest` object we saw in [Chapter 2](02_request___response_models.md). Session cookies and headers get added automatically during this step (`Session.prepare_request`).
|
||||
3. **Transport Adapters & Pooling:** The `Session` doesn't directly handle network sockets. It delegates the sending of the `PreparedRequest` to a suitable **Transport Adapter** (usually `HTTPAdapter` for HTTP/HTTPS). Each `Session` typically keeps instances of these adapters. The *adapter* is responsible for managing the pool of underlying network connections (`urllib3`'s connection pool). When you make a request to `https://example.com`, the adapter checks if it already has an open, reusable connection to that host in its pool. If yes, it uses it (much faster!). If not, it creates a new one and potentially adds it to the pool for future reuse.
|
||||
4. **Response Processing:** When the adapter receives the response, it builds the `Response` object. The `Session` then gets the `Response` back from the adapter. Crucially, it inspects the response headers (like `Set-Cookie`) and updates its own state (e.g., adds new cookies to its `Cookie Jar`).
|
||||
|
||||
Here's a simplified diagram showing two requests using a `Session`:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User as Your Code
|
||||
participant Sess as Session Object
|
||||
participant PrepReq as PreparedRequest
|
||||
participant Adapter as Transport Adapter (holds connection pool)
|
||||
participant Server as Web Server
|
||||
|
||||
User->>Sess: Create Session()
|
||||
User->>Sess: s.get(url1, headers={'User-Header': 'A'})
|
||||
Sess->>Sess: Merge s.headers, s.cookies, s.auth... with User's headers/data
|
||||
Sess->>PrepReq: prepare_request(merged_settings)
|
||||
Sess->>Adapter: send(prepared_request)
|
||||
Adapter->>Adapter: Get connection from pool (or create new)
|
||||
Adapter->>Server: Send HTTP Request 1 (with session+user headers, session cookies)
|
||||
Server-->>Adapter: Send HTTP Response 1 (sets cookie 'C')
|
||||
Adapter->>Sess: Return Response 1
|
||||
Sess->>Sess: Extract cookie 'C' into s.cookies
|
||||
Sess-->>User: Return Response 1
|
||||
|
||||
User->>Sess: s.get(url2)
|
||||
Sess->>Sess: Merge s.headers, s.cookies ('C'), s.auth...
|
||||
Sess->>PrepReq: prepare_request(merged_settings)
|
||||
Sess->>Adapter: send(prepared_request)
|
||||
Adapter->>Adapter: Get REUSED connection from pool
|
||||
Adapter->>Server: Send HTTP Request 2 (with session headers, cookie 'C')
|
||||
Server-->>Adapter: Send HTTP Response 2
|
||||
Adapter->>Sess: Return Response 2
|
||||
Sess-->>User: Return Response 2
|
||||
```
|
||||
|
||||
You can see the core logic in `requests/sessions.py`. The `Session.request` method orchestrates the process:
|
||||
|
||||
```python
|
||||
# File: requests/sessions.py (Simplified View)
|
||||
|
||||
# [...] imports and helper functions
|
||||
|
||||
class Session(SessionRedirectMixin):
|
||||
def __init__(self):
|
||||
# Stores persistent headers, cookies, auth, etc.
|
||||
self.headers = default_headers()
|
||||
self.cookies = cookiejar_from_dict({})
|
||||
self.auth = None
|
||||
self.params = {}
|
||||
# [...] other defaults like verify, proxies, max_redirects
|
||||
self.adapters = OrderedDict() # Holds Transport Adapters
|
||||
self.mount('https://', HTTPAdapter()) # Default adapter for HTTPS
|
||||
self.mount('http://', HTTPAdapter()) # Default adapter for HTTP
|
||||
|
||||
def prepare_request(self, request):
|
||||
"""Prepares a Request object with Session settings."""
|
||||
p = PreparedRequest()
|
||||
|
||||
# MERGE session settings with request settings
|
||||
merged_cookies = merge_cookies(RequestsCookieJar(), self.cookies)
|
||||
if request.cookies:
|
||||
merged_cookies = merge_cookies(merged_cookies, cookiejar_from_dict(request.cookies))
|
||||
|
||||
merged_headers = merge_setting(request.headers, self.headers, dict_class=CaseInsensitiveDict)
|
||||
merged_params = merge_setting(request.params, self.params)
|
||||
merged_auth = merge_setting(request.auth, self.auth)
|
||||
# [...] merge other settings like hooks
|
||||
|
||||
p.prepare(
|
||||
method=request.method.upper(),
|
||||
url=request.url,
|
||||
headers=merged_headers,
|
||||
files=request.files,
|
||||
data=request.data,
|
||||
json=request.json,
|
||||
params=merged_params,
|
||||
auth=merged_auth,
|
||||
cookies=merged_cookies, # Pass merged cookies to PreparedRequest
|
||||
hooks=merge_hooks(request.hooks, self.hooks),
|
||||
)
|
||||
return p
|
||||
|
||||
def request(self, method, url, **kwargs):
|
||||
"""Constructs a Request, prepares it, sends it."""
|
||||
# Create the initial Request object from user args
|
||||
req = Request(method=method.upper(), url=url, **kwargs) # Simplified
|
||||
|
||||
# Prepare the request, merging session state
|
||||
prep = self.prepare_request(req)
|
||||
|
||||
# Get environment settings (proxies, verify, cert) merged with session settings
|
||||
proxies = kwargs.get('proxies') or {}
|
||||
settings = self.merge_environment_settings(prep.url, proxies,
|
||||
kwargs.get('stream'),
|
||||
kwargs.get('verify'),
|
||||
kwargs.get('cert'))
|
||||
send_kwargs = {'timeout': kwargs.get('timeout'),
|
||||
'allow_redirects': kwargs.get('allow_redirects', True)}
|
||||
send_kwargs.update(settings)
|
||||
|
||||
# Send the prepared request using the appropriate adapter
|
||||
resp = self.send(prep, **send_kwargs)
|
||||
|
||||
return resp
|
||||
|
||||
def send(self, request, **kwargs):
|
||||
"""Sends a PreparedRequest object."""
|
||||
# [...] set default kwargs if needed
|
||||
|
||||
# Get the right adapter (e.g., HTTPAdapter) based on URL
|
||||
adapter = self.get_adapter(url=request.url)
|
||||
|
||||
# The adapter sends the request (using connection pooling)
|
||||
r = adapter.send(request, **kwargs)
|
||||
|
||||
# [...] response hook processing
|
||||
|
||||
# IMPORTANT: Extract cookies from the response and store them in the session's cookie jar
|
||||
extract_cookies_to_jar(self.cookies, request, r.raw)
|
||||
|
||||
# [...] redirect handling (which also extracts cookies)
|
||||
|
||||
return r
|
||||
|
||||
def get_adapter(self, url):
|
||||
"""Finds the Transport Adapter for the URL (e.g., HTTPAdapter)."""
|
||||
# ... loops through self.adapters ...
|
||||
# Simplified: return self.adapters['http://'] or self.adapters['https://']
|
||||
for prefix, adapter in self.adapters.items():
|
||||
if url.lower().startswith(prefix.lower()):
|
||||
return adapter
|
||||
raise InvalidSchema(f"No connection adapters were found for {url!r}")
|
||||
|
||||
def mount(self, prefix, adapter):
|
||||
"""Attaches a Transport Adapter to handle URLs starting with 'prefix'."""
|
||||
self.adapters[prefix] = adapter
|
||||
# [...] sort adapters by prefix length
|
||||
|
||||
def close(self):
|
||||
"""Closes the session and all its adapters (and connections)."""
|
||||
for adapter in self.adapters.values():
|
||||
adapter.close()
|
||||
|
||||
# [...] other methods like get(), post(), put(), delete() which call self.request()
|
||||
# [...] redirect handling logic in SessionRedirectMixin
|
||||
```
|
||||
|
||||
The key takeaways are:
|
||||
* The `Session` object holds the state (`headers`, `cookies`, `auth`).
|
||||
* `prepare_request` merges this state with the details of the specific request you're making.
|
||||
* `send` uses a `Transport Adapter` (like `HTTPAdapter`) which handles the actual network communication and connection pooling.
|
||||
* After a response is received, `send` (and the redirection logic) updates the `Session`'s cookies.
|
||||
|
||||
## Conclusion
|
||||
|
||||
You've learned about the `requests.Session` object, a powerful tool for making multiple requests to the same host efficiently. You saw how it automatically handles **cookie persistence** and provides significant performance benefits through **connection pooling** (via [Transport Adapters](07_transport_adapters.md)). You also learned how to set persistent `headers`, `auth`, and other settings on a session. Using a `Session` is the recommended approach when your script needs to interact with a website more than once.
|
||||
|
||||
We mentioned that the `Session` stores cookies in a "Cookie Jar". What exactly is that, and can we interact with it more directly? Let's find out.
|
||||
|
||||
**Next:** [Chapter 4: The Cookie Jar](04_cookie_jar.md)
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
365
docs/Requests/04_cookie_jar.md
Normal file
365
docs/Requests/04_cookie_jar.md
Normal file
@@ -0,0 +1,365 @@
|
||||
# Chapter 4: The Cookie Jar - Remembering Website Visits
|
||||
|
||||
In [Chapter 3: Remembering Things - The Session Object](03_session.md), we saw how `Session` objects are super useful for making multiple requests to the same website. A big reason they work so well is that they automatically remember **cookies** sent by the server, just like your web browser does.
|
||||
|
||||
But *how* does a `Session` remember these cookies? Where does it keep them? Welcome to the **Cookie Jar**!
|
||||
|
||||
## What's the Problem? Staying Logged In
|
||||
|
||||
Imagine you log in to a website. The website usually sends back a special piece of information called a **cookie**. This cookie is like a temporary ID card. When you visit other pages on that *same* website, your browser automatically shows this ID card (sends the cookie back) so the website knows you're still logged in.
|
||||
|
||||
If you used the simple `requests.get()` function from [Chapter 1](01_functional_api.md) for each step, it would forget the ID card immediately after logging in. Your next request would be treated as if you were a stranger.
|
||||
|
||||
`Session` objects solve this by using a **Cookie Jar** to hold onto those ID cards (cookies) for you.
|
||||
|
||||
## What are Cookies (Briefly)?
|
||||
|
||||
Think of cookies as little notes or name tags that websites give to your browser (or your `requests` script).
|
||||
|
||||
* **Website:** "Hi, you just logged in. Here's a name tag that says 'User123'." (Sends a `Set-Cookie` header)
|
||||
* **Your Browser / Session:** "Okay, I'll keep this 'User123' tag." (Stores the cookie)
|
||||
* **You:** (Click on another page on the same website)
|
||||
* **Your Browser / Session:** "Hi website, I'd like this page. By the way, here's my name tag: 'User123'." (Sends a `Cookie` header)
|
||||
* **Website:** "Ah, User123, I remember you. Here's the page you asked for."
|
||||
|
||||
Cookies are used to remember login status, user preferences, items in a shopping cart, etc., between different page visits.
|
||||
|
||||
## The Cookie Jar Analogy 🍪
|
||||
|
||||
`Requests` uses an object called a `RequestsCookieJar` to store and manage cookies. It's very much like the cookie jar you might have in your kitchen:
|
||||
|
||||
1. **Collects Cookies:** When a website sends you a cookie (like after you log in), the `Session` automatically puts it into its `Cookie Jar`.
|
||||
2. **Stores Them Safely:** The jar keeps all the cookies collected from different websites (domains).
|
||||
3. **Sends the Right Ones Back:** When you make *another* request to a website using the *same* `Session`, the `Session` looks into the `Cookie Jar`, finds any cookies that belong to that website's domain, and automatically sends them back.
|
||||
|
||||
This happens seamlessly when you use a `Session` object.
|
||||
|
||||
## Meet `RequestsCookieJar`
|
||||
|
||||
The specific object `requests` uses is `requests.cookies.RequestsCookieJar`. It's designed to work just like Python's standard `http.cookiejar.CookieJar` but adds some convenient features, like acting like a dictionary.
|
||||
|
||||
Every `Session` object has its own `Cookie Jar` accessible via the `s.cookies` attribute.
|
||||
|
||||
Let's see it in action, revisiting the example from Chapter 3:
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
# Create a Session object (which has its own empty Cookie Jar)
|
||||
s = requests.Session()
|
||||
print(f"Initial session cookies: {s.cookies.get_dict()}")
|
||||
|
||||
# Visit a page that sets a cookie
|
||||
cookie_setter_url = 'https://httpbin.org/cookies/set/fruit/apple'
|
||||
print(f"\nVisiting {cookie_setter_url}...")
|
||||
response1 = s.get(cookie_setter_url)
|
||||
|
||||
# Check the Session's Cookie Jar - it should have the cookie now!
|
||||
print(f"Session cookies after setting: {s.cookies.get_dict()}")
|
||||
|
||||
# Visit another page on the same domain (httpbin.org)
|
||||
cookie_viewer_url = 'https://httpbin.org/cookies'
|
||||
print(f"\nVisiting {cookie_viewer_url}...")
|
||||
response2 = s.get(cookie_viewer_url)
|
||||
|
||||
# This page shows the cookies it received. Let's see if our 'fruit' cookie was sent.
|
||||
print("Cookies received by the server:")
|
||||
print(response2.text) # httpbin.org/cookies returns JSON showing received cookies
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
Initial session cookies: {}
|
||||
|
||||
Visiting https://httpbin.org/cookies/set/fruit/apple...
|
||||
Session cookies after setting: {'fruit': 'apple'}
|
||||
|
||||
Visiting https://httpbin.org/cookies...
|
||||
Cookies received by the server:
|
||||
{
|
||||
"cookies": {
|
||||
"fruit": "apple"
|
||||
}
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
1. We started with an empty `Session` and an empty cookie jar (`{}`).
|
||||
2. We visited `/cookies/set/fruit/apple`. The server sent back a `Set-Cookie: fruit=apple; Path=/` header.
|
||||
3. The `Session` object `s` automatically saw this header and stored the `fruit=apple` cookie in its jar (`s.cookies`). We confirmed this by printing `s.cookies.get_dict()`.
|
||||
4. We then visited `/cookies` using the *same session* `s`.
|
||||
5. The `Session` automatically looked in `s.cookies`, found the `fruit` cookie (since it's for the `httpbin.org` domain), and added a `Cookie: fruit=apple` header to the request.
|
||||
6. The server at `/cookies` received this header and echoed it back, confirming our cookie was sent!
|
||||
|
||||
The `Session` and its `Cookie Jar` handled the persistence automatically.
|
||||
|
||||
## Cookies in the Response
|
||||
|
||||
While the `Session` cookie jar (`s.cookies`) holds *all* cookies collected during the session's lifetime, the [Request & Response Models](02_request___response_models.md) also have a `cookies` attribute.
|
||||
|
||||
The `response.cookies` attribute (also a `RequestsCookieJar`) contains *only* the cookies that were set or updated by *that specific response*. It doesn't know about cookies from previous responses in the session.
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
s = requests.Session()
|
||||
|
||||
url_set_a = 'https://httpbin.org/cookies/set/cookieA/valueA'
|
||||
url_set_b = 'https://httpbin.org/cookies/set/cookieB/valueB'
|
||||
|
||||
print(f"Visiting {url_set_a}")
|
||||
response_a = s.get(url_set_a)
|
||||
print(f"Cookies SET by response A: {response_a.cookies.get_dict()}")
|
||||
print(f"ALL session cookies after A: {s.cookies.get_dict()}")
|
||||
|
||||
print(f"\nVisiting {url_set_b}")
|
||||
response_b = s.get(url_set_b)
|
||||
print(f"Cookies SET by response B: {response_b.cookies.get_dict()}")
|
||||
print(f"ALL session cookies after B: {s.cookies.get_dict()}")
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
Visiting https://httpbin.org/cookies/set/cookieA/valueA
|
||||
Cookies SET by response A: {'cookieA': 'valueA'}
|
||||
ALL session cookies after A: {'cookieA': 'valueA'}
|
||||
|
||||
Visiting https://httpbin.org/cookies/set/cookieB/valueB
|
||||
Cookies SET by response B: {'cookieB': 'valueB'}
|
||||
ALL session cookies after B: {'cookieA': 'valueA', 'cookieB': 'valueB'}
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
* `response_a.cookies` only contains `cookieA`, because that's the cookie set by *that specific response*.
|
||||
* `s.cookies` contains `cookieA` after the first request.
|
||||
* `response_b.cookies` only contains `cookieB`.
|
||||
* `s.cookies` contains *both* `cookieA` and `cookieB` after the second request, because the `Session` accumulates cookies.
|
||||
|
||||
## Using the Cookie Jar Like a Dictionary
|
||||
|
||||
The `RequestsCookieJar` is extra friendly because you can treat it much like a Python dictionary to access or modify cookies directly.
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
jar = requests.cookies.RequestsCookieJar()
|
||||
|
||||
# Set cookies using dictionary-like assignment or set()
|
||||
jar.set('username', 'Nate', domain='httpbin.org', path='/')
|
||||
jar['session_id'] = 'abcdef123' # Sets for default domain/path ('')
|
||||
|
||||
print(f"Jar contents: {jar.get_dict()}")
|
||||
|
||||
# Get cookies using dictionary-like access or get()
|
||||
print(f"Username: {jar['username']}")
|
||||
print(f"Session ID: {jar.get('session_id')}")
|
||||
print(f"API Key (default None): {jar.get('api_key', default='NoKey')}")
|
||||
|
||||
# Iterate over cookies
|
||||
print("\nIterating:")
|
||||
for name, value in jar.items():
|
||||
print(f" - {name}: {value}")
|
||||
|
||||
# Delete a cookie
|
||||
del jar['session_id']
|
||||
print(f"\nJar after deleting session_id: {jar.get_dict()}")
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
Jar contents: {'session_id': 'abcdef123', 'username': 'Nate'}
|
||||
Username: Nate
|
||||
Session ID: abcdef123
|
||||
API Key (default None): NoKey
|
||||
|
||||
Iterating:
|
||||
- session_id: abcdef123
|
||||
- username: Nate
|
||||
|
||||
Jar after deleting session_id: {'username': 'Nate'}
|
||||
```
|
||||
|
||||
This makes it easy to manually inspect, add, or modify cookies if needed, although the `Session` usually handles the common cases automatically.
|
||||
|
||||
**Important Note:** Cookies often have specific `domain` and `path` attributes. If you have multiple cookies with the *same name* but for different domains or paths (e.g., `user=A` for `site1.com` and `user=B` for `site2.com`), using the simple dictionary access `jar['user']` might be ambiguous or raise an error. In such cases, use the `get()` or `set()` methods with the `domain` and `path` arguments for more precision:
|
||||
|
||||
```python
|
||||
jar.set('pref', 'dark', domain='example.com', path='/')
|
||||
jar.set('pref', 'compact', domain='test.com', path='/')
|
||||
|
||||
# Get the specific cookie for example.com
|
||||
pref_example = jar.get('pref', domain='example.com', path='/')
|
||||
print(f"Pref for example.com: {pref_example}")
|
||||
|
||||
# Simple access might be ambiguous or pick one arbitrarily
|
||||
# print(jar['pref']) # Could raise CookieConflictError or return one
|
||||
```
|
||||
|
||||
## How It Works Internally
|
||||
|
||||
How does the `Session` manage this cookie magic?
|
||||
|
||||
1. **Sending Request:** When you call `s.get(...)` or `s.post(...)`, the `Session.prepare_request` method is called.
|
||||
* It creates a `PreparedRequest` object.
|
||||
* It merges cookies from your request (`cookies=...`), the session (`self.cookies`), and potentially environment settings.
|
||||
* It calls `get_cookie_header(merged_cookies, prepared_request)` (from `requests.cookies`). This function checks the cookie jar for cookies that match the request's domain and path.
|
||||
* It generates the `Cookie` header string (e.g., `Cookie: fruit=apple; username=Nate`) and adds it to the `PreparedRequest.headers`.
|
||||
* The request (with the `Cookie` header) is then sent via a [Transport Adapter](07_transport_adapters.md).
|
||||
|
||||
2. **Receiving Response:** When the [Transport Adapter](07_transport_adapters.md) receives the raw HTTP response from the server:
|
||||
* It builds the `Response` object.
|
||||
* The `Session.send` method (or redirection logic) gets this `Response`.
|
||||
* It calls `extract_cookies_to_jar(self.cookies, request, response.raw)` (from `requests.cookies`). This function looks for `Set-Cookie` headers in the raw response.
|
||||
* It parses any `Set-Cookie` headers and adds/updates the corresponding cookies in the `Session`'s cookie jar (`self.cookies`).
|
||||
* The final `Response` object is returned to you.
|
||||
|
||||
Here's a simplified diagram focusing on the cookie flow:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User as Your Code
|
||||
participant Sess as Session Object
|
||||
participant Jar as Cookie Jar (s.cookies)
|
||||
participant Adapter as Transport Adapter
|
||||
participant Server as Web Server
|
||||
|
||||
User->>Sess: s.get(url)
|
||||
Sess->>Jar: get_cookie_header(url)
|
||||
Jar-->>Sess: Return matching cookie header string (e.g., "fruit=apple")
|
||||
Sess->>Adapter: send(request with 'Cookie' header)
|
||||
Adapter->>Server: Send HTTP Request (with Cookie: fruit=apple)
|
||||
Server-->>Adapter: Send HTTP Response (e.g., with Set-Cookie: new=cookie)
|
||||
Adapter->>Sess: Return raw response
|
||||
Sess->>Jar: extract_cookies_to_jar(raw response)
|
||||
Jar->>Jar: Add/Update 'new=cookie'
|
||||
Sess->>User: Return Response object
|
||||
```
|
||||
|
||||
You can see parts of this logic in `requests/sessions.py` and `requests/cookies.py`:
|
||||
|
||||
```python
|
||||
# File: requests/sessions.py (Simplified View)
|
||||
|
||||
from .cookies import extract_cookies_to_jar, merge_cookies, RequestsCookieJar, cookiejar_from_dict
|
||||
from .models import PreparedRequest
|
||||
from .utils import to_key_val_list
|
||||
from .structures import CaseInsensitiveDict
|
||||
|
||||
class Session:
|
||||
def __init__(self):
|
||||
# ... other attributes ...
|
||||
self.cookies = cookiejar_from_dict({}) # The Session's main Cookie Jar
|
||||
|
||||
def prepare_request(self, request):
|
||||
# ... merge headers, params, auth ...
|
||||
|
||||
# Merge session cookies with request-specific cookies
|
||||
merged_cookies = merge_cookies(
|
||||
merge_cookies(RequestsCookieJar(), self.cookies),
|
||||
cookiejar_from_dict(request.cookies or {})
|
||||
)
|
||||
|
||||
p = PreparedRequest()
|
||||
p.prepare(
|
||||
# ... other args ...
|
||||
cookies=merged_cookies, # Pass merged jar to PreparedRequest
|
||||
)
|
||||
return p
|
||||
|
||||
def send(self, request, **kwargs):
|
||||
# ... prepare sending ...
|
||||
adapter = self.get_adapter(url=request.url)
|
||||
response = adapter.send(request, **kwargs) # Adapter gets raw response
|
||||
|
||||
# ... hooks ...
|
||||
|
||||
# EXTRACT cookies from the response and put them in the session jar!
|
||||
extract_cookies_to_jar(self.cookies, request, response.raw)
|
||||
|
||||
# ... redirect handling (also extracts cookies) ...
|
||||
|
||||
return response
|
||||
|
||||
# --- File: requests/models.py (Simplified View) ---
|
||||
from .cookies import get_cookie_header, _copy_cookie_jar, cookiejar_from_dict
|
||||
|
||||
class PreparedRequest:
|
||||
def prepare_cookies(self, cookies):
|
||||
# Store the jar potentially passed from Session.prepare_request
|
||||
if isinstance(cookies, cookielib.CookieJar):
|
||||
self._cookies = cookies
|
||||
else:
|
||||
self._cookies = cookiejar_from_dict(cookies)
|
||||
|
||||
# Generate the Cookie header string
|
||||
cookie_header = get_cookie_header(self._cookies, self)
|
||||
if cookie_header is not None:
|
||||
self.headers['Cookie'] = cookie_header
|
||||
|
||||
class Response:
|
||||
def __init__(self):
|
||||
# ... other attributes ...
|
||||
# This jar holds cookies SET by *this* response only
|
||||
self.cookies = cookiejar_from_dict({})
|
||||
|
||||
# --- File: requests/cookies.py (Simplified View) ---
|
||||
import cookielib
|
||||
|
||||
class MockRequest: # Helper to adapt requests.Request for cookielib
|
||||
# ... implementation ...
|
||||
|
||||
class MockResponse: # Helper to adapt response headers for cookielib
|
||||
# ... implementation ...
|
||||
|
||||
def extract_cookies_to_jar(jar, request, response):
|
||||
"""Extract Set-Cookie headers from response into jar."""
|
||||
if not hasattr(response, '_original_response') or not response._original_response:
|
||||
return # Need the underlying httplib response
|
||||
|
||||
req = MockRequest(request) # Adapt request for cookielib
|
||||
res = MockResponse(response._original_response.msg) # Adapt headers for cookielib
|
||||
jar.extract_cookies(res, req) # Use cookielib's extraction logic
|
||||
|
||||
def get_cookie_header(jar, request):
|
||||
"""Generate the Cookie header string for the request."""
|
||||
r = MockRequest(request)
|
||||
jar.add_cookie_header(r) # Use cookielib to add the header to the mock request
|
||||
return r.get_new_headers().get('Cookie') # Retrieve the generated header
|
||||
|
||||
class RequestsCookieJar(cookielib.CookieJar, MutableMapping):
|
||||
# Dictionary-like methods (get, set, __getitem__, etc.)
|
||||
def get(self, name, default=None, domain=None, path=None):
|
||||
# ... find cookie, handle conflicts ...
|
||||
pass
|
||||
def set(self, name, value, **kwargs):
|
||||
# ... create or update cookie ...
|
||||
pass
|
||||
# ... other dict methods ...
|
||||
```
|
||||
|
||||
The key is that `Session.send` calls `extract_cookies_to_jar` after receiving a response, and `PreparedRequest.prepare_cookies` (called via `Session.prepare_request`) calls `get_cookie_header` before sending the next one.
|
||||
|
||||
## Conclusion
|
||||
|
||||
You've learned about the **Cookie Jar** (`RequestsCookieJar`), the mechanism `requests` (especially `Session` objects) uses to store and manage cookies. You saw:
|
||||
|
||||
* How `Session` objects automatically use their cookie jar (`s.cookies`) to persist cookies across requests.
|
||||
* How `response.cookies` contains cookies set by a specific response.
|
||||
* How to interact with a `RequestsCookieJar` using its dictionary-like interface.
|
||||
* A glimpse into how `requests` extracts cookies from `Set-Cookie` headers and adds them back via the `Cookie` header.
|
||||
|
||||
Understanding the cookie jar helps explain how sessions maintain state and interact with websites that require logins or remember preferences.
|
||||
|
||||
Speaking of logging in, while cookies are often involved, sometimes websites require more explicit forms of identification, like usernames and passwords sent directly with the request. How does `requests` handle those?
|
||||
|
||||
**Next:** [Chapter 5: Authentication Handlers](05_authentication_handlers.md)
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
421
docs/Requests/05_authentication_handlers.md
Normal file
421
docs/Requests/05_authentication_handlers.md
Normal file
@@ -0,0 +1,421 @@
|
||||
# Chapter 5: Authentication Handlers - Showing Your ID Card
|
||||
|
||||
In [Chapter 4: The Cookie Jar](04_cookie_jar.md), we learned how `requests` uses `Session` objects and cookie jars to automatically remember things like login cookies. This is great for websites that use cookies to manage sessions after you log in.
|
||||
|
||||
But what about websites or APIs that require you to prove who you are *every time* you make a request, or use different methods than cookies? For example, some services need a username and password sent directly with the request, not just a cookie.
|
||||
|
||||
## The Problem: Accessing Protected Resources
|
||||
|
||||
Imagine a website has a special members-only area. To access pages in this area, the server needs to know you're a valid member *right when you ask for the page*. It won't just let anyone in. It needs some form of identification, like a username and password.
|
||||
|
||||
How do we tell `requests` to include this identification with our request?
|
||||
|
||||
This is where **Authentication Handlers** come in.
|
||||
|
||||
## What are Authentication Handlers?
|
||||
|
||||
Think of authentication handlers as different types of **ID badges** you can attach to your web requests. Just like you might need a specific badge to get into different parts of a building, different web services might require different types of authentication.
|
||||
|
||||
`Requests` has built-in support for common types (schemes) of HTTP authentication, and you can even create your own custom badges.
|
||||
|
||||
**Common ID Badges (Authentication Schemes):**
|
||||
|
||||
1. **HTTP Basic Auth:** This is the simplest type. It's like a badge with your username and password written directly on it (encoded, but easily decoded). It's common but not very secure over plain HTTP (HTTPS makes it safer).
|
||||
* `Requests` provides: A simple `(username, password)` tuple or the `HTTPBasicAuth` class.
|
||||
2. **HTTP Digest Auth:** This is a bit more secure than Basic. Instead of sending your password directly, it involves a challenge-response process, like the server asking a secret question based on your password, and your request providing the answer. It's more complex but avoids sending the password openly.
|
||||
* `Requests` provides: The `HTTPDigestAuth` class.
|
||||
3. **Custom Auth:** Some services use unique authentication methods (like OAuth1, OAuth2, custom API keys).
|
||||
* `Requests` allows you to create your own auth handlers by subclassing `AuthBase`. Many other libraries provide handlers for common schemes like OAuth.
|
||||
|
||||
When you provide authentication details to `requests`, it automatically figures out how to create and attach the correct `Authorization` header (or sometimes `Proxy-Authorization` for proxies) to your request. It's like pinning the right ID badge onto your request before sending it off.
|
||||
|
||||
## Using Authentication Handlers
|
||||
|
||||
The easiest way to add authentication is by using the `auth` parameter when making a request, either with the functional API or with a [Session](03_session.md) object.
|
||||
|
||||
### HTTP Basic Auth (The Easiest Way)
|
||||
|
||||
For Basic Auth, you can simply pass a tuple `(username, password)` to the `auth` argument.
|
||||
|
||||
Let's try accessing a test endpoint from `httpbin.org` that's protected with Basic Auth. The username is `testuser` and the password is `testpass`.
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
# This URL requires Basic Auth with user='testuser', pass='testpass'
|
||||
url = 'https://httpbin.org/basic-auth/testuser/testpass'
|
||||
|
||||
# Try without authentication first (should fail with 401 Unauthorized)
|
||||
print("Attempting without authentication...")
|
||||
response_fail = requests.get(url)
|
||||
print(f"Status Code (fail): {response_fail.status_code}") # Expect 401
|
||||
|
||||
# Now, provide the username and password tuple to the 'auth' parameter
|
||||
print("\nAttempting with Basic Auth tuple...")
|
||||
try:
|
||||
response_ok = requests.get(url, auth=('testuser', 'testpass'))
|
||||
print(f"Status Code (ok): {response_ok.status_code}") # Expect 200
|
||||
# Check the response content (httpbin echoes auth info)
|
||||
print("Response JSON:")
|
||||
print(response_ok.json())
|
||||
except requests.exceptions.RequestException as e:
|
||||
print(f"An error occurred: {e}")
|
||||
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
Attempting without authentication...
|
||||
Status Code (fail): 401
|
||||
|
||||
Attempting with Basic Auth tuple...
|
||||
Status Code (ok): 200
|
||||
Response JSON:
|
||||
{'authenticated': True, 'user': 'testuser'}
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
1. The first request failed with `401 Unauthorized` because we didn't provide credentials.
|
||||
2. In the second request, we added `auth=('testuser', 'testpass')`.
|
||||
3. `Requests` automatically recognized this tuple, created the necessary `Authorization: Basic dGVzdHVzZXI6dGVzdHBhc3M=` header (where `dGVzdHVzZXI6dGVzdHBhc3M=` is the Base64 encoding of `testuser:testpass`), and added it to the request.
|
||||
4. The server validated the credentials and granted access, returning a `200 OK` status. The response body confirms we were authenticated as `testuser`.
|
||||
|
||||
### Using the `HTTPBasicAuth` Class
|
||||
|
||||
Passing a tuple is a shortcut specifically for Basic Auth. For clarity, or if you want to reuse the authentication details, you can use the `HTTPBasicAuth` class explicitly. It does exactly the same thing internally.
|
||||
|
||||
```python
|
||||
import requests
|
||||
from requests.auth import HTTPBasicAuth # Import the class
|
||||
|
||||
url = 'https://httpbin.org/basic-auth/testuser/testpass'
|
||||
|
||||
# Create an HTTPBasicAuth object
|
||||
basic_auth = HTTPBasicAuth('testuser', 'testpass')
|
||||
|
||||
# Pass the auth object to the 'auth' parameter
|
||||
print("Attempting with HTTPBasicAuth object...")
|
||||
try:
|
||||
response = requests.get(url, auth=basic_auth)
|
||||
print(f"Status Code: {response.status_code}") # Expect 200
|
||||
print("Response JSON:")
|
||||
print(response.json())
|
||||
except requests.exceptions.RequestException as e:
|
||||
print(f"An error occurred: {e}")
|
||||
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
Attempting with HTTPBasicAuth object...
|
||||
Status Code: 200
|
||||
Response JSON:
|
||||
{'authenticated': True, 'user': 'testuser'}
|
||||
```
|
||||
|
||||
This achieves the same result as the tuple, but `HTTPBasicAuth(user, pass)` is more explicit about the type of authentication being used.
|
||||
|
||||
### HTTP Digest Auth
|
||||
|
||||
Digest Auth is more complex, involving a challenge from the server. `Requests` handles this complexity for you with the `HTTPDigestAuth` class. You use it similarly to `HTTPBasicAuth`.
|
||||
|
||||
```python
|
||||
import requests
|
||||
from requests.auth import HTTPDigestAuth # Import the class
|
||||
|
||||
# httpbin has a digest auth endpoint
|
||||
# user='testuser', pass='testpass'
|
||||
url = 'https://httpbin.org/digest-auth/auth/testuser/testpass'
|
||||
|
||||
# Create an HTTPDigestAuth object
|
||||
digest_auth = HTTPDigestAuth('testuser', 'testpass')
|
||||
|
||||
# Pass the auth object to the 'auth' parameter
|
||||
print("Attempting with HTTPDigestAuth object...")
|
||||
try:
|
||||
response = requests.get(url, auth=digest_auth)
|
||||
print(f"Status Code: {response.status_code}") # Expect 200
|
||||
print("Response JSON:")
|
||||
print(response.json())
|
||||
# Note: It might take two requests internally for Digest Auth
|
||||
print(f"Request History (if any): {response.history}")
|
||||
except requests.exceptions.RequestException as e:
|
||||
print(f"An error occurred: {e}")
|
||||
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
Attempting with HTTPDigestAuth object...
|
||||
Status Code: 200
|
||||
Response JSON:
|
||||
{'authenticated': True, 'user': 'testuser'}
|
||||
Request History (if any): [<Response [401]>]
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
1. We used `HTTPDigestAuth` this time.
|
||||
2. When `requests` first tries to access the URL, the server challenges it with a `401 Unauthorized` response containing details needed for Digest Auth (like a `nonce` and `realm`). You can see this `401` response in `response.history`.
|
||||
3. The `HTTPDigestAuth` handler catches this `401`, uses the challenge information and your password to calculate the correct response, and automatically sends a *second* request with the proper `Authorization: Digest ...` header.
|
||||
4. This second request succeeds, and you get the final `200 OK` response.
|
||||
|
||||
`Requests` handles the two-step process automatically when you use `HTTPDigestAuth`.
|
||||
|
||||
### Persistent Authentication with Sessions
|
||||
|
||||
If you need to make multiple requests to the same server using the same authentication, it's much more efficient to set the authentication on a [Session](03_session.md) object. The session will then automatically apply the authentication to *all* requests made through it.
|
||||
|
||||
```python
|
||||
import requests
|
||||
from requests.auth import HTTPBasicAuth
|
||||
|
||||
basic_auth_url = 'https://httpbin.org/basic-auth/testuser/testpass'
|
||||
headers_url = 'https://httpbin.org/headers' # Just to see headers sent
|
||||
|
||||
# Create a session
|
||||
with requests.Session() as s:
|
||||
# Set the authentication ONCE on the session
|
||||
s.auth = HTTPBasicAuth('testuser', 'testpass')
|
||||
# Or: s.auth = ('testuser', 'testpass')
|
||||
|
||||
# Make the first request (auth will be added automatically)
|
||||
print("Making first request using session auth...")
|
||||
response1 = s.get(basic_auth_url)
|
||||
print(f"Status Code 1: {response1.status_code}")
|
||||
|
||||
# Make a second request to a different endpoint (auth will also be added)
|
||||
# We use /headers to see the Authorization header being sent
|
||||
print("\nMaking second request using session auth...")
|
||||
response2 = s.get(headers_url)
|
||||
print(f"Status Code 2: {response2.status_code}")
|
||||
print("Headers sent in second request:")
|
||||
# Look for the 'Authorization' header in the output
|
||||
print(response2.json()['headers'])
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
Making first request using session auth...
|
||||
Status Code 1: 200
|
||||
|
||||
Making second request using session auth...
|
||||
Status Code 2: 200
|
||||
Headers sent in second request:
|
||||
{
|
||||
"Accept": "*/*",
|
||||
"Accept-Encoding": "gzip, deflate",
|
||||
"Authorization": "Basic dGVzdHVzZXI6dGVzdHBhc3M=", // <-- Auth header added automatically!
|
||||
"Host": "httpbin.org",
|
||||
"User-Agent": "python-requests/2.x.y",
|
||||
"X-Amzn-Trace-Id": "Root=..."
|
||||
}
|
||||
```
|
||||
|
||||
By setting `s.auth = ...`, we ensured that *both* requests sent the `Authorization` header without needing to specify it in each `s.get()` call.
|
||||
|
||||
### Custom Authentication
|
||||
|
||||
What if a service uses a completely different way to authenticate? `Requests` allows you to create your own authentication handler by writing a class that inherits from `requests.auth.AuthBase` and implements the `__call__` method. This method receives the `PreparedRequest` object and should modify it (usually by adding headers) as needed.
|
||||
|
||||
```python
|
||||
from requests.auth import AuthBase
|
||||
|
||||
class MyCustomApiKeyAuth(AuthBase):
|
||||
"""Attaches a custom API Key header to the request."""
|
||||
def __init__(self, api_key):
|
||||
self.api_key = api_key
|
||||
|
||||
def __call__(self, r):
|
||||
# 'r' is the PreparedRequest object
|
||||
# Modify the request 'r' here. We'll add a header.
|
||||
r.headers['X-API-Key'] = self.api_key
|
||||
# We MUST return the modified request object
|
||||
return r
|
||||
|
||||
# Usage:
|
||||
# api_key = "YOUR_SECRET_API_KEY"
|
||||
# response = requests.get(some_url, auth=MyCustomApiKeyAuth(api_key))
|
||||
```
|
||||
|
||||
This is more advanced, but it shows the flexibility of the `requests` auth system. Many third-party libraries use this pattern to provide auth helpers for specific services (like OAuth).
|
||||
|
||||
## How It Works Internally
|
||||
|
||||
How does `requests` take the `auth` parameter and turn it into the correct `Authorization` header?
|
||||
|
||||
1. **Preparation Step:** When you make a request (e.g., `requests.get(url, auth=...)` or `s.request(...)`), the `Request` object is turned into a `PreparedRequest` as we saw in [Chapter 2: Request & Response Models](02_request___response_models.md). Part of this preparation involves the `prepare_auth` method.
|
||||
2. **Check Auth Type:** Inside `prepare_auth`, `requests` checks the `auth` parameter.
|
||||
* If `auth` is a tuple `(user, pass)`, it automatically wraps it in an `HTTPBasicAuth(user, pass)` object.
|
||||
* If `auth` is already an object (like `HTTPBasicAuth`, `HTTPDigestAuth`, or a custom one inheriting from `AuthBase`), it uses that object directly.
|
||||
3. **Call the Auth Object:** All authentication handler objects (including the built-in ones) are **callable**. This means they have a `__call__` method. The `prepare_auth` step *calls* the auth object, passing the `PreparedRequest` object (`p`) to it: `auth(p)`.
|
||||
4. **Modify the Request:** The `__call__` method of the auth object does the actual work.
|
||||
* For `HTTPBasicAuth`, the `__call__` method calculates the `Basic base64(user:pass)` string and sets `p.headers['Authorization'] = ...`.
|
||||
* For `HTTPDigestAuth`, the `__call__` method might initially set up hooks to handle the `401` challenge, or if it already has the necessary info (like a `nonce`), it calculates the `Digest ...` header and sets `p.headers['Authorization']`.
|
||||
* For a custom auth object, its `__call__` method performs whatever modifications are needed (e.g., adding an `X-API-Key` header).
|
||||
5. **Return Modified Request:** The `__call__` method *must* return the modified `PreparedRequest` object.
|
||||
6. **Send Request:** The `PreparedRequest`, now potentially including an `Authorization` header, is sent to the server.
|
||||
|
||||
Here's a simplified sequence diagram for Basic Auth:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant UserCode as Your Code
|
||||
participant ReqFunc as requests.get / Session.request
|
||||
participant PrepReq as PreparedRequest
|
||||
participant AuthObj as HTTPBasicAuth Instance
|
||||
participant Server
|
||||
|
||||
UserCode->>ReqFunc: Call get(url, auth=('user', 'pass'))
|
||||
ReqFunc->>PrepReq: Create PreparedRequest (p)
|
||||
ReqFunc->>PrepReq: Call p.prepare_auth(auth=...)
|
||||
Note over PrepReq: Detects tuple, creates HTTPBasicAuth('user', 'pass')
|
||||
PrepReq->>AuthObj: Call auth_obj(p)
|
||||
activate AuthObj
|
||||
AuthObj->>AuthObj: Calculate 'Basic ...' string
|
||||
AuthObj->>PrepReq: Set p.headers['Authorization'] = 'Basic ...'
|
||||
AuthObj-->>PrepReq: Return modified p
|
||||
deactivate AuthObj
|
||||
PrepReq-->>ReqFunc: Return prepared request p
|
||||
ReqFunc->>Server: Send HTTP Request (with Authorization header)
|
||||
Server-->>ReqFunc: Send HTTP Response
|
||||
ReqFunc-->>UserCode: Return Response
|
||||
```
|
||||
|
||||
Let's look at the simplified code in `requests/auth.py` for `HTTPBasicAuth`:
|
||||
|
||||
```python
|
||||
# File: requests/auth.py (Simplified)
|
||||
|
||||
from base64 import b64encode
|
||||
from ._internal_utils import to_native_string
|
||||
|
||||
def _basic_auth_str(username, password):
|
||||
"""Returns a Basic Auth string."""
|
||||
# ... (handle encoding username/password to bytes) ...
|
||||
auth_bytes = b":".join((username_bytes, password_bytes))
|
||||
auth_b64 = b64encode(auth_bytes).strip()
|
||||
# Return native string (str in Py3) e.g., "Basic dXNlcjpwYXNz"
|
||||
return "Basic " + to_native_string(auth_b64)
|
||||
|
||||
class AuthBase:
|
||||
"""Base class that all auth implementations derive from"""
|
||||
def __call__(self, r):
|
||||
# This method MUST be overridden by subclasses
|
||||
raise NotImplementedError("Auth hooks must be callable.")
|
||||
|
||||
class HTTPBasicAuth(AuthBase):
|
||||
"""Attaches HTTP Basic Authentication to the given Request object."""
|
||||
def __init__(self, username, password):
|
||||
self.username = username
|
||||
self.password = password
|
||||
|
||||
def __call__(self, r):
|
||||
# 'r' is the PreparedRequest object passed in by requests
|
||||
# Calculate the Basic auth string
|
||||
auth_header_value = _basic_auth_str(self.username, self.password)
|
||||
# Modify the request's headers
|
||||
r.headers['Authorization'] = auth_header_value
|
||||
# Return the modified request
|
||||
return r
|
||||
|
||||
class HTTPProxyAuth(HTTPBasicAuth):
|
||||
"""Attaches HTTP Proxy Authentication to a given Request object."""
|
||||
def __call__(self, r):
|
||||
# Same as Basic Auth, but sets the Proxy-Authorization header
|
||||
r.headers['Proxy-Authorization'] = _basic_auth_str(self.username, self.password)
|
||||
return r
|
||||
|
||||
# HTTPDigestAuth is more complex, involving state and hooks for the 401 challenge
|
||||
class HTTPDigestAuth(AuthBase):
|
||||
def __init__(self, username, password):
|
||||
# ... store username/password ...
|
||||
# ... initialize state (nonce, etc.) ...
|
||||
pass
|
||||
|
||||
def build_digest_header(self, method, url):
|
||||
# ... complex calculation based on nonce, realm, qop, etc. ...
|
||||
return "Digest ..." # Calculated digest header
|
||||
|
||||
def handle_401(self, r, **kwargs):
|
||||
# Hook called when a 401 response is received
|
||||
# 1. Parse challenge ('WWW-Authenticate' header)
|
||||
# 2. Store nonce, realm etc.
|
||||
# 3. Prepare a *new* request with the calculated digest header
|
||||
# 4. Send the new request
|
||||
# 5. Return the response to the *new* request
|
||||
pass # Simplified
|
||||
|
||||
def __call__(self, r):
|
||||
# 'r' is the PreparedRequest
|
||||
# If we already have a nonce, add the Authorization header directly
|
||||
if self.has_nonce():
|
||||
r.headers['Authorization'] = self.build_digest_header(r.method, r.url)
|
||||
# Register the handle_401 hook to handle the server challenge if needed
|
||||
r.register_hook('response', self.handle_401)
|
||||
return r
|
||||
```
|
||||
|
||||
And in `requests/models.py`, the `PreparedRequest` calls the auth object:
|
||||
|
||||
```python
|
||||
# File: requests/models.py (Simplified View)
|
||||
|
||||
from .auth import HTTPBasicAuth
|
||||
from .utils import get_auth_from_url
|
||||
|
||||
class PreparedRequest(RequestEncodingMixin, RequestHooksMixin):
|
||||
# ... (other prepare methods like prepare_url, prepare_headers) ...
|
||||
|
||||
def prepare_auth(self, auth, url=""):
|
||||
"""Prepares the given HTTP auth data."""
|
||||
|
||||
# If no Auth provided, maybe get it from the URL (e.g., http://user:pass@host)
|
||||
if auth is None:
|
||||
url_auth = get_auth_from_url(self.url)
|
||||
auth = url_auth if any(url_auth) else None
|
||||
|
||||
if auth:
|
||||
# If auth is a ('user', 'pass') tuple, wrap it in HTTPBasicAuth
|
||||
if isinstance(auth, tuple) and len(auth) == 2:
|
||||
auth = HTTPBasicAuth(*auth)
|
||||
|
||||
# --- The Core Step ---
|
||||
# Call the auth object (which must be callable, like AuthBase subclasses)
|
||||
# Pass 'self' (the PreparedRequest instance) to the auth object's __call__
|
||||
r = auth(self)
|
||||
|
||||
# Update self to reflect any changes made by the auth object
|
||||
# (Auth objects typically just modify headers, but could do more)
|
||||
self.__dict__.update(r.__dict__)
|
||||
|
||||
# Recompute Content-Length in case auth modified the body (unlikely for Basic/Digest)
|
||||
self.prepare_content_length(self.body)
|
||||
|
||||
# ... (rest of PreparedRequest) ...
|
||||
```
|
||||
|
||||
The key is the `r = auth(self)` line, where the `PreparedRequest` delegates the task of adding authentication details to the specific authentication handler object provided.
|
||||
|
||||
## Conclusion
|
||||
|
||||
You've learned how `requests` handles HTTP authentication using **Authentication Handlers**.
|
||||
|
||||
* You saw that authentication is like providing an **ID badge** with your request.
|
||||
* You learned about common schemes like **Basic Auth** (using a simple `(user, pass)` tuple or `HTTPBasicAuth`) and **Digest Auth** (`HTTPDigestAuth`).
|
||||
* You know how to apply authentication to single requests or persistently using a [Session](03_session.md) object via the `auth` parameter.
|
||||
* You understand that internally, `requests` calls the provided auth object, which modifies the `PreparedRequest` (usually by adding an `Authorization` header) before sending it.
|
||||
* You got a glimpse of how custom authentication can be built using `AuthBase`.
|
||||
|
||||
Authentication is crucial for accessing protected resources. But what happens when things go wrong? A server might be down, a URL might be invalid, or authentication might fail. How does `requests` tell you about these problems?
|
||||
|
||||
**Next:** [Chapter 6: Exception Hierarchy](06_exception_hierarchy.md)
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
372
docs/Requests/06_exception_hierarchy.md
Normal file
372
docs/Requests/06_exception_hierarchy.md
Normal file
@@ -0,0 +1,372 @@
|
||||
# Chapter 6: When Things Go Wrong - The Exception Hierarchy
|
||||
|
||||
In [Chapter 5: Authentication Handlers](05_authentication_handlers.md), we learned how to prove our identity to websites that require login or API keys. We assumed our requests would work if we provided the correct credentials.
|
||||
|
||||
But what happens when things *don't* go as planned? The internet isn't always reliable. Websites go down, networks have hiccups, URLs might be typed incorrectly, or servers might just be having a bad day. How does `requests` tell us about these problems, and how can we handle them gracefully in our code?
|
||||
|
||||
## The Problem: Dealing with Request Failures
|
||||
|
||||
Imagine you're building a script to check the weather using an online weather API. You use `requests.get()` to fetch the weather data. What could go wrong?
|
||||
|
||||
* Your internet connection might be down.
|
||||
* The weather API website might be temporarily offline.
|
||||
* You might have mistyped the URL.
|
||||
* The website might take too long to respond (a timeout).
|
||||
* The website might respond, but with an error message (like "404 Not Found" or "500 Server Error").
|
||||
|
||||
If any of these happen, `requests` will encounter an error. If you don't prepare for these errors, your script might crash! We need a way to:
|
||||
|
||||
1. **Detect** that an error occurred.
|
||||
2. **Understand** *what kind* of error it was (network issue? timeout? bad URL?).
|
||||
3. **React** appropriately (e.g., print a helpful message, try again later, use a default value).
|
||||
|
||||
## The Solution: A Family Tree of Errors
|
||||
|
||||
`Requests` helps us by using a system of specific error messages called **exceptions**. When something goes wrong, `requests` doesn't just give up silently; it **raises an exception**.
|
||||
|
||||
Think of it like a doctor diagnosing an illness. A doctor doesn't just say "You're sick." They give a specific diagnosis: "You have the flu," or "You have a broken arm," or "You have allergies." Each diagnosis tells you something specific about the problem and how to treat it.
|
||||
|
||||
`Requests` does something similar with its exceptions. It has a main, general exception called `requests.exceptions.RequestException`. All other specific `requests` errors are "children" or "descendants" of this main one, forming an **Exception Hierarchy** (like a family tree).
|
||||
|
||||
**Analogy:** The "Sickness" Family Tree 🌳
|
||||
|
||||
* **`RequestException` (The Grandparent):** This is the most general category, like saying "Sickness." If you catch this, you catch *any* problem related to `requests`.
|
||||
* **`ConnectionError`, `Timeout`, `HTTPError`, `URLRequired` (The Parents):** These are more specific categories under `RequestException`.
|
||||
* `ConnectionError` is like saying "Infection."
|
||||
* `Timeout` is like saying "Fatigue."
|
||||
* `HTTPError` is like saying "External Injury."
|
||||
* `URLRequired` is like saying "Genetic Condition" (problem with the input itself).
|
||||
* **`ConnectTimeout`, `ReadTimeout` (The Children):** These are even *more* specific.
|
||||
* `ConnectTimeout` (child of `Timeout`) is like "Trouble Falling Asleep."
|
||||
* `ReadTimeout` (child of `Timeout`) is like "Waking Up Too Early." Both are types of "Fatigue" (`Timeout`).
|
||||
|
||||
This hierarchy allows you to decide how specific you want to be when handling errors.
|
||||
|
||||
## Key Members of the Exception Family
|
||||
|
||||
All `requests` exceptions live inside the `requests.exceptions` module. You usually import the main `requests` library and access them like `requests.exceptions.ConnectionError`.
|
||||
|
||||
Here are some of the most common ones you'll encounter:
|
||||
|
||||
* **`requests.exceptions.RequestException`**: The base exception. Catching this catches *all* exceptions listed below.
|
||||
* **`requests.exceptions.ConnectionError`**: Problems connecting to the server. This could be due to:
|
||||
* DNS failure (can't find the server's address).
|
||||
* Refused connection (server is there but not accepting connections).
|
||||
* Network is unreachable.
|
||||
* **`requests.exceptions.Timeout`**: The request took too long. This is a parent category for:
|
||||
* **`requests.exceptions.ConnectTimeout`**: Timeout occurred *while trying to establish the connection*.
|
||||
* **`requests.exceptions.ReadTimeout`**: Timeout occurred *after connecting*, while waiting for the server to send data.
|
||||
* **`requests.exceptions.HTTPError`**: Raised when the server returns a "bad" status code (4xx for client errors like "404 Not Found", or 5xx for server errors like "500 Internal Server Error"). **Important:** `requests` does *not* automatically raise this just because the status code is bad. You typically need to call the `response.raise_for_status()` method to trigger it.
|
||||
* **`requests.exceptions.TooManyRedirects`**: The request exceeded the maximum number of allowed redirects (usually 30).
|
||||
* **`requests.exceptions.URLRequired`**: You tried to make a request without providing a URL.
|
||||
* **`requests.exceptions.MissingSchema`**: The URL was missing the scheme (like `http://` or `https://`).
|
||||
* **`requests.exceptions.InvalidURL`**: The URL was malformed in some way.
|
||||
* **`requests.exceptions.InvalidSchema`**: The URL scheme was not recognized (e.g., `ftp://` might not be supported by default).
|
||||
|
||||
## Handling Exceptions: The `try...except` Block
|
||||
|
||||
How do we use this hierarchy in our code? We use Python's `try...except` block.
|
||||
|
||||
1. Put the code that *might* cause an error (like `requests.get()`) inside the `try:` block.
|
||||
2. Follow it with one or more `except:` blocks. Each `except:` block specifies the type of exception it's designed to catch.
|
||||
|
||||
**Example 1: Catching Any `requests` Error**
|
||||
|
||||
Let's try fetching a URL that doesn't exist and catch the most general exception.
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
# A URL that might cause a connection error (e.g., non-existent domain)
|
||||
bad_url = 'https://this-domain-probably-does-not-exist-asdfghjkl.com'
|
||||
good_url = 'https://httpbin.org/get'
|
||||
|
||||
url_to_try = bad_url # Change to good_url to see success case
|
||||
|
||||
print(f"Trying to fetch: {url_to_try}")
|
||||
|
||||
try:
|
||||
response = requests.get(url_to_try, timeout=5) # Add timeout
|
||||
response.raise_for_status() # Check for 4xx/5xx errors
|
||||
print("Success! Status Code:", response.status_code)
|
||||
# Process the response... (e.g., print response.text)
|
||||
|
||||
except requests.exceptions.RequestException as e:
|
||||
# This will catch ANY error originating from requests
|
||||
print(f"\nOh no! A requests-related error occurred:")
|
||||
print(f"Error Type: {type(e).__name__}")
|
||||
print(f"Error Details: {e}")
|
||||
|
||||
print("\nScript continues after handling the error.")
|
||||
```
|
||||
|
||||
**Possible Output (if `url_to_try = bad_url`):**
|
||||
|
||||
```
|
||||
Trying to fetch: https://this-domain-probably-does-not-exist-asdfghjkl.com
|
||||
|
||||
Oh no! A requests-related error occurred:
|
||||
Error Type: ConnectionError
|
||||
Error Details: HTTPSConnectionPool(host='this-domain-probably-does-not-exist-asdfghjkl.com', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x...>: Failed to resolve 'this-domain-probably-does-not-exist-asdfghjkl.com' ([Errno ...)"))
|
||||
|
||||
Script continues after handling the error.
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
* We put `requests.get()` and `response.raise_for_status()` inside the `try` block.
|
||||
* If `requests.get()` fails (e.g., due to `ConnectionError` or `Timeout`), or if `raise_for_status()` detects a 4xx/5xx code (`HTTPError`), an exception is raised.
|
||||
* The `except requests.exceptions.RequestException as e:` block catches it because `ConnectionError`, `Timeout`, and `HTTPError` are all descendants of `RequestException`.
|
||||
* We print a helpful message and the details of the error (`e`). Crucially, the script *doesn't crash*.
|
||||
|
||||
**Example 2: Catching Specific Errors**
|
||||
|
||||
Sometimes, you want to react differently based on the *type* of error. Was it a temporary network glitch, or did the server permanently remove the page?
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
# URL that gives a 404 error
|
||||
not_found_url = 'https://httpbin.org/status/404'
|
||||
# URL that is slow and might time out
|
||||
timeout_url = 'https://httpbin.org/delay/5' # Delays response by 5 seconds
|
||||
|
||||
url_to_try = timeout_url # Change to not_found_url to see HTTPError
|
||||
|
||||
print(f"Trying to fetch: {url_to_try}")
|
||||
|
||||
try:
|
||||
# Set a short timeout to demonstrate Timeout exception
|
||||
response = requests.get(url_to_try, timeout=2)
|
||||
response.raise_for_status() # Check for 4xx/5xx status codes
|
||||
print("Success! Status Code:", response.status_code)
|
||||
# Process response...
|
||||
|
||||
except requests.exceptions.ConnectTimeout as e:
|
||||
print(f"\nError: Could not connect to the server in time.")
|
||||
print(f"Details: {e}")
|
||||
# Maybe retry later?
|
||||
|
||||
except requests.exceptions.ReadTimeout as e:
|
||||
print(f"\nError: Server took too long to send data.")
|
||||
print(f"Details: {e}")
|
||||
# Maybe the server is slow, could try again?
|
||||
|
||||
except requests.exceptions.ConnectionError as e:
|
||||
print(f"\nError: Network problem (e.g., DNS error, refused connection).")
|
||||
print(f"Details: {e}")
|
||||
# Check internet connection?
|
||||
|
||||
except requests.exceptions.HTTPError as e:
|
||||
print(f"\nError: Bad HTTP status code received from server.")
|
||||
print(f"Status Code: {e.response.status_code}")
|
||||
print(f"Details: {e}")
|
||||
# Was it a 404 Not Found? 500 Server Error?
|
||||
|
||||
except requests.exceptions.RequestException as e:
|
||||
# Catch any other requests error that wasn't specifically handled above
|
||||
print(f"\nAn unexpected requests error occurred:")
|
||||
print(f"Error Type: {type(e).__name__}")
|
||||
print(f"Details: {e}")
|
||||
|
||||
print("\nScript continues...")
|
||||
```
|
||||
|
||||
**Possible Output (if `url_to_try = timeout_url`):**
|
||||
|
||||
```
|
||||
Trying to fetch: https://httpbin.org/delay/5
|
||||
|
||||
Error: Server took too long to send data.
|
||||
Details: HTTPSConnectionPool(host='httpbin.org', port=443): Read timed out. (read timeout=2)
|
||||
|
||||
Script continues...
|
||||
```
|
||||
|
||||
**Possible Output (if `url_to_try = not_found_url`):**
|
||||
|
||||
```
|
||||
Trying to fetch: https://httpbin.org/status/404
|
||||
|
||||
Error: Bad HTTP status code received from server.
|
||||
Status Code: 404
|
||||
Details: 404 Client Error: NOT FOUND for url: https://httpbin.org/status/404
|
||||
|
||||
Script continues...
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
* We have multiple `except` blocks, ordered from most specific (`ConnectTimeout`, `ReadTimeout`) to more general (`ConnectionError`, `HTTPError`) and finally the catch-all `RequestException`.
|
||||
* Python tries the `except` blocks in order. When an exception occurs, the *first* matching block is executed.
|
||||
* If a `ReadTimeout` occurs, the `except requests.exceptions.ReadTimeout` block handles it. It won't fall through to the `except requests.exceptions.ConnectionError` or `except requests.exceptions.RequestException` blocks, even though `ReadTimeout` *is* a type of `RequestException`.
|
||||
* This allows us to provide specific feedback or recovery logic for different error scenarios.
|
||||
|
||||
**Inheritance Benefit:** If you write `except requests.exceptions.Timeout as e:`, this block will catch *both* `ConnectTimeout` and `ReadTimeout` because they inherit from `Timeout`.
|
||||
|
||||
## How It Works Internally: Wrapping Lower-Level Errors
|
||||
|
||||
`Requests` doesn't handle network connections directly. It uses a lower-level library called `urllib3` under the hood (managed via [Transport Adapters](07_transport_adapters.md)). When `urllib3` encounters a network problem (like a connection error or timeout), it raises its *own* specific exceptions (e.g., `urllib3.exceptions.MaxRetryError`, `urllib3.exceptions.NewConnectionError`, `urllib3.exceptions.ReadTimeoutError`).
|
||||
|
||||
`Requests` catches these `urllib3` exceptions inside its [Transport Adapters](07_transport_adapters.md) (specifically, the `HTTPAdapter.send` method) and then **raises its own corresponding exception** from the `requests.exceptions` hierarchy. This simplifies things for you – you only need to worry about catching `requests` exceptions, not the underlying `urllib3` ones.
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant UserCode as Your Code
|
||||
participant ReqAPI as requests.get()
|
||||
participant Adapter as HTTPAdapter
|
||||
participant Urllib3 as urllib3 library
|
||||
participant Network
|
||||
|
||||
UserCode->>ReqAPI: requests.get(bad_url, timeout=1)
|
||||
ReqAPI->>Adapter: send(prepared_request)
|
||||
Adapter->>Urllib3: urlopen(method, url, ..., timeout=1)
|
||||
Urllib3->>Network: Attempt connection...
|
||||
Network-->>Urllib3: Fails (e.g., DNS lookup fails)
|
||||
Urllib3->>Urllib3: Raise urllib3.exceptions.NewConnectionError
|
||||
Urllib3-->>Adapter: Propagate NewConnectionError
|
||||
Adapter->>Adapter: Catch NewConnectionError
|
||||
Adapter->>Adapter: Raise requests.exceptions.ConnectionError(original_error)
|
||||
Adapter-->>ReqAPI: Propagate ConnectionError
|
||||
ReqAPI-->>UserCode: Propagate ConnectionError
|
||||
UserCode->>UserCode: Catch requests.exceptions.ConnectionError
|
||||
```
|
||||
|
||||
Let's look at the definitions in `requests/exceptions.py`. You can see the inheritance structure clearly:
|
||||
|
||||
```python
|
||||
# File: requests/exceptions.py (Simplified View)
|
||||
|
||||
from urllib3.exceptions import HTTPError as BaseHTTPError
|
||||
|
||||
# The base class for all requests exceptions
|
||||
class RequestException(IOError):
|
||||
"""There was an ambiguous exception that occurred while handling your request."""
|
||||
# ... (stores request/response objects) ...
|
||||
|
||||
# Specific exceptions inheriting from RequestException or other requests exceptions
|
||||
class HTTPError(RequestException):
|
||||
"""An HTTP error occurred.""" # Typically raised by response.raise_for_status()
|
||||
|
||||
class ConnectionError(RequestException):
|
||||
"""A Connection error occurred."""
|
||||
|
||||
class ProxyError(ConnectionError): # Inherits from ConnectionError
|
||||
"""A proxy error occurred."""
|
||||
|
||||
class SSLError(ConnectionError): # Inherits from ConnectionError
|
||||
"""An SSL error occurred."""
|
||||
|
||||
class Timeout(RequestException): # Inherits directly from RequestException
|
||||
"""The request timed out."""
|
||||
|
||||
class ConnectTimeout(ConnectionError, Timeout): # Inherits from BOTH ConnectionError and Timeout!
|
||||
"""The request timed out while trying to connect to the remote server."""
|
||||
|
||||
class ReadTimeout(Timeout): # Inherits from Timeout
|
||||
"""The server did not send any data in the allotted amount of time."""
|
||||
|
||||
class URLRequired(RequestException):
|
||||
"""A valid URL is required to make a request."""
|
||||
|
||||
class TooManyRedirects(RequestException):
|
||||
"""Too many redirects."""
|
||||
|
||||
# ... other specific errors like MissingSchema, InvalidURL, etc. ...
|
||||
|
||||
# Some exceptions might also inherit from standard Python errors
|
||||
class JSONDecodeError(RequestException, ValueError): # Inherits from RequestException and ValueError
|
||||
"""Couldn't decode the text into json"""
|
||||
# Uses Python's built-in JSONDecodeError capabilities
|
||||
|
||||
```
|
||||
|
||||
And here's a simplified view of how `requests/adapters.py` (`HTTPAdapter.send`) catches `urllib3` errors and raises `requests` errors:
|
||||
|
||||
```python
|
||||
# File: requests/adapters.py (Simplified View in HTTPAdapter.send method)
|
||||
|
||||
from urllib3.exceptions import (
|
||||
MaxRetryError, ConnectTimeoutError, NewConnectionError, ResponseError,
|
||||
ProxyError as _ProxyError, SSLError as _SSLError, ReadTimeoutError,
|
||||
ProtocolError, ClosedPoolError, InvalidHeader as _InvalidHeader
|
||||
)
|
||||
from ..exceptions import (
|
||||
ConnectionError, ConnectTimeout, ReadTimeout, SSLError, ProxyError,
|
||||
RetryError, InvalidHeader, RequestException # And others
|
||||
)
|
||||
|
||||
class HTTPAdapter(BaseAdapter):
|
||||
def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
|
||||
# ... (prepare connection using self.get_connection_with_tls_context) ...
|
||||
conn = self.get_connection_with_tls_context(...)
|
||||
# ... (verify certs, prepare URL, add headers) ...
|
||||
|
||||
try:
|
||||
# === Make the actual request using urllib3 ===
|
||||
resp = conn.urlopen(
|
||||
method=request.method,
|
||||
url=url,
|
||||
# ... other args like body, headers ...
|
||||
retries=self.max_retries,
|
||||
timeout=timeout,
|
||||
)
|
||||
|
||||
# === Catch specific urllib3 errors and raise corresponding requests errors ===
|
||||
|
||||
except (ProtocolError, OSError) as err: # General network/protocol errors
|
||||
raise ConnectionError(err, request=request)
|
||||
|
||||
except MaxRetryError as e: # urllib3 retried but failed
|
||||
if isinstance(e.reason, ConnectTimeoutError):
|
||||
raise ConnectTimeout(e, request=request)
|
||||
if isinstance(e.reason, ResponseError): # Errors related to retry logic
|
||||
raise RetryError(e, request=request)
|
||||
if isinstance(e.reason, _ProxyError):
|
||||
raise ProxyError(e, request=request)
|
||||
if isinstance(e.reason, _SSLError):
|
||||
raise SSLError(e, request=request)
|
||||
# Fallback for other retry errors
|
||||
raise ConnectionError(e, request=request)
|
||||
|
||||
except ClosedPoolError as e: # Connection pool was closed
|
||||
raise ConnectionError(e, request=request)
|
||||
|
||||
except _ProxyError as e: # Direct proxy error
|
||||
raise ProxyError(e)
|
||||
|
||||
except (_SSLError, ReadTimeoutError, _InvalidHeader) as e: # Other specific errors
|
||||
if isinstance(e, _SSLError):
|
||||
raise SSLError(e, request=request)
|
||||
elif isinstance(e, ReadTimeoutError):
|
||||
raise ReadTimeout(e, request=request)
|
||||
elif isinstance(e, _InvalidHeader):
|
||||
raise InvalidHeader(e, request=request)
|
||||
else:
|
||||
# Should not happen, but raise generic RequestException if needed
|
||||
raise RequestException(e, request=request)
|
||||
|
||||
# ... (build and return the Response object if successful) ...
|
||||
return self.build_response(request, resp)
|
||||
```
|
||||
|
||||
This wrapping makes your life easier by providing a consistent set of exceptions (`requests.exceptions`) to handle, regardless of the underlying `urllib3` details.
|
||||
|
||||
## Conclusion
|
||||
|
||||
You've learned about the `requests` **Exception Hierarchy** – a family tree of error types that `requests` raises when things go wrong.
|
||||
|
||||
* You saw that all `requests` exceptions inherit from the base `requests.exceptions.RequestException`.
|
||||
* You learned about key specific exceptions like `ConnectionError`, `Timeout` (and its children `ConnectTimeout`, `ReadTimeout`), and `HTTPError` (raised by `response.raise_for_status()`).
|
||||
* You practiced using `try...except` blocks to catch both general (`RequestException`) and specific exceptions, allowing for tailored error handling.
|
||||
* You understood that `requests` wraps lower-level errors (from `urllib3`) into its own exception types, simplifying error handling for you.
|
||||
|
||||
Understanding this hierarchy is crucial for writing robust Python code that can gracefully handle the inevitable problems that occur when dealing with networks and web services.
|
||||
|
||||
So far, we've mostly used the default way `requests` handles connections. But what if we need more control over how connections are made, maybe to configure retries differently, or use different SSL settings? That's where Transport Adapters come in.
|
||||
|
||||
**Next:** [Chapter 7: Transport Adapters](07_transport_adapters.md)
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
404
docs/Requests/07_transport_adapters.md
Normal file
404
docs/Requests/07_transport_adapters.md
Normal file
@@ -0,0 +1,404 @@
|
||||
# Chapter 7: Transport Adapters - Custom Delivery Routes
|
||||
|
||||
In the previous chapter, [Chapter 6: Exception Hierarchy](06_exception_hierarchy.md), we learned how `requests` signals problems like network errors or bad responses. Most of the time, we rely on the default way `requests` handles sending our requests and managing connections.
|
||||
|
||||
But what if the default way isn't quite right for a specific website or service? What if you need to tell `requests` *exactly* how to handle connections or retries for URLs starting with `http://` or `https://`, or maybe even for a completely custom scheme like `myprotocol://`?
|
||||
|
||||
## The Problem: Needing Special Handling
|
||||
|
||||
Imagine you're interacting with an API that's known to be a bit unreliable. Sometimes requests to it fail temporarily, but succeed if you just try again a second later. The default `requests` behavior might not retry enough times, or maybe you want to retry only on specific error codes.
|
||||
|
||||
Or perhaps you need to connect to a server using very specific security settings (SSL/TLS versions or ciphers) that aren't the default.
|
||||
|
||||
How can you customize *how* `requests` sends requests and manages connections for specific types of URLs?
|
||||
|
||||
## Meet Transport Adapters: The Delivery Services
|
||||
|
||||
This is where **Transport Adapters** come in!
|
||||
|
||||
Think of a `requests` [Session](03_session.md) object like a customer ordering packages online. The customer (Session) wants to send a package (a web request) to a specific address (a URL).
|
||||
|
||||
**Transport Adapters** are like the different **delivery services** (like FedEx, UPS, USPS, or maybe a specialized local courier) that the customer can choose from.
|
||||
|
||||
* Each delivery service specializes in certain types of addresses or delivery methods.
|
||||
* When the customer has a package for a specific address (e.g., starting with `https://`), they pick the appropriate delivery service registered for that address type.
|
||||
* That delivery service then handles all the details of picking up, transporting, and delivering the package (sending the request, managing connections, handling retries, etc.).
|
||||
|
||||
In `requests`, a Transport Adapter defines *how* requests are actually sent and connections are managed for specific **URL schemes** (like `http://` or `https://`).
|
||||
|
||||
## The Default Delivery Service: `HTTPAdapter`
|
||||
|
||||
By default, when you create a `Session` object, it automatically sets up the standard "delivery services" for web addresses:
|
||||
|
||||
* For URLs starting with `https://`, it uses the built-in `requests.adapters.HTTPAdapter`.
|
||||
* For URLs starting with `http://`, it also uses the `requests.adapters.HTTPAdapter`.
|
||||
|
||||
This `HTTPAdapter` is the workhorse. It doesn't handle the network sockets directly; instead, it uses another powerful library called `urllib3` under the hood.
|
||||
|
||||
The `HTTPAdapter` (via `urllib3`) is responsible for:
|
||||
|
||||
1. **Connection Pooling:** Reusing existing network connections to the same host for better performance (like the delivery service keeping its trucks warm and ready for the next delivery to the same neighborhood). We saw the benefits of this in [Chapter 3: Session](03_session.md).
|
||||
2. **HTTP/HTTPS Details:** Handling the specifics of the HTTP and HTTPS protocols.
|
||||
3. **SSL Verification:** Making sure the website's security certificate is valid for HTTPS connections.
|
||||
4. **Basic Retries:** Handling some low-level connection retries (though often you might want more control).
|
||||
|
||||
So, when you use a `Session` and make a `GET` request to `https://example.com`, the Session looks up the adapter for `https://`, finds the default `HTTPAdapter`, and hands the request off to it for delivery.
|
||||
|
||||
## Mounting Adapters: Choosing Your Delivery Service
|
||||
|
||||
How does a `Session` know which adapter to use for which URL prefix? It uses a mechanism called **mounting**.
|
||||
|
||||
Think of it like telling your `Session` customer: "For any address starting with `https://`, use this specific delivery service (adapter)."
|
||||
|
||||
A `Session` object has an `adapters` attribute, which is an ordered dictionary. You use the `session.mount(prefix, adapter)` method to register an adapter for a given URL prefix.
|
||||
|
||||
```python
|
||||
import requests
|
||||
from requests.adapters import HTTPAdapter
|
||||
|
||||
# Create a session
|
||||
s = requests.Session()
|
||||
|
||||
# See the default adapters that are already mounted
|
||||
print("Default Adapters:")
|
||||
print(s.adapters)
|
||||
|
||||
# Create a *new* instance of the default HTTPAdapter
|
||||
# (Maybe we'll configure it later)
|
||||
custom_adapter = HTTPAdapter()
|
||||
|
||||
# Mount this adapter for a specific website
|
||||
# Now, any request to this specific host via HTTPS will use our custom_adapter
|
||||
print("\nMounting custom adapter for https://httpbin.org")
|
||||
s.mount('https://httpbin.org', custom_adapter)
|
||||
|
||||
# Let's mount another one for all HTTP traffic
|
||||
plain_http_adapter = HTTPAdapter()
|
||||
print("Mounting another adapter for all http://")
|
||||
s.mount('http://', plain_http_adapter)
|
||||
|
||||
# Check the adapters again (they are ordered by prefix length, longest first)
|
||||
print("\nAdapters after mounting:")
|
||||
print(s.adapters)
|
||||
|
||||
# When we make a request, the session finds the best matching prefix
|
||||
print(f"\nAdapter for 'https://httpbin.org/get': {s.get_adapter('https://httpbin.org/get')}")
|
||||
print(f"Adapter for 'http://example.com': {s.get_adapter('http://example.com')}")
|
||||
print(f"Adapter for 'https://google.com': {s.get_adapter('https://google.com')}") # Uses default https://
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
Default Adapters:
|
||||
OrderedDict([('https://', <requests.adapters.HTTPAdapter object at 0x...>), ('http://', <requests.adapters.HTTPAdapter object at 0x...>)])
|
||||
|
||||
Mounting custom adapter for https://httpbin.org
|
||||
Mounting another adapter for all http://
|
||||
|
||||
Adapters after mounting:
|
||||
OrderedDict([('https://httpbin.org', <requests.adapters.HTTPAdapter object at 0x...>), ('https://', <requests.adapters.HTTPAdapter object at 0x...>), ('http://', <requests.adapters.HTTPAdapter object at 0x...>)])
|
||||
|
||||
Adapter for 'https://httpbin.org/get': <requests.adapters.HTTPAdapter object at 0x...>
|
||||
Adapter for 'http://example.com': <requests.adapters.HTTPAdapter object at 0x...>
|
||||
Adapter for 'https://google.com': <requests.adapters.HTTPAdapter object at 0x...>
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
1. Initially, the session has default `HTTPAdapter` instances mounted for `https://` and `http://`.
|
||||
2. We created new `HTTPAdapter` instances.
|
||||
3. We used `s.mount('https://httpbin.org', custom_adapter)`. Now, requests to `https://httpbin.org/anything` will use `custom_adapter`.
|
||||
4. We used `s.mount('http://', plain_http_adapter)`. This *replaced* the original default adapter for `http://`.
|
||||
5. Requests to other HTTPS sites like `https://google.com` will still use the original default adapter mounted for the shorter `https://` prefix.
|
||||
6. The `s.get_adapter(url)` method shows how the session selects the adapter based on the longest matching prefix.
|
||||
|
||||
## Use Case: Customizing Retries
|
||||
|
||||
Let's go back to the unreliable API example. We want to configure `requests` to automatically retry requests to `https://flaky-api.example.com` up to 5 times if certain errors occur (like temporary server errors or connection issues).
|
||||
|
||||
The `HTTPAdapter`'s retry logic is controlled by a `Retry` object from the underlying `urllib3` library. We can create our own `Retry` object with custom settings and pass it to a *new* `HTTPAdapter` instance.
|
||||
|
||||
```python
|
||||
import requests
|
||||
from requests.adapters import HTTPAdapter
|
||||
from urllib3.util.retry import Retry # Import the Retry class
|
||||
|
||||
# 1. Configure the retry strategy
|
||||
# - total=5: Try up to 5 times in total
|
||||
# - backoff_factor=0.5: Wait 0.5s, 1s, 2s, 4s between retries
|
||||
# - status_forcelist=[500, 502, 503, 504]: Only retry on these HTTP status codes
|
||||
# - allowed_methods=False: Retry for all methods (GET, POST, etc.) by default. Use ["GET", "POST"] to restrict.
|
||||
retry_strategy = Retry(
|
||||
total=5,
|
||||
backoff_factor=0.5,
|
||||
status_forcelist=[500, 502, 503, 504],
|
||||
# allowed_methods=False # Default includes most common methods
|
||||
)
|
||||
|
||||
# 2. Create an HTTPAdapter with this retry strategy
|
||||
# The 'max_retries' argument accepts a Retry object
|
||||
adapter_with_retries = HTTPAdapter(max_retries=retry_strategy)
|
||||
|
||||
# 3. Create a Session
|
||||
session = requests.Session()
|
||||
|
||||
# 4. Mount the adapter for the specific API prefix
|
||||
api_base_url = 'https://flaky-api.example.com/' # Use the base URL prefix
|
||||
session.mount(api_base_url, adapter_with_retries)
|
||||
|
||||
# 5. Now, use the session to make requests to the flaky API
|
||||
api_endpoint = f"{api_base_url}data"
|
||||
print(f"Making request to {api_endpoint} with custom retries...")
|
||||
|
||||
try:
|
||||
# Imagine this API sometimes returns 503 Service Unavailable
|
||||
response = session.get(api_endpoint)
|
||||
response.raise_for_status() # Check for HTTP errors
|
||||
print("Success!")
|
||||
# print(response.json()) # Process the successful response
|
||||
except requests.exceptions.RequestException as e:
|
||||
print(f"Request failed after retries: {e}")
|
||||
|
||||
# Requests to other domains will use the default adapter/retries
|
||||
print("\nMaking request to a different site (default retries)...")
|
||||
try:
|
||||
response_other = session.get('https://httpbin.org/get')
|
||||
print(f"Status for httpbin: {response_other.status_code}")
|
||||
except requests.exceptions.RequestException as e:
|
||||
print(f"Httpbin request failed: {e}")
|
||||
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
1. We defined our desired retry behavior using `urllib3.util.retry.Retry`.
|
||||
2. We created a *new* `HTTPAdapter`, passing our `retry_strategy` to its `max_retries` parameter during initialization.
|
||||
3. We created a `Session`.
|
||||
4. Crucially, we `mount`ed our `adapter_with_retries` specifically to the base URL of the flaky API (`https://flaky-api.example.com/`).
|
||||
5. When `session.get(api_endpoint)` is called, the Session sees that the URL starts with the mounted prefix, so it uses our `adapter_with_retries`. If the server returns a `503` error, this adapter (using the `Retry` object) will automatically wait and try again, up to 5 times.
|
||||
6. Requests to `https://httpbin.org` don't match the specific prefix, so they fall back to the default adapter mounted for `https://`, which has default retry behavior.
|
||||
|
||||
This allows fine-grained control over connection handling for different destinations.
|
||||
|
||||
## How It Works Internally: The Session-Adapter Dance
|
||||
|
||||
Let's trace the steps when you call `session.get(url)`:
|
||||
|
||||
1. **`Session.request`:** Your `session.get(url, ...)` call ends up in the main `Session.request` method.
|
||||
2. **Prepare Request:** `Session.request` creates a `Request` object and calls `self.prepare_request(req)` to turn it into a `PreparedRequest`, merging session-level settings like headers and cookies (as seen in [Chapter 3: Session](03_session.md)).
|
||||
3. **Merge Environment Settings:** `Session.request` calls `self.merge_environment_settings(...)` to figure out final settings for proxies, SSL verification (`verify`), etc.
|
||||
4. **`Session.send`:** The prepared request (`prep`) and final settings (`send_kwargs`) are passed to `self.send(prep, **send_kwargs)`.
|
||||
5. **`get_adapter`:** Inside `Session.send`, the first crucial step is `adapter = self.get_adapter(url=request.url)`. This method looks through the `self.adapters` dictionary (which is ordered from longest prefix to shortest) and returns the *first* adapter whose mounted prefix matches the beginning of the request's URL.
|
||||
6. **`adapter.send`:** The `Session` then calls the `send` method *on the chosen adapter*: `r = adapter.send(request, **kwargs)`. **This is the handover!** The Session delegates the actual sending to the Transport Adapter.
|
||||
7. **Adapter Does the Work:** The adapter (e.g., `HTTPAdapter`) takes over.
|
||||
* It interacts with its `urllib3.PoolManager` to get a connection from the pool (or create one).
|
||||
* It configures SSL/TLS context based on `verify` and `cert` parameters.
|
||||
* It uses `urllib3` to send the actual HTTP request bytes over the network.
|
||||
* It applies retry logic (using the `Retry` object if configured) if `urllib3` reports certain connection errors or status codes.
|
||||
* It receives the raw HTTP response bytes from `urllib3`.
|
||||
8. **`adapter.build_response`:** The adapter takes the raw response data from `urllib3` and constructs a `requests.Response` object using its `build_response(request, raw_urllib3_response)` method. This involves parsing status codes, headers, and making the response body available.
|
||||
9. **Return Response:** The `adapter.send` method returns the fully formed `Response` object back to the `Session.send` method.
|
||||
10. **Post-Processing:** `Session.send` does some final steps, like extracting cookies from the response into the session's [Cookie Jar](04_cookie_jar.md) and handling redirects (which might involve calling `send` again).
|
||||
11. **Final Return:** The final `Response` object is returned to your original `session.get(url)` call.
|
||||
|
||||
Here's a simplified diagram:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant UserCode as Your Code
|
||||
participant Session as Session Object
|
||||
participant Adapter as Transport Adapter
|
||||
participant Urllib3 as urllib3 Library
|
||||
participant Server
|
||||
|
||||
UserCode->>Session: session.get(url)
|
||||
Session->>Session: prepare_request(req) -> PreparedRequest (prep)
|
||||
Session->>Session: merge_environment_settings() -> send_kwargs
|
||||
Session->>Session: get_adapter(url) -> adapter_instance
|
||||
Session->>Adapter: adapter_instance.send(prep, **send_kwargs)
|
||||
activate Adapter
|
||||
Adapter->>Urllib3: Get connection from PoolManager
|
||||
Adapter->>Urllib3: urlopen(prep.method, url, ..., retries=..., timeout=...)
|
||||
activate Urllib3
|
||||
Urllib3->>Server: Send HTTP Request Bytes
|
||||
Server-->>Urllib3: Receive HTTP Response Bytes
|
||||
Urllib3-->>Adapter: Return raw urllib3 response
|
||||
deactivate Urllib3
|
||||
Adapter->>Adapter: build_response(prep, raw_response) -> Response (r)
|
||||
Adapter-->>Session: Return Response (r)
|
||||
deactivate Adapter
|
||||
Session->>Session: Extract cookies, handle redirects...
|
||||
Session-->>UserCode: Return final Response
|
||||
```
|
||||
|
||||
Let's peek at the relevant code snippets:
|
||||
|
||||
```python
|
||||
# File: requests/sessions.py (Simplified View)
|
||||
|
||||
class Session:
|
||||
def __init__(self):
|
||||
# ... other defaults ...
|
||||
self.adapters = OrderedDict() # The mounted adapters
|
||||
self.mount('https://', HTTPAdapter()) # Mount default HTTPS adapter
|
||||
self.mount('http://', HTTPAdapter()) # Mount default HTTP adapter
|
||||
|
||||
def get_adapter(self, url):
|
||||
"""Returns the appropriate connection adapter for the given URL."""
|
||||
for prefix, adapter in self.adapters.items():
|
||||
# Find the longest prefix that matches the URL
|
||||
if url.lower().startswith(prefix.lower()):
|
||||
return adapter
|
||||
# No match found
|
||||
raise InvalidSchema(f"No connection adapters were found for {url!r}")
|
||||
|
||||
def mount(self, prefix, adapter):
|
||||
"""Registers a connection adapter to a prefix."""
|
||||
self.adapters[prefix] = adapter
|
||||
# Sort adapters by prefix length, descending (longest first)
|
||||
# Simplified: Real code sorts keys and rebuilds OrderedDict
|
||||
keys_to_move = [k for k in self.adapters if len(k) < len(prefix)]
|
||||
for key in keys_to_move:
|
||||
self.adapters[key] = self.adapters.pop(key)
|
||||
|
||||
def send(self, request, **kwargs):
|
||||
# ... setup kwargs (stream, verify, cert, proxies) ...
|
||||
|
||||
# === GET THE ADAPTER ===
|
||||
adapter = self.get_adapter(url=request.url)
|
||||
|
||||
# === DELEGATE TO THE ADAPTER ===
|
||||
# Start timer
|
||||
start = preferred_clock()
|
||||
# Call the adapter's send method
|
||||
r = adapter.send(request, **kwargs)
|
||||
# Stop timer
|
||||
elapsed = preferred_clock() - start
|
||||
r.elapsed = timedelta(seconds=elapsed)
|
||||
|
||||
# ... dispatch response hooks ...
|
||||
# ... persist cookies (extract_cookies_to_jar) ...
|
||||
# ... handle redirects (resolve_redirects, might call send again) ...
|
||||
|
||||
# ... maybe read content if stream=False ...
|
||||
return r
|
||||
|
||||
# File: requests/adapters.py (Simplified View)
|
||||
|
||||
from urllib3.util.retry import Retry
|
||||
from urllib3.poolmanager import PoolManager # Used internally by HTTPAdapter
|
||||
|
||||
class BaseAdapter:
|
||||
"""The Base Transport Adapter"""
|
||||
def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
|
||||
raise NotImplementedError
|
||||
def close(self):
|
||||
raise NotImplementedError
|
||||
|
||||
class HTTPAdapter(BaseAdapter):
|
||||
def __init__(self, pool_connections=10, pool_maxsize=10, max_retries=0, pool_block=False):
|
||||
# === STORE RETRY CONFIGURATION ===
|
||||
if isinstance(max_retries, Retry):
|
||||
self.max_retries = max_retries
|
||||
else:
|
||||
# Convert integer retries to a basic Retry object
|
||||
self.max_retries = Retry(total=max_retries, read=False, connect=max_retries)
|
||||
|
||||
# ... configure pooling options ...
|
||||
|
||||
# === INITIALIZE URLIB3 POOL MANAGER ===
|
||||
# This object manages connections using urllib3
|
||||
self.poolmanager = PoolManager(num_pools=pool_connections, maxsize=pool_maxsize, block=pool_block)
|
||||
self.proxy_manager = {} # For handling proxies
|
||||
|
||||
def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
|
||||
"""Sends PreparedRequest object using urllib3."""
|
||||
# ... determine connection pool (conn) based on URL, proxies, SSL context ...
|
||||
conn = self.get_connection_with_tls_context(request, verify, proxies=proxies, cert=cert)
|
||||
# ... determine URL to use (might be different for proxies) ...
|
||||
url = self.request_url(request, proxies)
|
||||
# ... configure timeout object for urllib3 ...
|
||||
timeout_obj = self._build_timeout(timeout)
|
||||
|
||||
try:
|
||||
# === CALL URLIB3 ===
|
||||
# This is the core network call
|
||||
resp = conn.urlopen(
|
||||
method=request.method,
|
||||
url=url,
|
||||
body=request.body,
|
||||
headers=request.headers,
|
||||
redirect=False, # Requests handles redirects
|
||||
assert_same_host=False,
|
||||
preload_content=False, # Requests streams content
|
||||
decode_content=False, # Requests handles decoding
|
||||
retries=self.max_retries, # Pass configured retries
|
||||
timeout=timeout_obj, # Pass configured timeout
|
||||
chunked=... # Determine if chunked encoding is needed
|
||||
)
|
||||
|
||||
except (urllib3_exceptions...) as err:
|
||||
# === WRAP URLIB3 EXCEPTIONS ===
|
||||
# Catch exceptions from urllib3 and raise corresponding
|
||||
# requests.exceptions (ConnectionError, Timeout, SSLError, etc.)
|
||||
# See Chapter 6 for details.
|
||||
raise MappedRequestsException(err, request=request)
|
||||
|
||||
# === BUILD RESPONSE OBJECT ===
|
||||
# Convert the raw urllib3 response into a requests.Response
|
||||
response = self.build_response(request, resp)
|
||||
|
||||
return response
|
||||
|
||||
def build_response(self, req, resp):
|
||||
"""Builds a requests.Response from a urllib3 response."""
|
||||
response = Response()
|
||||
response.status_code = getattr(resp, 'status', None)
|
||||
response.headers = CaseInsensitiveDict(getattr(resp, 'headers', {}))
|
||||
response.raw = resp # The raw urllib3 response object
|
||||
response.reason = response.raw.reason
|
||||
response.url = req.url
|
||||
# ... extract cookies, set encoding, link request ...
|
||||
response.request = req
|
||||
response.connection = self # Link back to this adapter
|
||||
return response
|
||||
|
||||
def close(self):
|
||||
"""Close the underlying PoolManager."""
|
||||
self.poolmanager.clear()
|
||||
# ... close proxy managers ...
|
||||
|
||||
# ... other helper methods (cert_verify, proxy_manager_for, request_url) ...
|
||||
|
||||
```
|
||||
|
||||
The key idea is that the `Session` finds the right `Adapter` using `mount` prefixes, and then the `Adapter` uses `urllib3` to handle the low-level details of connection pooling, retries, and HTTP communication.
|
||||
|
||||
## Other Use Cases
|
||||
|
||||
Besides custom retries, you might use Transport Adapters for:
|
||||
|
||||
* **Custom SSL/TLS Contexts:** Create an `HTTPAdapter` and initialize its `PoolManager` with a custom `ssl.SSLContext` for fine-grained control over TLS versions, ciphers, or certificate verification logic.
|
||||
* **SOCKS Proxies:** While `requests` doesn't support SOCKS natively, you can install a third-party library (like `requests-socks`) which provides a `SOCKSAdapter` that you can mount onto a session.
|
||||
* **Testing:** You could create a custom adapter that doesn't actually make network requests but returns predefined responses, useful for testing your application without hitting real servers.
|
||||
* **Custom Protocols:** If you needed to interact with a non-HTTP protocol, you could theoretically write a custom `BaseAdapter` subclass to handle it.
|
||||
|
||||
## Conclusion
|
||||
|
||||
You've learned about **Transport Adapters**, the pluggable backends that `requests` uses to handle the actual sending of requests and management of connections for different URL schemes (`http://`, `https://`, etc.).
|
||||
|
||||
* You saw the default adapter is `HTTPAdapter`, which uses `urllib3` for connection pooling, retries, and SSL.
|
||||
* You learned how `Session` objects `mount` adapters to specific URL prefixes.
|
||||
* You practiced customizing retry behavior by creating a new `HTTPAdapter` with a `urllib3.util.retry.Retry` object and mounting it to a session.
|
||||
* You traced how a `Session` finds and delegates work to the appropriate adapter via `adapter.send`.
|
||||
|
||||
Transport Adapters give you powerful, low-level control over how `requests` interacts with the network, allowing you to tailor its behavior for specific needs.
|
||||
|
||||
Adapters let you customize *how* requests are sent. What if you want to simply *react* to a request being sent or a response being received, perhaps to log it or modify it slightly on the fly? `Requests` has another mechanism for that.
|
||||
|
||||
**Next:** [Chapter 8: The Hook System](08_hook_system.md)
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
341
docs/Requests/08_hook_system.md
Normal file
341
docs/Requests/08_hook_system.md
Normal file
@@ -0,0 +1,341 @@
|
||||
# Chapter 8: The Hook System - Setting Up Checkpoints
|
||||
|
||||
In [Chapter 7: Transport Adapters](07_transport_adapters.md), we saw how to customize the low-level details of *how* requests are sent and connections are managed, like setting custom retry strategies. Transport Adapters give you control over the delivery mechanism itself.
|
||||
|
||||
But what if you don't need to change *how* the request is sent, but instead want to simply **react** when something happens during the process? For example, maybe you want to log every single response your application receives, or perhaps automatically add a timestamp to every request header just before it goes out (though this specific header example isn't currently supported by the default hooks).
|
||||
|
||||
## The Problem: Reacting to Events
|
||||
|
||||
Imagine you're building an application that interacts with several different web services. For debugging or monitoring purposes, you want to keep a record of every response you get back – specifically, the URL you requested and the status code the server returned.
|
||||
|
||||
You could manually add `print()` statements after every single `requests.get()`, `s.post()`, etc., call throughout your code:
|
||||
|
||||
```python
|
||||
# Manual logging (Repetitive!)
|
||||
response1 = s.get('https://api.service1.com/data')
|
||||
print(f"LOG: Got {response1.status_code} for {response1.url}")
|
||||
# ... process response1 ...
|
||||
|
||||
response2 = s.post('https://api.service2.com/action', data={'key': 'value'})
|
||||
print(f"LOG: Got {response2.status_code} for {response2.url}")
|
||||
# ... process response2 ...
|
||||
|
||||
response3 = s.get('https://api.service1.com/status')
|
||||
print(f"LOG: Got {response3.status_code} for {response3.url}")
|
||||
# ... process response3 ...
|
||||
```
|
||||
|
||||
This quickly becomes tedious and error-prone. If you forget to add the logging line, you miss that record. If you want to change the log format, you have to change it everywhere. Isn't there a way to tell `requests` to automatically run your logging code *every time* it gets a response?
|
||||
|
||||
## Meet the Hook System: Your Automated Checkpoints
|
||||
|
||||
Yes, there is! `Requests` provides a **Hook System** that lets you do just that.
|
||||
|
||||
Think of hooks like setting up **checkpoints** in the process of making a request and getting a response. When the process reaches a specific checkpoint, `requests` pauses briefly and calls any custom functions you've registered for that checkpoint.
|
||||
|
||||
**Analogy: Package Delivery Checkpoints** 📦
|
||||
|
||||
Imagine a package delivery process:
|
||||
1. Package picked up.
|
||||
2. Package arrives at sorting facility. -> **Checkpoint!** (Maybe run a function to scan the barcode).
|
||||
3. Package loaded onto delivery truck.
|
||||
4. Package delivered to recipient. -> **Checkpoint!** (Maybe run a function to get a signature).
|
||||
|
||||
The Hook System in `requests` works similarly. You can attach your own Python functions (called "hooks") to specific events (checkpoints).
|
||||
|
||||
Currently, the main event available is the **`response`** hook.
|
||||
* **`response` Hook:** This hook runs *after* a response has been received from the server and the basic `Response` object has been built, but *before* that `Response` object is returned to your code that called `requests.get()` or `s.post()`.
|
||||
|
||||
## Using the `response` Hook
|
||||
|
||||
Let's solve our logging problem using the `response` hook.
|
||||
|
||||
**Step 1: Define the Hook Function**
|
||||
|
||||
First, we need to write a Python function that will perform our logging action. This function needs to accept the `Response` object as its first argument. It can also accept optional keyword arguments (`**kwargs`), which `requests` might pass in (though for the `response` hook, the `Response` object is the main thing).
|
||||
|
||||
```python
|
||||
# Our custom hook function for logging
|
||||
def log_response_details(response, *args, **kwargs):
|
||||
"""
|
||||
This function will be called after each response.
|
||||
It logs the request method, URL, and response status code.
|
||||
"""
|
||||
# 'response' is the Response object just received
|
||||
request_method = response.request.method # Get the method from the original request
|
||||
url = response.url # Get the final URL
|
||||
status_code = response.status_code # Get the status code
|
||||
|
||||
print(f"HOOK LOG: Received {status_code} for {request_method} request to {url}")
|
||||
|
||||
# IMPORTANT: Hooks usually shouldn't return anything (or return None).
|
||||
# If a hook returns a value, it REPLACES the data being processed.
|
||||
# For the 'response' hook, returning a value would replace the Response object!
|
||||
# Since we just want to log, we don't return anything.
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
* The function `log_response_details` takes `response` as its first argument. This will be the `requests.Response` object.
|
||||
* It also accepts `*args` and `**kwargs` to be flexible, even though we don't use them here.
|
||||
* Inside the function, we access attributes of the `response` object (like `status_code`, `url`) and its associated request (`response.request.method`) to print our log message.
|
||||
* Crucially, this function *doesn't return anything*. If it did return a value, that value would replace the original `response` object for any further processing or for the final return value of `s.get()`.
|
||||
|
||||
**Step 2: Register the Hook**
|
||||
|
||||
Now we need to tell `requests` to actually *use* our `log_response_details` function. We can register hooks in two main ways:
|
||||
|
||||
1. **On a `Session` Object:** If you register a hook on a [Session](03_session.md) object, it will be called for *every request* made using that session. This is perfect for our logging use case.
|
||||
2. **On a Single `Request`:** You can also attach hooks to an individual `Request` object before preparing it. This is less common but useful if you only want a hook to run for one specific request.
|
||||
|
||||
Let's register our hook on a `Session`:
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
# (Paste the log_response_details function definition from above here)
|
||||
def log_response_details(response, *args, **kwargs):
|
||||
request_method = response.request.method
|
||||
url = response.url
|
||||
status_code = response.status_code
|
||||
print(f"HOOK LOG: Received {status_code} for {request_method} request to {url}")
|
||||
|
||||
# Create a Session
|
||||
s = requests.Session()
|
||||
|
||||
# Register the hook on the session
|
||||
# Hooks are stored in a dictionary: session.hooks = {'event_name': [list_of_functions]}
|
||||
# We add our function to the list for the 'response' event.
|
||||
s.hooks['response'].append(log_response_details)
|
||||
|
||||
# Now, make some requests using the session
|
||||
print("Making requests...")
|
||||
response1 = s.get('https://httpbin.org/get')
|
||||
print(f" -> Main code received response 1 with status: {response1.status_code}")
|
||||
|
||||
response2 = s.post('https://httpbin.org/post', data={'id': '123'})
|
||||
print(f" -> Main code received response 2 with status: {response2.status_code}")
|
||||
|
||||
response3 = s.get('https://httpbin.org/status/404') # This will get a 404
|
||||
print(f" -> Main code received response 3 with status: {response3.status_code}")
|
||||
```
|
||||
|
||||
**Expected Output:**
|
||||
|
||||
```
|
||||
Making requests...
|
||||
HOOK LOG: Received 200 for GET request to https://httpbin.org/get
|
||||
-> Main code received response 1 with status: 200
|
||||
HOOK LOG: Received 200 for POST request to https://httpbin.org/post
|
||||
-> Main code received response 2 with status: 200
|
||||
HOOK LOG: Received 404 for GET request to https://httpbin.org/status/404
|
||||
-> Main code received response 3 with status: 404
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
1. `s = requests.Session()`: We created a session.
|
||||
2. `s.hooks['response'].append(log_response_details)`: This is the key step. `s.hooks` is a dictionary where keys are event names (like `'response'`) and values are lists of functions to call for that event. We appended our logging function to the list for the `'response'` event.
|
||||
3. When we called `s.get(...)` or `s.post(...)`, the following happened internally:
|
||||
* The request was sent.
|
||||
* The response was received.
|
||||
* *Before* returning the response to our main code (`response1 = ...`), the `requests` Session checked its `hooks` dictionary for the `'response'` event.
|
||||
* It found our `log_response_details` function and called it, passing the received `Response` object.
|
||||
* Our hook function printed the log message.
|
||||
* Since the hook returned `None`, the original `Response` object was then returned to our main code.
|
||||
4. Notice how the "HOOK LOG" lines appear *before* the "Main code received response" lines, demonstrating that the hook runs after receiving the response but before the calling code gets it.
|
||||
|
||||
**Modifying the Response (Advanced)**
|
||||
|
||||
While our logging hook didn't return anything, a hook *can* modify the `Response` object it receives, or even return a completely different `Response` object.
|
||||
|
||||
```python
|
||||
def add_custom_header_hook(response, *args, **kwargs):
|
||||
"""Adds a custom header to the received response."""
|
||||
print("HOOK: Adding X-Hook-Processed header...")
|
||||
response.headers['X-Hook-Processed'] = 'True'
|
||||
# We modified the response in-place, so we return None
|
||||
# to let requests continue using the modified response.
|
||||
return None
|
||||
|
||||
# Or, a hook that returns a *new* response (less common)
|
||||
# def replace_response_hook(response, *args, **kwargs):
|
||||
# if response.status_code == 404:
|
||||
# print("HOOK: Replacing 404 response with a custom one!")
|
||||
# new_response = requests.Response()
|
||||
# new_response.status_code = 200
|
||||
# new_response.reason = "Found via Hook"
|
||||
# new_response._content = b"Content generated by hook!"
|
||||
# new_response.request = response.request # Keep original request link
|
||||
# return new_response # Return the NEW response
|
||||
# return None # Otherwise, keep the original response
|
||||
```
|
||||
|
||||
**Caution:** Modifying or replacing responses within hooks can be powerful but also confusing if not done carefully. For beginners, using hooks for actions like logging or metrics that don't change the response is often the safest starting point.
|
||||
|
||||
## How It Works Internally
|
||||
|
||||
Where exactly does `requests` call these hooks? The `response` hook is triggered within the `Session.send()` method, after the underlying [Transport Adapter](07_transport_adapters.md) has returned a response, but before things like cookie persistence and redirect handling are fully completed for that specific response.
|
||||
|
||||
1. **`Session.send()` Called:** Your code calls `s.get()` or `s.post()`, which eventually calls `Session.send()`.
|
||||
2. **Adapter Sends Request:** The session selects the appropriate [Transport Adapter](07_transport_adapters.md) (e.g., `HTTPAdapter`). The adapter sends the request and receives the raw response (`r = adapter.send(...)`).
|
||||
3. **Dispatch Hook:** Right after the adapter returns the `Response` object `r`, `Session.send()` calls `dispatch_hook("response", hooks, r, **kwargs)`. `hooks` here refers to the merged hooks from the `Request` and the `Session`.
|
||||
4. **`dispatch_hook()` Executes:** This helper function (from `requests.hooks`) looks up the list of functions registered for the `"response"` event. It iterates through this list, calling each hook function (like our `log_response_details`) one by one, passing the `Response` object (`r`) to it.
|
||||
5. **Hook Modifies/Replaces (Optional):** If a hook function returns a value, `dispatch_hook` updates `r` to be that new value. This allows hooks later in the list (or the main code) to see the modified response.
|
||||
6. **Further Processing:** After `dispatch_hook` returns the (potentially modified) `Response` object `r`, `Session.send()` continues with other tasks like extracting cookies from `r` into the session's jar and handling redirects (which might involve sending another request).
|
||||
7. **Return Response:** Finally, the `Response` object is returned to your original calling code.
|
||||
|
||||
Here's a simplified sequence diagram:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant UserCode as Your Code
|
||||
participant Session as Session Object
|
||||
participant Adapter as Transport Adapter
|
||||
participant Hooks as dispatch_hook()
|
||||
|
||||
UserCode->>Session: s.get(url) / s.post(url)
|
||||
Session->>Session: Calls prepare_request()
|
||||
Session->>Session: Gets adapter based on URL
|
||||
Session->>Adapter: adapter.send(request)
|
||||
activate Adapter
|
||||
Note over Adapter: Sends request, gets raw response
|
||||
Adapter->>Adapter: build_response() -> Response 'r'
|
||||
Adapter-->>Session: Return Response 'r'
|
||||
deactivate Adapter
|
||||
|
||||
Note over Session: Merges request and session hooks
|
||||
Session->>Hooks: dispatch_hook('response', merged_hooks, r)
|
||||
activate Hooks
|
||||
Note over Hooks: Iterates through registered hook functions
|
||||
Hooks->>Hooks: Call each hook_function(r)
|
||||
Note over Hooks: Hook might modify 'r' or return a new one
|
||||
Hooks-->>Session: Return (potentially modified) Response 'r'
|
||||
deactivate Hooks
|
||||
|
||||
Note over Session: Persist cookies from 'r', handle redirects...
|
||||
Session-->>UserCode: Return final Response 'r'
|
||||
|
||||
```
|
||||
|
||||
Let's look at the key code pieces:
|
||||
|
||||
```python
|
||||
# File: requests/hooks.py (Simplified)
|
||||
|
||||
HOOKS = ["response"] # Currently, only 'response' is actively used
|
||||
|
||||
def default_hooks():
|
||||
# Creates the initial empty structure for hooks
|
||||
return {event: [] for event in HOOKS}
|
||||
|
||||
def dispatch_hook(key, hooks, hook_data, **kwargs):
|
||||
"""Dispatches hooks for a given key event."""
|
||||
hooks = hooks or {} # Ensure hooks is a dict
|
||||
hooks = hooks.get(key) # Get the list of functions for this event key
|
||||
|
||||
if hooks:
|
||||
# Allow a single callable or a list
|
||||
if hasattr(hooks, "__call__"):
|
||||
hooks = [hooks]
|
||||
# Call each registered hook function
|
||||
for hook in hooks:
|
||||
_hook_data = hook(hook_data, **kwargs) # Call the user's function
|
||||
if _hook_data is not None:
|
||||
# If the hook returned something, update the data
|
||||
hook_data = _hook_data
|
||||
return hook_data # Return the (potentially modified) data
|
||||
|
||||
|
||||
# File: requests/sessions.py (Simplified view of Session.send)
|
||||
|
||||
from .hooks import dispatch_hook # Import the dispatcher
|
||||
|
||||
class Session:
|
||||
# ... (other methods: __init__, request, prepare_request, get_adapter) ...
|
||||
|
||||
def send(self, request, **kwargs):
|
||||
# ... (setup: kwargs, get adapter) ...
|
||||
|
||||
adapter = self.get_adapter(url=request.url)
|
||||
|
||||
# === ADAPTER SENDS THE REQUEST ===
|
||||
r = adapter.send(request, **kwargs) # Gets the Response object 'r'
|
||||
|
||||
# ... (calculate elapsed time) ...
|
||||
|
||||
# === DISPATCH THE 'RESPONSE' HOOK ===
|
||||
# request.hooks contains merged hooks from Request and Session
|
||||
r = dispatch_hook("response", request.hooks, r, **kwargs)
|
||||
|
||||
# === CONTINUE PROCESSING ===
|
||||
# Persist cookies from the (potentially modified) response 'r'
|
||||
extract_cookies_to_jar(self.cookies, request, r.raw)
|
||||
|
||||
# Handle redirects if allowed (using the potentially modified 'r')
|
||||
if kwargs.get('allow_redirects', True):
|
||||
# ... redirect logic using self.resolve_redirects ...
|
||||
# This might modify 'r' further if redirects occur
|
||||
pass
|
||||
else:
|
||||
# ... store potential next request for non-redirected responses ...
|
||||
pass
|
||||
|
||||
# ... (maybe consume content if stream=False) ...
|
||||
|
||||
return r # Return the final Response object
|
||||
|
||||
# File: requests/models.py (Simplified view of PreparedRequest)
|
||||
# Shows where hooks are stored initially
|
||||
|
||||
class RequestHooksMixin:
|
||||
# Mixin used by Request and PreparedRequest
|
||||
def register_hook(self, event, hook):
|
||||
# ... logic to add hook functions to self.hooks[event] list ...
|
||||
pass
|
||||
|
||||
class Request(RequestHooksMixin):
|
||||
def __init__(self, ..., hooks=None):
|
||||
# ...
|
||||
self.hooks = default_hooks() # Initialize hooks dict
|
||||
if hooks:
|
||||
for k, v in list(hooks.items()):
|
||||
self.register_hook(event=k, hook=v) # Register hooks passed in
|
||||
# ...
|
||||
|
||||
class PreparedRequest(..., RequestHooksMixin):
|
||||
def __init__(self):
|
||||
# ...
|
||||
self.hooks = default_hooks() # Hooks are also on PreparedRequest
|
||||
# ...
|
||||
|
||||
def prepare_hooks(self, hooks):
|
||||
# Called during prepare() to merge hooks from the original Request
|
||||
hooks = hooks or []
|
||||
for event in hooks:
|
||||
self.register_hook(event, hooks[event])
|
||||
|
||||
# Note: Session.prepare_request merges Request hooks and Session hooks
|
||||
# into the PreparedRequest.hooks dictionary.
|
||||
```
|
||||
|
||||
The `dispatch_hook` function is the core mechanism that allows `requests` to call your custom functions at the designated `"response"` checkpoint within `Session.send`.
|
||||
|
||||
## Conclusion
|
||||
|
||||
You've learned about the **Hook System** in `requests`, a way to register custom callback functions that run at specific points in the request-response lifecycle.
|
||||
|
||||
* You understood the motivation: automating actions like logging without cluttering your main code.
|
||||
* You focused on the primary hook: **`response`**, which runs after a response is received but before it's returned to the caller.
|
||||
* You saw how to define a hook function (accepting the `response` object) and register it on a `Session` (using `session.hooks`) to apply it globally, or potentially on a single `Request`.
|
||||
* You implemented a practical example: logging response details automatically.
|
||||
* You got a glimpse into how hooks *can* modify responses (use with care!).
|
||||
* You learned that internally, the `dispatch_hook` function is called by `Session.send` to execute your registered hook functions.
|
||||
|
||||
The Hook System provides a clean way to plug into the `requests` workflow and add custom behavior or monitoring without modifying the library itself.
|
||||
|
||||
This concludes our journey through the core abstractions of the `requests` library! From the simple [Functional API](01_functional_api.md) to the powerful [Session](03_session.md) object, managing [Cookies](04_cookie_jar.md), handling [Authentication](05_authentication_handlers.md), dealing with [Exceptions](06_exception_hierarchy.md), customizing connections with [Transport Adapters](07_transport_adapters.md), and reacting to events with the Hook System, you now have a solid foundation for using `requests` effectively in your Python projects. Happy requesting!
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
48
docs/Requests/index.md
Normal file
48
docs/Requests/index.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# Tutorial: Requests
|
||||
|
||||
Requests is a Python library that makes sending *HTTP requests* incredibly simple.
|
||||
Instead of dealing with complex details, you can use straightforward functions (like `requests.get()`) or **Session objects** to interact with web services.
|
||||
It automatically handles things like *cookies*, *redirects*, *authentication*, and connection pooling, returning easy-to-use **Response objects** with all the server's data.
|
||||
|
||||
|
||||
**Source Repository:** [https://github.com/psf/requests/tree/0e322af87745eff34caffe4df68456ebc20d9068/src/requests](https://github.com/psf/requests/tree/0e322af87745eff34caffe4df68456ebc20d9068/src/requests)
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A0["Request & Response Models"]
|
||||
A1["Session"]
|
||||
A2["Transport Adapters"]
|
||||
A3["Functional API"]
|
||||
A4["Authentication Handlers"]
|
||||
A5["Cookie Jar"]
|
||||
A6["Exception Hierarchy"]
|
||||
A7["Hook System"]
|
||||
A3 -- "Uses temporary" --> A1
|
||||
A1 -- "Prepares/Receives" --> A0
|
||||
A1 -- "Manages & Uses" --> A2
|
||||
A1 -- "Manages" --> A5
|
||||
A1 -- "Manages" --> A4
|
||||
A1 -- "Manages" --> A7
|
||||
A2 -- "Sends/Builds" --> A0
|
||||
A4 -- "Modifies (adds headers)" --> A0
|
||||
A5 -- "Populates/Reads" --> A0
|
||||
A7 -- "Operates on" --> A0
|
||||
A0 -- "Can Raise (raise_for_status)" --> A6
|
||||
A2 -- "Raises Connection Errors" --> A6
|
||||
```
|
||||
|
||||
## Chapters
|
||||
|
||||
1. [Functional API](01_functional_api.md)
|
||||
2. [Request & Response Models](02_request___response_models.md)
|
||||
3. [Session](03_session.md)
|
||||
4. [Cookie Jar](04_cookie_jar.md)
|
||||
5. [Authentication Handlers](05_authentication_handlers.md)
|
||||
6. [Exception Hierarchy](06_exception_hierarchy.md)
|
||||
7. [Transport Adapters](07_transport_adapters.md)
|
||||
8. [Hook System](08_hook_system.md)
|
||||
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
Reference in New Issue
Block a user