init push

This commit is contained in:
zachary62
2025-04-04 13:03:54 -04:00
parent e62ee2cb13
commit 2ebad5e5f2
160 changed files with 2 additions and 0 deletions

View File

@@ -0,0 +1,186 @@
# Chapter 1: The Simplest Way - The Functional API
Welcome to the world of `Requests`! If you need to get information from a website or interact with a web service using Python, `Requests` is your friendly helper.
Imagine you just want to quickly grab the content of a webpage, maybe check the latest news headlines from a site, or send a simple piece of data to an online service. How do you do that without getting bogged down in complex details?
That's where the **Functional API** of `Requests` comes in. It's the most straightforward way to start making web requests.
## What's the Functional API?
Think of the Functional API as a set of handy, ready-to-use tools right at the top level of the `requests` library. You don't need to set anything up; you just call a function like `requests.get()` to fetch data or `requests.post()` to send data.
**Analogy:** Ordering Takeout 🍕
Using the Functional API is like using a generic food delivery app (like DoorDash or Uber Eats) to order a pizza from a place you've never ordered from before.
1. You open the app ( `import requests`).
2. You find the pizza place and tap "Order" (`requests.get('pizza_place_url')`).
3. The app handles finding a driver, sending them to the restaurant, picking up the pizza, and delivering it to you (Requests does all the connection and fetching work).
4. You get your pizza (`Response` object).
It's super convenient for a one-time order!
## Making Your First Request: `requests.get()`
The most common type of request is a `GET` request. It's what your web browser does every time you type a website address and hit Enter. It means "Please *get* me the content of this page."
Let's try it! First, make sure you have `requests` installed (`pip install requests`). Then, in your Python script or interactive session:
```python
import requests # Import the library
# The URL we want to get data from
url = 'https://httpbin.org/get' # A handy website for testing requests
# Use the functional API 'get' function
print(f"Fetching data from: {url}")
response = requests.get(url)
# Check if the request was successful (Status Code 200 means OK)
print(f"Status Code: {response.status_code}")
# Print the first 200 characters of the content we received
print("Response Content (first 200 chars):")
print(response.text[:200])
```
**What happened here?**
1. `import requests`: We told Python we want to use the `requests` library.
2. `response = requests.get(url)`: This is the core magic! We called the `get` function directly from the `requests` module, passing the URL we want to visit.
3. `requests` did all the work: connected to the server, sent the `GET` request, and received the server's reply.
4. The reply is stored in the `response` variable. This isn't just the text of the page; it's a special `Response` object containing lots of useful information. We'll explore this more in [Request & Response Models](02_request___response_models.md).
5. `response.status_code`: We checked the status code. `200` is the standard code for "Everything went okay!". Other codes might indicate errors (like `404 Not Found`).
6. `response.text`: We accessed the main content (usually HTML or JSON) returned by the server as a string.
## Sending Data: `requests.post()`
Sometimes, instead of just getting data, you need to *send* data to a website. This is often done when submitting a form, logging in, or telling an API to perform an action. The `POST` method is commonly used for this.
The Functional API provides `requests.post()` for this purpose.
```python
import requests
# The URL we want to send data to
url = 'https://httpbin.org/post'
# The data we want to send (like form fields)
# We'll use a Python dictionary
payload = {'username': 'tutorial_user', 'action': 'learn_requests'}
print(f"Sending data to: {url}")
# Use the functional API 'post' function, passing the data
response = requests.post(url, data=payload)
# Check the status code
print(f"Status Code: {response.status_code}")
# The response often echoes back the data we sent
print("Response Content:")
print(response.text)
```
**What's new?**
1. `payload = {...}`: We created a Python dictionary to hold the data we want to send.
2. `response = requests.post(url, data=payload)`: We called `requests.post()`. Notice the second argument, `data=payload`. This tells `requests` to send our dictionary as form data in the body of the `POST` request.
3. The `response.text` from `httpbin.org/post` conveniently shows us the data it received, confirming our `payload` was sent correctly.
`Requests` also offers functions for other HTTP methods like `put`, `delete`, `head`, `patch`, and `options`, all working similarly: `requests.put(...)`, `requests.delete(...)`, etc.
## How It Works Under the Hood
You might wonder: if it's so simple, how does `requests.get()` actually connect to the internet and manage the request?
Every time you call one of these functional API methods (like `requests.get` or `requests.post`), `Requests` performs a few steps behind the scenes:
1. **Creates a temporary `Session` object:** Think of a `Session` as a more advanced way to manage requests, especially when you need to talk to the same website multiple times. We'll learn all about these in the [Session](03_session.md) chapter. For a functional API call, `requests` creates a *brand new, temporary* `Session` just for this single request.
2. **Uses the `Session`:** This temporary `Session` is then used to actually prepare and send your request (e.g., the `GET` to `https://httpbin.org/get`).
3. **Gets the `Response`:** The `Session` receives the reply from the server.
4. **Returns the `Response` to you:** The function gives you back the `Response` object.
5. **Discards the `Session`:** The temporary `Session` is immediately thrown away. It's gone.
**Analogy Revisited:** The generic delivery app (Functional API) contacts *a* driver (creates a temporary `Session`), tells them the restaurant and your order (sends the request), the driver delivers the food (returns the `Response`), and then the app forgets about that specific driver (discards the `Session`). If you order again 5 minutes later, it starts the whole process over with potentially a different driver.
Here's a simplified diagram of what happens when you call `requests.get()`:
```mermaid
sequenceDiagram
participant User as Your Code
participant FuncAPI as requests.get()
participant TempSession as Temporary Session
participant Server as Web Server
User->>FuncAPI: Call requests.get('url')
FuncAPI->>TempSession: Create new Session()
activate TempSession
TempSession->>Server: Make HTTP GET request to 'url'
activate Server
Server-->>TempSession: Send HTTP Response back
deactivate Server
TempSession-->>FuncAPI: Return Response object
FuncAPI-->>User: Return Response object
deactivate TempSession
Note right of FuncAPI: Temporary Session is discarded
```
You can see a glimpse of this in the `requests/api.py` code:
```python
# File: requests/api.py (Simplified view)
from . import sessions # Where the Session logic lives
def request(method, url, **kwargs):
"""Internal function that handles all functional API calls."""
# Creates a temporary Session just for this one call.
# The 'with' statement ensures it's properly closed afterwards.
with sessions.Session() as session:
# The temporary session makes the actual request.
return session.request(method=method, url=url, **kwargs)
def get(url, params=None, **kwargs):
"""Sends a GET request (functional API)."""
# This is just a convenient shortcut that calls the main 'request' function.
return request("get", url, params=params, **kwargs)
def post(url, data=None, json=None, **kwargs):
"""Sends a POST request (functional API)."""
# Another shortcut calling the main 'request' function.
return request("post", url, data=data, json=json, **kwargs)
# ... similar functions for put, delete, head, patch, options ...
```
Each function like `get`, `post`, etc., is just a simple wrapper that calls the main `request` function, which in turn creates and uses that temporary `Session`.
## When Is It Good? When Is It Not?
**Good For:**
* Simple, one-off requests.
* Quick scripts where performance isn't critical.
* Learning `Requests` - it's the easiest starting point!
**Not Ideal For:**
* **Multiple requests to the same website:** Creating and tearing down a connection and a `Session` for *every single request* is inefficient. It's like sending a separate delivery driver for each item you forgot from the grocery store.
* **Needing persistence:** If the website gives you a cookie (like after logging in) and you want to use it on your *next* request to that same site, the functional API won't remember it because the temporary `Session` (which holds cookies) is discarded after each call.
* **Fine-grained control:** If you need custom configurations, specific connection pooling, or advanced features, using a `Session` object directly offers more power.
## Conclusion
You've learned about the `Requests` Functional API the simplest way to make web requests using functions like `requests.get()` and `requests.post()`. It's perfect for quick tasks and getting started. You saw how it works by creating temporary `Session` objects behind the scenes.
While convenient for single shots, remember its limitations for performance and state persistence when dealing with multiple requests to the same site.
Now that you know how to *send* a basic request, what exactly do you get *back*? Let's explore the structure of the requests we send and the powerful `Response` object we receive.
**Next:** [Chapter 2: Request & Response Models](02_request___response_models.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,233 @@
# Chapter 2: What Happens When You Order? Request & Response Models
In [Chapter 1: The Simplest Way - The Functional API](01_functional_api.md), we saw how easy it is to fetch a webpage or send data using simple functions like `requests.get()` and `requests.post()`. We also noticed that these functions return something called a `Response` object.
But what exactly *is* that `Response` object? And what happens behind the scenes when `requests` sends your request? Just like ordering food involves more than just shouting your order and getting a meal, web requests have structured steps and data carriers. Understanding these helps you use `requests` more effectively.
## Why Models? The Need for Structure
Imagine ordering takeout again. You don't just tell the restaurant "food!"; you give them specific details: "One large pepperoni pizza, delivery to 123 Main St." The restaurant then prepares exactly that and delivers it back to you with a receipt.
Web requests work similarly. You need to tell the server:
* *What* you want (the URL, like `/get` or `/post`).
* *How* you want to interact (the method, like `GET` or `POST`).
* *Any extra details* (like headers or data you're sending).
The server then replies with:
* *If it worked* (a status code, like `200 OK` or `404 Not Found`).
* *Information about the reply* (headers, like the content type).
* *The actual stuff* you asked for (the content, like HTML or JSON).
`Requests` uses special Python objects to hold all this information in an organized way. These are the **Request and Response Models**.
## The Main Characters: Request, PreparedRequest, and Response
Think of the process like ordering at a restaurant:
1. **`Request` Object (Your Order Slip):** This is your initial intention. It holds the basic details of the request you *want* to make: the URL, the method (`GET`, `POST`, etc.), any headers you want to add, and any data you want to send. You usually don't create this object directly when using the simple functional API, but `requests` does it for you internally.
* *Analogy:* You write down "Large Pizza, Pepperoni, Extra Cheese" on an order slip.
2. **`PreparedRequest` Object (The Prepared Tray):** This is the finalized, ready-to-go version of your request. `Requests` takes the initial `Request` object, processes it (encodes data, applies cookies, adds default headers like `User-Agent`), and gets it ready to be sent over the network. It contains the *exact* bytes and final details. This is mostly an internal step.
* *Analogy:* The kitchen takes your slip, makes the pizza, puts it in a box, adds napkins and maybe a drink, and puts it all on a tray ready for the delivery driver.
3. **`Response` Object (The Delivered Meal):** This object represents the server's reply *after* the `PreparedRequest` has been sent and the server has responded. It contains everything the server sent back: the status code (Did the order succeed?), the response headers (What kind of food is this? How was it packaged?), and the actual content (The pizza itself!). This is the object you usually work with directly.
* *Analogy:* The delivery driver hands you the tray with the pizza and receipt. You check the receipt (`status_code`, `headers`) and eat the pizza (`content`).
Most of the time, you'll interact primarily with the `Response` object. But knowing about `Request` and `PreparedRequest` helps understand what `requests` is doing for you.
## Working with the `Response` Object
Let's revisit our `requests.get()` example from Chapter 1 and see what useful things are inside the `response` object we get back.
```python
import requests
url = 'https://httpbin.org/get'
print(f"Fetching data from: {url}")
response = requests.get(url)
# --- Exploring the Response Object ---
# 1. Status Code: Was it successful?
print(f"\nStatus Code: {response.status_code}") # A number like 200 (OK) or 404 (Not Found)
print(f"Was it successful (status < 400)? {response.ok}") # A boolean True/False
# 2. Response Headers: Information *about* the response
print(f"\nResponse Headers (Content-Type): {response.headers['Content-Type']}")
# Headers are like a dictionary (Case-Insensitive)
print("All Headers:")
for key, value in response.headers.items():
print(f" {key}: {value}")
# 3. Response Content (Body): The actual data!
# - As text (decoded using guessed encoding):
print("\nResponse Text (first 100 chars):")
print(response.text[:100])
# - As raw bytes (useful for non-text like images):
print("\nResponse Content (bytes, first 20):")
print(response.content[:20])
# 4. JSON Helper: If the content is JSON
json_url = 'https://httpbin.org/json'
print(f"\nFetching JSON from: {json_url}")
json_response = requests.get(json_url)
if json_response.ok and 'application/json' in json_response.headers.get('Content-Type', ''):
try:
data = json_response.json() # Decodes JSON into a Python dict/list
print("Decoded JSON data:")
print(data)
print(f"Value of 'title': {data['slideshow']['title']}")
except requests.exceptions.JSONDecodeError:
print("Response was not valid JSON.")
```
**What we learned from the `Response`:**
1. **`response.status_code`**: A standard HTTP status code number. `200` means "OK". `404` means "Not Found". Many others exist.
2. **`response.ok`**: A quick boolean check. `True` if the status code is less than 400 (meaning success or redirect), `False` for errors (4xx or 5xx codes).
3. **`response.headers`**: A dictionary-like object holding the response headers sent by the server (like `Content-Type`, `Date`, `Server`). It's case-insensitive, so `response.headers['content-type']` works too.
4. **`response.text`**: The response body decoded into a string. `Requests` tries to guess the correct text encoding based on headers, or falls back to a guess based on the content itself. Good for HTML, plain text, etc.
5. **`response.content`**: The response body as raw bytes, exactly as received from the server. Use this for images, downloads, or when you need precise control over decoding.
6. **`response.json()`**: A convenient method that tries to parse the `response.text` as JSON and returns a Python dictionary or list. It raises an error if the content isn't valid JSON.
The `Response` object neatly packages all the server's reply information for you to use.
## How It Works Internally: From Request to Response
When you call `requests.get(url)`, the following happens under the hood (simplified):
1. **Create `Request`:** `Requests` creates a `Request` object containing the method (`'GET'`), the `url`, and any other arguments you provided (like `headers` or `params`). (See `requests/sessions.py` `request` method which creates a `models.Request`)
2. **Prepare `Request`:** This `Request` object is then passed to a preparation step. Here, it becomes a `PreparedRequest`. This involves:
* Merging session-level settings (like default headers or cookies from a [Session](03_session.md), which the functional API uses temporarily).
* Encoding parameters (`params`).
* Encoding the body (`data` or `json`).
* Handling authentication (`auth`).
* Adding standard headers (like `User-Agent`, `Accept-Encoding`).
* Resolving the final URL.
(See `requests/sessions.py` `prepare_request` method which calls `PreparedRequest.prepare` in `requests/models.py`)
3. **Send `PreparedRequest`:** The `PreparedRequest`, now containing the exact bytes and headers, is handed off to a **Transport Adapter** (we'll cover these in [Transport Adapters](07_transport_adapters.md)). The adapter handles the actual network communication (opening connections, sending bytes, dealing with HTTP/HTTPS specifics). (See `requests/sessions.py` `send` method which calls `adapter.send` in `requests/adapters.py`)
4. **Receive Reply:** The Transport Adapter waits for the server's reply (status line, headers, body).
5. **Build `Response`:** The adapter takes the raw reply data and uses it to build the `Response` object you receive. It parses the status code, headers, and makes the raw content available. (See `requests/adapters.py` `build_response` method which creates a `models.Response`)
6. **Return `Response`:** The `send` method returns the fully formed `Response` object back to your code.
Here's a diagram showing the journey:
```mermaid
sequenceDiagram
participant UserCode as Your Code (e.g., requests.get)
participant Session as requests Session (Temporary or Explicit)
participant PrepReq as PreparedRequest
participant Adapter as Transport Adapter
participant Server as Web Server
participant Resp as Response
UserCode->>Session: Call get(url) / post(url, data=...)
Session->>Session: Create models.Request object
Session->>PrepReq: prepare_request(request) -> PreparedRequest
Note over PrepReq: Encodes data, adds headers, cookies etc.
Session->>Adapter: send(prepared_request)
Adapter->>Server: Send HTTP Request bytes
Server-->>Adapter: Send HTTP Response bytes
Adapter->>Resp: build_response(raw_reply) -> Response
Resp-->>Adapter: Return Response
Adapter-->>Session: Return Response
Session-->>UserCode: Return Response
```
You can see the definitions for these objects in `requests/models.py`:
```python
# File: requests/models.py (Highly Simplified)
class Request:
"""A user-created Request object. Used to prepare a PreparedRequest."""
def __init__(self, method=None, url=None, headers=None, files=None,
data=None, params=None, auth=None, cookies=None, hooks=None, json=None):
self.method = method
self.url = url
# ... other attributes ...
def prepare(self):
"""Constructs a PreparedRequest for transmission."""
p = PreparedRequest()
p.prepare(
method=self.method,
url=self.url,
# ... pass other attributes ...
)
return p
class PreparedRequest:
"""The fully mutable PreparedRequest object, containing the exact bytes
that will be sent to the server."""
def __init__(self):
self.method = None
self.url = None
self.headers = None
self.body = None
# ... other attributes ...
def prepare(self, method=None, url=None, headers=None, files=None, data=None,
params=None, auth=None, cookies=None, hooks=None, json=None):
"""Prepares the entire request."""
# ... Logic to encode data, set headers, handle auth, etc. ...
self.method = method
self.url = # processed url
self.headers = # final headers
self.body = # encoded body bytes or stream
# ...
class Response:
"""Contains a server's response to an HTTP request."""
def __init__(self):
self._content = False # Content hasn't been read yet
self.status_code = None
self.headers = CaseInsensitiveDict() # Special dictionary for headers
self.raw = None # The raw stream from the network connection
self.url = None
self.encoding = None
self.history = [] # List of redirects
self.reason = None # Text reason, e.g., "OK"
self.cookies = cookiejar_from_dict({})
self.elapsed = datetime.timedelta(0) # Time taken
self.request = None # The PreparedRequest that led to this response
@property
def content(self):
"""Content of the response, in bytes."""
# ... logic to read from self.raw if not already read ...
return self._content
@property
def text(self):
"""Content of the response, in unicode."""
# ... logic to decode self.content using self.encoding or guessed encoding ...
return decoded_string
def json(self, **kwargs):
"""Returns the json-encoded content of a response, if any."""
# ... logic to parse self.text as JSON ...
return python_object
# ... other properties like .ok, .is_redirect, and methods like .raise_for_status() ...
```
Understanding these models gives you a clearer picture of how `requests` turns your simple function call into a network operation and packages the result neatly for you.
## Conclusion
You've learned about the core data carriers in `Requests`:
* `Request`: Your initial intent.
* `PreparedRequest`: The finalized request ready for sending.
* `Response`: The server's reply, containing status, headers, and content.
While you mostly interact with the `Response` object after making a request, knowing about the `Request` and `PreparedRequest` helps demystify the process. You saw how to access useful attributes of the `Response` like `status_code`, `headers`, `text`, `content`, and the handy `json()` method.
In Chapter 1, we noted that the functional API creates a temporary setup for each request. This is simple but inefficient if you need to talk to the same website multiple times, perhaps needing to maintain login status or custom settings. How can we do that better?
**Next:** [Chapter 3: Remembering Things - The Session Object](03_session.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

327
docs/Requests/03_session.md Normal file
View File

@@ -0,0 +1,327 @@
# Chapter 3: Remembering Things - The Session Object
In [Chapter 1](01_functional_api.md), we learned the easiest way to make web requests using functions like `requests.get()`. In [Chapter 2](02_request___response_models.md), we looked at the `Request` and `Response` objects that structure our communication with web servers.
We also saw that the simple functional API methods like `requests.get()` are great for single, one-off requests. But what if you need to talk to the *same website* multiple times? For example, maybe you need to:
1. Log in to a website (which gives you a "session cookie" to prove you're logged in).
2. Make several requests to access different pages that *require* you to be logged in (using that cookie).
If you use `requests.get()` for each step, you'll have a problem. Remember how `requests.get()` creates a *temporary* setup for each call and then throws it away? This means it forgets the login cookie immediately after the login request! Your next request will be like visiting the site as a brand new, logged-out user.
How can we make `Requests` remember things between requests, just like your web browser does when you navigate around a logged-in site?
## Meet the `Session` Object: Your Persistent Browser Tab
This is where the `requests.Session` object comes in!
Think of a `Session` object as a dedicated browser tab you've opened just for interacting with a specific website or web service. What does a browser tab do?
* **Remembers Cookies:** If you log in on a website in one tab, that tab remembers your login cookie. When you click a link *within that same tab*, the browser automatically sends the cookie back, keeping you logged in.
* **Keeps Connections Warm:** Your browser often keeps the underlying network connection (TCP connection) to the website open for a little while. This makes clicking links and loading subsequent pages much faster because it doesn't have to establish a new connection every single time. This is called **connection pooling**.
* **Applies Consistent Settings:** You might have browser extensions that add specific headers to your requests, or your browser sends a consistent "User-Agent" string identifying itself.
A `requests.Session` object does all of these things for your Python script:
1. **Cookie Persistence:** It automatically stores cookies sent by the server and sends them back on subsequent requests to the same domain.
2. **Connection Pooling:** It reuses the underlying TCP connections for requests to the same host, significantly speeding up multiple requests. This is managed by components called [Transport Adapters](07_transport_adapters.md).
3. **Default Data:** You can set default headers, authentication details, query parameters, or proxy settings directly on the `Session` object, and they will be applied to all requests made through that session.
## Using a `Session`
Using a `Session` is almost as easy as using the functional API. Instead of calling `requests.get()`, you first create a `Session` object, and then call methods like `get()` or `post()` on *that object*.
```python
import requests
# 1. Create a Session object
s = requests.Session()
# Let's try accessing a page that requires a login (we're not logged in yet)
login_required_url = 'https://httpbin.org/cookies' # This page shows cookies sent to it
print("Trying to access protected page without login...")
response1 = s.get(login_required_url)
print("Cookies sent (should be none):", response1.json()) # httpbin returns JSON
# Now, let's simulate 'logging in' by visiting a page that sets a cookie
cookie_setter_url = 'https://httpbin.org/cookies/set/sessioncookie/123456789'
print("\nSimulating login by getting a cookie...")
response2 = s.get(cookie_setter_url)
# The session automatically stored the cookie! Check the session's cookie jar:
print("Session cookies after setting:", s.cookies.get_dict())
# Now, try accessing the 'protected' page again using the SAME session
print("\nTrying to access protected page AGAIN with the session...")
response3 = s.get(login_required_url)
print("Cookies sent (should have sessioncookie):", response3.json())
# Compare with using the functional API (which forgets cookies)
print("\nTrying the same with functional API (will fail)...")
response4 = requests.get(cookie_setter_url) # Gets cookie, but immediately forgets
response5 = requests.get(login_required_url)
print("Cookies sent via functional API (should be none):", response5.json())
```
**What happened here?**
1. `s = requests.Session()`: We created our "persistent browser tab".
2. `response1 = s.get(login_required_url)`: Our first request sent no cookies, as expected.
3. `response2 = s.get(cookie_setter_url)`: We visited a URL designed to send back a `Set-Cookie` header. The `Session` object automatically noticed this and stored the `sessioncookie` in its internal [Cookie Jar](04_cookie_jar.md).
4. `s.cookies.get_dict()`: We peeked inside the session's cookie storage and saw the cookie was indeed saved.
5. `response3 = s.get(login_required_url)`: We made *another* request using the *same* session `s`. This time, the session automatically included the `sessioncookie` in the request headers. The server received it!
6. The last part shows that if we used `requests.get()` instead, the cookie from `response4` would be lost, and `response5` would fail to send it. The `Session` was crucial for remembering the cookie.
## Persistent Settings: Headers, Auth, etc.
Besides cookies, you can set other things on the `Session` that will apply to all its requests.
```python
import requests
import os # To get environment variables for auth example
s = requests.Session()
# Set a default header for all requests made by this session
s.headers.update({'X-My-Custom-Header': 'HelloSession'})
# Set default authentication (using basic auth from environment variables for example)
# NOTE: Replace with actual username/password or use httpbin's basic-auth endpoint
# For httpbin, the user/pass is 'user'/'pass'
# s.auth = ('user', 'passwd') # Set directly if needed
httpbin_user = os.environ.get("HTTPBIN_USER", "testuser") # Fake user if not set
httpbin_pass = os.environ.get("HTTPBIN_PASS", "testpass") # Fake pass if not set
s.auth = (httpbin_user, httpbin_pass)
# Set default query parameters
s.params.update({'session_param': 'persistent'})
# Now make a request
url = 'https://httpbin.org/get' # Changed endpoint to see params
print(f"Making request with persistent session settings to: {url}")
response = s.get(url)
print(f"\nStatus Code: {response.status_code}")
# Check the response (httpbin.org/get echoes back request details)
response_data = response.json()
print("\nHeaders sent (look for X-My-Custom-Header):")
print(response_data['headers'])
# print("\nAuth info sent (if using httpbin basic-auth):")
# print(response_data.get('authenticated'), response_data.get('user')) # Won't show here for /get
print("\nQuery parameters sent (look for session_param):")
print(response_data['args'])
# Make another request to a different endpoint using the same session
headers_url = 'https://httpbin.org/headers'
print(f"\nMaking request to {headers_url}...")
response_headers = s.get(headers_url)
print("Headers received by second request (still has custom header):")
print(response_headers.json()['headers'])
```
**What we see:**
* The `X-My-Custom-Header` we set on `s.headers` was automatically added to both requests.
* The `session_param` we added to `s.params` was included in the query string of the first request.
* If we had used a real authentication endpoint, the `s.auth` details would have been used automatically.
* We didn't have to specify these details on each `s.get()` call! The `Session` handled it.
## Using Sessions with `with` (Context Manager)
Sessions manage resources like network connections. It's good practice to explicitly close them when you're done. The easiest way to ensure this happens is to use the `Session` as a context manager with the `with` statement.
```python
import requests
url = 'https://httpbin.org/cookies'
# Use the Session as a context manager
with requests.Session() as s:
s.get('https://httpbin.org/cookies/set/contextcookie/abc')
response = s.get(url)
print("Cookies sent within 'with' block:", response.json())
# After the 'with' block, the session 's' is automatically closed.
# Making a request now might fail or use a new connection pool if s was reused (not recommended)
# print("\nTrying to use session after 'with' block (might not work as expected)...")
# try:
# response_after = s.get(url)
# print(response_after.text)
# except Exception as e:
# print(f"Error using session after close: {e}")
print("\nSession automatically closed after 'with' block.")
```
The `with` statement ensures that `s.close()` is called automatically at the end of the block, even if errors occur. This cleans up the underlying connections managed by the [Transport Adapters](07_transport_adapters.md).
## How It Works Internally
So, how does the `Session` actually achieve this persistence and efficiency?
1. **State Storage:** The `Session` object itself holds onto configuration like `headers`, `cookies` (in a [Cookie Jar](04_cookie_jar.md)), `auth`, `params`, etc.
2. **Request Preparation:** When you call a method like `s.get(url, headers=...)`, the `Session` takes your request details *and* its own stored settings and merges them together. It uses these merged settings to create the `PreparedRequest` object we saw in [Chapter 2](02_request___response_models.md). Session cookies and headers get added automatically during this step (`Session.prepare_request`).
3. **Transport Adapters & Pooling:** The `Session` doesn't directly handle network sockets. It delegates the sending of the `PreparedRequest` to a suitable **Transport Adapter** (usually `HTTPAdapter` for HTTP/HTTPS). Each `Session` typically keeps instances of these adapters. The *adapter* is responsible for managing the pool of underlying network connections (`urllib3`'s connection pool). When you make a request to `https://example.com`, the adapter checks if it already has an open, reusable connection to that host in its pool. If yes, it uses it (much faster!). If not, it creates a new one and potentially adds it to the pool for future reuse.
4. **Response Processing:** When the adapter receives the response, it builds the `Response` object. The `Session` then gets the `Response` back from the adapter. Crucially, it inspects the response headers (like `Set-Cookie`) and updates its own state (e.g., adds new cookies to its `Cookie Jar`).
Here's a simplified diagram showing two requests using a `Session`:
```mermaid
sequenceDiagram
participant User as Your Code
participant Sess as Session Object
participant PrepReq as PreparedRequest
participant Adapter as Transport Adapter (holds connection pool)
participant Server as Web Server
User->>Sess: Create Session()
User->>Sess: s.get(url1, headers={'User-Header': 'A'})
Sess->>Sess: Merge s.headers, s.cookies, s.auth... with User's headers/data
Sess->>PrepReq: prepare_request(merged_settings)
Sess->>Adapter: send(prepared_request)
Adapter->>Adapter: Get connection from pool (or create new)
Adapter->>Server: Send HTTP Request 1 (with session+user headers, session cookies)
Server-->>Adapter: Send HTTP Response 1 (sets cookie 'C')
Adapter->>Sess: Return Response 1
Sess->>Sess: Extract cookie 'C' into s.cookies
Sess-->>User: Return Response 1
User->>Sess: s.get(url2)
Sess->>Sess: Merge s.headers, s.cookies ('C'), s.auth...
Sess->>PrepReq: prepare_request(merged_settings)
Sess->>Adapter: send(prepared_request)
Adapter->>Adapter: Get REUSED connection from pool
Adapter->>Server: Send HTTP Request 2 (with session headers, cookie 'C')
Server-->>Adapter: Send HTTP Response 2
Adapter->>Sess: Return Response 2
Sess-->>User: Return Response 2
```
You can see the core logic in `requests/sessions.py`. The `Session.request` method orchestrates the process:
```python
# File: requests/sessions.py (Simplified View)
# [...] imports and helper functions
class Session(SessionRedirectMixin):
def __init__(self):
# Stores persistent headers, cookies, auth, etc.
self.headers = default_headers()
self.cookies = cookiejar_from_dict({})
self.auth = None
self.params = {}
# [...] other defaults like verify, proxies, max_redirects
self.adapters = OrderedDict() # Holds Transport Adapters
self.mount('https://', HTTPAdapter()) # Default adapter for HTTPS
self.mount('http://', HTTPAdapter()) # Default adapter for HTTP
def prepare_request(self, request):
"""Prepares a Request object with Session settings."""
p = PreparedRequest()
# MERGE session settings with request settings
merged_cookies = merge_cookies(RequestsCookieJar(), self.cookies)
if request.cookies:
merged_cookies = merge_cookies(merged_cookies, cookiejar_from_dict(request.cookies))
merged_headers = merge_setting(request.headers, self.headers, dict_class=CaseInsensitiveDict)
merged_params = merge_setting(request.params, self.params)
merged_auth = merge_setting(request.auth, self.auth)
# [...] merge other settings like hooks
p.prepare(
method=request.method.upper(),
url=request.url,
headers=merged_headers,
files=request.files,
data=request.data,
json=request.json,
params=merged_params,
auth=merged_auth,
cookies=merged_cookies, # Pass merged cookies to PreparedRequest
hooks=merge_hooks(request.hooks, self.hooks),
)
return p
def request(self, method, url, **kwargs):
"""Constructs a Request, prepares it, sends it."""
# Create the initial Request object from user args
req = Request(method=method.upper(), url=url, **kwargs) # Simplified
# Prepare the request, merging session state
prep = self.prepare_request(req)
# Get environment settings (proxies, verify, cert) merged with session settings
proxies = kwargs.get('proxies') or {}
settings = self.merge_environment_settings(prep.url, proxies,
kwargs.get('stream'),
kwargs.get('verify'),
kwargs.get('cert'))
send_kwargs = {'timeout': kwargs.get('timeout'),
'allow_redirects': kwargs.get('allow_redirects', True)}
send_kwargs.update(settings)
# Send the prepared request using the appropriate adapter
resp = self.send(prep, **send_kwargs)
return resp
def send(self, request, **kwargs):
"""Sends a PreparedRequest object."""
# [...] set default kwargs if needed
# Get the right adapter (e.g., HTTPAdapter) based on URL
adapter = self.get_adapter(url=request.url)
# The adapter sends the request (using connection pooling)
r = adapter.send(request, **kwargs)
# [...] response hook processing
# IMPORTANT: Extract cookies from the response and store them in the session's cookie jar
extract_cookies_to_jar(self.cookies, request, r.raw)
# [...] redirect handling (which also extracts cookies)
return r
def get_adapter(self, url):
"""Finds the Transport Adapter for the URL (e.g., HTTPAdapter)."""
# ... loops through self.adapters ...
# Simplified: return self.adapters['http://'] or self.adapters['https://']
for prefix, adapter in self.adapters.items():
if url.lower().startswith(prefix.lower()):
return adapter
raise InvalidSchema(f"No connection adapters were found for {url!r}")
def mount(self, prefix, adapter):
"""Attaches a Transport Adapter to handle URLs starting with 'prefix'."""
self.adapters[prefix] = adapter
# [...] sort adapters by prefix length
def close(self):
"""Closes the session and all its adapters (and connections)."""
for adapter in self.adapters.values():
adapter.close()
# [...] other methods like get(), post(), put(), delete() which call self.request()
# [...] redirect handling logic in SessionRedirectMixin
```
The key takeaways are:
* The `Session` object holds the state (`headers`, `cookies`, `auth`).
* `prepare_request` merges this state with the details of the specific request you're making.
* `send` uses a `Transport Adapter` (like `HTTPAdapter`) which handles the actual network communication and connection pooling.
* After a response is received, `send` (and the redirection logic) updates the `Session`'s cookies.
## Conclusion
You've learned about the `requests.Session` object, a powerful tool for making multiple requests to the same host efficiently. You saw how it automatically handles **cookie persistence** and provides significant performance benefits through **connection pooling** (via [Transport Adapters](07_transport_adapters.md)). You also learned how to set persistent `headers`, `auth`, and other settings on a session. Using a `Session` is the recommended approach when your script needs to interact with a website more than once.
We mentioned that the `Session` stores cookies in a "Cookie Jar". What exactly is that, and can we interact with it more directly? Let's find out.
**Next:** [Chapter 4: The Cookie Jar](04_cookie_jar.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,365 @@
# Chapter 4: The Cookie Jar - Remembering Website Visits
In [Chapter 3: Remembering Things - The Session Object](03_session.md), we saw how `Session` objects are super useful for making multiple requests to the same website. A big reason they work so well is that they automatically remember **cookies** sent by the server, just like your web browser does.
But *how* does a `Session` remember these cookies? Where does it keep them? Welcome to the **Cookie Jar**!
## What's the Problem? Staying Logged In
Imagine you log in to a website. The website usually sends back a special piece of information called a **cookie**. This cookie is like a temporary ID card. When you visit other pages on that *same* website, your browser automatically shows this ID card (sends the cookie back) so the website knows you're still logged in.
If you used the simple `requests.get()` function from [Chapter 1](01_functional_api.md) for each step, it would forget the ID card immediately after logging in. Your next request would be treated as if you were a stranger.
`Session` objects solve this by using a **Cookie Jar** to hold onto those ID cards (cookies) for you.
## What are Cookies (Briefly)?
Think of cookies as little notes or name tags that websites give to your browser (or your `requests` script).
* **Website:** "Hi, you just logged in. Here's a name tag that says 'User123'." (Sends a `Set-Cookie` header)
* **Your Browser / Session:** "Okay, I'll keep this 'User123' tag." (Stores the cookie)
* **You:** (Click on another page on the same website)
* **Your Browser / Session:** "Hi website, I'd like this page. By the way, here's my name tag: 'User123'." (Sends a `Cookie` header)
* **Website:** "Ah, User123, I remember you. Here's the page you asked for."
Cookies are used to remember login status, user preferences, items in a shopping cart, etc., between different page visits.
## The Cookie Jar Analogy 🍪
`Requests` uses an object called a `RequestsCookieJar` to store and manage cookies. It's very much like the cookie jar you might have in your kitchen:
1. **Collects Cookies:** When a website sends you a cookie (like after you log in), the `Session` automatically puts it into its `Cookie Jar`.
2. **Stores Them Safely:** The jar keeps all the cookies collected from different websites (domains).
3. **Sends the Right Ones Back:** When you make *another* request to a website using the *same* `Session`, the `Session` looks into the `Cookie Jar`, finds any cookies that belong to that website's domain, and automatically sends them back.
This happens seamlessly when you use a `Session` object.
## Meet `RequestsCookieJar`
The specific object `requests` uses is `requests.cookies.RequestsCookieJar`. It's designed to work just like Python's standard `http.cookiejar.CookieJar` but adds some convenient features, like acting like a dictionary.
Every `Session` object has its own `Cookie Jar` accessible via the `s.cookies` attribute.
Let's see it in action, revisiting the example from Chapter 3:
```python
import requests
# Create a Session object (which has its own empty Cookie Jar)
s = requests.Session()
print(f"Initial session cookies: {s.cookies.get_dict()}")
# Visit a page that sets a cookie
cookie_setter_url = 'https://httpbin.org/cookies/set/fruit/apple'
print(f"\nVisiting {cookie_setter_url}...")
response1 = s.get(cookie_setter_url)
# Check the Session's Cookie Jar - it should have the cookie now!
print(f"Session cookies after setting: {s.cookies.get_dict()}")
# Visit another page on the same domain (httpbin.org)
cookie_viewer_url = 'https://httpbin.org/cookies'
print(f"\nVisiting {cookie_viewer_url}...")
response2 = s.get(cookie_viewer_url)
# This page shows the cookies it received. Let's see if our 'fruit' cookie was sent.
print("Cookies received by the server:")
print(response2.text) # httpbin.org/cookies returns JSON showing received cookies
```
**Output:**
```
Initial session cookies: {}
Visiting https://httpbin.org/cookies/set/fruit/apple...
Session cookies after setting: {'fruit': 'apple'}
Visiting https://httpbin.org/cookies...
Cookies received by the server:
{
"cookies": {
"fruit": "apple"
}
}
```
**Explanation:**
1. We started with an empty `Session` and an empty cookie jar (`{}`).
2. We visited `/cookies/set/fruit/apple`. The server sent back a `Set-Cookie: fruit=apple; Path=/` header.
3. The `Session` object `s` automatically saw this header and stored the `fruit=apple` cookie in its jar (`s.cookies`). We confirmed this by printing `s.cookies.get_dict()`.
4. We then visited `/cookies` using the *same session* `s`.
5. The `Session` automatically looked in `s.cookies`, found the `fruit` cookie (since it's for the `httpbin.org` domain), and added a `Cookie: fruit=apple` header to the request.
6. The server at `/cookies` received this header and echoed it back, confirming our cookie was sent!
The `Session` and its `Cookie Jar` handled the persistence automatically.
## Cookies in the Response
While the `Session` cookie jar (`s.cookies`) holds *all* cookies collected during the session's lifetime, the [Request & Response Models](02_request___response_models.md) also have a `cookies` attribute.
The `response.cookies` attribute (also a `RequestsCookieJar`) contains *only* the cookies that were set or updated by *that specific response*. It doesn't know about cookies from previous responses in the session.
```python
import requests
s = requests.Session()
url_set_a = 'https://httpbin.org/cookies/set/cookieA/valueA'
url_set_b = 'https://httpbin.org/cookies/set/cookieB/valueB'
print(f"Visiting {url_set_a}")
response_a = s.get(url_set_a)
print(f"Cookies SET by response A: {response_a.cookies.get_dict()}")
print(f"ALL session cookies after A: {s.cookies.get_dict()}")
print(f"\nVisiting {url_set_b}")
response_b = s.get(url_set_b)
print(f"Cookies SET by response B: {response_b.cookies.get_dict()}")
print(f"ALL session cookies after B: {s.cookies.get_dict()}")
```
**Output:**
```
Visiting https://httpbin.org/cookies/set/cookieA/valueA
Cookies SET by response A: {'cookieA': 'valueA'}
ALL session cookies after A: {'cookieA': 'valueA'}
Visiting https://httpbin.org/cookies/set/cookieB/valueB
Cookies SET by response B: {'cookieB': 'valueB'}
ALL session cookies after B: {'cookieA': 'valueA', 'cookieB': 'valueB'}
```
**Explanation:**
* `response_a.cookies` only contains `cookieA`, because that's the cookie set by *that specific response*.
* `s.cookies` contains `cookieA` after the first request.
* `response_b.cookies` only contains `cookieB`.
* `s.cookies` contains *both* `cookieA` and `cookieB` after the second request, because the `Session` accumulates cookies.
## Using the Cookie Jar Like a Dictionary
The `RequestsCookieJar` is extra friendly because you can treat it much like a Python dictionary to access or modify cookies directly.
```python
import requests
jar = requests.cookies.RequestsCookieJar()
# Set cookies using dictionary-like assignment or set()
jar.set('username', 'Nate', domain='httpbin.org', path='/')
jar['session_id'] = 'abcdef123' # Sets for default domain/path ('')
print(f"Jar contents: {jar.get_dict()}")
# Get cookies using dictionary-like access or get()
print(f"Username: {jar['username']}")
print(f"Session ID: {jar.get('session_id')}")
print(f"API Key (default None): {jar.get('api_key', default='NoKey')}")
# Iterate over cookies
print("\nIterating:")
for name, value in jar.items():
print(f" - {name}: {value}")
# Delete a cookie
del jar['session_id']
print(f"\nJar after deleting session_id: {jar.get_dict()}")
```
**Output:**
```
Jar contents: {'session_id': 'abcdef123', 'username': 'Nate'}
Username: Nate
Session ID: abcdef123
API Key (default None): NoKey
Iterating:
- session_id: abcdef123
- username: Nate
Jar after deleting session_id: {'username': 'Nate'}
```
This makes it easy to manually inspect, add, or modify cookies if needed, although the `Session` usually handles the common cases automatically.
**Important Note:** Cookies often have specific `domain` and `path` attributes. If you have multiple cookies with the *same name* but for different domains or paths (e.g., `user=A` for `site1.com` and `user=B` for `site2.com`), using the simple dictionary access `jar['user']` might be ambiguous or raise an error. In such cases, use the `get()` or `set()` methods with the `domain` and `path` arguments for more precision:
```python
jar.set('pref', 'dark', domain='example.com', path='/')
jar.set('pref', 'compact', domain='test.com', path='/')
# Get the specific cookie for example.com
pref_example = jar.get('pref', domain='example.com', path='/')
print(f"Pref for example.com: {pref_example}")
# Simple access might be ambiguous or pick one arbitrarily
# print(jar['pref']) # Could raise CookieConflictError or return one
```
## How It Works Internally
How does the `Session` manage this cookie magic?
1. **Sending Request:** When you call `s.get(...)` or `s.post(...)`, the `Session.prepare_request` method is called.
* It creates a `PreparedRequest` object.
* It merges cookies from your request (`cookies=...`), the session (`self.cookies`), and potentially environment settings.
* It calls `get_cookie_header(merged_cookies, prepared_request)` (from `requests.cookies`). This function checks the cookie jar for cookies that match the request's domain and path.
* It generates the `Cookie` header string (e.g., `Cookie: fruit=apple; username=Nate`) and adds it to the `PreparedRequest.headers`.
* The request (with the `Cookie` header) is then sent via a [Transport Adapter](07_transport_adapters.md).
2. **Receiving Response:** When the [Transport Adapter](07_transport_adapters.md) receives the raw HTTP response from the server:
* It builds the `Response` object.
* The `Session.send` method (or redirection logic) gets this `Response`.
* It calls `extract_cookies_to_jar(self.cookies, request, response.raw)` (from `requests.cookies`). This function looks for `Set-Cookie` headers in the raw response.
* It parses any `Set-Cookie` headers and adds/updates the corresponding cookies in the `Session`'s cookie jar (`self.cookies`).
* The final `Response` object is returned to you.
Here's a simplified diagram focusing on the cookie flow:
```mermaid
sequenceDiagram
participant User as Your Code
participant Sess as Session Object
participant Jar as Cookie Jar (s.cookies)
participant Adapter as Transport Adapter
participant Server as Web Server
User->>Sess: s.get(url)
Sess->>Jar: get_cookie_header(url)
Jar-->>Sess: Return matching cookie header string (e.g., "fruit=apple")
Sess->>Adapter: send(request with 'Cookie' header)
Adapter->>Server: Send HTTP Request (with Cookie: fruit=apple)
Server-->>Adapter: Send HTTP Response (e.g., with Set-Cookie: new=cookie)
Adapter->>Sess: Return raw response
Sess->>Jar: extract_cookies_to_jar(raw response)
Jar->>Jar: Add/Update 'new=cookie'
Sess->>User: Return Response object
```
You can see parts of this logic in `requests/sessions.py` and `requests/cookies.py`:
```python
# File: requests/sessions.py (Simplified View)
from .cookies import extract_cookies_to_jar, merge_cookies, RequestsCookieJar, cookiejar_from_dict
from .models import PreparedRequest
from .utils import to_key_val_list
from .structures import CaseInsensitiveDict
class Session:
def __init__(self):
# ... other attributes ...
self.cookies = cookiejar_from_dict({}) # The Session's main Cookie Jar
def prepare_request(self, request):
# ... merge headers, params, auth ...
# Merge session cookies with request-specific cookies
merged_cookies = merge_cookies(
merge_cookies(RequestsCookieJar(), self.cookies),
cookiejar_from_dict(request.cookies or {})
)
p = PreparedRequest()
p.prepare(
# ... other args ...
cookies=merged_cookies, # Pass merged jar to PreparedRequest
)
return p
def send(self, request, **kwargs):
# ... prepare sending ...
adapter = self.get_adapter(url=request.url)
response = adapter.send(request, **kwargs) # Adapter gets raw response
# ... hooks ...
# EXTRACT cookies from the response and put them in the session jar!
extract_cookies_to_jar(self.cookies, request, response.raw)
# ... redirect handling (also extracts cookies) ...
return response
# --- File: requests/models.py (Simplified View) ---
from .cookies import get_cookie_header, _copy_cookie_jar, cookiejar_from_dict
class PreparedRequest:
def prepare_cookies(self, cookies):
# Store the jar potentially passed from Session.prepare_request
if isinstance(cookies, cookielib.CookieJar):
self._cookies = cookies
else:
self._cookies = cookiejar_from_dict(cookies)
# Generate the Cookie header string
cookie_header = get_cookie_header(self._cookies, self)
if cookie_header is not None:
self.headers['Cookie'] = cookie_header
class Response:
def __init__(self):
# ... other attributes ...
# This jar holds cookies SET by *this* response only
self.cookies = cookiejar_from_dict({})
# --- File: requests/cookies.py (Simplified View) ---
import cookielib
class MockRequest: # Helper to adapt requests.Request for cookielib
# ... implementation ...
class MockResponse: # Helper to adapt response headers for cookielib
# ... implementation ...
def extract_cookies_to_jar(jar, request, response):
"""Extract Set-Cookie headers from response into jar."""
if not hasattr(response, '_original_response') or not response._original_response:
return # Need the underlying httplib response
req = MockRequest(request) # Adapt request for cookielib
res = MockResponse(response._original_response.msg) # Adapt headers for cookielib
jar.extract_cookies(res, req) # Use cookielib's extraction logic
def get_cookie_header(jar, request):
"""Generate the Cookie header string for the request."""
r = MockRequest(request)
jar.add_cookie_header(r) # Use cookielib to add the header to the mock request
return r.get_new_headers().get('Cookie') # Retrieve the generated header
class RequestsCookieJar(cookielib.CookieJar, MutableMapping):
# Dictionary-like methods (get, set, __getitem__, etc.)
def get(self, name, default=None, domain=None, path=None):
# ... find cookie, handle conflicts ...
pass
def set(self, name, value, **kwargs):
# ... create or update cookie ...
pass
# ... other dict methods ...
```
The key is that `Session.send` calls `extract_cookies_to_jar` after receiving a response, and `PreparedRequest.prepare_cookies` (called via `Session.prepare_request`) calls `get_cookie_header` before sending the next one.
## Conclusion
You've learned about the **Cookie Jar** (`RequestsCookieJar`), the mechanism `requests` (especially `Session` objects) uses to store and manage cookies. You saw:
* How `Session` objects automatically use their cookie jar (`s.cookies`) to persist cookies across requests.
* How `response.cookies` contains cookies set by a specific response.
* How to interact with a `RequestsCookieJar` using its dictionary-like interface.
* A glimpse into how `requests` extracts cookies from `Set-Cookie` headers and adds them back via the `Cookie` header.
Understanding the cookie jar helps explain how sessions maintain state and interact with websites that require logins or remember preferences.
Speaking of logging in, while cookies are often involved, sometimes websites require more explicit forms of identification, like usernames and passwords sent directly with the request. How does `requests` handle those?
**Next:** [Chapter 5: Authentication Handlers](05_authentication_handlers.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,421 @@
# Chapter 5: Authentication Handlers - Showing Your ID Card
In [Chapter 4: The Cookie Jar](04_cookie_jar.md), we learned how `requests` uses `Session` objects and cookie jars to automatically remember things like login cookies. This is great for websites that use cookies to manage sessions after you log in.
But what about websites or APIs that require you to prove who you are *every time* you make a request, or use different methods than cookies? For example, some services need a username and password sent directly with the request, not just a cookie.
## The Problem: Accessing Protected Resources
Imagine a website has a special members-only area. To access pages in this area, the server needs to know you're a valid member *right when you ask for the page*. It won't just let anyone in. It needs some form of identification, like a username and password.
How do we tell `requests` to include this identification with our request?
This is where **Authentication Handlers** come in.
## What are Authentication Handlers?
Think of authentication handlers as different types of **ID badges** you can attach to your web requests. Just like you might need a specific badge to get into different parts of a building, different web services might require different types of authentication.
`Requests` has built-in support for common types (schemes) of HTTP authentication, and you can even create your own custom badges.
**Common ID Badges (Authentication Schemes):**
1. **HTTP Basic Auth:** This is the simplest type. It's like a badge with your username and password written directly on it (encoded, but easily decoded). It's common but not very secure over plain HTTP (HTTPS makes it safer).
* `Requests` provides: A simple `(username, password)` tuple or the `HTTPBasicAuth` class.
2. **HTTP Digest Auth:** This is a bit more secure than Basic. Instead of sending your password directly, it involves a challenge-response process, like the server asking a secret question based on your password, and your request providing the answer. It's more complex but avoids sending the password openly.
* `Requests` provides: The `HTTPDigestAuth` class.
3. **Custom Auth:** Some services use unique authentication methods (like OAuth1, OAuth2, custom API keys).
* `Requests` allows you to create your own auth handlers by subclassing `AuthBase`. Many other libraries provide handlers for common schemes like OAuth.
When you provide authentication details to `requests`, it automatically figures out how to create and attach the correct `Authorization` header (or sometimes `Proxy-Authorization` for proxies) to your request. It's like pinning the right ID badge onto your request before sending it off.
## Using Authentication Handlers
The easiest way to add authentication is by using the `auth` parameter when making a request, either with the functional API or with a [Session](03_session.md) object.
### HTTP Basic Auth (The Easiest Way)
For Basic Auth, you can simply pass a tuple `(username, password)` to the `auth` argument.
Let's try accessing a test endpoint from `httpbin.org` that's protected with Basic Auth. The username is `testuser` and the password is `testpass`.
```python
import requests
# This URL requires Basic Auth with user='testuser', pass='testpass'
url = 'https://httpbin.org/basic-auth/testuser/testpass'
# Try without authentication first (should fail with 401 Unauthorized)
print("Attempting without authentication...")
response_fail = requests.get(url)
print(f"Status Code (fail): {response_fail.status_code}") # Expect 401
# Now, provide the username and password tuple to the 'auth' parameter
print("\nAttempting with Basic Auth tuple...")
try:
response_ok = requests.get(url, auth=('testuser', 'testpass'))
print(f"Status Code (ok): {response_ok.status_code}") # Expect 200
# Check the response content (httpbin echoes auth info)
print("Response JSON:")
print(response_ok.json())
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
```
**Output:**
```
Attempting without authentication...
Status Code (fail): 401
Attempting with Basic Auth tuple...
Status Code (ok): 200
Response JSON:
{'authenticated': True, 'user': 'testuser'}
```
**Explanation:**
1. The first request failed with `401 Unauthorized` because we didn't provide credentials.
2. In the second request, we added `auth=('testuser', 'testpass')`.
3. `Requests` automatically recognized this tuple, created the necessary `Authorization: Basic dGVzdHVzZXI6dGVzdHBhc3M=` header (where `dGVzdHVzZXI6dGVzdHBhc3M=` is the Base64 encoding of `testuser:testpass`), and added it to the request.
4. The server validated the credentials and granted access, returning a `200 OK` status. The response body confirms we were authenticated as `testuser`.
### Using the `HTTPBasicAuth` Class
Passing a tuple is a shortcut specifically for Basic Auth. For clarity, or if you want to reuse the authentication details, you can use the `HTTPBasicAuth` class explicitly. It does exactly the same thing internally.
```python
import requests
from requests.auth import HTTPBasicAuth # Import the class
url = 'https://httpbin.org/basic-auth/testuser/testpass'
# Create an HTTPBasicAuth object
basic_auth = HTTPBasicAuth('testuser', 'testpass')
# Pass the auth object to the 'auth' parameter
print("Attempting with HTTPBasicAuth object...")
try:
response = requests.get(url, auth=basic_auth)
print(f"Status Code: {response.status_code}") # Expect 200
print("Response JSON:")
print(response.json())
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
```
**Output:**
```
Attempting with HTTPBasicAuth object...
Status Code: 200
Response JSON:
{'authenticated': True, 'user': 'testuser'}
```
This achieves the same result as the tuple, but `HTTPBasicAuth(user, pass)` is more explicit about the type of authentication being used.
### HTTP Digest Auth
Digest Auth is more complex, involving a challenge from the server. `Requests` handles this complexity for you with the `HTTPDigestAuth` class. You use it similarly to `HTTPBasicAuth`.
```python
import requests
from requests.auth import HTTPDigestAuth # Import the class
# httpbin has a digest auth endpoint
# user='testuser', pass='testpass'
url = 'https://httpbin.org/digest-auth/auth/testuser/testpass'
# Create an HTTPDigestAuth object
digest_auth = HTTPDigestAuth('testuser', 'testpass')
# Pass the auth object to the 'auth' parameter
print("Attempting with HTTPDigestAuth object...")
try:
response = requests.get(url, auth=digest_auth)
print(f"Status Code: {response.status_code}") # Expect 200
print("Response JSON:")
print(response.json())
# Note: It might take two requests internally for Digest Auth
print(f"Request History (if any): {response.history}")
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
```
**Output:**
```
Attempting with HTTPDigestAuth object...
Status Code: 200
Response JSON:
{'authenticated': True, 'user': 'testuser'}
Request History (if any): [<Response [401]>]
```
**Explanation:**
1. We used `HTTPDigestAuth` this time.
2. When `requests` first tries to access the URL, the server challenges it with a `401 Unauthorized` response containing details needed for Digest Auth (like a `nonce` and `realm`). You can see this `401` response in `response.history`.
3. The `HTTPDigestAuth` handler catches this `401`, uses the challenge information and your password to calculate the correct response, and automatically sends a *second* request with the proper `Authorization: Digest ...` header.
4. This second request succeeds, and you get the final `200 OK` response.
`Requests` handles the two-step process automatically when you use `HTTPDigestAuth`.
### Persistent Authentication with Sessions
If you need to make multiple requests to the same server using the same authentication, it's much more efficient to set the authentication on a [Session](03_session.md) object. The session will then automatically apply the authentication to *all* requests made through it.
```python
import requests
from requests.auth import HTTPBasicAuth
basic_auth_url = 'https://httpbin.org/basic-auth/testuser/testpass'
headers_url = 'https://httpbin.org/headers' # Just to see headers sent
# Create a session
with requests.Session() as s:
# Set the authentication ONCE on the session
s.auth = HTTPBasicAuth('testuser', 'testpass')
# Or: s.auth = ('testuser', 'testpass')
# Make the first request (auth will be added automatically)
print("Making first request using session auth...")
response1 = s.get(basic_auth_url)
print(f"Status Code 1: {response1.status_code}")
# Make a second request to a different endpoint (auth will also be added)
# We use /headers to see the Authorization header being sent
print("\nMaking second request using session auth...")
response2 = s.get(headers_url)
print(f"Status Code 2: {response2.status_code}")
print("Headers sent in second request:")
# Look for the 'Authorization' header in the output
print(response2.json()['headers'])
```
**Output:**
```
Making first request using session auth...
Status Code 1: 200
Making second request using session auth...
Status Code 2: 200
Headers sent in second request:
{
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Authorization": "Basic dGVzdHVzZXI6dGVzdHBhc3M=", // <-- Auth header added automatically!
"Host": "httpbin.org",
"User-Agent": "python-requests/2.x.y",
"X-Amzn-Trace-Id": "Root=..."
}
```
By setting `s.auth = ...`, we ensured that *both* requests sent the `Authorization` header without needing to specify it in each `s.get()` call.
### Custom Authentication
What if a service uses a completely different way to authenticate? `Requests` allows you to create your own authentication handler by writing a class that inherits from `requests.auth.AuthBase` and implements the `__call__` method. This method receives the `PreparedRequest` object and should modify it (usually by adding headers) as needed.
```python
from requests.auth import AuthBase
class MyCustomApiKeyAuth(AuthBase):
"""Attaches a custom API Key header to the request."""
def __init__(self, api_key):
self.api_key = api_key
def __call__(self, r):
# 'r' is the PreparedRequest object
# Modify the request 'r' here. We'll add a header.
r.headers['X-API-Key'] = self.api_key
# We MUST return the modified request object
return r
# Usage:
# api_key = "YOUR_SECRET_API_KEY"
# response = requests.get(some_url, auth=MyCustomApiKeyAuth(api_key))
```
This is more advanced, but it shows the flexibility of the `requests` auth system. Many third-party libraries use this pattern to provide auth helpers for specific services (like OAuth).
## How It Works Internally
How does `requests` take the `auth` parameter and turn it into the correct `Authorization` header?
1. **Preparation Step:** When you make a request (e.g., `requests.get(url, auth=...)` or `s.request(...)`), the `Request` object is turned into a `PreparedRequest` as we saw in [Chapter 2: Request & Response Models](02_request___response_models.md). Part of this preparation involves the `prepare_auth` method.
2. **Check Auth Type:** Inside `prepare_auth`, `requests` checks the `auth` parameter.
* If `auth` is a tuple `(user, pass)`, it automatically wraps it in an `HTTPBasicAuth(user, pass)` object.
* If `auth` is already an object (like `HTTPBasicAuth`, `HTTPDigestAuth`, or a custom one inheriting from `AuthBase`), it uses that object directly.
3. **Call the Auth Object:** All authentication handler objects (including the built-in ones) are **callable**. This means they have a `__call__` method. The `prepare_auth` step *calls* the auth object, passing the `PreparedRequest` object (`p`) to it: `auth(p)`.
4. **Modify the Request:** The `__call__` method of the auth object does the actual work.
* For `HTTPBasicAuth`, the `__call__` method calculates the `Basic base64(user:pass)` string and sets `p.headers['Authorization'] = ...`.
* For `HTTPDigestAuth`, the `__call__` method might initially set up hooks to handle the `401` challenge, or if it already has the necessary info (like a `nonce`), it calculates the `Digest ...` header and sets `p.headers['Authorization']`.
* For a custom auth object, its `__call__` method performs whatever modifications are needed (e.g., adding an `X-API-Key` header).
5. **Return Modified Request:** The `__call__` method *must* return the modified `PreparedRequest` object.
6. **Send Request:** The `PreparedRequest`, now potentially including an `Authorization` header, is sent to the server.
Here's a simplified sequence diagram for Basic Auth:
```mermaid
sequenceDiagram
participant UserCode as Your Code
participant ReqFunc as requests.get / Session.request
participant PrepReq as PreparedRequest
participant AuthObj as HTTPBasicAuth Instance
participant Server
UserCode->>ReqFunc: Call get(url, auth=('user', 'pass'))
ReqFunc->>PrepReq: Create PreparedRequest (p)
ReqFunc->>PrepReq: Call p.prepare_auth(auth=...)
Note over PrepReq: Detects tuple, creates HTTPBasicAuth('user', 'pass')
PrepReq->>AuthObj: Call auth_obj(p)
activate AuthObj
AuthObj->>AuthObj: Calculate 'Basic ...' string
AuthObj->>PrepReq: Set p.headers['Authorization'] = 'Basic ...'
AuthObj-->>PrepReq: Return modified p
deactivate AuthObj
PrepReq-->>ReqFunc: Return prepared request p
ReqFunc->>Server: Send HTTP Request (with Authorization header)
Server-->>ReqFunc: Send HTTP Response
ReqFunc-->>UserCode: Return Response
```
Let's look at the simplified code in `requests/auth.py` for `HTTPBasicAuth`:
```python
# File: requests/auth.py (Simplified)
from base64 import b64encode
from ._internal_utils import to_native_string
def _basic_auth_str(username, password):
"""Returns a Basic Auth string."""
# ... (handle encoding username/password to bytes) ...
auth_bytes = b":".join((username_bytes, password_bytes))
auth_b64 = b64encode(auth_bytes).strip()
# Return native string (str in Py3) e.g., "Basic dXNlcjpwYXNz"
return "Basic " + to_native_string(auth_b64)
class AuthBase:
"""Base class that all auth implementations derive from"""
def __call__(self, r):
# This method MUST be overridden by subclasses
raise NotImplementedError("Auth hooks must be callable.")
class HTTPBasicAuth(AuthBase):
"""Attaches HTTP Basic Authentication to the given Request object."""
def __init__(self, username, password):
self.username = username
self.password = password
def __call__(self, r):
# 'r' is the PreparedRequest object passed in by requests
# Calculate the Basic auth string
auth_header_value = _basic_auth_str(self.username, self.password)
# Modify the request's headers
r.headers['Authorization'] = auth_header_value
# Return the modified request
return r
class HTTPProxyAuth(HTTPBasicAuth):
"""Attaches HTTP Proxy Authentication to a given Request object."""
def __call__(self, r):
# Same as Basic Auth, but sets the Proxy-Authorization header
r.headers['Proxy-Authorization'] = _basic_auth_str(self.username, self.password)
return r
# HTTPDigestAuth is more complex, involving state and hooks for the 401 challenge
class HTTPDigestAuth(AuthBase):
def __init__(self, username, password):
# ... store username/password ...
# ... initialize state (nonce, etc.) ...
pass
def build_digest_header(self, method, url):
# ... complex calculation based on nonce, realm, qop, etc. ...
return "Digest ..." # Calculated digest header
def handle_401(self, r, **kwargs):
# Hook called when a 401 response is received
# 1. Parse challenge ('WWW-Authenticate' header)
# 2. Store nonce, realm etc.
# 3. Prepare a *new* request with the calculated digest header
# 4. Send the new request
# 5. Return the response to the *new* request
pass # Simplified
def __call__(self, r):
# 'r' is the PreparedRequest
# If we already have a nonce, add the Authorization header directly
if self.has_nonce():
r.headers['Authorization'] = self.build_digest_header(r.method, r.url)
# Register the handle_401 hook to handle the server challenge if needed
r.register_hook('response', self.handle_401)
return r
```
And in `requests/models.py`, the `PreparedRequest` calls the auth object:
```python
# File: requests/models.py (Simplified View)
from .auth import HTTPBasicAuth
from .utils import get_auth_from_url
class PreparedRequest(RequestEncodingMixin, RequestHooksMixin):
# ... (other prepare methods like prepare_url, prepare_headers) ...
def prepare_auth(self, auth, url=""):
"""Prepares the given HTTP auth data."""
# If no Auth provided, maybe get it from the URL (e.g., http://user:pass@host)
if auth is None:
url_auth = get_auth_from_url(self.url)
auth = url_auth if any(url_auth) else None
if auth:
# If auth is a ('user', 'pass') tuple, wrap it in HTTPBasicAuth
if isinstance(auth, tuple) and len(auth) == 2:
auth = HTTPBasicAuth(*auth)
# --- The Core Step ---
# Call the auth object (which must be callable, like AuthBase subclasses)
# Pass 'self' (the PreparedRequest instance) to the auth object's __call__
r = auth(self)
# Update self to reflect any changes made by the auth object
# (Auth objects typically just modify headers, but could do more)
self.__dict__.update(r.__dict__)
# Recompute Content-Length in case auth modified the body (unlikely for Basic/Digest)
self.prepare_content_length(self.body)
# ... (rest of PreparedRequest) ...
```
The key is the `r = auth(self)` line, where the `PreparedRequest` delegates the task of adding authentication details to the specific authentication handler object provided.
## Conclusion
You've learned how `requests` handles HTTP authentication using **Authentication Handlers**.
* You saw that authentication is like providing an **ID badge** with your request.
* You learned about common schemes like **Basic Auth** (using a simple `(user, pass)` tuple or `HTTPBasicAuth`) and **Digest Auth** (`HTTPDigestAuth`).
* You know how to apply authentication to single requests or persistently using a [Session](03_session.md) object via the `auth` parameter.
* You understand that internally, `requests` calls the provided auth object, which modifies the `PreparedRequest` (usually by adding an `Authorization` header) before sending it.
* You got a glimpse of how custom authentication can be built using `AuthBase`.
Authentication is crucial for accessing protected resources. But what happens when things go wrong? A server might be down, a URL might be invalid, or authentication might fail. How does `requests` tell you about these problems?
**Next:** [Chapter 6: Exception Hierarchy](06_exception_hierarchy.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,372 @@
# Chapter 6: When Things Go Wrong - The Exception Hierarchy
In [Chapter 5: Authentication Handlers](05_authentication_handlers.md), we learned how to prove our identity to websites that require login or API keys. We assumed our requests would work if we provided the correct credentials.
But what happens when things *don't* go as planned? The internet isn't always reliable. Websites go down, networks have hiccups, URLs might be typed incorrectly, or servers might just be having a bad day. How does `requests` tell us about these problems, and how can we handle them gracefully in our code?
## The Problem: Dealing with Request Failures
Imagine you're building a script to check the weather using an online weather API. You use `requests.get()` to fetch the weather data. What could go wrong?
* Your internet connection might be down.
* The weather API website might be temporarily offline.
* You might have mistyped the URL.
* The website might take too long to respond (a timeout).
* The website might respond, but with an error message (like "404 Not Found" or "500 Server Error").
If any of these happen, `requests` will encounter an error. If you don't prepare for these errors, your script might crash! We need a way to:
1. **Detect** that an error occurred.
2. **Understand** *what kind* of error it was (network issue? timeout? bad URL?).
3. **React** appropriately (e.g., print a helpful message, try again later, use a default value).
## The Solution: A Family Tree of Errors
`Requests` helps us by using a system of specific error messages called **exceptions**. When something goes wrong, `requests` doesn't just give up silently; it **raises an exception**.
Think of it like a doctor diagnosing an illness. A doctor doesn't just say "You're sick." They give a specific diagnosis: "You have the flu," or "You have a broken arm," or "You have allergies." Each diagnosis tells you something specific about the problem and how to treat it.
`Requests` does something similar with its exceptions. It has a main, general exception called `requests.exceptions.RequestException`. All other specific `requests` errors are "children" or "descendants" of this main one, forming an **Exception Hierarchy** (like a family tree).
**Analogy:** The "Sickness" Family Tree 🌳
* **`RequestException` (The Grandparent):** This is the most general category, like saying "Sickness." If you catch this, you catch *any* problem related to `requests`.
* **`ConnectionError`, `Timeout`, `HTTPError`, `URLRequired` (The Parents):** These are more specific categories under `RequestException`.
* `ConnectionError` is like saying "Infection."
* `Timeout` is like saying "Fatigue."
* `HTTPError` is like saying "External Injury."
* `URLRequired` is like saying "Genetic Condition" (problem with the input itself).
* **`ConnectTimeout`, `ReadTimeout` (The Children):** These are even *more* specific.
* `ConnectTimeout` (child of `Timeout`) is like "Trouble Falling Asleep."
* `ReadTimeout` (child of `Timeout`) is like "Waking Up Too Early." Both are types of "Fatigue" (`Timeout`).
This hierarchy allows you to decide how specific you want to be when handling errors.
## Key Members of the Exception Family
All `requests` exceptions live inside the `requests.exceptions` module. You usually import the main `requests` library and access them like `requests.exceptions.ConnectionError`.
Here are some of the most common ones you'll encounter:
* **`requests.exceptions.RequestException`**: The base exception. Catching this catches *all* exceptions listed below.
* **`requests.exceptions.ConnectionError`**: Problems connecting to the server. This could be due to:
* DNS failure (can't find the server's address).
* Refused connection (server is there but not accepting connections).
* Network is unreachable.
* **`requests.exceptions.Timeout`**: The request took too long. This is a parent category for:
* **`requests.exceptions.ConnectTimeout`**: Timeout occurred *while trying to establish the connection*.
* **`requests.exceptions.ReadTimeout`**: Timeout occurred *after connecting*, while waiting for the server to send data.
* **`requests.exceptions.HTTPError`**: Raised when the server returns a "bad" status code (4xx for client errors like "404 Not Found", or 5xx for server errors like "500 Internal Server Error"). **Important:** `requests` does *not* automatically raise this just because the status code is bad. You typically need to call the `response.raise_for_status()` method to trigger it.
* **`requests.exceptions.TooManyRedirects`**: The request exceeded the maximum number of allowed redirects (usually 30).
* **`requests.exceptions.URLRequired`**: You tried to make a request without providing a URL.
* **`requests.exceptions.MissingSchema`**: The URL was missing the scheme (like `http://` or `https://`).
* **`requests.exceptions.InvalidURL`**: The URL was malformed in some way.
* **`requests.exceptions.InvalidSchema`**: The URL scheme was not recognized (e.g., `ftp://` might not be supported by default).
## Handling Exceptions: The `try...except` Block
How do we use this hierarchy in our code? We use Python's `try...except` block.
1. Put the code that *might* cause an error (like `requests.get()`) inside the `try:` block.
2. Follow it with one or more `except:` blocks. Each `except:` block specifies the type of exception it's designed to catch.
**Example 1: Catching Any `requests` Error**
Let's try fetching a URL that doesn't exist and catch the most general exception.
```python
import requests
# A URL that might cause a connection error (e.g., non-existent domain)
bad_url = 'https://this-domain-probably-does-not-exist-asdfghjkl.com'
good_url = 'https://httpbin.org/get'
url_to_try = bad_url # Change to good_url to see success case
print(f"Trying to fetch: {url_to_try}")
try:
response = requests.get(url_to_try, timeout=5) # Add timeout
response.raise_for_status() # Check for 4xx/5xx errors
print("Success! Status Code:", response.status_code)
# Process the response... (e.g., print response.text)
except requests.exceptions.RequestException as e:
# This will catch ANY error originating from requests
print(f"\nOh no! A requests-related error occurred:")
print(f"Error Type: {type(e).__name__}")
print(f"Error Details: {e}")
print("\nScript continues after handling the error.")
```
**Possible Output (if `url_to_try = bad_url`):**
```
Trying to fetch: https://this-domain-probably-does-not-exist-asdfghjkl.com
Oh no! A requests-related error occurred:
Error Type: ConnectionError
Error Details: HTTPSConnectionPool(host='this-domain-probably-does-not-exist-asdfghjkl.com', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x...>: Failed to resolve 'this-domain-probably-does-not-exist-asdfghjkl.com' ([Errno ...)"))
Script continues after handling the error.
```
**Explanation:**
* We put `requests.get()` and `response.raise_for_status()` inside the `try` block.
* If `requests.get()` fails (e.g., due to `ConnectionError` or `Timeout`), or if `raise_for_status()` detects a 4xx/5xx code (`HTTPError`), an exception is raised.
* The `except requests.exceptions.RequestException as e:` block catches it because `ConnectionError`, `Timeout`, and `HTTPError` are all descendants of `RequestException`.
* We print a helpful message and the details of the error (`e`). Crucially, the script *doesn't crash*.
**Example 2: Catching Specific Errors**
Sometimes, you want to react differently based on the *type* of error. Was it a temporary network glitch, or did the server permanently remove the page?
```python
import requests
# URL that gives a 404 error
not_found_url = 'https://httpbin.org/status/404'
# URL that is slow and might time out
timeout_url = 'https://httpbin.org/delay/5' # Delays response by 5 seconds
url_to_try = timeout_url # Change to not_found_url to see HTTPError
print(f"Trying to fetch: {url_to_try}")
try:
# Set a short timeout to demonstrate Timeout exception
response = requests.get(url_to_try, timeout=2)
response.raise_for_status() # Check for 4xx/5xx status codes
print("Success! Status Code:", response.status_code)
# Process response...
except requests.exceptions.ConnectTimeout as e:
print(f"\nError: Could not connect to the server in time.")
print(f"Details: {e}")
# Maybe retry later?
except requests.exceptions.ReadTimeout as e:
print(f"\nError: Server took too long to send data.")
print(f"Details: {e}")
# Maybe the server is slow, could try again?
except requests.exceptions.ConnectionError as e:
print(f"\nError: Network problem (e.g., DNS error, refused connection).")
print(f"Details: {e}")
# Check internet connection?
except requests.exceptions.HTTPError as e:
print(f"\nError: Bad HTTP status code received from server.")
print(f"Status Code: {e.response.status_code}")
print(f"Details: {e}")
# Was it a 404 Not Found? 500 Server Error?
except requests.exceptions.RequestException as e:
# Catch any other requests error that wasn't specifically handled above
print(f"\nAn unexpected requests error occurred:")
print(f"Error Type: {type(e).__name__}")
print(f"Details: {e}")
print("\nScript continues...")
```
**Possible Output (if `url_to_try = timeout_url`):**
```
Trying to fetch: https://httpbin.org/delay/5
Error: Server took too long to send data.
Details: HTTPSConnectionPool(host='httpbin.org', port=443): Read timed out. (read timeout=2)
Script continues...
```
**Possible Output (if `url_to_try = not_found_url`):**
```
Trying to fetch: https://httpbin.org/status/404
Error: Bad HTTP status code received from server.
Status Code: 404
Details: 404 Client Error: NOT FOUND for url: https://httpbin.org/status/404
Script continues...
```
**Explanation:**
* We have multiple `except` blocks, ordered from most specific (`ConnectTimeout`, `ReadTimeout`) to more general (`ConnectionError`, `HTTPError`) and finally the catch-all `RequestException`.
* Python tries the `except` blocks in order. When an exception occurs, the *first* matching block is executed.
* If a `ReadTimeout` occurs, the `except requests.exceptions.ReadTimeout` block handles it. It won't fall through to the `except requests.exceptions.ConnectionError` or `except requests.exceptions.RequestException` blocks, even though `ReadTimeout` *is* a type of `RequestException`.
* This allows us to provide specific feedback or recovery logic for different error scenarios.
**Inheritance Benefit:** If you write `except requests.exceptions.Timeout as e:`, this block will catch *both* `ConnectTimeout` and `ReadTimeout` because they inherit from `Timeout`.
## How It Works Internally: Wrapping Lower-Level Errors
`Requests` doesn't handle network connections directly. It uses a lower-level library called `urllib3` under the hood (managed via [Transport Adapters](07_transport_adapters.md)). When `urllib3` encounters a network problem (like a connection error or timeout), it raises its *own* specific exceptions (e.g., `urllib3.exceptions.MaxRetryError`, `urllib3.exceptions.NewConnectionError`, `urllib3.exceptions.ReadTimeoutError`).
`Requests` catches these `urllib3` exceptions inside its [Transport Adapters](07_transport_adapters.md) (specifically, the `HTTPAdapter.send` method) and then **raises its own corresponding exception** from the `requests.exceptions` hierarchy. This simplifies things for you you only need to worry about catching `requests` exceptions, not the underlying `urllib3` ones.
```mermaid
sequenceDiagram
participant UserCode as Your Code
participant ReqAPI as requests.get()
participant Adapter as HTTPAdapter
participant Urllib3 as urllib3 library
participant Network
UserCode->>ReqAPI: requests.get(bad_url, timeout=1)
ReqAPI->>Adapter: send(prepared_request)
Adapter->>Urllib3: urlopen(method, url, ..., timeout=1)
Urllib3->>Network: Attempt connection...
Network-->>Urllib3: Fails (e.g., DNS lookup fails)
Urllib3->>Urllib3: Raise urllib3.exceptions.NewConnectionError
Urllib3-->>Adapter: Propagate NewConnectionError
Adapter->>Adapter: Catch NewConnectionError
Adapter->>Adapter: Raise requests.exceptions.ConnectionError(original_error)
Adapter-->>ReqAPI: Propagate ConnectionError
ReqAPI-->>UserCode: Propagate ConnectionError
UserCode->>UserCode: Catch requests.exceptions.ConnectionError
```
Let's look at the definitions in `requests/exceptions.py`. You can see the inheritance structure clearly:
```python
# File: requests/exceptions.py (Simplified View)
from urllib3.exceptions import HTTPError as BaseHTTPError
# The base class for all requests exceptions
class RequestException(IOError):
"""There was an ambiguous exception that occurred while handling your request."""
# ... (stores request/response objects) ...
# Specific exceptions inheriting from RequestException or other requests exceptions
class HTTPError(RequestException):
"""An HTTP error occurred.""" # Typically raised by response.raise_for_status()
class ConnectionError(RequestException):
"""A Connection error occurred."""
class ProxyError(ConnectionError): # Inherits from ConnectionError
"""A proxy error occurred."""
class SSLError(ConnectionError): # Inherits from ConnectionError
"""An SSL error occurred."""
class Timeout(RequestException): # Inherits directly from RequestException
"""The request timed out."""
class ConnectTimeout(ConnectionError, Timeout): # Inherits from BOTH ConnectionError and Timeout!
"""The request timed out while trying to connect to the remote server."""
class ReadTimeout(Timeout): # Inherits from Timeout
"""The server did not send any data in the allotted amount of time."""
class URLRequired(RequestException):
"""A valid URL is required to make a request."""
class TooManyRedirects(RequestException):
"""Too many redirects."""
# ... other specific errors like MissingSchema, InvalidURL, etc. ...
# Some exceptions might also inherit from standard Python errors
class JSONDecodeError(RequestException, ValueError): # Inherits from RequestException and ValueError
"""Couldn't decode the text into json"""
# Uses Python's built-in JSONDecodeError capabilities
```
And here's a simplified view of how `requests/adapters.py` (`HTTPAdapter.send`) catches `urllib3` errors and raises `requests` errors:
```python
# File: requests/adapters.py (Simplified View in HTTPAdapter.send method)
from urllib3.exceptions import (
MaxRetryError, ConnectTimeoutError, NewConnectionError, ResponseError,
ProxyError as _ProxyError, SSLError as _SSLError, ReadTimeoutError,
ProtocolError, ClosedPoolError, InvalidHeader as _InvalidHeader
)
from ..exceptions import (
ConnectionError, ConnectTimeout, ReadTimeout, SSLError, ProxyError,
RetryError, InvalidHeader, RequestException # And others
)
class HTTPAdapter(BaseAdapter):
def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
# ... (prepare connection using self.get_connection_with_tls_context) ...
conn = self.get_connection_with_tls_context(...)
# ... (verify certs, prepare URL, add headers) ...
try:
# === Make the actual request using urllib3 ===
resp = conn.urlopen(
method=request.method,
url=url,
# ... other args like body, headers ...
retries=self.max_retries,
timeout=timeout,
)
# === Catch specific urllib3 errors and raise corresponding requests errors ===
except (ProtocolError, OSError) as err: # General network/protocol errors
raise ConnectionError(err, request=request)
except MaxRetryError as e: # urllib3 retried but failed
if isinstance(e.reason, ConnectTimeoutError):
raise ConnectTimeout(e, request=request)
if isinstance(e.reason, ResponseError): # Errors related to retry logic
raise RetryError(e, request=request)
if isinstance(e.reason, _ProxyError):
raise ProxyError(e, request=request)
if isinstance(e.reason, _SSLError):
raise SSLError(e, request=request)
# Fallback for other retry errors
raise ConnectionError(e, request=request)
except ClosedPoolError as e: # Connection pool was closed
raise ConnectionError(e, request=request)
except _ProxyError as e: # Direct proxy error
raise ProxyError(e)
except (_SSLError, ReadTimeoutError, _InvalidHeader) as e: # Other specific errors
if isinstance(e, _SSLError):
raise SSLError(e, request=request)
elif isinstance(e, ReadTimeoutError):
raise ReadTimeout(e, request=request)
elif isinstance(e, _InvalidHeader):
raise InvalidHeader(e, request=request)
else:
# Should not happen, but raise generic RequestException if needed
raise RequestException(e, request=request)
# ... (build and return the Response object if successful) ...
return self.build_response(request, resp)
```
This wrapping makes your life easier by providing a consistent set of exceptions (`requests.exceptions`) to handle, regardless of the underlying `urllib3` details.
## Conclusion
You've learned about the `requests` **Exception Hierarchy** a family tree of error types that `requests` raises when things go wrong.
* You saw that all `requests` exceptions inherit from the base `requests.exceptions.RequestException`.
* You learned about key specific exceptions like `ConnectionError`, `Timeout` (and its children `ConnectTimeout`, `ReadTimeout`), and `HTTPError` (raised by `response.raise_for_status()`).
* You practiced using `try...except` blocks to catch both general (`RequestException`) and specific exceptions, allowing for tailored error handling.
* You understood that `requests` wraps lower-level errors (from `urllib3`) into its own exception types, simplifying error handling for you.
Understanding this hierarchy is crucial for writing robust Python code that can gracefully handle the inevitable problems that occur when dealing with networks and web services.
So far, we've mostly used the default way `requests` handles connections. But what if we need more control over how connections are made, maybe to configure retries differently, or use different SSL settings? That's where Transport Adapters come in.
**Next:** [Chapter 7: Transport Adapters](07_transport_adapters.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,404 @@
# Chapter 7: Transport Adapters - Custom Delivery Routes
In the previous chapter, [Chapter 6: Exception Hierarchy](06_exception_hierarchy.md), we learned how `requests` signals problems like network errors or bad responses. Most of the time, we rely on the default way `requests` handles sending our requests and managing connections.
But what if the default way isn't quite right for a specific website or service? What if you need to tell `requests` *exactly* how to handle connections or retries for URLs starting with `http://` or `https://`, or maybe even for a completely custom scheme like `myprotocol://`?
## The Problem: Needing Special Handling
Imagine you're interacting with an API that's known to be a bit unreliable. Sometimes requests to it fail temporarily, but succeed if you just try again a second later. The default `requests` behavior might not retry enough times, or maybe you want to retry only on specific error codes.
Or perhaps you need to connect to a server using very specific security settings (SSL/TLS versions or ciphers) that aren't the default.
How can you customize *how* `requests` sends requests and manages connections for specific types of URLs?
## Meet Transport Adapters: The Delivery Services
This is where **Transport Adapters** come in!
Think of a `requests` [Session](03_session.md) object like a customer ordering packages online. The customer (Session) wants to send a package (a web request) to a specific address (a URL).
**Transport Adapters** are like the different **delivery services** (like FedEx, UPS, USPS, or maybe a specialized local courier) that the customer can choose from.
* Each delivery service specializes in certain types of addresses or delivery methods.
* When the customer has a package for a specific address (e.g., starting with `https://`), they pick the appropriate delivery service registered for that address type.
* That delivery service then handles all the details of picking up, transporting, and delivering the package (sending the request, managing connections, handling retries, etc.).
In `requests`, a Transport Adapter defines *how* requests are actually sent and connections are managed for specific **URL schemes** (like `http://` or `https://`).
## The Default Delivery Service: `HTTPAdapter`
By default, when you create a `Session` object, it automatically sets up the standard "delivery services" for web addresses:
* For URLs starting with `https://`, it uses the built-in `requests.adapters.HTTPAdapter`.
* For URLs starting with `http://`, it also uses the `requests.adapters.HTTPAdapter`.
This `HTTPAdapter` is the workhorse. It doesn't handle the network sockets directly; instead, it uses another powerful library called `urllib3` under the hood.
The `HTTPAdapter` (via `urllib3`) is responsible for:
1. **Connection Pooling:** Reusing existing network connections to the same host for better performance (like the delivery service keeping its trucks warm and ready for the next delivery to the same neighborhood). We saw the benefits of this in [Chapter 3: Session](03_session.md).
2. **HTTP/HTTPS Details:** Handling the specifics of the HTTP and HTTPS protocols.
3. **SSL Verification:** Making sure the website's security certificate is valid for HTTPS connections.
4. **Basic Retries:** Handling some low-level connection retries (though often you might want more control).
So, when you use a `Session` and make a `GET` request to `https://example.com`, the Session looks up the adapter for `https://`, finds the default `HTTPAdapter`, and hands the request off to it for delivery.
## Mounting Adapters: Choosing Your Delivery Service
How does a `Session` know which adapter to use for which URL prefix? It uses a mechanism called **mounting**.
Think of it like telling your `Session` customer: "For any address starting with `https://`, use this specific delivery service (adapter)."
A `Session` object has an `adapters` attribute, which is an ordered dictionary. You use the `session.mount(prefix, adapter)` method to register an adapter for a given URL prefix.
```python
import requests
from requests.adapters import HTTPAdapter
# Create a session
s = requests.Session()
# See the default adapters that are already mounted
print("Default Adapters:")
print(s.adapters)
# Create a *new* instance of the default HTTPAdapter
# (Maybe we'll configure it later)
custom_adapter = HTTPAdapter()
# Mount this adapter for a specific website
# Now, any request to this specific host via HTTPS will use our custom_adapter
print("\nMounting custom adapter for https://httpbin.org")
s.mount('https://httpbin.org', custom_adapter)
# Let's mount another one for all HTTP traffic
plain_http_adapter = HTTPAdapter()
print("Mounting another adapter for all http://")
s.mount('http://', plain_http_adapter)
# Check the adapters again (they are ordered by prefix length, longest first)
print("\nAdapters after mounting:")
print(s.adapters)
# When we make a request, the session finds the best matching prefix
print(f"\nAdapter for 'https://httpbin.org/get': {s.get_adapter('https://httpbin.org/get')}")
print(f"Adapter for 'http://example.com': {s.get_adapter('http://example.com')}")
print(f"Adapter for 'https://google.com': {s.get_adapter('https://google.com')}") # Uses default https://
```
**Output:**
```
Default Adapters:
OrderedDict([('https://', <requests.adapters.HTTPAdapter object at 0x...>), ('http://', <requests.adapters.HTTPAdapter object at 0x...>)])
Mounting custom adapter for https://httpbin.org
Mounting another adapter for all http://
Adapters after mounting:
OrderedDict([('https://httpbin.org', <requests.adapters.HTTPAdapter object at 0x...>), ('https://', <requests.adapters.HTTPAdapter object at 0x...>), ('http://', <requests.adapters.HTTPAdapter object at 0x...>)])
Adapter for 'https://httpbin.org/get': <requests.adapters.HTTPAdapter object at 0x...>
Adapter for 'http://example.com': <requests.adapters.HTTPAdapter object at 0x...>
Adapter for 'https://google.com': <requests.adapters.HTTPAdapter object at 0x...>
```
**Explanation:**
1. Initially, the session has default `HTTPAdapter` instances mounted for `https://` and `http://`.
2. We created new `HTTPAdapter` instances.
3. We used `s.mount('https://httpbin.org', custom_adapter)`. Now, requests to `https://httpbin.org/anything` will use `custom_adapter`.
4. We used `s.mount('http://', plain_http_adapter)`. This *replaced* the original default adapter for `http://`.
5. Requests to other HTTPS sites like `https://google.com` will still use the original default adapter mounted for the shorter `https://` prefix.
6. The `s.get_adapter(url)` method shows how the session selects the adapter based on the longest matching prefix.
## Use Case: Customizing Retries
Let's go back to the unreliable API example. We want to configure `requests` to automatically retry requests to `https://flaky-api.example.com` up to 5 times if certain errors occur (like temporary server errors or connection issues).
The `HTTPAdapter`'s retry logic is controlled by a `Retry` object from the underlying `urllib3` library. We can create our own `Retry` object with custom settings and pass it to a *new* `HTTPAdapter` instance.
```python
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry # Import the Retry class
# 1. Configure the retry strategy
# - total=5: Try up to 5 times in total
# - backoff_factor=0.5: Wait 0.5s, 1s, 2s, 4s between retries
# - status_forcelist=[500, 502, 503, 504]: Only retry on these HTTP status codes
# - allowed_methods=False: Retry for all methods (GET, POST, etc.) by default. Use ["GET", "POST"] to restrict.
retry_strategy = Retry(
total=5,
backoff_factor=0.5,
status_forcelist=[500, 502, 503, 504],
# allowed_methods=False # Default includes most common methods
)
# 2. Create an HTTPAdapter with this retry strategy
# The 'max_retries' argument accepts a Retry object
adapter_with_retries = HTTPAdapter(max_retries=retry_strategy)
# 3. Create a Session
session = requests.Session()
# 4. Mount the adapter for the specific API prefix
api_base_url = 'https://flaky-api.example.com/' # Use the base URL prefix
session.mount(api_base_url, adapter_with_retries)
# 5. Now, use the session to make requests to the flaky API
api_endpoint = f"{api_base_url}data"
print(f"Making request to {api_endpoint} with custom retries...")
try:
# Imagine this API sometimes returns 503 Service Unavailable
response = session.get(api_endpoint)
response.raise_for_status() # Check for HTTP errors
print("Success!")
# print(response.json()) # Process the successful response
except requests.exceptions.RequestException as e:
print(f"Request failed after retries: {e}")
# Requests to other domains will use the default adapter/retries
print("\nMaking request to a different site (default retries)...")
try:
response_other = session.get('https://httpbin.org/get')
print(f"Status for httpbin: {response_other.status_code}")
except requests.exceptions.RequestException as e:
print(f"Httpbin request failed: {e}")
```
**Explanation:**
1. We defined our desired retry behavior using `urllib3.util.retry.Retry`.
2. We created a *new* `HTTPAdapter`, passing our `retry_strategy` to its `max_retries` parameter during initialization.
3. We created a `Session`.
4. Crucially, we `mount`ed our `adapter_with_retries` specifically to the base URL of the flaky API (`https://flaky-api.example.com/`).
5. When `session.get(api_endpoint)` is called, the Session sees that the URL starts with the mounted prefix, so it uses our `adapter_with_retries`. If the server returns a `503` error, this adapter (using the `Retry` object) will automatically wait and try again, up to 5 times.
6. Requests to `https://httpbin.org` don't match the specific prefix, so they fall back to the default adapter mounted for `https://`, which has default retry behavior.
This allows fine-grained control over connection handling for different destinations.
## How It Works Internally: The Session-Adapter Dance
Let's trace the steps when you call `session.get(url)`:
1. **`Session.request`:** Your `session.get(url, ...)` call ends up in the main `Session.request` method.
2. **Prepare Request:** `Session.request` creates a `Request` object and calls `self.prepare_request(req)` to turn it into a `PreparedRequest`, merging session-level settings like headers and cookies (as seen in [Chapter 3: Session](03_session.md)).
3. **Merge Environment Settings:** `Session.request` calls `self.merge_environment_settings(...)` to figure out final settings for proxies, SSL verification (`verify`), etc.
4. **`Session.send`:** The prepared request (`prep`) and final settings (`send_kwargs`) are passed to `self.send(prep, **send_kwargs)`.
5. **`get_adapter`:** Inside `Session.send`, the first crucial step is `adapter = self.get_adapter(url=request.url)`. This method looks through the `self.adapters` dictionary (which is ordered from longest prefix to shortest) and returns the *first* adapter whose mounted prefix matches the beginning of the request's URL.
6. **`adapter.send`:** The `Session` then calls the `send` method *on the chosen adapter*: `r = adapter.send(request, **kwargs)`. **This is the handover!** The Session delegates the actual sending to the Transport Adapter.
7. **Adapter Does the Work:** The adapter (e.g., `HTTPAdapter`) takes over.
* It interacts with its `urllib3.PoolManager` to get a connection from the pool (or create one).
* It configures SSL/TLS context based on `verify` and `cert` parameters.
* It uses `urllib3` to send the actual HTTP request bytes over the network.
* It applies retry logic (using the `Retry` object if configured) if `urllib3` reports certain connection errors or status codes.
* It receives the raw HTTP response bytes from `urllib3`.
8. **`adapter.build_response`:** The adapter takes the raw response data from `urllib3` and constructs a `requests.Response` object using its `build_response(request, raw_urllib3_response)` method. This involves parsing status codes, headers, and making the response body available.
9. **Return Response:** The `adapter.send` method returns the fully formed `Response` object back to the `Session.send` method.
10. **Post-Processing:** `Session.send` does some final steps, like extracting cookies from the response into the session's [Cookie Jar](04_cookie_jar.md) and handling redirects (which might involve calling `send` again).
11. **Final Return:** The final `Response` object is returned to your original `session.get(url)` call.
Here's a simplified diagram:
```mermaid
sequenceDiagram
participant UserCode as Your Code
participant Session as Session Object
participant Adapter as Transport Adapter
participant Urllib3 as urllib3 Library
participant Server
UserCode->>Session: session.get(url)
Session->>Session: prepare_request(req) -> PreparedRequest (prep)
Session->>Session: merge_environment_settings() -> send_kwargs
Session->>Session: get_adapter(url) -> adapter_instance
Session->>Adapter: adapter_instance.send(prep, **send_kwargs)
activate Adapter
Adapter->>Urllib3: Get connection from PoolManager
Adapter->>Urllib3: urlopen(prep.method, url, ..., retries=..., timeout=...)
activate Urllib3
Urllib3->>Server: Send HTTP Request Bytes
Server-->>Urllib3: Receive HTTP Response Bytes
Urllib3-->>Adapter: Return raw urllib3 response
deactivate Urllib3
Adapter->>Adapter: build_response(prep, raw_response) -> Response (r)
Adapter-->>Session: Return Response (r)
deactivate Adapter
Session->>Session: Extract cookies, handle redirects...
Session-->>UserCode: Return final Response
```
Let's peek at the relevant code snippets:
```python
# File: requests/sessions.py (Simplified View)
class Session:
def __init__(self):
# ... other defaults ...
self.adapters = OrderedDict() # The mounted adapters
self.mount('https://', HTTPAdapter()) # Mount default HTTPS adapter
self.mount('http://', HTTPAdapter()) # Mount default HTTP adapter
def get_adapter(self, url):
"""Returns the appropriate connection adapter for the given URL."""
for prefix, adapter in self.adapters.items():
# Find the longest prefix that matches the URL
if url.lower().startswith(prefix.lower()):
return adapter
# No match found
raise InvalidSchema(f"No connection adapters were found for {url!r}")
def mount(self, prefix, adapter):
"""Registers a connection adapter to a prefix."""
self.adapters[prefix] = adapter
# Sort adapters by prefix length, descending (longest first)
# Simplified: Real code sorts keys and rebuilds OrderedDict
keys_to_move = [k for k in self.adapters if len(k) < len(prefix)]
for key in keys_to_move:
self.adapters[key] = self.adapters.pop(key)
def send(self, request, **kwargs):
# ... setup kwargs (stream, verify, cert, proxies) ...
# === GET THE ADAPTER ===
adapter = self.get_adapter(url=request.url)
# === DELEGATE TO THE ADAPTER ===
# Start timer
start = preferred_clock()
# Call the adapter's send method
r = adapter.send(request, **kwargs)
# Stop timer
elapsed = preferred_clock() - start
r.elapsed = timedelta(seconds=elapsed)
# ... dispatch response hooks ...
# ... persist cookies (extract_cookies_to_jar) ...
# ... handle redirects (resolve_redirects, might call send again) ...
# ... maybe read content if stream=False ...
return r
# File: requests/adapters.py (Simplified View)
from urllib3.util.retry import Retry
from urllib3.poolmanager import PoolManager # Used internally by HTTPAdapter
class BaseAdapter:
"""The Base Transport Adapter"""
def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
raise NotImplementedError
def close(self):
raise NotImplementedError
class HTTPAdapter(BaseAdapter):
def __init__(self, pool_connections=10, pool_maxsize=10, max_retries=0, pool_block=False):
# === STORE RETRY CONFIGURATION ===
if isinstance(max_retries, Retry):
self.max_retries = max_retries
else:
# Convert integer retries to a basic Retry object
self.max_retries = Retry(total=max_retries, read=False, connect=max_retries)
# ... configure pooling options ...
# === INITIALIZE URLIB3 POOL MANAGER ===
# This object manages connections using urllib3
self.poolmanager = PoolManager(num_pools=pool_connections, maxsize=pool_maxsize, block=pool_block)
self.proxy_manager = {} # For handling proxies
def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
"""Sends PreparedRequest object using urllib3."""
# ... determine connection pool (conn) based on URL, proxies, SSL context ...
conn = self.get_connection_with_tls_context(request, verify, proxies=proxies, cert=cert)
# ... determine URL to use (might be different for proxies) ...
url = self.request_url(request, proxies)
# ... configure timeout object for urllib3 ...
timeout_obj = self._build_timeout(timeout)
try:
# === CALL URLIB3 ===
# This is the core network call
resp = conn.urlopen(
method=request.method,
url=url,
body=request.body,
headers=request.headers,
redirect=False, # Requests handles redirects
assert_same_host=False,
preload_content=False, # Requests streams content
decode_content=False, # Requests handles decoding
retries=self.max_retries, # Pass configured retries
timeout=timeout_obj, # Pass configured timeout
chunked=... # Determine if chunked encoding is needed
)
except (urllib3_exceptions...) as err:
# === WRAP URLIB3 EXCEPTIONS ===
# Catch exceptions from urllib3 and raise corresponding
# requests.exceptions (ConnectionError, Timeout, SSLError, etc.)
# See Chapter 6 for details.
raise MappedRequestsException(err, request=request)
# === BUILD RESPONSE OBJECT ===
# Convert the raw urllib3 response into a requests.Response
response = self.build_response(request, resp)
return response
def build_response(self, req, resp):
"""Builds a requests.Response from a urllib3 response."""
response = Response()
response.status_code = getattr(resp, 'status', None)
response.headers = CaseInsensitiveDict(getattr(resp, 'headers', {}))
response.raw = resp # The raw urllib3 response object
response.reason = response.raw.reason
response.url = req.url
# ... extract cookies, set encoding, link request ...
response.request = req
response.connection = self # Link back to this adapter
return response
def close(self):
"""Close the underlying PoolManager."""
self.poolmanager.clear()
# ... close proxy managers ...
# ... other helper methods (cert_verify, proxy_manager_for, request_url) ...
```
The key idea is that the `Session` finds the right `Adapter` using `mount` prefixes, and then the `Adapter` uses `urllib3` to handle the low-level details of connection pooling, retries, and HTTP communication.
## Other Use Cases
Besides custom retries, you might use Transport Adapters for:
* **Custom SSL/TLS Contexts:** Create an `HTTPAdapter` and initialize its `PoolManager` with a custom `ssl.SSLContext` for fine-grained control over TLS versions, ciphers, or certificate verification logic.
* **SOCKS Proxies:** While `requests` doesn't support SOCKS natively, you can install a third-party library (like `requests-socks`) which provides a `SOCKSAdapter` that you can mount onto a session.
* **Testing:** You could create a custom adapter that doesn't actually make network requests but returns predefined responses, useful for testing your application without hitting real servers.
* **Custom Protocols:** If you needed to interact with a non-HTTP protocol, you could theoretically write a custom `BaseAdapter` subclass to handle it.
## Conclusion
You've learned about **Transport Adapters**, the pluggable backends that `requests` uses to handle the actual sending of requests and management of connections for different URL schemes (`http://`, `https://`, etc.).
* You saw the default adapter is `HTTPAdapter`, which uses `urllib3` for connection pooling, retries, and SSL.
* You learned how `Session` objects `mount` adapters to specific URL prefixes.
* You practiced customizing retry behavior by creating a new `HTTPAdapter` with a `urllib3.util.retry.Retry` object and mounting it to a session.
* You traced how a `Session` finds and delegates work to the appropriate adapter via `adapter.send`.
Transport Adapters give you powerful, low-level control over how `requests` interacts with the network, allowing you to tailor its behavior for specific needs.
Adapters let you customize *how* requests are sent. What if you want to simply *react* to a request being sent or a response being received, perhaps to log it or modify it slightly on the fly? `Requests` has another mechanism for that.
**Next:** [Chapter 8: The Hook System](08_hook_system.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

View File

@@ -0,0 +1,341 @@
# Chapter 8: The Hook System - Setting Up Checkpoints
In [Chapter 7: Transport Adapters](07_transport_adapters.md), we saw how to customize the low-level details of *how* requests are sent and connections are managed, like setting custom retry strategies. Transport Adapters give you control over the delivery mechanism itself.
But what if you don't need to change *how* the request is sent, but instead want to simply **react** when something happens during the process? For example, maybe you want to log every single response your application receives, or perhaps automatically add a timestamp to every request header just before it goes out (though this specific header example isn't currently supported by the default hooks).
## The Problem: Reacting to Events
Imagine you're building an application that interacts with several different web services. For debugging or monitoring purposes, you want to keep a record of every response you get back specifically, the URL you requested and the status code the server returned.
You could manually add `print()` statements after every single `requests.get()`, `s.post()`, etc., call throughout your code:
```python
# Manual logging (Repetitive!)
response1 = s.get('https://api.service1.com/data')
print(f"LOG: Got {response1.status_code} for {response1.url}")
# ... process response1 ...
response2 = s.post('https://api.service2.com/action', data={'key': 'value'})
print(f"LOG: Got {response2.status_code} for {response2.url}")
# ... process response2 ...
response3 = s.get('https://api.service1.com/status')
print(f"LOG: Got {response3.status_code} for {response3.url}")
# ... process response3 ...
```
This quickly becomes tedious and error-prone. If you forget to add the logging line, you miss that record. If you want to change the log format, you have to change it everywhere. Isn't there a way to tell `requests` to automatically run your logging code *every time* it gets a response?
## Meet the Hook System: Your Automated Checkpoints
Yes, there is! `Requests` provides a **Hook System** that lets you do just that.
Think of hooks like setting up **checkpoints** in the process of making a request and getting a response. When the process reaches a specific checkpoint, `requests` pauses briefly and calls any custom functions you've registered for that checkpoint.
**Analogy: Package Delivery Checkpoints** 📦
Imagine a package delivery process:
1. Package picked up.
2. Package arrives at sorting facility. -> **Checkpoint!** (Maybe run a function to scan the barcode).
3. Package loaded onto delivery truck.
4. Package delivered to recipient. -> **Checkpoint!** (Maybe run a function to get a signature).
The Hook System in `requests` works similarly. You can attach your own Python functions (called "hooks") to specific events (checkpoints).
Currently, the main event available is the **`response`** hook.
* **`response` Hook:** This hook runs *after* a response has been received from the server and the basic `Response` object has been built, but *before* that `Response` object is returned to your code that called `requests.get()` or `s.post()`.
## Using the `response` Hook
Let's solve our logging problem using the `response` hook.
**Step 1: Define the Hook Function**
First, we need to write a Python function that will perform our logging action. This function needs to accept the `Response` object as its first argument. It can also accept optional keyword arguments (`**kwargs`), which `requests` might pass in (though for the `response` hook, the `Response` object is the main thing).
```python
# Our custom hook function for logging
def log_response_details(response, *args, **kwargs):
"""
This function will be called after each response.
It logs the request method, URL, and response status code.
"""
# 'response' is the Response object just received
request_method = response.request.method # Get the method from the original request
url = response.url # Get the final URL
status_code = response.status_code # Get the status code
print(f"HOOK LOG: Received {status_code} for {request_method} request to {url}")
# IMPORTANT: Hooks usually shouldn't return anything (or return None).
# If a hook returns a value, it REPLACES the data being processed.
# For the 'response' hook, returning a value would replace the Response object!
# Since we just want to log, we don't return anything.
```
**Explanation:**
* The function `log_response_details` takes `response` as its first argument. This will be the `requests.Response` object.
* It also accepts `*args` and `**kwargs` to be flexible, even though we don't use them here.
* Inside the function, we access attributes of the `response` object (like `status_code`, `url`) and its associated request (`response.request.method`) to print our log message.
* Crucially, this function *doesn't return anything*. If it did return a value, that value would replace the original `response` object for any further processing or for the final return value of `s.get()`.
**Step 2: Register the Hook**
Now we need to tell `requests` to actually *use* our `log_response_details` function. We can register hooks in two main ways:
1. **On a `Session` Object:** If you register a hook on a [Session](03_session.md) object, it will be called for *every request* made using that session. This is perfect for our logging use case.
2. **On a Single `Request`:** You can also attach hooks to an individual `Request` object before preparing it. This is less common but useful if you only want a hook to run for one specific request.
Let's register our hook on a `Session`:
```python
import requests
# (Paste the log_response_details function definition from above here)
def log_response_details(response, *args, **kwargs):
request_method = response.request.method
url = response.url
status_code = response.status_code
print(f"HOOK LOG: Received {status_code} for {request_method} request to {url}")
# Create a Session
s = requests.Session()
# Register the hook on the session
# Hooks are stored in a dictionary: session.hooks = {'event_name': [list_of_functions]}
# We add our function to the list for the 'response' event.
s.hooks['response'].append(log_response_details)
# Now, make some requests using the session
print("Making requests...")
response1 = s.get('https://httpbin.org/get')
print(f" -> Main code received response 1 with status: {response1.status_code}")
response2 = s.post('https://httpbin.org/post', data={'id': '123'})
print(f" -> Main code received response 2 with status: {response2.status_code}")
response3 = s.get('https://httpbin.org/status/404') # This will get a 404
print(f" -> Main code received response 3 with status: {response3.status_code}")
```
**Expected Output:**
```
Making requests...
HOOK LOG: Received 200 for GET request to https://httpbin.org/get
-> Main code received response 1 with status: 200
HOOK LOG: Received 200 for POST request to https://httpbin.org/post
-> Main code received response 2 with status: 200
HOOK LOG: Received 404 for GET request to https://httpbin.org/status/404
-> Main code received response 3 with status: 404
```
**Explanation:**
1. `s = requests.Session()`: We created a session.
2. `s.hooks['response'].append(log_response_details)`: This is the key step. `s.hooks` is a dictionary where keys are event names (like `'response'`) and values are lists of functions to call for that event. We appended our logging function to the list for the `'response'` event.
3. When we called `s.get(...)` or `s.post(...)`, the following happened internally:
* The request was sent.
* The response was received.
* *Before* returning the response to our main code (`response1 = ...`), the `requests` Session checked its `hooks` dictionary for the `'response'` event.
* It found our `log_response_details` function and called it, passing the received `Response` object.
* Our hook function printed the log message.
* Since the hook returned `None`, the original `Response` object was then returned to our main code.
4. Notice how the "HOOK LOG" lines appear *before* the "Main code received response" lines, demonstrating that the hook runs after receiving the response but before the calling code gets it.
**Modifying the Response (Advanced)**
While our logging hook didn't return anything, a hook *can* modify the `Response` object it receives, or even return a completely different `Response` object.
```python
def add_custom_header_hook(response, *args, **kwargs):
"""Adds a custom header to the received response."""
print("HOOK: Adding X-Hook-Processed header...")
response.headers['X-Hook-Processed'] = 'True'
# We modified the response in-place, so we return None
# to let requests continue using the modified response.
return None
# Or, a hook that returns a *new* response (less common)
# def replace_response_hook(response, *args, **kwargs):
# if response.status_code == 404:
# print("HOOK: Replacing 404 response with a custom one!")
# new_response = requests.Response()
# new_response.status_code = 200
# new_response.reason = "Found via Hook"
# new_response._content = b"Content generated by hook!"
# new_response.request = response.request # Keep original request link
# return new_response # Return the NEW response
# return None # Otherwise, keep the original response
```
**Caution:** Modifying or replacing responses within hooks can be powerful but also confusing if not done carefully. For beginners, using hooks for actions like logging or metrics that don't change the response is often the safest starting point.
## How It Works Internally
Where exactly does `requests` call these hooks? The `response` hook is triggered within the `Session.send()` method, after the underlying [Transport Adapter](07_transport_adapters.md) has returned a response, but before things like cookie persistence and redirect handling are fully completed for that specific response.
1. **`Session.send()` Called:** Your code calls `s.get()` or `s.post()`, which eventually calls `Session.send()`.
2. **Adapter Sends Request:** The session selects the appropriate [Transport Adapter](07_transport_adapters.md) (e.g., `HTTPAdapter`). The adapter sends the request and receives the raw response (`r = adapter.send(...)`).
3. **Dispatch Hook:** Right after the adapter returns the `Response` object `r`, `Session.send()` calls `dispatch_hook("response", hooks, r, **kwargs)`. `hooks` here refers to the merged hooks from the `Request` and the `Session`.
4. **`dispatch_hook()` Executes:** This helper function (from `requests.hooks`) looks up the list of functions registered for the `"response"` event. It iterates through this list, calling each hook function (like our `log_response_details`) one by one, passing the `Response` object (`r`) to it.
5. **Hook Modifies/Replaces (Optional):** If a hook function returns a value, `dispatch_hook` updates `r` to be that new value. This allows hooks later in the list (or the main code) to see the modified response.
6. **Further Processing:** After `dispatch_hook` returns the (potentially modified) `Response` object `r`, `Session.send()` continues with other tasks like extracting cookies from `r` into the session's jar and handling redirects (which might involve sending another request).
7. **Return Response:** Finally, the `Response` object is returned to your original calling code.
Here's a simplified sequence diagram:
```mermaid
sequenceDiagram
participant UserCode as Your Code
participant Session as Session Object
participant Adapter as Transport Adapter
participant Hooks as dispatch_hook()
UserCode->>Session: s.get(url) / s.post(url)
Session->>Session: Calls prepare_request()
Session->>Session: Gets adapter based on URL
Session->>Adapter: adapter.send(request)
activate Adapter
Note over Adapter: Sends request, gets raw response
Adapter->>Adapter: build_response() -> Response 'r'
Adapter-->>Session: Return Response 'r'
deactivate Adapter
Note over Session: Merges request and session hooks
Session->>Hooks: dispatch_hook('response', merged_hooks, r)
activate Hooks
Note over Hooks: Iterates through registered hook functions
Hooks->>Hooks: Call each hook_function(r)
Note over Hooks: Hook might modify 'r' or return a new one
Hooks-->>Session: Return (potentially modified) Response 'r'
deactivate Hooks
Note over Session: Persist cookies from 'r', handle redirects...
Session-->>UserCode: Return final Response 'r'
```
Let's look at the key code pieces:
```python
# File: requests/hooks.py (Simplified)
HOOKS = ["response"] # Currently, only 'response' is actively used
def default_hooks():
# Creates the initial empty structure for hooks
return {event: [] for event in HOOKS}
def dispatch_hook(key, hooks, hook_data, **kwargs):
"""Dispatches hooks for a given key event."""
hooks = hooks or {} # Ensure hooks is a dict
hooks = hooks.get(key) # Get the list of functions for this event key
if hooks:
# Allow a single callable or a list
if hasattr(hooks, "__call__"):
hooks = [hooks]
# Call each registered hook function
for hook in hooks:
_hook_data = hook(hook_data, **kwargs) # Call the user's function
if _hook_data is not None:
# If the hook returned something, update the data
hook_data = _hook_data
return hook_data # Return the (potentially modified) data
# File: requests/sessions.py (Simplified view of Session.send)
from .hooks import dispatch_hook # Import the dispatcher
class Session:
# ... (other methods: __init__, request, prepare_request, get_adapter) ...
def send(self, request, **kwargs):
# ... (setup: kwargs, get adapter) ...
adapter = self.get_adapter(url=request.url)
# === ADAPTER SENDS THE REQUEST ===
r = adapter.send(request, **kwargs) # Gets the Response object 'r'
# ... (calculate elapsed time) ...
# === DISPATCH THE 'RESPONSE' HOOK ===
# request.hooks contains merged hooks from Request and Session
r = dispatch_hook("response", request.hooks, r, **kwargs)
# === CONTINUE PROCESSING ===
# Persist cookies from the (potentially modified) response 'r'
extract_cookies_to_jar(self.cookies, request, r.raw)
# Handle redirects if allowed (using the potentially modified 'r')
if kwargs.get('allow_redirects', True):
# ... redirect logic using self.resolve_redirects ...
# This might modify 'r' further if redirects occur
pass
else:
# ... store potential next request for non-redirected responses ...
pass
# ... (maybe consume content if stream=False) ...
return r # Return the final Response object
# File: requests/models.py (Simplified view of PreparedRequest)
# Shows where hooks are stored initially
class RequestHooksMixin:
# Mixin used by Request and PreparedRequest
def register_hook(self, event, hook):
# ... logic to add hook functions to self.hooks[event] list ...
pass
class Request(RequestHooksMixin):
def __init__(self, ..., hooks=None):
# ...
self.hooks = default_hooks() # Initialize hooks dict
if hooks:
for k, v in list(hooks.items()):
self.register_hook(event=k, hook=v) # Register hooks passed in
# ...
class PreparedRequest(..., RequestHooksMixin):
def __init__(self):
# ...
self.hooks = default_hooks() # Hooks are also on PreparedRequest
# ...
def prepare_hooks(self, hooks):
# Called during prepare() to merge hooks from the original Request
hooks = hooks or []
for event in hooks:
self.register_hook(event, hooks[event])
# Note: Session.prepare_request merges Request hooks and Session hooks
# into the PreparedRequest.hooks dictionary.
```
The `dispatch_hook` function is the core mechanism that allows `requests` to call your custom functions at the designated `"response"` checkpoint within `Session.send`.
## Conclusion
You've learned about the **Hook System** in `requests`, a way to register custom callback functions that run at specific points in the request-response lifecycle.
* You understood the motivation: automating actions like logging without cluttering your main code.
* You focused on the primary hook: **`response`**, which runs after a response is received but before it's returned to the caller.
* You saw how to define a hook function (accepting the `response` object) and register it on a `Session` (using `session.hooks`) to apply it globally, or potentially on a single `Request`.
* You implemented a practical example: logging response details automatically.
* You got a glimpse into how hooks *can* modify responses (use with care!).
* You learned that internally, the `dispatch_hook` function is called by `Session.send` to execute your registered hook functions.
The Hook System provides a clean way to plug into the `requests` workflow and add custom behavior or monitoring without modifying the library itself.
This concludes our journey through the core abstractions of the `requests` library! From the simple [Functional API](01_functional_api.md) to the powerful [Session](03_session.md) object, managing [Cookies](04_cookie_jar.md), handling [Authentication](05_authentication_handlers.md), dealing with [Exceptions](06_exception_hierarchy.md), customizing connections with [Transport Adapters](07_transport_adapters.md), and reacting to events with the Hook System, you now have a solid foundation for using `requests` effectively in your Python projects. Happy requesting!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)

48
docs/Requests/index.md Normal file
View File

@@ -0,0 +1,48 @@
# Tutorial: Requests
Requests is a Python library that makes sending *HTTP requests* incredibly simple.
Instead of dealing with complex details, you can use straightforward functions (like `requests.get()`) or **Session objects** to interact with web services.
It automatically handles things like *cookies*, *redirects*, *authentication*, and connection pooling, returning easy-to-use **Response objects** with all the server's data.
**Source Repository:** [https://github.com/psf/requests/tree/0e322af87745eff34caffe4df68456ebc20d9068/src/requests](https://github.com/psf/requests/tree/0e322af87745eff34caffe4df68456ebc20d9068/src/requests)
```mermaid
flowchart TD
A0["Request & Response Models"]
A1["Session"]
A2["Transport Adapters"]
A3["Functional API"]
A4["Authentication Handlers"]
A5["Cookie Jar"]
A6["Exception Hierarchy"]
A7["Hook System"]
A3 -- "Uses temporary" --> A1
A1 -- "Prepares/Receives" --> A0
A1 -- "Manages & Uses" --> A2
A1 -- "Manages" --> A5
A1 -- "Manages" --> A4
A1 -- "Manages" --> A7
A2 -- "Sends/Builds" --> A0
A4 -- "Modifies (adds headers)" --> A0
A5 -- "Populates/Reads" --> A0
A7 -- "Operates on" --> A0
A0 -- "Can Raise (raise_for_status)" --> A6
A2 -- "Raises Connection Errors" --> A6
```
## Chapters
1. [Functional API](01_functional_api.md)
2. [Request & Response Models](02_request___response_models.md)
3. [Session](03_session.md)
4. [Cookie Jar](04_cookie_jar.md)
5. [Authentication Handlers](05_authentication_handlers.md)
6. [Exception Hierarchy](06_exception_hierarchy.md)
7. [Transport Adapters](07_transport_adapters.md)
8. [Hook System](08_hook_system.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)