mirror of
https://github.com/aljazceru/Tutorial-Codebase-Knowledge.git
synced 2025-12-19 23:44:19 +01:00
372 lines
19 KiB
Markdown
372 lines
19 KiB
Markdown
# Chapter 6: When Things Go Wrong - The Exception Hierarchy
|
||
|
||
In [Chapter 5: Authentication Handlers](05_authentication_handlers.md), we learned how to prove our identity to websites that require login or API keys. We assumed our requests would work if we provided the correct credentials.
|
||
|
||
But what happens when things *don't* go as planned? The internet isn't always reliable. Websites go down, networks have hiccups, URLs might be typed incorrectly, or servers might just be having a bad day. How does `requests` tell us about these problems, and how can we handle them gracefully in our code?
|
||
|
||
## The Problem: Dealing with Request Failures
|
||
|
||
Imagine you're building a script to check the weather using an online weather API. You use `requests.get()` to fetch the weather data. What could go wrong?
|
||
|
||
* Your internet connection might be down.
|
||
* The weather API website might be temporarily offline.
|
||
* You might have mistyped the URL.
|
||
* The website might take too long to respond (a timeout).
|
||
* The website might respond, but with an error message (like "404 Not Found" or "500 Server Error").
|
||
|
||
If any of these happen, `requests` will encounter an error. If you don't prepare for these errors, your script might crash! We need a way to:
|
||
|
||
1. **Detect** that an error occurred.
|
||
2. **Understand** *what kind* of error it was (network issue? timeout? bad URL?).
|
||
3. **React** appropriately (e.g., print a helpful message, try again later, use a default value).
|
||
|
||
## The Solution: A Family Tree of Errors
|
||
|
||
`Requests` helps us by using a system of specific error messages called **exceptions**. When something goes wrong, `requests` doesn't just give up silently; it **raises an exception**.
|
||
|
||
Think of it like a doctor diagnosing an illness. A doctor doesn't just say "You're sick." They give a specific diagnosis: "You have the flu," or "You have a broken arm," or "You have allergies." Each diagnosis tells you something specific about the problem and how to treat it.
|
||
|
||
`Requests` does something similar with its exceptions. It has a main, general exception called `requests.exceptions.RequestException`. All other specific `requests` errors are "children" or "descendants" of this main one, forming an **Exception Hierarchy** (like a family tree).
|
||
|
||
**Analogy:** The "Sickness" Family Tree 🌳
|
||
|
||
* **`RequestException` (The Grandparent):** This is the most general category, like saying "Sickness." If you catch this, you catch *any* problem related to `requests`.
|
||
* **`ConnectionError`, `Timeout`, `HTTPError`, `URLRequired` (The Parents):** These are more specific categories under `RequestException`.
|
||
* `ConnectionError` is like saying "Infection."
|
||
* `Timeout` is like saying "Fatigue."
|
||
* `HTTPError` is like saying "External Injury."
|
||
* `URLRequired` is like saying "Genetic Condition" (problem with the input itself).
|
||
* **`ConnectTimeout`, `ReadTimeout` (The Children):** These are even *more* specific.
|
||
* `ConnectTimeout` (child of `Timeout`) is like "Trouble Falling Asleep."
|
||
* `ReadTimeout` (child of `Timeout`) is like "Waking Up Too Early." Both are types of "Fatigue" (`Timeout`).
|
||
|
||
This hierarchy allows you to decide how specific you want to be when handling errors.
|
||
|
||
## Key Members of the Exception Family
|
||
|
||
All `requests` exceptions live inside the `requests.exceptions` module. You usually import the main `requests` library and access them like `requests.exceptions.ConnectionError`.
|
||
|
||
Here are some of the most common ones you'll encounter:
|
||
|
||
* **`requests.exceptions.RequestException`**: The base exception. Catching this catches *all* exceptions listed below.
|
||
* **`requests.exceptions.ConnectionError`**: Problems connecting to the server. This could be due to:
|
||
* DNS failure (can't find the server's address).
|
||
* Refused connection (server is there but not accepting connections).
|
||
* Network is unreachable.
|
||
* **`requests.exceptions.Timeout`**: The request took too long. This is a parent category for:
|
||
* **`requests.exceptions.ConnectTimeout`**: Timeout occurred *while trying to establish the connection*.
|
||
* **`requests.exceptions.ReadTimeout`**: Timeout occurred *after connecting*, while waiting for the server to send data.
|
||
* **`requests.exceptions.HTTPError`**: Raised when the server returns a "bad" status code (4xx for client errors like "404 Not Found", or 5xx for server errors like "500 Internal Server Error"). **Important:** `requests` does *not* automatically raise this just because the status code is bad. You typically need to call the `response.raise_for_status()` method to trigger it.
|
||
* **`requests.exceptions.TooManyRedirects`**: The request exceeded the maximum number of allowed redirects (usually 30).
|
||
* **`requests.exceptions.URLRequired`**: You tried to make a request without providing a URL.
|
||
* **`requests.exceptions.MissingSchema`**: The URL was missing the scheme (like `http://` or `https://`).
|
||
* **`requests.exceptions.InvalidURL`**: The URL was malformed in some way.
|
||
* **`requests.exceptions.InvalidSchema`**: The URL scheme was not recognized (e.g., `ftp://` might not be supported by default).
|
||
|
||
## Handling Exceptions: The `try...except` Block
|
||
|
||
How do we use this hierarchy in our code? We use Python's `try...except` block.
|
||
|
||
1. Put the code that *might* cause an error (like `requests.get()`) inside the `try:` block.
|
||
2. Follow it with one or more `except:` blocks. Each `except:` block specifies the type of exception it's designed to catch.
|
||
|
||
**Example 1: Catching Any `requests` Error**
|
||
|
||
Let's try fetching a URL that doesn't exist and catch the most general exception.
|
||
|
||
```python
|
||
import requests
|
||
|
||
# A URL that might cause a connection error (e.g., non-existent domain)
|
||
bad_url = 'https://this-domain-probably-does-not-exist-asdfghjkl.com'
|
||
good_url = 'https://httpbin.org/get'
|
||
|
||
url_to_try = bad_url # Change to good_url to see success case
|
||
|
||
print(f"Trying to fetch: {url_to_try}")
|
||
|
||
try:
|
||
response = requests.get(url_to_try, timeout=5) # Add timeout
|
||
response.raise_for_status() # Check for 4xx/5xx errors
|
||
print("Success! Status Code:", response.status_code)
|
||
# Process the response... (e.g., print response.text)
|
||
|
||
except requests.exceptions.RequestException as e:
|
||
# This will catch ANY error originating from requests
|
||
print(f"\nOh no! A requests-related error occurred:")
|
||
print(f"Error Type: {type(e).__name__}")
|
||
print(f"Error Details: {e}")
|
||
|
||
print("\nScript continues after handling the error.")
|
||
```
|
||
|
||
**Possible Output (if `url_to_try = bad_url`):**
|
||
|
||
```
|
||
Trying to fetch: https://this-domain-probably-does-not-exist-asdfghjkl.com
|
||
|
||
Oh no! A requests-related error occurred:
|
||
Error Type: ConnectionError
|
||
Error Details: HTTPSConnectionPool(host='this-domain-probably-does-not-exist-asdfghjkl.com', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x...>: Failed to resolve 'this-domain-probably-does-not-exist-asdfghjkl.com' ([Errno ...)"))
|
||
|
||
Script continues after handling the error.
|
||
```
|
||
|
||
**Explanation:**
|
||
|
||
* We put `requests.get()` and `response.raise_for_status()` inside the `try` block.
|
||
* If `requests.get()` fails (e.g., due to `ConnectionError` or `Timeout`), or if `raise_for_status()` detects a 4xx/5xx code (`HTTPError`), an exception is raised.
|
||
* The `except requests.exceptions.RequestException as e:` block catches it because `ConnectionError`, `Timeout`, and `HTTPError` are all descendants of `RequestException`.
|
||
* We print a helpful message and the details of the error (`e`). Crucially, the script *doesn't crash*.
|
||
|
||
**Example 2: Catching Specific Errors**
|
||
|
||
Sometimes, you want to react differently based on the *type* of error. Was it a temporary network glitch, or did the server permanently remove the page?
|
||
|
||
```python
|
||
import requests
|
||
|
||
# URL that gives a 404 error
|
||
not_found_url = 'https://httpbin.org/status/404'
|
||
# URL that is slow and might time out
|
||
timeout_url = 'https://httpbin.org/delay/5' # Delays response by 5 seconds
|
||
|
||
url_to_try = timeout_url # Change to not_found_url to see HTTPError
|
||
|
||
print(f"Trying to fetch: {url_to_try}")
|
||
|
||
try:
|
||
# Set a short timeout to demonstrate Timeout exception
|
||
response = requests.get(url_to_try, timeout=2)
|
||
response.raise_for_status() # Check for 4xx/5xx status codes
|
||
print("Success! Status Code:", response.status_code)
|
||
# Process response...
|
||
|
||
except requests.exceptions.ConnectTimeout as e:
|
||
print(f"\nError: Could not connect to the server in time.")
|
||
print(f"Details: {e}")
|
||
# Maybe retry later?
|
||
|
||
except requests.exceptions.ReadTimeout as e:
|
||
print(f"\nError: Server took too long to send data.")
|
||
print(f"Details: {e}")
|
||
# Maybe the server is slow, could try again?
|
||
|
||
except requests.exceptions.ConnectionError as e:
|
||
print(f"\nError: Network problem (e.g., DNS error, refused connection).")
|
||
print(f"Details: {e}")
|
||
# Check internet connection?
|
||
|
||
except requests.exceptions.HTTPError as e:
|
||
print(f"\nError: Bad HTTP status code received from server.")
|
||
print(f"Status Code: {e.response.status_code}")
|
||
print(f"Details: {e}")
|
||
# Was it a 404 Not Found? 500 Server Error?
|
||
|
||
except requests.exceptions.RequestException as e:
|
||
# Catch any other requests error that wasn't specifically handled above
|
||
print(f"\nAn unexpected requests error occurred:")
|
||
print(f"Error Type: {type(e).__name__}")
|
||
print(f"Details: {e}")
|
||
|
||
print("\nScript continues...")
|
||
```
|
||
|
||
**Possible Output (if `url_to_try = timeout_url`):**
|
||
|
||
```
|
||
Trying to fetch: https://httpbin.org/delay/5
|
||
|
||
Error: Server took too long to send data.
|
||
Details: HTTPSConnectionPool(host='httpbin.org', port=443): Read timed out. (read timeout=2)
|
||
|
||
Script continues...
|
||
```
|
||
|
||
**Possible Output (if `url_to_try = not_found_url`):**
|
||
|
||
```
|
||
Trying to fetch: https://httpbin.org/status/404
|
||
|
||
Error: Bad HTTP status code received from server.
|
||
Status Code: 404
|
||
Details: 404 Client Error: NOT FOUND for url: https://httpbin.org/status/404
|
||
|
||
Script continues...
|
||
```
|
||
|
||
**Explanation:**
|
||
|
||
* We have multiple `except` blocks, ordered from most specific (`ConnectTimeout`, `ReadTimeout`) to more general (`ConnectionError`, `HTTPError`) and finally the catch-all `RequestException`.
|
||
* Python tries the `except` blocks in order. When an exception occurs, the *first* matching block is executed.
|
||
* If a `ReadTimeout` occurs, the `except requests.exceptions.ReadTimeout` block handles it. It won't fall through to the `except requests.exceptions.ConnectionError` or `except requests.exceptions.RequestException` blocks, even though `ReadTimeout` *is* a type of `RequestException`.
|
||
* This allows us to provide specific feedback or recovery logic for different error scenarios.
|
||
|
||
**Inheritance Benefit:** If you write `except requests.exceptions.Timeout as e:`, this block will catch *both* `ConnectTimeout` and `ReadTimeout` because they inherit from `Timeout`.
|
||
|
||
## How It Works Internally: Wrapping Lower-Level Errors
|
||
|
||
`Requests` doesn't handle network connections directly. It uses a lower-level library called `urllib3` under the hood (managed via [Transport Adapters](07_transport_adapters.md)). When `urllib3` encounters a network problem (like a connection error or timeout), it raises its *own* specific exceptions (e.g., `urllib3.exceptions.MaxRetryError`, `urllib3.exceptions.NewConnectionError`, `urllib3.exceptions.ReadTimeoutError`).
|
||
|
||
`Requests` catches these `urllib3` exceptions inside its [Transport Adapters](07_transport_adapters.md) (specifically, the `HTTPAdapter.send` method) and then **raises its own corresponding exception** from the `requests.exceptions` hierarchy. This simplifies things for you – you only need to worry about catching `requests` exceptions, not the underlying `urllib3` ones.
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant UserCode as Your Code
|
||
participant ReqAPI as requests.get()
|
||
participant Adapter as HTTPAdapter
|
||
participant Urllib3 as urllib3 library
|
||
participant Network
|
||
|
||
UserCode->>ReqAPI: requests.get(bad_url, timeout=1)
|
||
ReqAPI->>Adapter: send(prepared_request)
|
||
Adapter->>Urllib3: urlopen(method, url, ..., timeout=1)
|
||
Urllib3->>Network: Attempt connection...
|
||
Network-->>Urllib3: Fails (e.g., DNS lookup fails)
|
||
Urllib3->>Urllib3: Raise urllib3.exceptions.NewConnectionError
|
||
Urllib3-->>Adapter: Propagate NewConnectionError
|
||
Adapter->>Adapter: Catch NewConnectionError
|
||
Adapter->>Adapter: Raise requests.exceptions.ConnectionError(original_error)
|
||
Adapter-->>ReqAPI: Propagate ConnectionError
|
||
ReqAPI-->>UserCode: Propagate ConnectionError
|
||
UserCode->>UserCode: Catch requests.exceptions.ConnectionError
|
||
```
|
||
|
||
Let's look at the definitions in `requests/exceptions.py`. You can see the inheritance structure clearly:
|
||
|
||
```python
|
||
# File: requests/exceptions.py (Simplified View)
|
||
|
||
from urllib3.exceptions import HTTPError as BaseHTTPError
|
||
|
||
# The base class for all requests exceptions
|
||
class RequestException(IOError):
|
||
"""There was an ambiguous exception that occurred while handling your request."""
|
||
# ... (stores request/response objects) ...
|
||
|
||
# Specific exceptions inheriting from RequestException or other requests exceptions
|
||
class HTTPError(RequestException):
|
||
"""An HTTP error occurred.""" # Typically raised by response.raise_for_status()
|
||
|
||
class ConnectionError(RequestException):
|
||
"""A Connection error occurred."""
|
||
|
||
class ProxyError(ConnectionError): # Inherits from ConnectionError
|
||
"""A proxy error occurred."""
|
||
|
||
class SSLError(ConnectionError): # Inherits from ConnectionError
|
||
"""An SSL error occurred."""
|
||
|
||
class Timeout(RequestException): # Inherits directly from RequestException
|
||
"""The request timed out."""
|
||
|
||
class ConnectTimeout(ConnectionError, Timeout): # Inherits from BOTH ConnectionError and Timeout!
|
||
"""The request timed out while trying to connect to the remote server."""
|
||
|
||
class ReadTimeout(Timeout): # Inherits from Timeout
|
||
"""The server did not send any data in the allotted amount of time."""
|
||
|
||
class URLRequired(RequestException):
|
||
"""A valid URL is required to make a request."""
|
||
|
||
class TooManyRedirects(RequestException):
|
||
"""Too many redirects."""
|
||
|
||
# ... other specific errors like MissingSchema, InvalidURL, etc. ...
|
||
|
||
# Some exceptions might also inherit from standard Python errors
|
||
class JSONDecodeError(RequestException, ValueError): # Inherits from RequestException and ValueError
|
||
"""Couldn't decode the text into json"""
|
||
# Uses Python's built-in JSONDecodeError capabilities
|
||
|
||
```
|
||
|
||
And here's a simplified view of how `requests/adapters.py` (`HTTPAdapter.send`) catches `urllib3` errors and raises `requests` errors:
|
||
|
||
```python
|
||
# File: requests/adapters.py (Simplified View in HTTPAdapter.send method)
|
||
|
||
from urllib3.exceptions import (
|
||
MaxRetryError, ConnectTimeoutError, NewConnectionError, ResponseError,
|
||
ProxyError as _ProxyError, SSLError as _SSLError, ReadTimeoutError,
|
||
ProtocolError, ClosedPoolError, InvalidHeader as _InvalidHeader
|
||
)
|
||
from ..exceptions import (
|
||
ConnectionError, ConnectTimeout, ReadTimeout, SSLError, ProxyError,
|
||
RetryError, InvalidHeader, RequestException # And others
|
||
)
|
||
|
||
class HTTPAdapter(BaseAdapter):
|
||
def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
|
||
# ... (prepare connection using self.get_connection_with_tls_context) ...
|
||
conn = self.get_connection_with_tls_context(...)
|
||
# ... (verify certs, prepare URL, add headers) ...
|
||
|
||
try:
|
||
# === Make the actual request using urllib3 ===
|
||
resp = conn.urlopen(
|
||
method=request.method,
|
||
url=url,
|
||
# ... other args like body, headers ...
|
||
retries=self.max_retries,
|
||
timeout=timeout,
|
||
)
|
||
|
||
# === Catch specific urllib3 errors and raise corresponding requests errors ===
|
||
|
||
except (ProtocolError, OSError) as err: # General network/protocol errors
|
||
raise ConnectionError(err, request=request)
|
||
|
||
except MaxRetryError as e: # urllib3 retried but failed
|
||
if isinstance(e.reason, ConnectTimeoutError):
|
||
raise ConnectTimeout(e, request=request)
|
||
if isinstance(e.reason, ResponseError): # Errors related to retry logic
|
||
raise RetryError(e, request=request)
|
||
if isinstance(e.reason, _ProxyError):
|
||
raise ProxyError(e, request=request)
|
||
if isinstance(e.reason, _SSLError):
|
||
raise SSLError(e, request=request)
|
||
# Fallback for other retry errors
|
||
raise ConnectionError(e, request=request)
|
||
|
||
except ClosedPoolError as e: # Connection pool was closed
|
||
raise ConnectionError(e, request=request)
|
||
|
||
except _ProxyError as e: # Direct proxy error
|
||
raise ProxyError(e)
|
||
|
||
except (_SSLError, ReadTimeoutError, _InvalidHeader) as e: # Other specific errors
|
||
if isinstance(e, _SSLError):
|
||
raise SSLError(e, request=request)
|
||
elif isinstance(e, ReadTimeoutError):
|
||
raise ReadTimeout(e, request=request)
|
||
elif isinstance(e, _InvalidHeader):
|
||
raise InvalidHeader(e, request=request)
|
||
else:
|
||
# Should not happen, but raise generic RequestException if needed
|
||
raise RequestException(e, request=request)
|
||
|
||
# ... (build and return the Response object if successful) ...
|
||
return self.build_response(request, resp)
|
||
```
|
||
|
||
This wrapping makes your life easier by providing a consistent set of exceptions (`requests.exceptions`) to handle, regardless of the underlying `urllib3` details.
|
||
|
||
## Conclusion
|
||
|
||
You've learned about the `requests` **Exception Hierarchy** – a family tree of error types that `requests` raises when things go wrong.
|
||
|
||
* You saw that all `requests` exceptions inherit from the base `requests.exceptions.RequestException`.
|
||
* You learned about key specific exceptions like `ConnectionError`, `Timeout` (and its children `ConnectTimeout`, `ReadTimeout`), and `HTTPError` (raised by `response.raise_for_status()`).
|
||
* You practiced using `try...except` blocks to catch both general (`RequestException`) and specific exceptions, allowing for tailored error handling.
|
||
* You understood that `requests` wraps lower-level errors (from `urllib3`) into its own exception types, simplifying error handling for you.
|
||
|
||
Understanding this hierarchy is crucial for writing robust Python code that can gracefully handle the inevitable problems that occur when dealing with networks and web services.
|
||
|
||
So far, we've mostly used the default way `requests` handles connections. But what if we need more control over how connections are made, maybe to configure retries differently, or use different SSL settings? That's where Transport Adapters come in.
|
||
|
||
**Next:** [Chapter 7: Transport Adapters](07_transport_adapters.md)
|
||
|
||
---
|
||
|
||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) |