mirror of
https://github.com/aljazceru/Tutorial-Codebase-Knowledge.git
synced 2026-01-11 10:44:27 +01:00
init push
This commit is contained in:
52
output/Crawl4AI/index.md
Normal file
52
output/Crawl4AI/index.md
Normal file
@@ -0,0 +1,52 @@
|
||||
# Tutorial: Crawl4AI
|
||||
|
||||
`Crawl4AI` is a flexible Python library for *asynchronously crawling websites* and *extracting structured content*, specifically designed for **AI use cases**.
|
||||
You primarily interact with the `AsyncWebCrawler`, which acts as the main coordinator. You provide it with URLs and a `CrawlerRunConfig` detailing *how* to crawl (e.g., using specific strategies for fetching, scraping, filtering, and extraction).
|
||||
It can handle single pages or multiple URLs concurrently using a `BaseDispatcher`, optionally crawl deeper by following links via `DeepCrawlStrategy`, manage `CacheMode`, and apply `RelevantContentFilter` before finally returning a `CrawlResult` containing all the gathered data.
|
||||
|
||||
|
||||
**Source Repository:** [https://github.com/unclecode/crawl4ai/tree/9c58e4ce2ee025debd3f36bf213330bd72b90e46/crawl4ai](https://github.com/unclecode/crawl4ai/tree/9c58e4ce2ee025debd3f36bf213330bd72b90e46/crawl4ai)
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A0["AsyncWebCrawler"]
|
||||
A1["CrawlerRunConfig"]
|
||||
A2["AsyncCrawlerStrategy"]
|
||||
A3["ContentScrapingStrategy"]
|
||||
A4["ExtractionStrategy"]
|
||||
A5["CrawlResult"]
|
||||
A6["BaseDispatcher"]
|
||||
A7["DeepCrawlStrategy"]
|
||||
A8["CacheContext / CacheMode"]
|
||||
A9["RelevantContentFilter"]
|
||||
A0 -- "Configured by" --> A1
|
||||
A0 -- "Uses Fetching Strategy" --> A2
|
||||
A0 -- "Uses Scraping Strategy" --> A3
|
||||
A0 -- "Uses Extraction Strategy" --> A4
|
||||
A0 -- "Produces" --> A5
|
||||
A0 -- "Uses Dispatcher for `arun_m..." --> A6
|
||||
A0 -- "Uses Caching Logic" --> A8
|
||||
A6 -- "Calls Crawler's `arun`" --> A0
|
||||
A1 -- "Specifies Deep Crawl Strategy" --> A7
|
||||
A7 -- "Processes Links from" --> A5
|
||||
A3 -- "Provides Cleaned HTML to" --> A9
|
||||
A1 -- "Specifies Content Filter" --> A9
|
||||
```
|
||||
|
||||
## Chapters
|
||||
|
||||
1. [AsyncCrawlerStrategy](01_asynccrawlerstrategy.md)
|
||||
2. [AsyncWebCrawler](02_asyncwebcrawler.md)
|
||||
3. [CrawlerRunConfig](03_crawlerrunconfig.md)
|
||||
4. [ContentScrapingStrategy](04_contentscrapingstrategy.md)
|
||||
5. [RelevantContentFilter](05_relevantcontentfilter.md)
|
||||
6. [ExtractionStrategy](06_extractionstrategy.md)
|
||||
7. [CrawlResult](07_crawlresult.md)
|
||||
8. [DeepCrawlStrategy](08_deepcrawlstrategy.md)
|
||||
9. [CacheContext / CacheMode](09_cachecontext___cachemode.md)
|
||||
10. [BaseDispatcher](10_basedispatcher.md)
|
||||
|
||||
|
||||
---
|
||||
|
||||
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
|
||||
Reference in New Issue
Block a user