--- layout: default title: "CrawlResult" parent: "Crawl4AI" nav_order: 7 --- # Chapter 7: Understanding the Results - CrawlResult In the previous chapter, [Chapter 6: Getting Specific Data - ExtractionStrategy](06_extractionstrategy.md), we learned how to teach Crawl4AI to act like an analyst, extracting specific, structured data points from a webpage using an `ExtractionStrategy`. We've seen how Crawl4AI can fetch pages, clean them, filter them, and even extract precise information. But after all that work, where does all the gathered information go? When you ask the `AsyncWebCrawler` to crawl a URL using `arun()`, what do you actually get back? ## What Problem Does `CrawlResult` Solve? Imagine you sent a research assistant to the library (a website) with a set of instructions: "Find this book (URL), make a clean copy of the relevant chapter (clean HTML/Markdown), list all the cited references (links), take photos of the illustrations (media), find the author and publication date (metadata), and maybe extract specific quotes (structured data)." When the assistant returns, they wouldn't just hand you a single piece of paper. They'd likely give you a folder containing everything you asked for: the clean copy, the list of references, the photos, the metadata notes, and the extracted quotes, all neatly organized. They might also include a note if they encountered any problems (errors). `CrawlResult` is exactly this **final report folder** or **delivery package**. It's a single object that neatly contains *all* the information Crawl4AI gathered and processed for a specific URL during a crawl operation. Instead of getting lots of separate pieces of data back, you get one convenient container. ## What is `CrawlResult`? `CrawlResult` is a Python object (specifically, a Pydantic model, which is like a super-powered dictionary) that acts as a data container. It holds the results of a single crawl task performed by `AsyncWebCrawler.arun()` or one of the results from `arun_many()`. Think of it as a toolbox filled with different tools and information related to the crawled page. **Key Information Stored in `CrawlResult`:** * **`url` (string):** The original URL that was requested. * **`success` (boolean):** Did the crawl complete without critical errors? `True` if successful, `False` otherwise. **Always check this first!** * **`html` (string):** The raw, original HTML source code fetched from the page. * **`cleaned_html` (string):** The HTML after initial cleaning by the [ContentScrapingStrategy](04_contentscrapingstrategy.md) (e.g., scripts, styles removed). * **`markdown` (object):** An object containing different Markdown representations of the content. * `markdown.raw_markdown`: Basic Markdown generated from `cleaned_html`. * `markdown.fit_markdown`: Markdown generated *only* from content deemed relevant by a [RelevantContentFilter](05_relevantcontentfilter.md) (if one was used). Might be empty if no filter was applied. * *(Other fields like `markdown_with_citations` might exist)* * **`extracted_content` (string):** If you used an [ExtractionStrategy](06_extractionstrategy.md), this holds the extracted structured data, usually formatted as a JSON string. `None` if no extraction was performed or nothing was found. * **`metadata` (dictionary):** Information extracted from the page's metadata tags, like the page title (`metadata['title']`), description, keywords, etc. * **`links` (object):** Contains lists of links found on the page. * `links.internal`: List of links pointing to the same website. * `links.external`: List of links pointing to other websites. * **`media` (object):** Contains lists of media items found. * `media.images`: List of images (`` tags). * `media.videos`: List of videos (`