update nav

This commit is contained in:
zachary62
2025-04-04 14:03:22 -04:00
parent 2fa60fe7d5
commit 0426110e66
24 changed files with 261 additions and 32 deletions

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "AsyncCrawlerStrategy"
parent: "Crawl4AI"
nav_order: 1
---
# Chapter 1: How We Fetch Webpages - AsyncCrawlerStrategy # Chapter 1: How We Fetch Webpages - AsyncCrawlerStrategy
Welcome to the Crawl4AI tutorial series! Our goal is to build intelligent agents that can understand and extract information from the web. The very first step in this process is actually *getting* the content from a webpage. This chapter explains how Crawl4AI handles that fundamental task. Welcome to the Crawl4AI tutorial series! Our goal is to build intelligent agents that can understand and extract information from the web. The very first step in this process is actually *getting* the content from a webpage. This chapter explains how Crawl4AI handles that fundamental task.

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "AsyncWebCrawler"
parent: "Crawl4AI"
nav_order: 2
---
# Chapter 2: Meet the General Manager - AsyncWebCrawler # Chapter 2: Meet the General Manager - AsyncWebCrawler
In [Chapter 1: How We Fetch Webpages - AsyncCrawlerStrategy](01_asynccrawlerstrategy.md), we learned about the different ways Crawl4AI can fetch the raw content of a webpage, like choosing between a fast drone (`AsyncHTTPCrawlerStrategy`) or a versatile delivery truck (`AsyncPlaywrightCrawlerStrategy`). In [Chapter 1: How We Fetch Webpages - AsyncCrawlerStrategy](01_asynccrawlerstrategy.md), we learned about the different ways Crawl4AI can fetch the raw content of a webpage, like choosing between a fast drone (`AsyncHTTPCrawlerStrategy`) or a versatile delivery truck (`AsyncPlaywrightCrawlerStrategy`).

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "CrawlerRunConfig"
parent: "Crawl4AI"
nav_order: 3
---
# Chapter 3: Giving Instructions - CrawlerRunConfig # Chapter 3: Giving Instructions - CrawlerRunConfig
In [Chapter 2: Meet the General Manager - AsyncWebCrawler](02_asyncwebcrawler.md), we met the `AsyncWebCrawler`, the central coordinator for our web crawling tasks. We saw how to tell it *what* URL to crawl using the `arun` method. In [Chapter 2: Meet the General Manager - AsyncWebCrawler](02_asyncwebcrawler.md), we met the `AsyncWebCrawler`, the central coordinator for our web crawling tasks. We saw how to tell it *what* URL to crawl using the `arun` method.

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "ContentScrapingStrategy"
parent: "Crawl4AI"
nav_order: 4
---
# Chapter 4: Cleaning Up the Mess - ContentScrapingStrategy # Chapter 4: Cleaning Up the Mess - ContentScrapingStrategy
In [Chapter 3: Giving Instructions - CrawlerRunConfig](03_crawlerrunconfig.md), we learned how to give specific instructions to our `AsyncWebCrawler` using `CrawlerRunConfig`. This included telling it *how* to fetch the page and potentially take screenshots or PDFs. In [Chapter 3: Giving Instructions - CrawlerRunConfig](03_crawlerrunconfig.md), we learned how to give specific instructions to our `AsyncWebCrawler` using `CrawlerRunConfig`. This included telling it *how* to fetch the page and potentially take screenshots or PDFs.

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "RelevantContentFilter"
parent: "Crawl4AI"
nav_order: 5
---
# Chapter 5: Focusing on What Matters - RelevantContentFilter # Chapter 5: Focusing on What Matters - RelevantContentFilter
In [Chapter 4: Cleaning Up the Mess - ContentScrapingStrategy](04_contentscrapingstrategy.md), we learned how Crawl4AI takes the raw, messy HTML from a webpage and cleans it up using a `ContentScrapingStrategy`. This gives us a tidier version of the HTML (`cleaned_html`) and extracts basic elements like links and images. In [Chapter 4: Cleaning Up the Mess - ContentScrapingStrategy](04_contentscrapingstrategy.md), we learned how Crawl4AI takes the raw, messy HTML from a webpage and cleans it up using a `ContentScrapingStrategy`. This gives us a tidier version of the HTML (`cleaned_html`) and extracts basic elements like links and images.

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "ExtractionStrategy"
parent: "Crawl4AI"
nav_order: 6
---
# Chapter 6: Getting Specific Data - ExtractionStrategy # Chapter 6: Getting Specific Data - ExtractionStrategy
In the previous chapter, [Chapter 5: Focusing on What Matters - RelevantContentFilter](05_relevantcontentfilter.md), we learned how to sift through the cleaned webpage content to keep only the parts relevant to our query or goal, producing a focused `fit_markdown`. This is great for tasks like summarization or getting the main gist of an article. In the previous chapter, [Chapter 5: Focusing on What Matters - RelevantContentFilter](05_relevantcontentfilter.md), we learned how to sift through the cleaned webpage content to keep only the parts relevant to our query or goal, producing a focused `fit_markdown`. This is great for tasks like summarization or getting the main gist of an article.

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "CrawlResult"
parent: "Crawl4AI"
nav_order: 7
---
# Chapter 7: Understanding the Results - CrawlResult # Chapter 7: Understanding the Results - CrawlResult
In the previous chapter, [Chapter 6: Getting Specific Data - ExtractionStrategy](06_extractionstrategy.md), we learned how to teach Crawl4AI to act like an analyst, extracting specific, structured data points from a webpage using an `ExtractionStrategy`. We've seen how Crawl4AI can fetch pages, clean them, filter them, and even extract precise information. In the previous chapter, [Chapter 6: Getting Specific Data - ExtractionStrategy](06_extractionstrategy.md), we learned how to teach Crawl4AI to act like an analyst, extracting specific, structured data points from a webpage using an `ExtractionStrategy`. We've seen how Crawl4AI can fetch pages, clean them, filter them, and even extract precise information.
@@ -247,7 +254,7 @@ if __name__ == "__main__":
You don't interact with the `CrawlResult` constructor directly. The `AsyncWebCrawler` creates it for you at the very end of the `arun` process, typically inside its internal `aprocess_html` method (or just before returning if fetching from cache). You don't interact with the `CrawlResult` constructor directly. The `AsyncWebCrawler` creates it for you at the very end of the `arun` process, typically inside its internal `aprocess_html` method (or just before returning if fetching from cache).
Heres a simplified sequence: Here's a simplified sequence:
1. **Fetch:** `AsyncWebCrawler` calls the [AsyncCrawlerStrategy](01_asynccrawlerstrategy.md) to get the raw `html`, `status_code`, `response_headers`, etc. 1. **Fetch:** `AsyncWebCrawler` calls the [AsyncCrawlerStrategy](01_asynccrawlerstrategy.md) to get the raw `html`, `status_code`, `response_headers`, etc.
2. **Scrape:** It passes the `html` to the [ContentScrapingStrategy](04_contentscrapingstrategy.md) to get `cleaned_html`, `links`, `media`, `metadata`. 2. **Scrape:** It passes the `html` to the [ContentScrapingStrategy](04_contentscrapingstrategy.md) to get `cleaned_html`, `links`, `media`, `metadata`.

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "DeepCrawlStrategy"
parent: "Crawl4AI"
nav_order: 8
---
# Chapter 8: Exploring Websites - DeepCrawlStrategy # Chapter 8: Exploring Websites - DeepCrawlStrategy
In [Chapter 7: Understanding the Results - CrawlResult](07_crawlresult.md), we saw the final report (`CrawlResult`) that Crawl4AI gives us after processing a single URL. This report contains cleaned content, links, metadata, and maybe even extracted data. In [Chapter 7: Understanding the Results - CrawlResult](07_crawlresult.md), we saw the final report (`CrawlResult`) that Crawl4AI gives us after processing a single URL. This report contains cleaned content, links, metadata, and maybe even extracted data.

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "CacheContext & CacheMode"
parent: "Crawl4AI"
nav_order: 9
---
# Chapter 9: Smart Fetching with Caching - CacheContext / CacheMode # Chapter 9: Smart Fetching with Caching - CacheContext / CacheMode
In the previous chapter, [Chapter 8: Exploring Websites - DeepCrawlStrategy](08_deepcrawlstrategy.md), we saw how Crawl4AI can explore websites by following links, potentially visiting many pages. During such explorations, or even when you run the same crawl multiple times, the crawler might try to fetch the exact same webpage again and again. This can be slow and might unnecessarily put a load on the website you're crawling. Wouldn't it be smarter to remember the result from the first time and just reuse it? In the previous chapter, [Chapter 8: Exploring Websites - DeepCrawlStrategy](08_deepcrawlstrategy.md), we saw how Crawl4AI can explore websites by following links, potentially visiting many pages. During such explorations, or even when you run the same crawl multiple times, the crawler might try to fetch the exact same webpage again and again. This can be slow and might unnecessarily put a load on the website you're crawling. Wouldn't it be smarter to remember the result from the first time and just reuse it?

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "BaseDispatcher"
parent: "Crawl4AI"
nav_order: 10
---
# Chapter 10: Orchestrating the Crawl - BaseDispatcher # Chapter 10: Orchestrating the Crawl - BaseDispatcher
In [Chapter 9: Smart Fetching with Caching - CacheContext / CacheMode](09_cachecontext___cachemode.md), we learned how Crawl4AI uses caching to cleverly avoid re-fetching the same webpage multiple times, which is especially helpful when crawling many URLs. We've also seen how methods like `arun_many()` ([Chapter 2: Meet the General Manager - AsyncWebCrawler](02_asyncwebcrawler.md)) or strategies like [DeepCrawlStrategy](08_deepcrawlstrategy.md) can lead to potentially hundreds or thousands of individual URLs needing to be crawled. In [Chapter 9: Smart Fetching with Caching - CacheContext / CacheMode](09_cachecontext___cachemode.md), we learned how Crawl4AI uses caching to cleverly avoid re-fetching the same webpage multiple times, which is especially helpful when crawling many URLs. We've also seen how methods like `arun_many()` ([Chapter 2: Meet the General Manager - AsyncWebCrawler](02_asyncwebcrawler.md)) or strategies like [DeepCrawlStrategy](08_deepcrawlstrategy.md) can lead to potentially hundreds or thousands of individual URLs needing to be crawled.

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "LLM"
parent: "CrewAI"
nav_order: 6
---
# Chapter 6: LLM - The Agent's Brain # Chapter 6: LLM - The Agent's Brain
In the [previous chapter](05_process.md), we explored the `Process` - how the `Crew` organizes the workflow for its `Agent`s, deciding whether they work sequentially or are managed hierarchically. We now have specialized agents ([Agent](02_agent.md)), defined work ([Task](03_task.md)), useful abilities ([Tool](04_tool.md)), and a workflow strategy ([Process](05_process.md)). In the [previous chapter](05_process.md), we explored the `Process` - how the `Crew` organizes the workflow for its `Agent`s, deciding whether they work sequentially or are managed hierarchically. We now have specialized agents ([Agent](02_agent.md)), defined work ([Task](03_task.md)), useful abilities ([Tool](04_tool.md)), and a workflow strategy ([Process](05_process.md)).

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "Memory"
parent: "CrewAI"
nav_order: 7
---
# Chapter 7: Memory - Giving Your Crew Recall # Chapter 7: Memory - Giving Your Crew Recall
In the [previous chapter](06_llm.md), we looked at the Large Language Model ([LLM](06_llm.md)) the "brain" that allows each [Agent](02_agent.md) to understand, reason, and generate text. Now we have agents that can think, perform [Task](03_task.md)s using [Tool](04_tool.md)s, and follow a [Process](05_process.md). In the [previous chapter](06_llm.md), we looked at the Large Language Model ([LLM](06_llm.md)) the "brain" that allows each [Agent](02_agent.md) to understand, reason, and generate text. Now we have agents that can think, perform [Task](03_task.md)s using [Tool](04_tool.md)s, and follow a [Process](05_process.md).

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "Knowledge"
parent: "CrewAI"
nav_order: 8
---
# Chapter 8: Knowledge - Providing External Information # Chapter 8: Knowledge - Providing External Information
In [Chapter 7: Memory](07_memory.md), we learned how to give our [Crew](01_crew.md) the ability to remember past interactions and details using `Memory`. This helps them maintain context within a single run and potentially across runs. In [Chapter 7: Memory](07_memory.md), we learned how to give our [Crew](01_crew.md) the ability to remember past interactions and details using `Memory`. This helps them maintain context within a single run and potentially across runs.

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "Module & Program"
parent: "DSPy"
nav_order: 1
---
# Chapter 1: Modules and Programs: Building Blocks of DSPy # Chapter 1: Modules and Programs: Building Blocks of DSPy
Welcome to the first chapter of our journey into DSPy! We're excited to have you here. Welcome to the first chapter of our journey into DSPy! We're excited to have you here.

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "Signature"
parent: "DSPy"
nav_order: 2
---
# Chapter 2: Signatures - Defining the Task # Chapter 2: Signatures - Defining the Task
In [Chapter 1: Modules and Programs](01_module___program.md), we learned that `Module`s are like Lego bricks that perform specific tasks, often using Language Models ([LM](05_lm__language_model_client_.md)). We saw how `Program`s combine these modules. In [Chapter 1: Modules and Programs](01_module___program.md), we learned that `Module`s are like Lego bricks that perform specific tasks, often using Language Models ([LM](05_lm__language_model_client_.md)). We saw how `Program`s combine these modules.

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "Example"
parent: "DSPy"
nav_order: 3
---
# Chapter 3: Example - Your Data Points # Chapter 3: Example - Your Data Points
In [Chapter 2: Signature](02_signature.md), we learned how to define the *task* for a DSPy module using `Signatures` specifying the inputs, outputs, and instructions. It's like writing a recipe card. In [Chapter 2: Signature](02_signature.md), we learned how to define the *task* for a DSPy module using `Signatures` specifying the inputs, outputs, and instructions. It's like writing a recipe card.

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "Predict"
parent: "DSPy"
nav_order: 4
---
# Chapter 4: Predict - The Basic LM Caller # Chapter 4: Predict - The Basic LM Caller
In [Chapter 3: Example](03_example.md), we learned how to create `dspy.Example` objects to represent our data points like flashcards holding an input and its corresponding desired output. We also saw in [Chapter 2: Signature](02_signature.md) how to define the *task* itself using `dspy.Signature`. In [Chapter 3: Example](03_example.md), we learned how to create `dspy.Example` objects to represent our data points like flashcards holding an input and its corresponding desired output. We also saw in [Chapter 2: Signature](02_signature.md) how to define the *task* itself using `dspy.Signature`.

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "LM (Language Model Client)"
parent: "DSPy"
nav_order: 5
---
# Chapter 5: LM (Language Model Client) - The Engine Room # Chapter 5: LM (Language Model Client) - The Engine Room
In [Chapter 4: Predict](04_predict.md), we saw how `dspy.Predict` takes a [Signature](02_signature.md) and input data to magically generate an output. We used our `translator` example: In [Chapter 4: Predict](04_predict.md), we saw how `dspy.Predict` takes a [Signature](02_signature.md) and input data to magically generate an output. We used our `translator` example:

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "RM (Retrieval Model Client)"
parent: "DSPy"
nav_order: 6
---
# Chapter 6: RM (Retrieval Model Client) - Your Program's Librarian # Chapter 6: RM (Retrieval Model Client) - Your Program's Librarian
In [Chapter 5: LM (Language Model Client)](05_lm__language_model_client_.md), we learned how to connect our DSPy programs to the powerful "brain" of a Language Model (LM) using the LM Client. The LM is great at generating creative text, answering questions based on its vast training data, and reasoning. In [Chapter 5: LM (Language Model Client)](05_lm__language_model_client_.md), we learned how to connect our DSPy programs to the powerful "brain" of a Language Model (LM) using the LM Client. The LM is great at generating creative text, answering questions based on its vast training data, and reasoning.

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "Evaluate"
parent: "DSPy"
nav_order: 7
---
# Chapter 7: Evaluate - Grading Your Program # Chapter 7: Evaluate - Grading Your Program
In the previous chapter, [Chapter 6: RM (Retrieval Model Client)](06_rm__retrieval_model_client_.md), we learned how to connect our DSPy program to external knowledge sources using Retrieval Models (RMs). We saw how combining RMs with Language Models (LMs) allows us to build sophisticated programs like Retrieval-Augmented Generation (RAG) systems. In the previous chapter, [Chapter 6: RM (Retrieval Model Client)](06_rm__retrieval_model_client_.md), we learned how to connect our DSPy program to external knowledge sources using Retrieval Models (RMs). We saw how combining RMs with Language Models (LMs) allows us to build sophisticated programs like Retrieval-Augmented Generation (RAG) systems.

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "Teleprompter & Optimizer"
parent: "DSPy"
nav_order: 8
---
# Chapter 8: Teleprompter / Optimizer - Your Program's Coach # Chapter 8: Teleprompter / Optimizer - Your Program's Coach
Welcome to Chapter 8! In [Chapter 7: Evaluate](07_evaluate.md), we learned how to grade our DSPy programs using metrics and datasets to see how well they perform. That's great for knowing our score, but what if the score isn't high enough? Welcome to Chapter 8! In [Chapter 7: Evaluate](07_evaluate.md), we learned how to grade our DSPy programs using metrics and datasets to see how well they perform. That's great for knowing our score, but what if the score isn't high enough?

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "Adapter"
parent: "DSPy"
nav_order: 9
---
# Chapter 9: Adapter - The Universal Translator # Chapter 9: Adapter - The Universal Translator
Welcome to Chapter 9! In [Chapter 8: Teleprompter / Optimizer](08_teleprompter___optimizer.md), we saw how DSPy can automatically optimize our programs by finding better prompts or few-shot examples. We ended up with a `compiled_program` that should perform better. Welcome to Chapter 9! In [Chapter 8: Teleprompter / Optimizer](08_teleprompter___optimizer.md), we saw how DSPy can automatically optimize our programs by finding better prompts or few-shot examples. We ended up with a `compiled_program` that should perform better.

View File

@@ -1,3 +1,10 @@
---
layout: default
title: "Settings"
parent: "DSPy"
nav_order: 10
---
# Chapter 10: Settings - Your Program's Control Panel # Chapter 10: Settings - Your Program's Control Panel
Welcome to the final chapter of our introductory DSPy tutorial! In [Chapter 9: Adapter](09_adapter.md), we saw how Adapters act as translators, allowing our DSPy programs to communicate seamlessly with different types of Language Models (LMs). Welcome to the final chapter of our introductory DSPy tutorial! In [Chapter 9: Adapter](09_adapter.md), we saw how Adapters act as translators, allowing our DSPy programs to communicate seamlessly with different types of Language Models (LMs).

View File

@@ -1,10 +1,10 @@
--- ---
layout: default layout: default
title: "Design" title: "System Design"
nav_order: 2 nav_order: 2
--- ---
# Design Doc: Your Project Name # System Design: Codebase Knowledge Builder
> Please DON'T remove notes for AI > Please DON'T remove notes for AI
@@ -13,6 +13,20 @@ nav_order: 2
> Notes for AI: Keep it simple and clear. > Notes for AI: Keep it simple and clear.
> If the requirements are abstract, write concrete user stories > If the requirements are abstract, write concrete user stories
**User Story:** As a developer onboarding to a new codebase, I want a tutorial automatically generated from its GitHub repository. This tutorial should explain the core abstractions, their relationships (visualized), and how they work together, using beginner-friendly language, analogies, and multi-line descriptions where needed, so I can understand the project structure and key concepts quickly without manually digging through all the code.
**Input:**
- A publicly accessible GitHub repository URL.
- A project name (optional, will be derived from the URL if not provided).
**Output:**
- A directory named after the project containing:
- An `index.md` file with:
- A high-level project summary.
- A Mermaid flowchart diagram visualizing relationships between abstractions.
- Textual descriptions of the relationships.
- An ordered list of links to chapter files.
- Individual Markdown files for each chapter (`01_chapter_one.md`, `02_chapter_two.md`, etc.) detailing core abstractions in a logical order.
## Flow Design ## Flow Design
@@ -22,37 +36,43 @@ nav_order: 2
### Applicable Design Pattern: ### Applicable Design Pattern:
1. Map the file summary into chunks, then reduce these chunks into a final summary. This project primarily uses a **Workflow** pattern to decompose the tutorial generation process into sequential steps. The chapter writing step utilizes a **BatchNode** (a form of MapReduce) to process each abstraction individually.
2. Agentic file finder
- *Context*: The entire summary of the file 1. **Workflow:** The overall process follows a defined sequence: fetch code -> identify abstractions -> analyze relationships -> determine order -> write chapters -> combine tutorial into files.
- *Action*: Find the file 2. **Batch Processing:** The `WriteChapters` node processes each identified abstraction independently (map) before the final tutorial files are structured (reduce).
### Flow high-level Design: ### Flow high-level Design:
1. **First Node**: This node is for ... 1. **`FetchRepo`**: Crawls the specified GitHub repository path using `crawl_github_files` utility, retrieving relevant source code file contents.
2. **Second Node**: This node is for ... 2. **`IdentifyAbstractions`**: Analyzes the codebase using an LLM to identify up to 10 core abstractions, generate beginner-friendly descriptions (allowing multi-line), and list the *indices* of files related to each abstraction.
3. **Third Node**: This node is for ... 3. **`AnalyzeRelationships`**: Uses an LLM to analyze the identified abstractions (referenced by index) and their related code to generate a high-level project summary and describe the relationships/interactions between these abstractions, specifying *source* and *target* abstraction indices and a concise label for each interaction.
4. **`OrderChapters`**: Determines the most logical order (as indices) to present the abstractions in the tutorial, likely based on importance or dependencies identified in the previous step.
5. **`WriteChapters` (BatchNode)**: Iterates through the ordered list of abstraction indices. For each abstraction, it calls an LLM to write a detailed, beginner-friendly chapter, using the relevant code files (accessed via indices) and summaries of previously generated chapters as context.
6. **`CombineTutorial`**: Creates an output directory, generates a Mermaid diagram from the relationship data, and writes the project summary, relationship diagram/details (in `index.md`), and individually generated chapters (as separate `.md` files, named and ordered according to `chapter_order`) into it.
```mermaid ```mermaid
flowchart TD flowchart TD
firstNode[First Node] --> secondNode[Second Node] A[FetchRepo] --> B[IdentifyAbstractions];
secondNode --> thirdNode[Third Node] B --> C[AnalyzeRelationships];
C --> D[OrderChapters];
D --> E[Batch WriteChapters];
E --> F[CombineTutorial];
``` ```
## Utility Functions ## Utility Functions
> Notes for AI: > Notes for AI:
> 1. Understand the utility function definition thoroughly by reviewing the doc. > 1. Understand the utility function definition thoroughly by reviewing the doc.
> 2. Include only the necessary utility functions, based on nodes in the flow. > 2. Include only the necessary utility functions, based on nodes in the flow.
1. **Call LLM** (`utils/call_llm.py`) 1. **`crawl_github_files`** (`utils/crawl_github_files.py`) - *External Dependency: requests*
- *Input*: prompt (str) * *Input*: `repo_url` (str), `token` (str, optional), `max_file_size` (int, optional), `use_relative_paths` (bool, optional), `include_patterns` (set, optional), `exclude_patterns` (set, optional)
- *Output*: response (str) * *Output*: `dict` containing `files` (dict[str, str]) and `stats`.
- Generally used by most nodes for LLM tasks * *Necessity*: Required by `FetchRepo` to download and read the source code from GitHub. Handles cloning logic implicitly via API calls, filtering, and file reading.
2. **`call_llm`** (`utils/call_llm.py`) - *External Dependency: LLM Provider API (e.g., OpenAI, Anthropic)*
2. **Embedding** (`utils/get_embedding.py`) * *Input*: `prompt` (str)
- *Input*: str * *Output*: `response` (str)
- *Output*: a vector of 3072 floats * *Necessity*: Used by `IdentifyAbstractions`, `AnalyzeRelationships`, `OrderChapters`, and `WriteChapters` for code analysis and content generation. Needs careful prompt engineering and YAML validation (implicit via `yaml.safe_load` which raises errors).
- Used by the second node to embed text
## Node Design ## Node Design
@@ -64,22 +84,70 @@ The shared memory structure is organized as follows:
```python ```python
shared = { shared = {
"key": "value" "repo_url": None, # Input: Provided by the user/main script
"project_name": None, # Input: Optional, derived from repo_url if not provided
"github_token": None, # Input: Optional, from environment or config
"files": [], # Output of FetchRepo: List of tuples (file_path: str, file_content: str)
"abstractions": [], # Output of IdentifyAbstractions: List of {"name": str, "description": str (can be multi-line), "files": [int]} (indices into shared["files"])
"relationships": { # Output of AnalyzeRelationships
"summary": None, # Overall project summary (can be multi-line)
"details": [] # List of {"from": int, "to": int, "label": str} describing relationships between abstraction indices with a concise label.
},
"chapter_order": [], # Output of OrderChapters: List of indices into shared["abstractions"], determining tutorial order
"chapters": [], # Output of WriteChapters: List of chapter content strings (Markdown), ordered according to chapter_order
"output_dir": "output", # Input/Default: Base directory for output
"final_output_dir": None # Output of CombineTutorial: Path to the final generated tutorial directory (e.g., "output/my_project")
} }
``` ```
### Node Steps ### Node Steps
> Notes for AI: Carefully decide whether to use Batch/Async Node/Flow. > Notes for AI: Carefully decide whether to use Batch/Async Node/Flow. Removed explicit try/except in exec, relying on Node's built-in fault tolerance.
1. First Node 1. **`FetchRepo`**
- *Purpose*: Provide a short explanation of the nodes function * *Purpose*: Download the repository code and load relevant files into memory using the crawler utility.
- *Type*: Decide between Regular, Batch, or Async * *Type*: Regular
- *Steps*: * *Steps*:
- *prep*: Read "key" from the shared store * `prep`: Read `repo_url`, optional `github_token`, `output_dir` from shared store. Define `include_patterns` (e.g., `{"*.py", "*.js", "*.md"}`) and `exclude_patterns` (e.g., `{"*test*", "docs/*"}`). Set `max_file_size` and `use_relative_paths` flags. Determine `project_name` from `repo_url` if not present in shared.
- *exec*: Call the utility function * `exec`: Call `crawl_github_files(shared["repo_url"], token=shared["github_token"], include_patterns=..., exclude_patterns=..., max_file_size=..., use_relative_paths=True)`. Convert the resulting `files` dictionary into a list of `(path, content)` tuples.
- *post*: Write "key" to the shared store * `post`: Write the list of `files` tuples and the derived `project_name` (if applicable) to the shared store.
2. Second Node 2. **`IdentifyAbstractions`**
... * *Purpose*: Analyze the code to identify key concepts/abstractions using indices.
* *Type*: Regular
* *Steps*:
* `prep`: Read `files` (list of tuples) from shared store. Create context using `create_llm_context` helper which adds file indices. Format the list of `index # path` for the prompt.
* `exec`: Construct a prompt for `call_llm` asking it to identify ~5-10 core abstractions, provide a simple description (allowing multi-line YAML string) for each, and list the relevant *file indices* (e.g., `- 0 # path/to/file.py`). Request YAML list output. Parse and validate the YAML, ensuring indices are within bounds and converting entries like `0 # path...` to just the integer `0`.
* `post`: Write the validated list of `abstractions` (e.g., `[{"name": "Node", "description": "...", "files": [0, 3, 5]}, ...]`) containing file *indices* to the shared store.
3. **`AnalyzeRelationships`**
* *Purpose*: Generate a project summary and describe how the identified abstractions interact using indices and concise labels.
* *Type*: Regular
* *Steps*:
* `prep`: Read `abstractions` and `files` from shared store. Format context for the LLM, including abstraction names *and indices*, descriptions, and content snippets from related files (referenced by `index # path` using `get_content_for_indices` helper). Prepare the list of `index # AbstractionName` for the prompt.
* `exec`: Construct a prompt for `call_llm` asking for (1) a high-level summary (allowing multi-line YAML string) and (2) a list of relationships, each specifying `from_abstraction` (e.g., `0 # Abstraction1`), `to_abstraction` (e.g., `1 # Abstraction2`), and a concise `label` (string, just a few words). Request structured YAML output. Parse and validate, converting referenced abstractions to indices (`from: 0, to: 1`).
* `post`: Parse the LLM response and write the `relationships` dictionary (`{"summary": "...", "details": [{"from": 0, "to": 1, "label": "..."}, ...]}`) with indices to the shared store.
4. **`OrderChapters`**
* *Purpose*: Determine the sequence (as indices) in which abstractions should be presented.
* *Type*: Regular
* *Steps*:
* `prep`: Read `abstractions` and `relationships` from the shared store. Prepare context including the list of `index # AbstractionName` and textual descriptions of relationships referencing indices and using the concise `label`.
* `exec`: Construct a prompt for `call_llm` asking it to order the abstractions based on importance, foundational concepts, or dependencies. Request output as an ordered YAML list of `index # AbstractionName`. Parse and validate, extracting only the indices and ensuring all are present exactly once.
* `post`: Write the validated ordered list of indices (`chapter_order`) to the shared store.
5. **`WriteChapters`**
* *Purpose*: Generate the detailed content for each chapter of the tutorial.
* *Type*: **BatchNode**
* *Steps*:
* `prep`: Read `chapter_order` (list of indices), `abstractions`, and `files` from shared store. Initialize an empty instance variable `self.chapters_written_so_far`. Return an iterable list where each item corresponds to an *abstraction index* from `chapter_order`. Each item should contain chapter number, abstraction details, and a map of related file content (`{ "idx # path": content }` obtained via `get_content_for_indices`).
* `exec(item)`: Construct a prompt for `call_llm`. Ask it to write a beginner-friendly Markdown chapter about the current abstraction. Provide its description. Include a summary of previously written chapters (from `self.chapters_written_so_far`). Provide relevant code snippets (referenced by `index # path`). Add the generated chapter content to `self.chapters_written_so_far` for the next iteration's context. Return the chapter content.
* `post(shared, prep_res, exec_res_list)`: `exec_res_list` contains the generated chapter Markdown content strings, ordered correctly. Assign this list directly to `shared["chapters"]`. Clean up `self.chapters_written_so_far`.
6. **`CombineTutorial`**
* *Purpose*: Assemble the final tutorial files, including a Mermaid diagram using concise labels.
* *Type*: Regular
* *Steps*:
* `prep`: Read `project_name`, `relationships`, `chapter_order` (indices), `abstractions`, and `chapters` (list of content) from shared store. Generate a Mermaid `flowchart TD` string based on `relationships["details"]`, using indices to identify nodes and the concise `label` for edges. Construct the content for `index.md` (including summary, Mermaid diagram, textual relationship details using the `label`, and ordered links to chapters derived using `chapter_order` and `abstractions`). Define the output directory path (e.g., `./output_dir/project_name`). Prepare a list of `{ "filename": "01_...", "content": "..." }` for chapters.
* `exec`: Create the output directory. Write the generated `index.md` content. Iterate through the prepared chapter file list and write each chapter's content to its corresponding `.md` file in the output directory.
* `post`: Write the final `output_dir` path to `shared["final_output_dir"]`. Log completion.