Merge pull request #14 from sublimegame/feature/multi-language-support

Feature/multi language support
This commit is contained in:
Zachary Huang
2025-04-16 13:51:11 -04:00
committed by GitHub
5 changed files with 265 additions and 178 deletions

5
.gitignore vendored
View File

@@ -96,4 +96,7 @@ coverage/
.node_repl_history
# LLM cache
llm_cache.json
llm_cache.json
# Output files
output/

View File

@@ -5,7 +5,7 @@
> *Ever stared at a new codebase written by others feeling completely lost? This tutorial shows you how to build an AI agent that analyzes GitHub repositories and creates beginner-friendly tutorials explaining exactly how the code works.*
<p align="center">
<img
<img
src="./assets/banner.png" width="800"
/>
</p>
@@ -19,7 +19,7 @@ This is a tutorial project of [Pocket Flow](https://github.com/The-Pocket/Pocket
## ⭐ Example Results for Popular GitHub Repositories!
<p align="center">
<img
<img
src="./assets/example.png" width="600"
/>
</p>
@@ -67,7 +67,7 @@ This is a tutorial project of [Pocket Flow](https://github.com/The-Pocket/Pocket
1. Clone this repository
2. Install dependencies:
2. Install dependencies:
```bash
pip install -r requirements.txt
```
@@ -79,20 +79,24 @@ This is a tutorial project of [Pocket Flow](https://github.com/The-Pocket/Pocket
api_key=os.getenv("GEMINI_API_KEY", "your-api_key"),
)
```
You can use your own models. We highly recommend the latest models with thinking capabilities (Claude 3.7 with thinking, O1). You can verify that it is correctly set up by running:
```bash
python utils/call_llm.py
```
7. Generate a complete codebase tutorial by running the main script:
4. Generate a complete codebase tutorial by running the main script:
```bash
# Analyze a GitHub repository
python main.py --repo https://github.com/username/repo --include "*.py" "*.js" --exclude "tests/*" --max-size 50000
# Or, analyze a local directory
python main.py --dir /path/to/your/codebase --include "*.py" --exclude "*test*"
# Or, generate a tutorial in Chinese
python main.py --repo https://github.com/username/repo --language "Chinese"
```
- `--repo` or `--dir` - Specify either a GitHub repo URL or a local directory path (required, mutually exclusive)
- `-n, --name` - Project name (optional, derived from URL/directory if omitted)
- `-t, --token` - GitHub token (or set GITHUB_TOKEN environment variable)
@@ -100,17 +104,17 @@ This is a tutorial project of [Pocket Flow](https://github.com/The-Pocket/Pocket
- `-i, --include` - Files to include (e.g., "*.py" "*.js")
- `-e, --exclude` - Files to exclude (e.g., "tests/*" "docs/*")
- `-s, --max-size` - Maximum file size in bytes (default: 100KB)
The application will crawl the repository, analyze the codebase structure, generate tutorial content, and save the output in the specified directory (default: ./output).
- `--language` - Language for the generated tutorial (default: "english")
The application will crawl the repository, analyze the codebase structure, generate tutorial content in the specified language, and save the output in the specified directory (default: ./output).
## 💡 Development Tutorial
- I built using [**Agentic Coding**](https://zacharyhuang.substack.com/p/agentic-coding-the-most-fun-way-to), the fastest development paradigm, where humans simply [design](docs/design.md) and agents [code](flow.py).
- The secret weapon is [Pocket Flow](https://github.com/The-Pocket/PocketFlow), a 100-line LLM framework that lets Agents (e.g., Cursor AI) build for you
- Check out the Step-by-step YouTube development tutorial:
- Check out the Step-by-step YouTube development tutorial:
<br>
<div align="center">

View File

@@ -13,20 +13,20 @@ nav_order: 2
> Notes for AI: Keep it simple and clear.
> If the requirements are abstract, write concrete user stories
**User Story:** As a developer onboarding to a new codebase, I want a tutorial automatically generated from its GitHub repository. This tutorial should explain the core abstractions, their relationships (visualized), and how they work together, using beginner-friendly language, analogies, and multi-line descriptions where needed, so I can understand the project structure and key concepts quickly without manually digging through all the code.
**User Story:** As a developer onboarding to a new codebase, I want a tutorial automatically generated from its GitHub repository or local directory, optionally in a specific language. This tutorial should explain the core abstractions, their relationships (visualized), and how they work together, using beginner-friendly language, analogies, and multi-line descriptions where needed, so I can understand the project structure and key concepts quickly without manually digging through all the code.
**Input:**
- A publicly accessible GitHub repository URL.
- A project name (optional, will be derived from the URL if not provided).
- A publicly accessible GitHub repository URL or a local directory path.
- A project name (optional, will be derived from the URL/directory if not provided).
- Desired language for the tutorial (optional, defaults to English).
**Output:**
- A directory named after the project containing:
- An `index.md` file with:
- A high-level project summary.
- A Mermaid flowchart diagram visualizing relationships between abstractions.
- Textual descriptions of the relationships.
- An ordered list of links to chapter files.
- Individual Markdown files for each chapter (`01_chapter_one.md`, `02_chapter_two.md`, etc.) detailing core abstractions in a logical order.
- A high-level project summary (potentially translated).
- A Mermaid flowchart diagram visualizing relationships between abstractions (using potentially translated names/labels).
- An ordered list of links to chapter files (using potentially translated names).
- Individual Markdown files for each chapter (`01_chapter_one.md`, `02_chapter_two.md`, etc.) detailing core abstractions in a logical order (potentially translated content).
## Flow Design
@@ -43,12 +43,12 @@ This project primarily uses a **Workflow** pattern to decompose the tutorial gen
### Flow high-level Design:
1. **`FetchRepo`**: Crawls the specified GitHub repository path using `crawl_github_files` utility, retrieving relevant source code file contents.
2. **`IdentifyAbstractions`**: Analyzes the codebase using an LLM to identify up to 10 core abstractions, generate beginner-friendly descriptions (allowing multi-line), and list the *indices* of files related to each abstraction.
3. **`AnalyzeRelationships`**: Uses an LLM to analyze the identified abstractions (referenced by index) and their related code to generate a high-level project summary and describe the relationships/interactions between these abstractions, specifying *source* and *target* abstraction indices and a concise label for each interaction.
4. **`OrderChapters`**: Determines the most logical order (as indices) to present the abstractions in the tutorial, likely based on importance or dependencies identified in the previous step.
5. **`WriteChapters` (BatchNode)**: Iterates through the ordered list of abstraction indices. For each abstraction, it calls an LLM to write a detailed, beginner-friendly chapter, using the relevant code files (accessed via indices) and summaries of previously generated chapters as context.
6. **`CombineTutorial`**: Creates an output directory, generates a Mermaid diagram from the relationship data, and writes the project summary, relationship diagram/details (in `index.md`), and individually generated chapters (as separate `.md` files, named and ordered according to `chapter_order`) into it.
1. **`FetchRepo`**: Crawls the specified GitHub repository URL or local directory using appropriate utility (`crawl_github_files` or `crawl_local_files`), retrieving relevant source code file contents.
2. **`IdentifyAbstractions`**: Analyzes the codebase using an LLM to identify up to 10 core abstractions, generate beginner-friendly descriptions (potentially translated if language != English), and list the *indices* of files related to each abstraction.
3. **`AnalyzeRelationships`**: Uses an LLM to analyze the identified abstractions (referenced by index) and their related code to generate a high-level project summary and describe the relationships/interactions between these abstractions (summary and labels potentially translated if language != English), specifying *source* and *target* abstraction indices and a concise label for each interaction.
4. **`OrderChapters`**: Determines the most logical order (as indices) to present the abstractions in the tutorial, considering input context which might be translated. The output order itself is language-independent.
5. **`WriteChapters` (BatchNode)**: Iterates through the ordered list of abstraction indices. For each abstraction, it calls an LLM to write a detailed, beginner-friendly chapter (content potentially fully translated if language != English), using the relevant code files (accessed via indices) and summaries of previously generated chapters (potentially translated) as context.
6. **`CombineTutorial`**: Creates an output directory, generates a Mermaid diagram from the relationship data (using potentially translated names/labels), and writes the project summary (potentially translated), relationship diagram, chapter links (using potentially translated names), and individually generated chapter files (potentially translated content) into it. Fixed text like "Chapters", "Source Repository", and the attribution footer remain in English.
```mermaid
flowchart TD
@@ -65,16 +65,16 @@ flowchart TD
> 1. Understand the utility function definition thoroughly by reviewing the doc.
> 2. Include only the necessary utility functions, based on nodes in the flow.
1. **`crawl_github_files`** (`utils/crawl_github_files.py`) - *External Dependency: requests*
1. **`crawl_github_files`** (`utils/crawl_github_files.py`) - *External Dependency: requests, gitpython (optional for SSH)*
* *Input*: `repo_url` (str), `token` (str, optional), `max_file_size` (int, optional), `use_relative_paths` (bool, optional), `include_patterns` (set, optional), `exclude_patterns` (set, optional)
* *Output*: `dict` containing `files` (dict[str, str]) and `stats`.
* *Necessity*: Required by `FetchRepo` to download and read source code from GitHub if a `repo_url` is provided. Handles cloning logic implicitly via API calls, filtering, and file reading.
* *Necessity*: Required by `FetchRepo` to download and read source code from GitHub if a `repo_url` is provided. Handles API calls or SSH cloning, filtering, and file reading.
2. **`crawl_local_files`** (`utils/crawl_local_files.py`) - *External Dependency: None*
* *Input*: `directory` (str), `max_file_size` (int, optional), `use_relative_paths` (bool, optional), `include_patterns` (set, optional), `exclude_patterns` (set, optional)
* *Output*: `dict` containing `files` (dict[str, str]).
* *Necessity*: Required by `FetchRepo` to read source code from a local directory if a `local_dir` path is provided. Handles directory walking, filtering, and file reading.
3. **`call_llm`** (`utils/call_llm.py`) - *External Dependency: LLM Provider API (e.g., OpenAI, Anthropic)*
* *Input*: `prompt` (str)
3. **`call_llm`** (`utils/call_llm.py`) - *External Dependency: LLM Provider API (e.g., Google GenAI)*
* *Input*: `prompt` (str), `use_cache` (bool, optional)
* *Output*: `response` (str)
* *Necessity*: Used by `IdentifyAbstractions`, `AnalyzeRelationships`, `OrderChapters`, and `WriteChapters` for code analysis and content generation. Needs careful prompt engineering and YAML validation (implicit via `yaml.safe_load` which raises errors).
@@ -88,18 +88,26 @@ The shared Store structure is organized as follows:
```python
shared = {
"repo_url": None, # Input: Provided by the user/main script
"project_name": None, # Input: Optional, derived from repo_url if not provided
"github_token": None, # Input: Optional, from environment or config
# --- Inputs ---
"repo_url": None, # Provided by the user/main script if using GitHub
"local_dir": None, # Provided by the user/main script if using local directory
"project_name": None, # Optional, derived from repo_url/local_dir if not provided
"github_token": None, # Optional, from argument or environment variable
"output_dir": "output", # Default or user-specified base directory for output
"include_patterns": set(), # File patterns to include
"exclude_patterns": set(), # File patterns to exclude
"max_file_size": 100000, # Default or user-specified max file size
"language": "english", # Default or user-specified language for the tutorial
# --- Intermediate/Output Data ---
"files": [], # Output of FetchRepo: List of tuples (file_path: str, file_content: str)
"abstractions": [], # Output of IdentifyAbstractions: List of {"name": str, "description": str (can be multi-line), "files": [int]} (indices into shared["files"])
"abstractions": [], # Output of IdentifyAbstractions: List of {"name": str (potentially translated), "description": str (potentially translated), "files": [int]} (indices into shared["files"])
"relationships": { # Output of AnalyzeRelationships
"summary": None, # Overall project summary (can be multi-line)
"details": [] # List of {"from": int, "to": int, "label": str} describing relationships between abstraction indices with a concise label.
"summary": None, # Overall project summary (potentially translated)
"details": [] # List of {"from": int, "to": int, "label": str (potentially translated)} describing relationships between abstraction indices.
},
"chapter_order": [], # Output of OrderChapters: List of indices into shared["abstractions"], determining tutorial order
"chapters": [], # Output of WriteChapters: List of chapter content strings (Markdown), ordered according to chapter_order
"output_dir": "output", # Input/Default: Base directory for output
"chapters": [], # Output of WriteChapters: List of chapter content strings (Markdown, potentially translated), ordered according to chapter_order
"final_output_dir": None # Output of CombineTutorial: Path to the final generated tutorial directory (e.g., "output/my_project")
}
```
@@ -112,46 +120,46 @@ shared = {
* *Purpose*: Download the repository code (from GitHub) or read from a local directory, loading relevant files into memory using the appropriate crawler utility.
* *Type*: Regular
* *Steps*:
* `prep`: Read `repo_url` (if provided), `local_dir` (if provided), optional `github_token`, `output_dir` from shared store. Define `include_patterns` (e.g., `{"*.py", "*.js", "*.md"}`) and `exclude_patterns` (e.g., `{"*test*", "docs/*"}`). Set `max_file_size` and `use_relative_paths` flags. Determine `project_name` from `repo_url` or `local_dir` if not present in shared.
* `prep`: Read `repo_url`, `local_dir`, `project_name`, `github_token`, `output_dir`, `include_patterns`, `exclude_patterns`, `max_file_size` from shared store. Determine `project_name` from `repo_url` or `local_dir` if not present in shared. Set `use_relative_paths` flag.
* `exec`: If `repo_url` is present, call `crawl_github_files(...)`. Otherwise, call `crawl_local_files(...)`. Convert the resulting `files` dictionary into a list of `(path, content)` tuples.
* `post`: Write the list of `files` tuples and the derived `project_name` (if applicable) to the shared store.
2. **`IdentifyAbstractions`**
* *Purpose*: Analyze the code to identify key concepts/abstractions using indices.
* *Purpose*: Analyze the code to identify key concepts/abstractions using indices. Generates potentially translated names and descriptions if language is not English.
* *Type*: Regular
* *Steps*:
* `prep`: Read `files` (list of tuples) from shared store. Create context using `create_llm_context` helper which adds file indices. Format the list of `index # path` for the prompt.
* `exec`: Construct a prompt for `call_llm` asking it to identify ~5-10 core abstractions, provide a simple description (allowing multi-line YAML string) for each, and list the relevant *file indices* (e.g., `- 0 # path/to/file.py`). Request YAML list output. Parse and validate the YAML, ensuring indices are within bounds and converting entries like `0 # path...` to just the integer `0`.
* `post`: Write the validated list of `abstractions` (e.g., `[{"name": "Node", "description": "...", "files": [0, 3, 5]}, ...]`) containing file *indices* to the shared store.
* `prep`: Read `files` (list of tuples), `project_name`, and `language` from shared store. Create context using `create_llm_context` helper which adds file indices. Format the list of `index # path` for the prompt.
* `exec`: Construct a prompt for `call_llm`. If language is not English, add instructions to generate `name` and `description` in the target language. Ask LLM to identify ~5-10 core abstractions, provide a simple description for each, and list the relevant *file indices* (e.g., `- 0 # path/to/file.py`). Request YAML list output. Parse and validate the YAML, ensuring indices are within bounds and converting entries like `0 # path...` to just the integer `0`.
* `post`: Write the validated list of `abstractions` (e.g., `[{"name": "Node", "description": "...", "files": [0, 3, 5]}, ...]`) containing file *indices* and potentially translated `name`/`description` to the shared store.
3. **`AnalyzeRelationships`**
* *Purpose*: Generate a project summary and describe how the identified abstractions interact using indices and concise labels.
* *Purpose*: Generate a project summary and describe how the identified abstractions interact using indices and concise labels. Generates potentially translated summary and labels if language is not English.
* *Type*: Regular
* *Steps*:
* `prep`: Read `abstractions` and `files` from shared store. Format context for the LLM, including abstraction names *and indices*, descriptions, and content snippets from related files (referenced by `index # path` using `get_content_for_indices` helper). Prepare the list of `index # AbstractionName` for the prompt.
* `exec`: Construct a prompt for `call_llm` asking for (1) a high-level summary (allowing multi-line YAML string) and (2) a list of relationships, each specifying `from_abstraction` (e.g., `0 # Abstraction1`), `to_abstraction` (e.g., `1 # Abstraction2`), and a concise `label` (string, just a few words). Request structured YAML output. Parse and validate, converting referenced abstractions to indices (`from: 0, to: 1`).
* `post`: Parse the LLM response and write the `relationships` dictionary (`{"summary": "...", "details": [{"from": 0, "to": 1, "label": "..."}, ...]}`) with indices to the shared store.
* `prep`: Read `abstractions`, `files`, `project_name`, and `language` from shared store. Format context for the LLM, including potentially translated abstraction names *and indices*, potentially translated descriptions, and content snippets from related files (referenced by `index # path` using `get_content_for_indices` helper). Prepare the list of `index # AbstractionName` (potentially translated) for the prompt.
* `exec`: Construct a prompt for `call_llm`. If language is not English, add instructions to generate `summary` and `label` in the target language, and note that input names might be translated. Ask for (1) a high-level summary and (2) a list of relationships, each specifying `from_abstraction` (e.g., `0 # Abstraction1`), `to_abstraction` (e.g., `1 # Abstraction2`), and a concise `label`. Request structured YAML output. Parse and validate, converting referenced abstractions to indices (`from: 0, to: 1`).
* `post`: Parse the LLM response and write the `relationships` dictionary (`{"summary": "...", "details": [{"from": 0, "to": 1, "label": "..."}, ...]}`) with indices and potentially translated `summary`/`label` to the shared store.
4. **`OrderChapters`**
* *Purpose*: Determine the sequence (as indices) in which abstractions should be presented.
* *Purpose*: Determine the sequence (as indices) in which abstractions should be presented. Considers potentially translated input context.
* *Type*: Regular
* *Steps*:
* `prep`: Read `abstractions` and `relationships` from the shared store. Prepare context including the list of `index # AbstractionName` and textual descriptions of relationships referencing indices and using the concise `label`.
* `prep`: Read `abstractions`, `relationships`, `project_name`, and `language` from the shared store. Prepare context including the list of `index # AbstractionName` (potentially translated) and textual descriptions of relationships referencing indices and using the potentially translated `label`. Note in context if summary/names might be translated.
* `exec`: Construct a prompt for `call_llm` asking it to order the abstractions based on importance, foundational concepts, or dependencies. Request output as an ordered YAML list of `index # AbstractionName`. Parse and validate, extracting only the indices and ensuring all are present exactly once.
* `post`: Write the validated ordered list of indices (`chapter_order`) to the shared store.
5. **`WriteChapters`**
* *Purpose*: Generate the detailed content for each chapter of the tutorial.
* *Purpose*: Generate the detailed content for each chapter of the tutorial. Generates potentially fully translated chapter content if language is not English.
* *Type*: **BatchNode**
* *Steps*:
* `prep`: Read `chapter_order` (list of indices), `abstractions`, and `files` from shared store. Initialize an empty instance variable `self.chapters_written_so_far`. Return an iterable list where each item corresponds to an *abstraction index* from `chapter_order`. Each item should contain chapter number, abstraction details, and a map of related file content (`{ "idx # path": content }` obtained via `get_content_for_indices`).
* `exec(item)`: Construct a prompt for `call_llm`. Ask it to write a beginner-friendly Markdown chapter about the current abstraction. Provide its description. Include a summary of previously written chapters (from `self.chapters_written_so_far`). Provide relevant code snippets (referenced by `index # path`). Add the generated chapter content to `self.chapters_written_so_far` for the next iteration's context. Return the chapter content.
* `post(shared, prep_res, exec_res_list)`: `exec_res_list` contains the generated chapter Markdown content strings, ordered correctly. Assign this list directly to `shared["chapters"]`. Clean up `self.chapters_written_so_far`.
* `prep`: Read `chapter_order` (indices), `abstractions`, `files`, `project_name`, and `language` from shared store. Initialize an empty instance variable `self.chapters_written_so_far`. Return an iterable list where each item corresponds to an *abstraction index* from `chapter_order`. Each item should contain chapter number, potentially translated abstraction details, a map of related file content (`{ "idx # path": content }`), full chapter listing (potentially translated names), chapter filename map, previous/next chapter info (potentially translated names), and language.
* `exec(item)`: Construct a prompt for `call_llm`. If language is not English, add detailed instructions to write the *entire* chapter in the target language, translating explanations, examples, etc., while noting which input context might already be translated. Ask LLM to write a beginner-friendly Markdown chapter. Provide potentially translated concept details. Include a summary of previously written chapters (potentially translated). Provide relevant code snippets. Add the generated (potentially translated) chapter content to `self.chapters_written_so_far` for the next iteration's context. Return the chapter content.
* `post(shared, prep_res, exec_res_list)`: `exec_res_list` contains the generated chapter Markdown content strings (potentially translated), ordered correctly. Assign this list directly to `shared["chapters"]`. Clean up `self.chapters_written_so_far`.
6. **`CombineTutorial`**
* *Purpose*: Assemble the final tutorial files, including a Mermaid diagram using concise labels.
* *Purpose*: Assemble the final tutorial files, including a Mermaid diagram using potentially translated labels/names. Fixed text remains English.
* *Type*: Regular
* *Steps*:
* `prep`: Read `project_name`, `relationships`, `chapter_order` (indices), `abstractions`, and `chapters` (list of content) from shared store. Generate a Mermaid `flowchart TD` string based on `relationships["details"]`, using indices to identify nodes and the concise `label` for edges. Construct the content for `index.md` (including summary, Mermaid diagram, textual relationship details using the `label`, and ordered links to chapters derived using `chapter_order` and `abstractions`). Define the output directory path (e.g., `./output_dir/project_name`). Prepare a list of `{ "filename": "01_...", "content": "..." }` for chapters.
* `prep`: Read `project_name`, `relationships` (potentially translated summary/labels), `chapter_order` (indices), `abstractions` (potentially translated name/desc), `chapters` (list of potentially translated content), `repo_url`, and `output_dir` from shared store. Generate a Mermaid `flowchart TD` string based on `relationships["details"]`, using indices to identify nodes (potentially translated names) and the concise `label` (potentially translated) for edges. Construct the content for `index.md` (including potentially translated summary, Mermaid diagram, and ordered links to chapters using potentially translated names derived using `chapter_order` and `abstractions`). Define the output directory path (e.g., `./output_dir/project_name`). Prepare a list of `{ "filename": "01_...", "content": "..." }` for chapters, adding the English attribution footer to each chapter's content. Add the English attribution footer to the index content.
* `exec`: Create the output directory. Write the generated `index.md` content. Iterate through the prepared chapter file list and write each chapter's content to its corresponding `.md` file in the output directory.
* `post`: Write the final `output_dir` path to `shared["final_output_dir"]`. Log completion.
* `post`: Write the final `output_path` to `shared["final_output_dir"]`. Log completion.

24
main.py
View File

@@ -8,32 +8,34 @@ dotenv.load_dotenv()
# Default file patterns
DEFAULT_INCLUDE_PATTERNS = {
"*.py", "*.js", "*.jsx", "*.ts", "*.tsx", "*.go", "*.java", "*.pyi", "*.pyx",
"*.c", "*.cc", "*.cpp", "*.h", "*.md", "*.rst", "Dockerfile",
"*.py", "*.js", "*.jsx", "*.ts", "*.tsx", "*.go", "*.java", "*.pyi", "*.pyx",
"*.c", "*.cc", "*.cpp", "*.h", "*.md", "*.rst", "Dockerfile",
"Makefile", "*.yaml", "*.yml",
}
DEFAULT_EXCLUDE_PATTERNS = {
"*test*", "tests/*", "docs/*", "examples/*", "v1/*",
"dist/*", "build/*", "experimental/*", "deprecated/*",
"*test*", "tests/*", "docs/*", "examples/*", "v1/*",
"dist/*", "build/*", "experimental/*", "deprecated/*",
"legacy/*", ".git/*", ".github/*", ".next/*", ".vscode/*", "obj/*", "bin/*", "node_modules/*", "*.log"
}
# --- Main Function ---
def main():
parser = argparse.ArgumentParser(description="Generate a tutorial for a GitHub codebase or local directory.")
# Create mutually exclusive group for source
source_group = parser.add_mutually_exclusive_group(required=True)
source_group.add_argument("--repo", help="URL of the public GitHub repository.")
source_group.add_argument("--dir", help="Path to local directory.")
parser.add_argument("-n", "--name", help="Project name (optional, derived from repo/directory if omitted).")
parser.add_argument("-t", "--token", help="GitHub personal access token (optional, reads from GITHUB_TOKEN env var if not provided).")
parser.add_argument("-o", "--output", default="output", help="Base directory for output (default: ./output).")
parser.add_argument("-i", "--include", nargs="+", help="Include file patterns (e.g. '*.py' '*.js'). Defaults to common code files if not specified.")
parser.add_argument("-e", "--exclude", nargs="+", help="Exclude file patterns (e.g. 'tests/*' 'docs/*'). Defaults to test/build directories if not specified.")
parser.add_argument("-s", "--max-size", type=int, default=100000, help="Maximum file size in bytes (default: 100000, about 100KB).")
# Add language parameter for multi-language support
parser.add_argument("--language", default="english", help="Language for the generated tutorial (default: english)")
args = parser.parse_args()
@@ -57,6 +59,9 @@ def main():
"exclude_patterns": set(args.exclude) if args.exclude else DEFAULT_EXCLUDE_PATTERNS,
"max_file_size": args.max_size,
# Add language for multi-language support
"language": args.language,
# Outputs will be populated by the nodes
"files": [],
"abstractions": [],
@@ -66,13 +71,14 @@ def main():
"final_output_dir": None
}
print(f"Starting tutorial generation for: {args.repo or args.dir}")
# Display starting message with repository/directory and language
print(f"Starting tutorial generation for: {args.repo or args.dir} in {args.language.capitalize()} language")
# Create the flow instance
tutorial_flow = create_tutorial_flow()
# Run the flow
tutorial_flow.run(shared)
if __name__ == "__main__":
main()
main()

296
nodes.py
View File

@@ -2,7 +2,7 @@ import os
import yaml
from pocketflow import Node, BatchNode
from utils.crawl_github_files import crawl_github_files
from utils.call_llm import call_llm
from utils.call_llm import call_llm
from utils.crawl_local_files import crawl_local_files
# Helper to get content for specific file indices
@@ -19,7 +19,7 @@ class FetchRepo(Node):
repo_url = shared.get("repo_url")
local_dir = shared.get("local_dir")
project_name = shared.get("project_name")
if not project_name:
# Basic name derivation from URL or directory
if repo_url:
@@ -63,7 +63,7 @@ class FetchRepo(Node):
max_file_size=prep_res["max_file_size"],
use_relative_paths=prep_res["use_relative_paths"]
)
# Convert dict to list of tuples: [(path, content), ...]
files_list = list(result.get("files", {}).items())
print(f"Fetched {len(files_list)} files.")
@@ -76,7 +76,8 @@ class IdentifyAbstractions(Node):
def prep(self, shared):
files_data = shared["files"]
project_name = shared["project_name"] # Get project name
language = shared.get("language", "english") # Get language
# Helper to create context from files, respecting limits (basic example)
def create_llm_context(files_data):
context = ""
@@ -91,23 +92,34 @@ class IdentifyAbstractions(Node):
context, file_info = create_llm_context(files_data)
# Format file info for the prompt (comment is just a hint for LLM)
file_listing_for_prompt = "\n".join([f"- {idx} # {path}" for idx, path in file_info])
return context, file_listing_for_prompt, len(files_data), project_name # Return project name
return context, file_listing_for_prompt, len(files_data), project_name, language # Return language
def exec(self, prep_res):
context, file_listing_for_prompt, file_count, project_name = prep_res # Unpack project name
print("Identifying abstractions using LLM...")
context, file_listing_for_prompt, file_count, project_name, language = prep_res # Unpack project name and language
print(f"Identifying abstractions using LLM...")
# Add language instruction and hints only if not English
language_instruction = ""
name_lang_hint = ""
desc_lang_hint = ""
if language.lower() != "english":
language_instruction = f"IMPORTANT: Generate the `name` and `description` for each abstraction in **{language.capitalize()}** language. Do NOT use English for these fields.\n\n"
# Keep specific hints here as name/description are primary targets
name_lang_hint = f" (value in {language.capitalize()})"
desc_lang_hint = f" (value in {language.capitalize()})"
prompt = f"""
For the project `{project_name}`:
Codebase Context:
{context}
Analyze the codebase context.
{language_instruction}Analyze the codebase context.
Identify the top 5-10 core most important abstractions to help those new to the codebase.
For each abstraction, provide:
1. A concise `name`.
2. A beginner-friendly `description` explaining what it is with a simple analogy, in around 100 words.
1. A concise `name`{name_lang_hint}.
2. A beginner-friendly `description` explaining what it is with a simple analogy, in around 100 words{desc_lang_hint}.
3. A list of relevant `file_indices` (integers) using the format `idx # path/comment`.
List of file indices and paths present in the context:
@@ -116,16 +128,16 @@ List of file indices and paths present in the context:
Format the output as a YAML list of dictionaries:
```yaml
- name: Query Processing
description: |
- name: Query Processing{name_lang_hint}
description: |
Explains what the abstraction does.
It's like a central dispatcher routing requests.
It's like a central dispatcher routing requests.{desc_lang_hint}
file_indices:
- 0 # path/to/file1.py
- 3 # path/to/related.py
- name: Query Optimization
- name: Query Optimization{name_lang_hint}
description: |
Another core concept, similar to a blueprint for objects.
Another core concept, similar to a blueprint for objects.{desc_lang_hint}
file_indices:
- 5 # path/to/another.js
# ... up to 10 abstractions
@@ -143,8 +155,10 @@ Format the output as a YAML list of dictionaries:
for item in abstractions:
if not isinstance(item, dict) or not all(k in item for k in ["name", "description", "file_indices"]):
raise ValueError(f"Missing keys in abstraction item: {item}")
if not isinstance(item["name"], str):
raise ValueError(f"Name is not a string in item: {item}")
if not isinstance(item["description"], str):
raise ValueError(f"description is not a string in item: {item}")
raise ValueError(f"Description is not a string in item: {item}")
if not isinstance(item["file_indices"], list):
raise ValueError(f"file_indices is not a list in item: {item}")
@@ -168,8 +182,8 @@ Format the output as a YAML list of dictionaries:
item["files"] = sorted(list(set(validated_indices)))
# Store only the required fields
validated_abstractions.append({
"name": item["name"],
"description": item["description"],
"name": item["name"], # Potentially translated name
"description": item["description"], # Potentially translated description
"files": item["files"]
})
@@ -181,9 +195,10 @@ Format the output as a YAML list of dictionaries:
class AnalyzeRelationships(Node):
def prep(self, shared):
abstractions = shared["abstractions"] # Now contains 'files' list of indices
abstractions = shared["abstractions"] # Now contains 'files' list of indices, name/description potentially translated
files_data = shared["files"]
project_name = shared["project_name"] # Get project name
language = shared.get("language", "english") # Get language
# Create context with abstraction names, indices, descriptions, and relevant file snippets
context = "Identified Abstractions:\n"
@@ -192,9 +207,10 @@ class AnalyzeRelationships(Node):
for i, abstr in enumerate(abstractions):
# Use 'files' which contains indices directly
file_indices_str = ", ".join(map(str, abstr['files']))
# Abstraction name and description might be translated already
info_line = f"- Index {i}: {abstr['name']} (Relevant file indices: [{file_indices_str}])\n Description: {abstr['description']}"
context += info_line + "\n"
abstraction_info_for_prompt.append(f"{i} # {abstr['name']}")
abstraction_info_for_prompt.append(f"{i} # {abstr['name']}") # Use potentially translated name here too
all_relevant_indices.update(abstr['files'])
context += "\nRelevant File Snippets (Referenced by Index and Path):\n"
@@ -210,26 +226,36 @@ class AnalyzeRelationships(Node):
)
context += file_context_str
return context, "\n".join(abstraction_info_for_prompt), project_name # Return project name
return context, "\n".join(abstraction_info_for_prompt), project_name, language # Return language
def exec(self, prep_res):
context, abstraction_listing, project_name = prep_res # Unpack project name
print("Analyzing relationships using LLM...")
context, abstraction_listing, project_name, language = prep_res # Unpack project name and language
print(f"Analyzing relationships using LLM...")
# Add language instruction and hints only if not English
language_instruction = ""
lang_hint = ""
list_lang_note = ""
if language.lower() != "english":
language_instruction = f"IMPORTANT: Generate the `summary` and relationship `label` fields in **{language.capitalize()}** language. Do NOT use English for these fields.\n\n"
lang_hint = f" (in {language.capitalize()})"
list_lang_note = f" (Names might be in {language.capitalize()})" # Note for the input list
prompt = f"""
Based on the following abstractions and relevant code snippets from the project `{project_name}`:
List of Abstraction Indices and Names:
List of Abstraction Indices and Names{list_lang_note}:
{abstraction_listing}
Context (Abstractions, Descriptions, Code):
{context}
Please provide:
1. A high-level `summary` of the project's main purpose and functionality in a few beginner-friendly sentences. Use markdown formatting with **bold** and *italic* text to highlight important concepts.
{language_instruction}Please provide:
1. A high-level `summary` of the project's main purpose and functionality in a few beginner-friendly sentences{lang_hint}. Use markdown formatting with **bold** and *italic* text to highlight important concepts.
2. A list (`relationships`) describing the key interactions between these abstractions. For each relationship, specify:
- `from_abstraction`: Index of the source abstraction (e.g., `0 # AbstractionName1`)
- `to_abstraction`: Index of the target abstraction (e.g., `1 # AbstractionName2`)
- `label`: A brief label for the interaction **in just a few words** (e.g., "Manages", "Inherits", "Uses").
- `label`: A brief label for the interaction **in just a few words**{lang_hint} (e.g., "Manages", "Inherits", "Uses").
Ideally the relationship should be backed by one abstraction calling or passing parameters to another.
Simplify the relationship and exclude those non-important ones.
@@ -239,15 +265,15 @@ Format the output as YAML:
```yaml
summary: |
A brief, simple explanation of the project.
A brief, simple explanation of the project{lang_hint}.
Can span multiple lines with **bold** and *italic* for emphasis.
relationships:
- from_abstraction: 0 # AbstractionName1
to_abstraction: 1 # AbstractionName2
label: "Manages"
label: "Manages"{lang_hint}
- from_abstraction: 2 # AbstractionName3
to_abstraction: 0 # AbstractionName1
label: "Provides config"
label: "Provides config"{lang_hint}
# ... other relationships
```
@@ -286,51 +312,64 @@ Now, provide the YAML output:
validated_relationships.append({
"from": from_idx,
"to": to_idx,
"label": rel["label"]
"label": rel["label"] # Potentially translated label
})
except (ValueError, TypeError):
raise ValueError(f"Could not parse indices from relationship: {rel}")
print("Generated project summary and relationship details.")
return {
"summary": relationships_data["summary"],
"details": validated_relationships # Store validated, index-based relationships
"summary": relationships_data["summary"], # Potentially translated summary
"details": validated_relationships # Store validated, index-based relationships with potentially translated labels
}
def post(self, shared, prep_res, exec_res):
# Structure is now {"summary": str, "details": [{"from": int, "to": int, "label": str}]}
# Summary and label might be translated
shared["relationships"] = exec_res
class OrderChapters(Node):
def prep(self, shared):
abstractions = shared["abstractions"]
relationships = shared["relationships"]
abstractions = shared["abstractions"] # Name/description might be translated
relationships = shared["relationships"] # Summary/label might be translated
project_name = shared["project_name"] # Get project name
language = shared.get("language", "english") # Get language
# Prepare context for the LLM
abstraction_info_for_prompt = []
for i, a in enumerate(abstractions):
abstraction_info_for_prompt.append(f"- {i} # {a['name']}")
abstraction_info_for_prompt.append(f"- {i} # {a['name']}") # Use potentially translated name
abstraction_listing = "\n".join(abstraction_info_for_prompt)
context = f"Project Summary:\n{relationships['summary']}\n\n"
# Use potentially translated summary and labels
summary_note = ""
if language.lower() != "english":
summary_note = f" (Note: Project Summary might be in {language.capitalize()})"
context = f"Project Summary{summary_note}:\n{relationships['summary']}\n\n"
context += "Relationships (Indices refer to abstractions above):\n"
for rel in relationships['details']:
from_name = abstractions[rel['from']]['name']
to_name = abstractions[rel['to']]['name']
# Use 'label' instead of 'desc'
context += f"- From {rel['from']} ({from_name}) to {rel['to']} ({to_name}): {rel['label']}\n"
# Use potentially translated 'label'
context += f"- From {rel['from']} ({from_name}) to {rel['to']} ({to_name}): {rel['label']}\n" # Label might be translated
return abstraction_listing, context, len(abstractions), project_name
list_lang_note = ""
if language.lower() != "english":
list_lang_note = f" (Names might be in {language.capitalize()})"
return abstraction_listing, context, len(abstractions), project_name, list_lang_note
def exec(self, prep_res):
abstraction_listing, context, num_abstractions, project_name = prep_res
abstraction_listing, context, num_abstractions, project_name, list_lang_note = prep_res
print("Determining chapter order using LLM...")
# No language variation needed here in prompt instructions, just ordering based on structure
# The input names might be translated, hence the note.
prompt = f"""
Given the following project abstractions and their relationships for the project ```` {project_name} ````:
Abstractions (Index # Name):
Abstractions (Index # Name){list_lang_note}:
{abstraction_listing}
Context about relationships and project summary:
@@ -353,7 +392,6 @@ Now, provide the YAML output:
response = call_llm(prompt)
# --- Validation ---
# Rely on Node's built-in retry/fallback
yaml_str = response.strip().split("```yaml")[1].split("```")[0].strip()
ordered_indices_raw = yaml.safe_load(yaml_str)
@@ -395,8 +433,10 @@ Now, provide the YAML output:
class WriteChapters(BatchNode):
def prep(self, shared):
chapter_order = shared["chapter_order"] # List of indices
abstractions = shared["abstractions"] # List of dicts, now using 'files' with indices
abstractions = shared["abstractions"] # List of dicts, name/desc potentially translated
files_data = shared["files"]
language = shared.get("language", "english") # Get language
# Get already written chapters to provide context
# We store them temporarily during the batch run, not in shared memory yet
# The 'previous_chapters_summary' will be built progressively in the exec context
@@ -408,34 +448,34 @@ class WriteChapters(BatchNode):
for i, abstraction_index in enumerate(chapter_order):
if 0 <= abstraction_index < len(abstractions):
chapter_num = i + 1
chapter_name = abstractions[abstraction_index]["name"]
# Create safe filename
chapter_name = abstractions[abstraction_index]["name"] # Potentially translated name
# Create safe filename (from potentially translated name)
safe_name = "".join(c if c.isalnum() else '_' for c in chapter_name).lower()
filename = f"{i+1:02d}_{safe_name}.md"
# Format with link
# Format with link (using potentially translated name)
all_chapters.append(f"{chapter_num}. [{chapter_name}]({filename})")
# Store mapping of chapter index to filename for linking
chapter_filenames[abstraction_index] = {"num": chapter_num, "name": chapter_name, "filename": filename}
# Create a formatted string with all chapters
full_chapter_listing = "\n".join(all_chapters)
items_to_process = []
for i, abstraction_index in enumerate(chapter_order):
if 0 <= abstraction_index < len(abstractions):
abstraction_details = abstractions[abstraction_index]
abstraction_details = abstractions[abstraction_index] # Contains potentially translated name/desc
# Use 'files' (list of indices) directly
related_file_indices = abstraction_details.get("files", [])
# Get content using helper, passing indices
related_files_content_map = get_content_for_indices(files_data, related_file_indices)
# Get previous chapter info for transitions
# Get previous chapter info for transitions (uses potentially translated name)
prev_chapter = None
if i > 0:
prev_idx = chapter_order[i-1]
prev_chapter = chapter_filenames[prev_idx]
# Get next chapter info for transitions
# Get next chapter info for transitions (uses potentially translated name)
next_chapter = None
if i < len(chapter_order) - 1:
next_idx = chapter_order[i+1]
@@ -444,13 +484,14 @@ class WriteChapters(BatchNode):
items_to_process.append({
"chapter_num": i + 1,
"abstraction_index": abstraction_index,
"abstraction_details": abstraction_details,
"abstraction_details": abstraction_details, # Has potentially translated name/desc
"related_files_content_map": related_files_content_map,
"project_name": shared["project_name"], # Add project name
"full_chapter_listing": full_chapter_listing, # Add the full chapter listing
"chapter_filenames": chapter_filenames, # Add chapter filenames mapping
"prev_chapter": prev_chapter, # Add previous chapter info
"next_chapter": next_chapter, # Add next chapter info
"full_chapter_listing": full_chapter_listing, # Add the full chapter listing (uses potentially translated names)
"chapter_filenames": chapter_filenames, # Add chapter filenames mapping (uses potentially translated names)
"prev_chapter": prev_chapter, # Add previous chapter info (uses potentially translated name)
"next_chapter": next_chapter, # Add next chapter info (uses potentially translated name)
"language": language, # Add language for multi-language support
# previous_chapters_summary will be added dynamically in exec
})
else:
@@ -461,9 +502,11 @@ class WriteChapters(BatchNode):
def exec(self, item):
# This runs for each item prepared above
abstraction_name = item["abstraction_details"]["name"]
abstraction_name = item["abstraction_details"]["name"] # Potentially translated name
abstraction_description = item["abstraction_details"]["description"] # Potentially translated description
chapter_num = item["chapter_num"]
project_name = item.get("project_name") # Get from item
project_name = item.get("project_name")
language = item.get("language", "english")
print(f"Writing chapter {chapter_num} for: {abstraction_name} using LLM...")
# Prepare file context string from the map
@@ -476,50 +519,72 @@ class WriteChapters(BatchNode):
# Use the temporary instance variable
previous_chapters_summary = "\n---\n".join(self.chapters_written_so_far)
# Add language instruction and context notes only if not English
language_instruction = ""
concept_details_note = ""
structure_note = ""
prev_summary_note = ""
instruction_lang_note = ""
mermaid_lang_note = ""
code_comment_note = ""
link_lang_note = ""
tone_note = ""
if language.lower() != "english":
lang_cap = language.capitalize()
language_instruction = f"IMPORTANT: Write this ENTIRE tutorial chapter in **{lang_cap}**. Some input context (like concept name, description, chapter list, previous summary) might already be in {lang_cap}, but you MUST translate ALL other generated content including explanations, examples, technical terms, and potentially code comments into {lang_cap}. DO NOT use English anywhere except in code syntax, required proper nouns, or when specified. The entire output MUST be in {lang_cap}.\n\n"
concept_details_note = f" (Note: Provided in {lang_cap})"
structure_note = f" (Note: Chapter names might be in {lang_cap})"
prev_summary_note = f" (Note: This summary might be in {lang_cap})"
instruction_lang_note = f" (in {lang_cap})"
mermaid_lang_note = f" (Use {lang_cap} for labels/text if appropriate)"
code_comment_note = f" (Translate to {lang_cap} if possible, otherwise keep minimal English for clarity)"
link_lang_note = f" (Use the {lang_cap} chapter title from the structure above)"
tone_note = f" (appropriate for {lang_cap} readers)"
prompt = f"""
Write a very beginner-friendly tutorial chapter (in Markdown format) for the project `{project_name}` about the concept: "{abstraction_name}". This is Chapter {chapter_num}.
{language_instruction}Write a very beginner-friendly tutorial chapter (in Markdown format) for the project `{project_name}` about the concept: "{abstraction_name}". This is Chapter {chapter_num}.
Concept Details:
Concept Details{concept_details_note}:
- Name: {abstraction_name}
- Description:
{item["abstraction_details"]["description"]}
{abstraction_description}
Complete Tutorial Structure:
Complete Tutorial Structure{structure_note}:
{item["full_chapter_listing"]}
Context from previous chapters (summary):
Context from previous chapters{prev_summary_note}:
{previous_chapters_summary if previous_chapters_summary else "This is the first chapter."}
Relevant Code Snippets:
Relevant Code Snippets (Code itself remains unchanged):
{file_context_str if file_context_str else "No specific code snippets provided for this abstraction."}
Instructions for the chapter:
- Start with a clear heading (e.g., `# Chapter {chapter_num}: {abstraction_name}`).
Instructions for the chapter (Generate content in {language.capitalize()} unless specified otherwise):
- Start with a clear heading (e.g., `# Chapter {chapter_num}: {abstraction_name}`). Use the provided concept name.
- If this is not the first chapter, begin with a brief transition from the previous chapter, referencing it with a proper Markdown link.
- If this is not the first chapter, begin with a brief transition from the previous chapter{instruction_lang_note}, referencing it with a proper Markdown link using its name{link_lang_note}.
- Begin with a high-level motivation explaining what problem this abstraction solves. Start with a central use case as a concrete example. The whole chapter should guide the reader to understand how to solve this use case. Make it very minimal and friendly to beginners.
- Begin with a high-level motivation explaining what problem this abstraction solves{instruction_lang_note}. Start with a central use case as a concrete example. The whole chapter should guide the reader to understand how to solve this use case. Make it very minimal and friendly to beginners.
- If the abstraction is complex, break it down into key concepts. Explain each concept one-by-one in a very beginner-friendly way.
- If the abstraction is complex, break it down into key concepts. Explain each concept one-by-one in a very beginner-friendly way{instruction_lang_note}.
- Explain how to use this abstraction to solve the use case. Give example inputs and outputs for code snippets (if the output isn't values, describe at a high level what will happen).
- Explain how to use this abstraction to solve the use case{instruction_lang_note}. Give example inputs and outputs for code snippets (if the output isn't values, describe at a high level what will happen{instruction_lang_note}).
- Each code block should be BELOW 20 lines! If longer code blocks are needed, break them down into smaller pieces and walk through them one-by-one. Aggresively simplify the code to make it minimal. Use comments to skip non-important implementation details. Each code block should have a beginner friendly explanation right after it.
- Each code block should be BELOW 20 lines! If longer code blocks are needed, break them down into smaller pieces and walk through them one-by-one. Aggresively simplify the code to make it minimal. Use comments{code_comment_note} to skip non-important implementation details. Each code block should have a beginner friendly explanation right after it{instruction_lang_note}.
- Describe the internal implementation to help understand what's under the hood. First provide a non-code or code-light walkthrough on what happens step-by-step when the abstraction is called. It's recommended to use a simple sequenceDiagram with a dummy example - keep it minimal with at most 5 participants to ensure clarity. If participant name has space, use:
`participant QP as Query Processing`
- Describe the internal implementation to help understand what's under the hood{instruction_lang_note}. First provide a non-code or code-light walkthrough on what happens step-by-step when the abstraction is called{instruction_lang_note}. It's recommended to use a simple sequenceDiagram with a dummy example - keep it minimal with at most 5 participants to ensure clarity. If participant name has space, use: `participant QP as Query Processing`. {mermaid_lang_note}.
- Then dive deeper into code for the internal implementation with references to files. Provide example code blocks, but make them similarly simple and beginner-friendly.
- Then dive deeper into code for the internal implementation with references to files. Provide example code blocks, but make them similarly simple and beginner-friendly. Explain{instruction_lang_note}.
- IMPORTANT: When you need to refer to other core abstractions covered in other chapters, ALWAYS use proper Markdown links like this: [Chapter Title](filename.md). Use the Complete Tutorial Structure above to find the correct filename. Example: "we will talk about [Query Processing](03_query_processing.md) in Chapter 3".
- IMPORTANT: When you need to refer to other core abstractions covered in other chapters, ALWAYS use proper Markdown links like this: [Chapter Title](filename.md). Use the Complete Tutorial Structure above to find the correct filename and the chapter title{link_lang_note}. Translate the surrounding text.
- Use mermaid diagrams to illustrate complex concepts (```mermaid``` format).
- Use mermaid diagrams to illustrate complex concepts (```mermaid``` format). {mermaid_lang_note}.
- Heavily use analogies and examples throughout to help beginners understand.
- Heavily use analogies and examples throughout{instruction_lang_note} to help beginners understand.
- End the chapter with a brief conclusion that summarizes what was learned and provides a transition to the next chapter. If there is a next chapter, use a proper Markdown link: [Next Chapter Title](next_chapter_filename).
- End the chapter with a brief conclusion that summarizes what was learned{instruction_lang_note} and provides a transition to the next chapter{instruction_lang_note}. If there is a next chapter, use a proper Markdown link: [Next Chapter Title](next_chapter_filename){link_lang_note}.
- Ensure the tone is welcoming and easy for a newcomer to understand.
- Ensure the tone is welcoming and easy for a newcomer to understand{tone_note}.
- Output *only* the Markdown content for this chapter.
@@ -527,7 +592,7 @@ Now, directly provide a super beginner-friendly Markdown output (DON'T need ```m
"""
chapter_content = call_llm(prompt)
# Basic validation/cleanup
actual_heading = f"# Chapter {chapter_num}: {abstraction_name}"
actual_heading = f"# Chapter {chapter_num}: {abstraction_name}" # Use potentially translated name
if not chapter_content.strip().startswith(f"# Chapter {chapter_num}"):
# Add heading if missing or incorrect, trying to preserve content
lines = chapter_content.strip().split('\n')
@@ -540,7 +605,7 @@ Now, directly provide a super beginner-friendly Markdown output (DON'T need ```m
# Add the generated content to our temporary list for the next iteration's context
self.chapters_written_so_far.append(chapter_content)
return chapter_content # Return the Markdown string
return chapter_content # Return the Markdown string (potentially translated)
def post(self, shared, prep_res, exec_res_list):
# exec_res_list contains the generated Markdown for each chapter, in order
@@ -554,76 +619,77 @@ class CombineTutorial(Node):
project_name = shared["project_name"]
output_base_dir = shared.get("output_dir", "output") # Default output dir
output_path = os.path.join(output_base_dir, project_name)
repo_url = shared["repo_url"] # Get the repository URL
repo_url = shared.get("repo_url") # Get the repository URL
# language = shared.get("language", "english") # No longer needed for fixed strings
# Use 'label' from relationships_data['details']
relationships_data = shared["relationships"] # {"summary": str, "details": [{"from": int, "to": int, "label": str}]}
# Get potentially translated data
relationships_data = shared["relationships"] # {"summary": str, "details": [{"from": int, "to": int, "label": str}]} -> summary/label potentially translated
chapter_order = shared["chapter_order"] # indices
abstractions = shared["abstractions"] # list of dicts
chapters_content = shared["chapters"] # list of strings
abstractions = shared["abstractions"] # list of dicts -> name/description potentially translated
chapters_content = shared["chapters"] # list of strings -> content potentially translated
# --- Generate Mermaid Diagram ---
mermaid_lines = ["flowchart TD"]
# Add nodes for each abstraction
# Add nodes for each abstraction using potentially translated names
for i, abstr in enumerate(abstractions):
# Sanitize name for Mermaid ID and label
node_id = f"A{i}"
# Use potentially translated name, sanitize for Mermaid ID and label
sanitized_name = abstr['name'].replace('"', '')
node_label = sanitized_name # Using sanitized name only, no index
mermaid_lines.append(f' {node_id}["{node_label}"]')
# Add edges for relationships using 'label'
node_label = sanitized_name # Using sanitized name only
mermaid_lines.append(f' {node_id}["{node_label}"]') # Node label uses potentially translated name
# Add edges for relationships using potentially translated labels
for rel in relationships_data['details']:
from_node_id = f"A{rel['from']}"
to_node_id = f"A{rel['to']}"
# Sanitize 'label' for edge label
# Use potentially translated label, sanitize
edge_label = rel['label'].replace('"', '').replace('\n', ' ') # Basic sanitization
# Limit edge label length for readability (optional, but good for diagrams)
max_label_len = 30 # Make it shorter for labels
max_label_len = 30
if len(edge_label) > max_label_len:
edge_label = edge_label[:max_label_len-3] + "..."
mermaid_lines.append(f' {from_node_id} -- "{edge_label}" --> {to_node_id}')
mermaid_lines.append(f' {from_node_id} -- "{edge_label}" --> {to_node_id}') # Edge label uses potentially translated label
mermaid_diagram = "\n".join(mermaid_lines)
# --- End Mermaid ---
# Prepare index.md content
# --- Prepare index.md content ---
index_content = f"# Tutorial: {project_name}\n\n"
index_content += f"{relationships_data['summary']}\n\n"
index_content += f"{relationships_data['summary']}\n\n" # Use the potentially translated summary directly
# Keep fixed strings in English
index_content += f"**Source Repository:** [{repo_url}]({repo_url})\n\n"
# Add Mermaid diagram for relationships
# Add Mermaid diagram for relationships (diagram itself uses potentially translated names/labels)
index_content += "```mermaid\n"
index_content += mermaid_diagram + "\n"
index_content += "```\n\n"
index_content += "## Chapters\n\n"
# Keep fixed strings in English
index_content += f"## Chapters\n\n"
chapter_files = []
# Generate chapter links based on the determined order
# Generate chapter links based on the determined order, using potentially translated names
for i, abstraction_index in enumerate(chapter_order):
# Ensure index is valid and we have content for it
if 0 <= abstraction_index < len(abstractions) and i < len(chapters_content):
abstraction_name = abstractions[abstraction_index]["name"]
# Sanitize name for filename
abstraction_name = abstractions[abstraction_index]["name"] # Potentially translated name
# Sanitize potentially translated name for filename
safe_name = "".join(c if c.isalnum() else '_' for c in abstraction_name).lower()
# Use chapter number (i+1) for ordering filename
filename = f"{i+1:02d}_{safe_name}.md"
index_content += f"{i+1}. [{abstraction_name}]({filename})\n"
# Add attribution to chapter content
chapter_content = chapters_content[i]
index_content += f"{i+1}. [{abstraction_name}]({filename})\n" # Use potentially translated name in link text
# Add attribution to chapter content (using English fixed string)
chapter_content = chapters_content[i] # Potentially translated content
if not chapter_content.endswith("\n\n"):
chapter_content += "\n\n"
chapter_content += "---\n\nGenerated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)"
# Keep fixed strings in English
chapter_content += f"---\n\nGenerated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)"
# Store filename and corresponding content
chapter_files.append({"filename": filename, "content": chapter_content})
else:
print(f"Warning: Mismatch between chapter order, abstractions, or content at index {i} (abstraction index {abstraction_index}). Skipping file generation for this entry.")
# Add attribution to index content
index_content += "\n\n---\n\nGenerated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)"
# Add attribution to index content (using English fixed string)
index_content += f"\n\n---\n\nGenerated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)"
return {
"output_path": output_path,
@@ -658,4 +724,4 @@ class CombineTutorial(Node):
def post(self, shared, prep_res, exec_res):
shared["final_output_dir"] = exec_res # Store the output path
print(f"\nTutorial generation complete! Files are in: {exec_res}")
print(f"\nTutorial generation complete! Files are in: {exec_res}")