mirror of
https://github.com/aljazceru/Tutorial-Codebase-Knowledge.git
synced 2025-12-18 06:54:24 +01:00
update nodes
This commit is contained in:
5
.gitignore
vendored
5
.gitignore
vendored
@@ -96,4 +96,7 @@ coverage/
|
||||
.node_repl_history
|
||||
|
||||
# LLM cache
|
||||
llm_cache.json
|
||||
llm_cache.json
|
||||
|
||||
# Output files
|
||||
output/
|
||||
12
README.md
12
README.md
@@ -92,7 +92,11 @@ This is a tutorial project of [Pocket Flow](https://github.com/The-Pocket/Pocket
|
||||
|
||||
# Or, analyze a local directory
|
||||
python main.py --dir /path/to/your/codebase --include "*.py" --exclude "*test*"
|
||||
|
||||
# Or, generate a tutorial in Chinese
|
||||
python main.py --repo https://github.com/username/repo --language "Chinese"
|
||||
```
|
||||
|
||||
- `--repo` or `--dir` - Specify either a GitHub repo URL or a local directory path (required, mutually exclusive)
|
||||
- `-n, --name` - Project name (optional, derived from URL/directory if omitted)
|
||||
- `-t, --token` - GitHub token (or set GITHUB_TOKEN environment variable)
|
||||
@@ -102,16 +106,8 @@ This is a tutorial project of [Pocket Flow](https://github.com/The-Pocket/Pocket
|
||||
- `-s, --max-size` - Maximum file size in bytes (default: 100KB)
|
||||
- `--language` - Language for the generated tutorial (default: "english")
|
||||
|
||||
To generate tutorials in languages other than English, add the `--language` parameter:
|
||||
|
||||
```bash
|
||||
# Generate a tutorial in Spanish
|
||||
python main.py --repo https://github.com/username/repo --language "Spanish"
|
||||
```
|
||||
|
||||
The application will crawl the repository, analyze the codebase structure, generate tutorial content in the specified language, and save the output in the specified directory (default: ./output).
|
||||
|
||||
|
||||
## 💡 Development Tutorial
|
||||
|
||||
- I built using [**Agentic Coding**](https://zacharyhuang.substack.com/p/agentic-coding-the-most-fun-way-to), the fastest development paradigm, where humans simply [design](docs/design.md) and agents [code](flow.py).
|
||||
|
||||
@@ -13,20 +13,20 @@ nav_order: 2
|
||||
> Notes for AI: Keep it simple and clear.
|
||||
> If the requirements are abstract, write concrete user stories
|
||||
|
||||
**User Story:** As a developer onboarding to a new codebase, I want a tutorial automatically generated from its GitHub repository. This tutorial should explain the core abstractions, their relationships (visualized), and how they work together, using beginner-friendly language, analogies, and multi-line descriptions where needed, so I can understand the project structure and key concepts quickly without manually digging through all the code.
|
||||
**User Story:** As a developer onboarding to a new codebase, I want a tutorial automatically generated from its GitHub repository or local directory, optionally in a specific language. This tutorial should explain the core abstractions, their relationships (visualized), and how they work together, using beginner-friendly language, analogies, and multi-line descriptions where needed, so I can understand the project structure and key concepts quickly without manually digging through all the code.
|
||||
|
||||
**Input:**
|
||||
- A publicly accessible GitHub repository URL.
|
||||
- A project name (optional, will be derived from the URL if not provided).
|
||||
- A publicly accessible GitHub repository URL or a local directory path.
|
||||
- A project name (optional, will be derived from the URL/directory if not provided).
|
||||
- Desired language for the tutorial (optional, defaults to English).
|
||||
|
||||
**Output:**
|
||||
- A directory named after the project containing:
|
||||
- An `index.md` file with:
|
||||
- A high-level project summary.
|
||||
- A Mermaid flowchart diagram visualizing relationships between abstractions.
|
||||
- Textual descriptions of the relationships.
|
||||
- An ordered list of links to chapter files.
|
||||
- Individual Markdown files for each chapter (`01_chapter_one.md`, `02_chapter_two.md`, etc.) detailing core abstractions in a logical order.
|
||||
- A high-level project summary (potentially translated).
|
||||
- A Mermaid flowchart diagram visualizing relationships between abstractions (using potentially translated names/labels).
|
||||
- An ordered list of links to chapter files (using potentially translated names).
|
||||
- Individual Markdown files for each chapter (`01_chapter_one.md`, `02_chapter_two.md`, etc.) detailing core abstractions in a logical order (potentially translated content).
|
||||
|
||||
## Flow Design
|
||||
|
||||
@@ -43,12 +43,12 @@ This project primarily uses a **Workflow** pattern to decompose the tutorial gen
|
||||
|
||||
### Flow high-level Design:
|
||||
|
||||
1. **`FetchRepo`**: Crawls the specified GitHub repository path using `crawl_github_files` utility, retrieving relevant source code file contents.
|
||||
2. **`IdentifyAbstractions`**: Analyzes the codebase using an LLM to identify up to 10 core abstractions, generate beginner-friendly descriptions (allowing multi-line), and list the *indices* of files related to each abstraction.
|
||||
3. **`AnalyzeRelationships`**: Uses an LLM to analyze the identified abstractions (referenced by index) and their related code to generate a high-level project summary and describe the relationships/interactions between these abstractions, specifying *source* and *target* abstraction indices and a concise label for each interaction.
|
||||
4. **`OrderChapters`**: Determines the most logical order (as indices) to present the abstractions in the tutorial, likely based on importance or dependencies identified in the previous step.
|
||||
5. **`WriteChapters` (BatchNode)**: Iterates through the ordered list of abstraction indices. For each abstraction, it calls an LLM to write a detailed, beginner-friendly chapter, using the relevant code files (accessed via indices) and summaries of previously generated chapters as context.
|
||||
6. **`CombineTutorial`**: Creates an output directory, generates a Mermaid diagram from the relationship data, and writes the project summary, relationship diagram/details (in `index.md`), and individually generated chapters (as separate `.md` files, named and ordered according to `chapter_order`) into it.
|
||||
1. **`FetchRepo`**: Crawls the specified GitHub repository URL or local directory using appropriate utility (`crawl_github_files` or `crawl_local_files`), retrieving relevant source code file contents.
|
||||
2. **`IdentifyAbstractions`**: Analyzes the codebase using an LLM to identify up to 10 core abstractions, generate beginner-friendly descriptions (potentially translated if language != English), and list the *indices* of files related to each abstraction.
|
||||
3. **`AnalyzeRelationships`**: Uses an LLM to analyze the identified abstractions (referenced by index) and their related code to generate a high-level project summary and describe the relationships/interactions between these abstractions (summary and labels potentially translated if language != English), specifying *source* and *target* abstraction indices and a concise label for each interaction.
|
||||
4. **`OrderChapters`**: Determines the most logical order (as indices) to present the abstractions in the tutorial, considering input context which might be translated. The output order itself is language-independent.
|
||||
5. **`WriteChapters` (BatchNode)**: Iterates through the ordered list of abstraction indices. For each abstraction, it calls an LLM to write a detailed, beginner-friendly chapter (content potentially fully translated if language != English), using the relevant code files (accessed via indices) and summaries of previously generated chapters (potentially translated) as context.
|
||||
6. **`CombineTutorial`**: Creates an output directory, generates a Mermaid diagram from the relationship data (using potentially translated names/labels), and writes the project summary (potentially translated), relationship diagram, chapter links (using potentially translated names), and individually generated chapter files (potentially translated content) into it. Fixed text like "Chapters", "Source Repository", and the attribution footer remain in English.
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
@@ -65,16 +65,16 @@ flowchart TD
|
||||
> 1. Understand the utility function definition thoroughly by reviewing the doc.
|
||||
> 2. Include only the necessary utility functions, based on nodes in the flow.
|
||||
|
||||
1. **`crawl_github_files`** (`utils/crawl_github_files.py`) - *External Dependency: requests*
|
||||
1. **`crawl_github_files`** (`utils/crawl_github_files.py`) - *External Dependency: requests, gitpython (optional for SSH)*
|
||||
* *Input*: `repo_url` (str), `token` (str, optional), `max_file_size` (int, optional), `use_relative_paths` (bool, optional), `include_patterns` (set, optional), `exclude_patterns` (set, optional)
|
||||
* *Output*: `dict` containing `files` (dict[str, str]) and `stats`.
|
||||
* *Necessity*: Required by `FetchRepo` to download and read source code from GitHub if a `repo_url` is provided. Handles cloning logic implicitly via API calls, filtering, and file reading.
|
||||
* *Necessity*: Required by `FetchRepo` to download and read source code from GitHub if a `repo_url` is provided. Handles API calls or SSH cloning, filtering, and file reading.
|
||||
2. **`crawl_local_files`** (`utils/crawl_local_files.py`) - *External Dependency: None*
|
||||
* *Input*: `directory` (str), `max_file_size` (int, optional), `use_relative_paths` (bool, optional), `include_patterns` (set, optional), `exclude_patterns` (set, optional)
|
||||
* *Output*: `dict` containing `files` (dict[str, str]).
|
||||
* *Necessity*: Required by `FetchRepo` to read source code from a local directory if a `local_dir` path is provided. Handles directory walking, filtering, and file reading.
|
||||
3. **`call_llm`** (`utils/call_llm.py`) - *External Dependency: LLM Provider API (e.g., OpenAI, Anthropic)*
|
||||
* *Input*: `prompt` (str)
|
||||
3. **`call_llm`** (`utils/call_llm.py`) - *External Dependency: LLM Provider API (e.g., Google GenAI)*
|
||||
* *Input*: `prompt` (str), `use_cache` (bool, optional)
|
||||
* *Output*: `response` (str)
|
||||
* *Necessity*: Used by `IdentifyAbstractions`, `AnalyzeRelationships`, `OrderChapters`, and `WriteChapters` for code analysis and content generation. Needs careful prompt engineering and YAML validation (implicit via `yaml.safe_load` which raises errors).
|
||||
|
||||
@@ -88,18 +88,26 @@ The shared Store structure is organized as follows:
|
||||
|
||||
```python
|
||||
shared = {
|
||||
"repo_url": None, # Input: Provided by the user/main script
|
||||
"project_name": None, # Input: Optional, derived from repo_url if not provided
|
||||
"github_token": None, # Input: Optional, from environment or config
|
||||
# --- Inputs ---
|
||||
"repo_url": None, # Provided by the user/main script if using GitHub
|
||||
"local_dir": None, # Provided by the user/main script if using local directory
|
||||
"project_name": None, # Optional, derived from repo_url/local_dir if not provided
|
||||
"github_token": None, # Optional, from argument or environment variable
|
||||
"output_dir": "output", # Default or user-specified base directory for output
|
||||
"include_patterns": set(), # File patterns to include
|
||||
"exclude_patterns": set(), # File patterns to exclude
|
||||
"max_file_size": 100000, # Default or user-specified max file size
|
||||
"language": "english", # Default or user-specified language for the tutorial
|
||||
|
||||
# --- Intermediate/Output Data ---
|
||||
"files": [], # Output of FetchRepo: List of tuples (file_path: str, file_content: str)
|
||||
"abstractions": [], # Output of IdentifyAbstractions: List of {"name": str, "description": str (can be multi-line), "files": [int]} (indices into shared["files"])
|
||||
"abstractions": [], # Output of IdentifyAbstractions: List of {"name": str (potentially translated), "description": str (potentially translated), "files": [int]} (indices into shared["files"])
|
||||
"relationships": { # Output of AnalyzeRelationships
|
||||
"summary": None, # Overall project summary (can be multi-line)
|
||||
"details": [] # List of {"from": int, "to": int, "label": str} describing relationships between abstraction indices with a concise label.
|
||||
"summary": None, # Overall project summary (potentially translated)
|
||||
"details": [] # List of {"from": int, "to": int, "label": str (potentially translated)} describing relationships between abstraction indices.
|
||||
},
|
||||
"chapter_order": [], # Output of OrderChapters: List of indices into shared["abstractions"], determining tutorial order
|
||||
"chapters": [], # Output of WriteChapters: List of chapter content strings (Markdown), ordered according to chapter_order
|
||||
"output_dir": "output", # Input/Default: Base directory for output
|
||||
"chapters": [], # Output of WriteChapters: List of chapter content strings (Markdown, potentially translated), ordered according to chapter_order
|
||||
"final_output_dir": None # Output of CombineTutorial: Path to the final generated tutorial directory (e.g., "output/my_project")
|
||||
}
|
||||
```
|
||||
@@ -112,46 +120,46 @@ shared = {
|
||||
* *Purpose*: Download the repository code (from GitHub) or read from a local directory, loading relevant files into memory using the appropriate crawler utility.
|
||||
* *Type*: Regular
|
||||
* *Steps*:
|
||||
* `prep`: Read `repo_url` (if provided), `local_dir` (if provided), optional `github_token`, `output_dir` from shared store. Define `include_patterns` (e.g., `{"*.py", "*.js", "*.md"}`) and `exclude_patterns` (e.g., `{"*test*", "docs/*"}`). Set `max_file_size` and `use_relative_paths` flags. Determine `project_name` from `repo_url` or `local_dir` if not present in shared.
|
||||
* `prep`: Read `repo_url`, `local_dir`, `project_name`, `github_token`, `output_dir`, `include_patterns`, `exclude_patterns`, `max_file_size` from shared store. Determine `project_name` from `repo_url` or `local_dir` if not present in shared. Set `use_relative_paths` flag.
|
||||
* `exec`: If `repo_url` is present, call `crawl_github_files(...)`. Otherwise, call `crawl_local_files(...)`. Convert the resulting `files` dictionary into a list of `(path, content)` tuples.
|
||||
* `post`: Write the list of `files` tuples and the derived `project_name` (if applicable) to the shared store.
|
||||
|
||||
2. **`IdentifyAbstractions`**
|
||||
* *Purpose*: Analyze the code to identify key concepts/abstractions using indices.
|
||||
* *Purpose*: Analyze the code to identify key concepts/abstractions using indices. Generates potentially translated names and descriptions if language is not English.
|
||||
* *Type*: Regular
|
||||
* *Steps*:
|
||||
* `prep`: Read `files` (list of tuples) from shared store. Create context using `create_llm_context` helper which adds file indices. Format the list of `index # path` for the prompt.
|
||||
* `exec`: Construct a prompt for `call_llm` asking it to identify ~5-10 core abstractions, provide a simple description (allowing multi-line YAML string) for each, and list the relevant *file indices* (e.g., `- 0 # path/to/file.py`). Request YAML list output. Parse and validate the YAML, ensuring indices are within bounds and converting entries like `0 # path...` to just the integer `0`.
|
||||
* `post`: Write the validated list of `abstractions` (e.g., `[{"name": "Node", "description": "...", "files": [0, 3, 5]}, ...]`) containing file *indices* to the shared store.
|
||||
* `prep`: Read `files` (list of tuples), `project_name`, and `language` from shared store. Create context using `create_llm_context` helper which adds file indices. Format the list of `index # path` for the prompt.
|
||||
* `exec`: Construct a prompt for `call_llm`. If language is not English, add instructions to generate `name` and `description` in the target language. Ask LLM to identify ~5-10 core abstractions, provide a simple description for each, and list the relevant *file indices* (e.g., `- 0 # path/to/file.py`). Request YAML list output. Parse and validate the YAML, ensuring indices are within bounds and converting entries like `0 # path...` to just the integer `0`.
|
||||
* `post`: Write the validated list of `abstractions` (e.g., `[{"name": "Node", "description": "...", "files": [0, 3, 5]}, ...]`) containing file *indices* and potentially translated `name`/`description` to the shared store.
|
||||
|
||||
3. **`AnalyzeRelationships`**
|
||||
* *Purpose*: Generate a project summary and describe how the identified abstractions interact using indices and concise labels.
|
||||
* *Purpose*: Generate a project summary and describe how the identified abstractions interact using indices and concise labels. Generates potentially translated summary and labels if language is not English.
|
||||
* *Type*: Regular
|
||||
* *Steps*:
|
||||
* `prep`: Read `abstractions` and `files` from shared store. Format context for the LLM, including abstraction names *and indices*, descriptions, and content snippets from related files (referenced by `index # path` using `get_content_for_indices` helper). Prepare the list of `index # AbstractionName` for the prompt.
|
||||
* `exec`: Construct a prompt for `call_llm` asking for (1) a high-level summary (allowing multi-line YAML string) and (2) a list of relationships, each specifying `from_abstraction` (e.g., `0 # Abstraction1`), `to_abstraction` (e.g., `1 # Abstraction2`), and a concise `label` (string, just a few words). Request structured YAML output. Parse and validate, converting referenced abstractions to indices (`from: 0, to: 1`).
|
||||
* `post`: Parse the LLM response and write the `relationships` dictionary (`{"summary": "...", "details": [{"from": 0, "to": 1, "label": "..."}, ...]}`) with indices to the shared store.
|
||||
* `prep`: Read `abstractions`, `files`, `project_name`, and `language` from shared store. Format context for the LLM, including potentially translated abstraction names *and indices*, potentially translated descriptions, and content snippets from related files (referenced by `index # path` using `get_content_for_indices` helper). Prepare the list of `index # AbstractionName` (potentially translated) for the prompt.
|
||||
* `exec`: Construct a prompt for `call_llm`. If language is not English, add instructions to generate `summary` and `label` in the target language, and note that input names might be translated. Ask for (1) a high-level summary and (2) a list of relationships, each specifying `from_abstraction` (e.g., `0 # Abstraction1`), `to_abstraction` (e.g., `1 # Abstraction2`), and a concise `label`. Request structured YAML output. Parse and validate, converting referenced abstractions to indices (`from: 0, to: 1`).
|
||||
* `post`: Parse the LLM response and write the `relationships` dictionary (`{"summary": "...", "details": [{"from": 0, "to": 1, "label": "..."}, ...]}`) with indices and potentially translated `summary`/`label` to the shared store.
|
||||
|
||||
4. **`OrderChapters`**
|
||||
* *Purpose*: Determine the sequence (as indices) in which abstractions should be presented.
|
||||
* *Purpose*: Determine the sequence (as indices) in which abstractions should be presented. Considers potentially translated input context.
|
||||
* *Type*: Regular
|
||||
* *Steps*:
|
||||
* `prep`: Read `abstractions` and `relationships` from the shared store. Prepare context including the list of `index # AbstractionName` and textual descriptions of relationships referencing indices and using the concise `label`.
|
||||
* `prep`: Read `abstractions`, `relationships`, `project_name`, and `language` from the shared store. Prepare context including the list of `index # AbstractionName` (potentially translated) and textual descriptions of relationships referencing indices and using the potentially translated `label`. Note in context if summary/names might be translated.
|
||||
* `exec`: Construct a prompt for `call_llm` asking it to order the abstractions based on importance, foundational concepts, or dependencies. Request output as an ordered YAML list of `index # AbstractionName`. Parse and validate, extracting only the indices and ensuring all are present exactly once.
|
||||
* `post`: Write the validated ordered list of indices (`chapter_order`) to the shared store.
|
||||
|
||||
5. **`WriteChapters`**
|
||||
* *Purpose*: Generate the detailed content for each chapter of the tutorial.
|
||||
* *Purpose*: Generate the detailed content for each chapter of the tutorial. Generates potentially fully translated chapter content if language is not English.
|
||||
* *Type*: **BatchNode**
|
||||
* *Steps*:
|
||||
* `prep`: Read `chapter_order` (list of indices), `abstractions`, and `files` from shared store. Initialize an empty instance variable `self.chapters_written_so_far`. Return an iterable list where each item corresponds to an *abstraction index* from `chapter_order`. Each item should contain chapter number, abstraction details, and a map of related file content (`{ "idx # path": content }` obtained via `get_content_for_indices`).
|
||||
* `exec(item)`: Construct a prompt for `call_llm`. Ask it to write a beginner-friendly Markdown chapter about the current abstraction. Provide its description. Include a summary of previously written chapters (from `self.chapters_written_so_far`). Provide relevant code snippets (referenced by `index # path`). Add the generated chapter content to `self.chapters_written_so_far` for the next iteration's context. Return the chapter content.
|
||||
* `post(shared, prep_res, exec_res_list)`: `exec_res_list` contains the generated chapter Markdown content strings, ordered correctly. Assign this list directly to `shared["chapters"]`. Clean up `self.chapters_written_so_far`.
|
||||
* `prep`: Read `chapter_order` (indices), `abstractions`, `files`, `project_name`, and `language` from shared store. Initialize an empty instance variable `self.chapters_written_so_far`. Return an iterable list where each item corresponds to an *abstraction index* from `chapter_order`. Each item should contain chapter number, potentially translated abstraction details, a map of related file content (`{ "idx # path": content }`), full chapter listing (potentially translated names), chapter filename map, previous/next chapter info (potentially translated names), and language.
|
||||
* `exec(item)`: Construct a prompt for `call_llm`. If language is not English, add detailed instructions to write the *entire* chapter in the target language, translating explanations, examples, etc., while noting which input context might already be translated. Ask LLM to write a beginner-friendly Markdown chapter. Provide potentially translated concept details. Include a summary of previously written chapters (potentially translated). Provide relevant code snippets. Add the generated (potentially translated) chapter content to `self.chapters_written_so_far` for the next iteration's context. Return the chapter content.
|
||||
* `post(shared, prep_res, exec_res_list)`: `exec_res_list` contains the generated chapter Markdown content strings (potentially translated), ordered correctly. Assign this list directly to `shared["chapters"]`. Clean up `self.chapters_written_so_far`.
|
||||
|
||||
6. **`CombineTutorial`**
|
||||
* *Purpose*: Assemble the final tutorial files, including a Mermaid diagram using concise labels.
|
||||
* *Purpose*: Assemble the final tutorial files, including a Mermaid diagram using potentially translated labels/names. Fixed text remains English.
|
||||
* *Type*: Regular
|
||||
* *Steps*:
|
||||
* `prep`: Read `project_name`, `relationships`, `chapter_order` (indices), `abstractions`, and `chapters` (list of content) from shared store. Generate a Mermaid `flowchart TD` string based on `relationships["details"]`, using indices to identify nodes and the concise `label` for edges. Construct the content for `index.md` (including summary, Mermaid diagram, textual relationship details using the `label`, and ordered links to chapters derived using `chapter_order` and `abstractions`). Define the output directory path (e.g., `./output_dir/project_name`). Prepare a list of `{ "filename": "01_...", "content": "..." }` for chapters.
|
||||
* `prep`: Read `project_name`, `relationships` (potentially translated summary/labels), `chapter_order` (indices), `abstractions` (potentially translated name/desc), `chapters` (list of potentially translated content), `repo_url`, and `output_dir` from shared store. Generate a Mermaid `flowchart TD` string based on `relationships["details"]`, using indices to identify nodes (potentially translated names) and the concise `label` (potentially translated) for edges. Construct the content for `index.md` (including potentially translated summary, Mermaid diagram, and ordered links to chapters using potentially translated names derived using `chapter_order` and `abstractions`). Define the output directory path (e.g., `./output_dir/project_name`). Prepare a list of `{ "filename": "01_...", "content": "..." }` for chapters, adding the English attribution footer to each chapter's content. Add the English attribution footer to the index content.
|
||||
* `exec`: Create the output directory. Write the generated `index.md` content. Iterate through the prepared chapter file list and write each chapter's content to its corresponding `.md` file in the output directory.
|
||||
* `post`: Write the final `output_dir` path to `shared["final_output_dir"]`. Log completion.
|
||||
* `post`: Write the final `output_path` to `shared["final_output_dir"]`. Log completion.
|
||||
140
nodes.py
140
nodes.py
@@ -96,16 +96,17 @@ class IdentifyAbstractions(Node):
|
||||
|
||||
def exec(self, prep_res):
|
||||
context, file_listing_for_prompt, file_count, project_name, language = prep_res # Unpack project name and language
|
||||
print(f"Identifying abstractions in {language.capitalize()} using LLM...")
|
||||
print(f"Identifying abstractions using LLM...")
|
||||
|
||||
# Add language instruction and hints if not English
|
||||
# Add language instruction and hints only if not English
|
||||
language_instruction = ""
|
||||
name_lang_hint = ""
|
||||
desc_lang_hint = ""
|
||||
if language.lower() != "english":
|
||||
language_instruction = f"IMPORTANT: Generate the `name` and `description` for each abstraction in **{language.capitalize()}** language. Do NOT use English for these fields.\n\n"
|
||||
name_lang_hint = f" # (value in {language.capitalize()})"
|
||||
desc_lang_hint = f" # (value in {language.capitalize()})"
|
||||
# Keep specific hints here as name/description are primary targets
|
||||
name_lang_hint = f" (value in {language.capitalize()})"
|
||||
desc_lang_hint = f" (value in {language.capitalize()})"
|
||||
|
||||
prompt = f"""
|
||||
For the project `{project_name}`:
|
||||
@@ -186,7 +187,7 @@ Format the output as a YAML list of dictionaries:
|
||||
"files": item["files"]
|
||||
})
|
||||
|
||||
print(f"Identified {len(validated_abstractions)} abstractions (in {language.capitalize()}).")
|
||||
print(f"Identified {len(validated_abstractions)} abstractions.")
|
||||
return validated_abstractions
|
||||
|
||||
def post(self, shared, prep_res, exec_res):
|
||||
@@ -229,33 +230,32 @@ class AnalyzeRelationships(Node):
|
||||
|
||||
def exec(self, prep_res):
|
||||
context, abstraction_listing, project_name, language = prep_res # Unpack project name and language
|
||||
print(f"Analyzing relationships in {language.capitalize()} using LLM...")
|
||||
print(f"Analyzing relationships using LLM...")
|
||||
|
||||
# Add language instruction and hints if not English
|
||||
# Add language instruction and hints only if not English
|
||||
language_instruction = ""
|
||||
summary_lang_hint = ""
|
||||
label_lang_hint = ""
|
||||
lang_hint = ""
|
||||
list_lang_note = ""
|
||||
if language.lower() != "english":
|
||||
language_instruction = f"IMPORTANT: Generate the `summary` and relationship `label` fields in **{language.capitalize()}** language. Do NOT use English for these fields.\n\n"
|
||||
summary_lang_hint = f" (in {language.capitalize()})"
|
||||
label_lang_hint = f" # (value in {language.capitalize()})"
|
||||
|
||||
lang_hint = f" (in {language.capitalize()})"
|
||||
list_lang_note = f" (Names might be in {language.capitalize()})" # Note for the input list
|
||||
|
||||
prompt = f"""
|
||||
Based on the following abstractions and relevant code snippets from the project `{project_name}`:
|
||||
|
||||
List of Abstraction Indices and Names (Names might be in {language.capitalize()}):
|
||||
List of Abstraction Indices and Names{list_lang_note}:
|
||||
{abstraction_listing}
|
||||
|
||||
Context (Abstractions, Descriptions, Code):
|
||||
{context}
|
||||
|
||||
{language_instruction}Please provide:
|
||||
1. A high-level `summary` of the project's main purpose and functionality in a few beginner-friendly sentences{summary_lang_hint}. Use markdown formatting with **bold** and *italic* text to highlight important concepts.
|
||||
1. A high-level `summary` of the project's main purpose and functionality in a few beginner-friendly sentences{lang_hint}. Use markdown formatting with **bold** and *italic* text to highlight important concepts.
|
||||
2. A list (`relationships`) describing the key interactions between these abstractions. For each relationship, specify:
|
||||
- `from_abstraction`: Index of the source abstraction (e.g., `0 # AbstractionName1`)
|
||||
- `to_abstraction`: Index of the target abstraction (e.g., `1 # AbstractionName2`)
|
||||
- `label`: A brief label for the interaction **in just a few words**{label_lang_hint} (e.g., "Manages", "Inherits", "Uses").
|
||||
- `label`: A brief label for the interaction **in just a few words**{lang_hint} (e.g., "Manages", "Inherits", "Uses").
|
||||
Ideally the relationship should be backed by one abstraction calling or passing parameters to another.
|
||||
Simplify the relationship and exclude those non-important ones.
|
||||
|
||||
@@ -265,15 +265,15 @@ Format the output as YAML:
|
||||
|
||||
```yaml
|
||||
summary: |
|
||||
A brief, simple explanation of the project{summary_lang_hint}.
|
||||
A brief, simple explanation of the project{lang_hint}.
|
||||
Can span multiple lines with **bold** and *italic* for emphasis.
|
||||
relationships:
|
||||
- from_abstraction: 0 # AbstractionName1
|
||||
to_abstraction: 1 # AbstractionName2
|
||||
label: "Manages"{label_lang_hint}
|
||||
label: "Manages"{lang_hint}
|
||||
- from_abstraction: 2 # AbstractionName3
|
||||
to_abstraction: 0 # AbstractionName1
|
||||
label: "Provides config"{label_lang_hint}
|
||||
label: "Provides config"{lang_hint}
|
||||
# ... other relationships
|
||||
```
|
||||
|
||||
@@ -317,7 +317,7 @@ Now, provide the YAML output:
|
||||
except (ValueError, TypeError):
|
||||
raise ValueError(f"Could not parse indices from relationship: {rel}")
|
||||
|
||||
print(f"Generated project summary and relationship details (in {language.capitalize()}).")
|
||||
print("Generated project summary and relationship details.")
|
||||
return {
|
||||
"summary": relationships_data["summary"], # Potentially translated summary
|
||||
"details": validated_relationships # Store validated, index-based relationships with potentially translated labels
|
||||
@@ -334,6 +334,7 @@ class OrderChapters(Node):
|
||||
abstractions = shared["abstractions"] # Name/description might be translated
|
||||
relationships = shared["relationships"] # Summary/label might be translated
|
||||
project_name = shared["project_name"] # Get project name
|
||||
language = shared.get("language", "english") # Get language
|
||||
|
||||
# Prepare context for the LLM
|
||||
abstraction_info_for_prompt = []
|
||||
@@ -342,24 +343,33 @@ class OrderChapters(Node):
|
||||
abstraction_listing = "\n".join(abstraction_info_for_prompt)
|
||||
|
||||
# Use potentially translated summary and labels
|
||||
context = f"Project Summary:\n{relationships['summary']}\n\n"
|
||||
summary_note = ""
|
||||
if language.lower() != "english":
|
||||
summary_note = f" (Note: Project Summary might be in {language.capitalize()})"
|
||||
|
||||
context = f"Project Summary{summary_note}:\n{relationships['summary']}\n\n"
|
||||
context += "Relationships (Indices refer to abstractions above):\n"
|
||||
for rel in relationships['details']:
|
||||
from_name = abstractions[rel['from']]['name']
|
||||
to_name = abstractions[rel['to']]['name']
|
||||
# Use potentially translated 'label'
|
||||
context += f"- From {rel['from']} ({from_name}) to {rel['to']} ({to_name}): {rel['label']}\n"
|
||||
context += f"- From {rel['from']} ({from_name}) to {rel['to']} ({to_name}): {rel['label']}\n" # Label might be translated
|
||||
|
||||
return abstraction_listing, context, len(abstractions), project_name
|
||||
list_lang_note = ""
|
||||
if language.lower() != "english":
|
||||
list_lang_note = f" (Names might be in {language.capitalize()})"
|
||||
|
||||
return abstraction_listing, context, len(abstractions), project_name, list_lang_note
|
||||
|
||||
def exec(self, prep_res):
|
||||
abstraction_listing, context, num_abstractions, project_name = prep_res
|
||||
abstraction_listing, context, num_abstractions, project_name, list_lang_note = prep_res
|
||||
print("Determining chapter order using LLM...")
|
||||
# No language variation needed here, just ordering based on structure
|
||||
# No language variation needed here in prompt instructions, just ordering based on structure
|
||||
# The input names might be translated, hence the note.
|
||||
prompt = f"""
|
||||
Given the following project abstractions and their relationships for the project ```` {project_name} ````:
|
||||
|
||||
Abstractions (Index # Name):
|
||||
Abstractions (Index # Name){list_lang_note}:
|
||||
{abstraction_listing}
|
||||
|
||||
Context about relationships and project summary:
|
||||
@@ -487,7 +497,7 @@ class WriteChapters(BatchNode):
|
||||
else:
|
||||
print(f"Warning: Invalid abstraction index {abstraction_index} in chapter_order. Skipping.")
|
||||
|
||||
print(f"Preparing to write {len(items_to_process)} chapters in {language.capitalize()}...")
|
||||
print(f"Preparing to write {len(items_to_process)} chapters...")
|
||||
return items_to_process # Iterable for BatchNode
|
||||
|
||||
def exec(self, item):
|
||||
@@ -497,7 +507,7 @@ class WriteChapters(BatchNode):
|
||||
chapter_num = item["chapter_num"]
|
||||
project_name = item.get("project_name")
|
||||
language = item.get("language", "english")
|
||||
print(f"Writing chapter {chapter_num} for: {abstraction_name} (in {language.capitalize()}) using LLM...")
|
||||
print(f"Writing chapter {chapter_num} for: {abstraction_name} using LLM...")
|
||||
|
||||
# Prepare file context string from the map
|
||||
file_context_str = "\n\n".join(
|
||||
@@ -509,54 +519,72 @@ class WriteChapters(BatchNode):
|
||||
# Use the temporary instance variable
|
||||
previous_chapters_summary = "\n---\n".join(self.chapters_written_so_far)
|
||||
|
||||
# Add language instruction if not English - the chapter content itself needs translation
|
||||
# Add language instruction and context notes only if not English
|
||||
language_instruction = ""
|
||||
concept_details_note = ""
|
||||
structure_note = ""
|
||||
prev_summary_note = ""
|
||||
instruction_lang_note = ""
|
||||
mermaid_lang_note = ""
|
||||
code_comment_note = ""
|
||||
link_lang_note = ""
|
||||
tone_note = ""
|
||||
if language.lower() != "english":
|
||||
language_instruction = f"IMPORTANT: Write this ENTIRE tutorial chapter in **{language.capitalize()}** language. The concept name '{abstraction_name}' and its description are already provided in {language.capitalize()}. You MUST translate ALL other content including explanations, examples, code comments (unless essential for syntax), and technical terms into {language.capitalize()}. DO NOT use English anywhere except in code syntax, required proper nouns or where specified. The entire output MUST be in {language.capitalize()} only.\n\n"
|
||||
lang_cap = language.capitalize()
|
||||
language_instruction = f"IMPORTANT: Write this ENTIRE tutorial chapter in **{lang_cap}**. Some input context (like concept name, description, chapter list, previous summary) might already be in {lang_cap}, but you MUST translate ALL other generated content including explanations, examples, technical terms, and potentially code comments into {lang_cap}. DO NOT use English anywhere except in code syntax, required proper nouns, or when specified. The entire output MUST be in {lang_cap}.\n\n"
|
||||
concept_details_note = f" (Note: Provided in {lang_cap})"
|
||||
structure_note = f" (Note: Chapter names might be in {lang_cap})"
|
||||
prev_summary_note = f" (Note: This summary might be in {lang_cap})"
|
||||
instruction_lang_note = f" (in {lang_cap})"
|
||||
mermaid_lang_note = f" (Use {lang_cap} for labels/text if appropriate)"
|
||||
code_comment_note = f" (Translate to {lang_cap} if possible, otherwise keep minimal English for clarity)"
|
||||
link_lang_note = f" (Use the {lang_cap} chapter title from the structure above)"
|
||||
tone_note = f" (appropriate for {lang_cap} readers)"
|
||||
|
||||
|
||||
prompt = f"""
|
||||
{language_instruction}Write a very beginner-friendly tutorial chapter (in Markdown format) for the project `{project_name}` about the concept: "{abstraction_name}". This is Chapter {chapter_num}.
|
||||
|
||||
Concept Details (already in {language.capitalize()}):
|
||||
Concept Details{concept_details_note}:
|
||||
- Name: {abstraction_name}
|
||||
- Description:
|
||||
{abstraction_description}
|
||||
|
||||
Complete Tutorial Structure (Chapter names might be in {language.capitalize()}):
|
||||
Complete Tutorial Structure{structure_note}:
|
||||
{item["full_chapter_listing"]}
|
||||
|
||||
Context from previous chapters (summary, also in {language.capitalize()}):
|
||||
Context from previous chapters{prev_summary_note}:
|
||||
{previous_chapters_summary if previous_chapters_summary else "This is the first chapter."}
|
||||
|
||||
Relevant Code Snippets (Code itself remains unchanged):
|
||||
{file_context_str if file_context_str else "No specific code snippets provided for this abstraction."}
|
||||
|
||||
Instructions for the chapter (Translate explanations into {language.capitalize()}):
|
||||
- Start with a clear heading (e.g., `# Chapter {chapter_num}: {abstraction_name}`). Use the provided {language.capitalize()} name.
|
||||
Instructions for the chapter (Generate content in {language.capitalize()} unless specified otherwise):
|
||||
- Start with a clear heading (e.g., `# Chapter {chapter_num}: {abstraction_name}`). Use the provided concept name.
|
||||
|
||||
- If this is not the first chapter, begin with a brief transition from the previous chapter (in {language.capitalize()}), referencing it with a proper Markdown link using its {language.capitalize()} name.
|
||||
- If this is not the first chapter, begin with a brief transition from the previous chapter{instruction_lang_note}, referencing it with a proper Markdown link using its name{link_lang_note}.
|
||||
|
||||
- Begin with a high-level motivation explaining what problem this abstraction solves (in {language.capitalize()}). Start with a central use case as a concrete example. The whole chapter should guide the reader to understand how to solve this use case. Make it very minimal and friendly to beginners.
|
||||
- Begin with a high-level motivation explaining what problem this abstraction solves{instruction_lang_note}. Start with a central use case as a concrete example. The whole chapter should guide the reader to understand how to solve this use case. Make it very minimal and friendly to beginners.
|
||||
|
||||
- If the abstraction is complex, break it down into key concepts. Explain each concept one-by-one in a very beginner-friendly way (in {language.capitalize()}).
|
||||
- If the abstraction is complex, break it down into key concepts. Explain each concept one-by-one in a very beginner-friendly way{instruction_lang_note}.
|
||||
|
||||
- Explain how to use this abstraction to solve the use case (in {language.capitalize()}). Give example inputs and outputs for code snippets (if the output isn't values, describe at a high level what will happen in {language.capitalize()}).
|
||||
- Explain how to use this abstraction to solve the use case{instruction_lang_note}. Give example inputs and outputs for code snippets (if the output isn't values, describe at a high level what will happen{instruction_lang_note}).
|
||||
|
||||
- Each code block should be BELOW 20 lines! If longer code blocks are needed, break them down into smaller pieces and walk through them one-by-one. Aggresively simplify the code to make it minimal. Use comments (translate to {language.capitalize()} if possible, otherwise keep minimal English for clarity) to skip non-important implementation details. Each code block should have a beginner friendly explanation right after it (in {language.capitalize()}).
|
||||
- Each code block should be BELOW 20 lines! If longer code blocks are needed, break them down into smaller pieces and walk through them one-by-one. Aggresively simplify the code to make it minimal. Use comments{code_comment_note} to skip non-important implementation details. Each code block should have a beginner friendly explanation right after it{instruction_lang_note}.
|
||||
|
||||
- Describe the internal implementation to help understand what's under the hood (in {language.capitalize()}). First provide a non-code or code-light walkthrough on what happens step-by-step when the abstraction is called (in {language.capitalize()}). It's recommended to use a simple sequenceDiagram with a dummy example - keep it minimal with at most 5 participants to ensure clarity. If participant name has space, use: `participant QP as Query Processing` (Use the {language.capitalize()} name if appropriate for participant labels).
|
||||
- Describe the internal implementation to help understand what's under the hood{instruction_lang_note}. First provide a non-code or code-light walkthrough on what happens step-by-step when the abstraction is called{instruction_lang_note}. It's recommended to use a simple sequenceDiagram with a dummy example - keep it minimal with at most 5 participants to ensure clarity. If participant name has space, use: `participant QP as Query Processing`. {mermaid_lang_note}.
|
||||
|
||||
- Then dive deeper into code for the internal implementation with references to files. Provide example code blocks, but make them similarly simple and beginner-friendly. Explain in {language.capitalize()}.
|
||||
- Then dive deeper into code for the internal implementation with references to files. Provide example code blocks, but make them similarly simple and beginner-friendly. Explain{instruction_lang_note}.
|
||||
|
||||
- IMPORTANT: When you need to refer to other core abstractions covered in other chapters, ALWAYS use proper Markdown links like this: [Chapter Title](filename.md). Use the Complete Tutorial Structure above to find the correct filename and the (potentially {language.capitalize()}) chapter title. Example: "we will talk about [Query Processing](03_query_processing.md) in Chapter 3". Translate the surrounding text.
|
||||
- IMPORTANT: When you need to refer to other core abstractions covered in other chapters, ALWAYS use proper Markdown links like this: [Chapter Title](filename.md). Use the Complete Tutorial Structure above to find the correct filename and the chapter title{link_lang_note}. Translate the surrounding text.
|
||||
|
||||
- Use mermaid diagrams to illustrate complex concepts (```mermaid``` format). Translate labels/text within diagrams where appropriate.
|
||||
- Use mermaid diagrams to illustrate complex concepts (```mermaid``` format). {mermaid_lang_note}.
|
||||
|
||||
- Heavily use analogies and examples throughout (in {language.capitalize()}) to help beginners understand.
|
||||
- Heavily use analogies and examples throughout{instruction_lang_note} to help beginners understand.
|
||||
|
||||
- End the chapter with a brief conclusion that summarizes what was learned (in {language.capitalize()}) and provides a transition to the next chapter (in {language.capitalize()}). If there is a next chapter, use a proper Markdown link: [Next Chapter Title](next_chapter_filename). Use the {language.capitalize()} title.
|
||||
- End the chapter with a brief conclusion that summarizes what was learned{instruction_lang_note} and provides a transition to the next chapter{instruction_lang_note}. If there is a next chapter, use a proper Markdown link: [Next Chapter Title](next_chapter_filename){link_lang_note}.
|
||||
|
||||
- Ensure the tone is welcoming and easy for a newcomer to understand (appropriate for {language.capitalize()} readers).
|
||||
- Ensure the tone is welcoming and easy for a newcomer to understand{tone_note}.
|
||||
|
||||
- Output *only* the Markdown content for this chapter.
|
||||
|
||||
@@ -608,7 +636,7 @@ class CombineTutorial(Node):
|
||||
# Use potentially translated name, sanitize for Mermaid ID and label
|
||||
sanitized_name = abstr['name'].replace('"', '')
|
||||
node_label = sanitized_name # Using sanitized name only
|
||||
mermaid_lines.append(f' {node_id}["{node_label}"]')
|
||||
mermaid_lines.append(f' {node_id}["{node_label}"]') # Node label uses potentially translated name
|
||||
# Add edges for relationships using potentially translated labels
|
||||
for rel in relationships_data['details']:
|
||||
from_node_id = f"A{rel['from']}"
|
||||
@@ -618,22 +646,24 @@ class CombineTutorial(Node):
|
||||
max_label_len = 30
|
||||
if len(edge_label) > max_label_len:
|
||||
edge_label = edge_label[:max_label_len-3] + "..."
|
||||
mermaid_lines.append(f' {from_node_id} -- "{edge_label}" --> {to_node_id}')
|
||||
mermaid_lines.append(f' {from_node_id} -- "{edge_label}" --> {to_node_id}') # Edge label uses potentially translated label
|
||||
|
||||
mermaid_diagram = "\n".join(mermaid_lines)
|
||||
# --- End Mermaid ---
|
||||
|
||||
# --- Prepare index.md content ---
|
||||
index_content = f"# Tutorial: {project_name}\n\n"
|
||||
index_content = f"# Tutorial: {project_name}\n\n"
|
||||
index_content += f"{relationships_data['summary']}\n\n" # Use the potentially translated summary directly
|
||||
index_content += f"**Source Repository:** [{repo_url}]({repo_url})\n\n" # English "Source Repository"
|
||||
# Keep fixed strings in English
|
||||
index_content += f"**Source Repository:** [{repo_url}]({repo_url})\n\n"
|
||||
|
||||
# Add Mermaid diagram for relationships (diagram itself uses potentially translated names/labels)
|
||||
index_content += "```mermaid\n"
|
||||
index_content += mermaid_diagram + "\n"
|
||||
index_content += "```\n\n"
|
||||
|
||||
index_content += f"## Chapters\n\n" # English "Chapters"
|
||||
# Keep fixed strings in English
|
||||
index_content += f"## Chapters\n\n"
|
||||
|
||||
chapter_files = []
|
||||
# Generate chapter links based on the determined order, using potentially translated names
|
||||
@@ -650,8 +680,8 @@ class CombineTutorial(Node):
|
||||
chapter_content = chapters_content[i] # Potentially translated content
|
||||
if not chapter_content.endswith("\n\n"):
|
||||
chapter_content += "\n\n"
|
||||
|
||||
chapter_content += f"---\n\nGenerated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)" # English "Generated by"
|
||||
# Keep fixed strings in English
|
||||
chapter_content += f"---\n\nGenerated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)"
|
||||
|
||||
# Store filename and corresponding content
|
||||
chapter_files.append({"filename": filename, "content": chapter_content})
|
||||
@@ -659,7 +689,7 @@ class CombineTutorial(Node):
|
||||
print(f"Warning: Mismatch between chapter order, abstractions, or content at index {i} (abstraction index {abstraction_index}). Skipping file generation for this entry.")
|
||||
|
||||
# Add attribution to index content (using English fixed string)
|
||||
index_content += f"\n\n---\n\nGenerated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)" # English "Generated by"
|
||||
index_content += f"\n\n---\n\nGenerated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)"
|
||||
|
||||
return {
|
||||
"output_path": output_path,
|
||||
@@ -694,4 +724,4 @@ class CombineTutorial(Node):
|
||||
|
||||
def post(self, shared, prep_res, exec_res):
|
||||
shared["final_output_dir"] = exec_res # Store the output path
|
||||
print(f"\nTutorial generation complete! Files are in: {exec_res}")
|
||||
print(f"\nTutorial generation complete! Files are in: {exec_res}")
|
||||
|
||||
Reference in New Issue
Block a user