mirror of
https://github.com/aljazceru/Auditor.git
synced 2025-12-17 03:24:18 +01:00
Fixed critical Windows compatibility issues and updated outdated documentation.
CRITICAL WINDOWS HANG FIXES:
1. ProcessPoolExecutor → ThreadPoolExecutor
- Fixes PowerShell/terminal hang where Ctrl+C wouldn't work
- Prevents .pf directory lock requiring Task Manager kill
- Root cause: Nested ProcessPool + ThreadPool on Windows creates kernel deadlock
2. Ctrl+C Interruption Support
- Replaced subprocess.run with Popen+poll pattern (industry standard)
- Poll subprocess every 100ms for interruption checking
- Added global stop_event and signal handlers for graceful shutdown
- Root cause: subprocess.run blocks threads with no signal propagation
DOCUMENTATION DRIFT FIX:
- Removed hardcoded "14 phases" references (actual is 19+ commands)
- Updated to "multiple analysis phases" throughout all docs
- Fixed CLI help text to be version-agnostic
- Added missing "Summary generation" step in HOWTOUSE.md
Changes:
- pipelines.py: ProcessPoolExecutor → ThreadPoolExecutor, added Popen+poll pattern
- Added signal handling and run_subprocess_with_interrupt() function
- commands/full.py: Updated docstring to remove specific phase count
- README.md: Changed "14 distinct phases" to "multiple analysis phases"
- HOWTOUSE.md: Updated phase references, added missing summary step
- CLAUDE.md & ARCHITECTURE.md: Removed hardcoded phase counts
Impact: Critical UX fixes - Windows compatibility restored, pipeline interruptible
Testing: Ctrl+C works, no PowerShell hangs, .pf directory deletable
606 lines
19 KiB
Markdown
606 lines
19 KiB
Markdown
# TheAuditor Architecture
|
|
|
|
This document provides a comprehensive technical overview of TheAuditor's architecture, design patterns, and implementation details.
|
|
|
|
## System Overview
|
|
|
|
TheAuditor is an offline-first, AI-centric SAST (Static Application Security Testing) and code intelligence platform. It orchestrates industry-standard tools to provide ground truth about code quality and security, producing AI-consumable reports optimized for LLM context windows.
|
|
|
|
### Core Design Principles
|
|
|
|
1. **Offline-First Operation** - All analysis runs without network access, ensuring data privacy and reproducible results
|
|
2. **Dual-Mode Architecture** - Courier Mode preserves raw external tool outputs; Expert Mode applies security expertise objectively
|
|
3. **AI-Centric Workflow** - Produces chunks optimized for LLM context windows (65KB by default)
|
|
4. **Sandboxed Execution** - Isolated analysis environment prevents cross-contamination
|
|
5. **No Fix Generation** - Reports findings without prescribing solutions
|
|
|
|
## Truth Courier vs Insights: Separation of Concerns
|
|
|
|
TheAuditor maintains a strict architectural separation between **factual observation** and **optional interpretation**:
|
|
|
|
### Truth Courier Modules (Core)
|
|
These modules are the foundation - they gather and report verifiable facts without judgment:
|
|
|
|
- **Indexer**: Reports "Function X exists at line Y with Z parameters"
|
|
- **Taint Analyzer**: Reports "Data flows from pattern A to pattern B through path C"
|
|
- **Impact Analyzer**: Reports "Changing function X affects Y files through Z call chains"
|
|
- **Graph Analyzer**: Reports "Module A imports B, B imports C, C imports A (cycle detected)"
|
|
- **Pattern Detector**: Reports "Line X matches pattern Y from rule Z"
|
|
- **Linters**: Reports "Tool ESLint flagged line X with rule Y"
|
|
|
|
These modules form the immutable ground truth. They report **what exists**, not what it means.
|
|
|
|
### Insights Modules (Optional Interpretation Layer)
|
|
These are **optional packages** that consume Truth Courier data to add scoring and classification. All insights modules have been consolidated into a single package for better organization:
|
|
|
|
```
|
|
theauditor/insights/
|
|
├── __init__.py # Package exports
|
|
├── ml.py # Machine learning predictions (requires pip install -e ".[ml]")
|
|
├── graph.py # Graph health scoring and recommendations
|
|
└── taint.py # Vulnerability severity classification
|
|
```
|
|
|
|
- **insights/taint.py**: Adds "This flow is XSS with HIGH severity"
|
|
- **insights/graph.py**: Adds "Health score: 70/100, Grade: B"
|
|
- **insights/ml.py** (requires `pip install -e ".[ml]"`): Adds "80% probability of bugs based on historical patterns"
|
|
|
|
**Important**: Insights modules are:
|
|
- Not installed by default (ML requires explicit opt-in)
|
|
- Completely decoupled from core analysis
|
|
- Still based on technical patterns, not business logic interpretation
|
|
- Designed for teams that want actionable scores alongside raw facts
|
|
- All consolidated in `/insights` package for consistency
|
|
|
|
### The FCE: Factual Correlation Engine
|
|
The FCE correlates facts from multiple tools without interpreting them:
|
|
- Reports: "Tool A and Tool B both flagged line 100"
|
|
- Reports: "Pattern X and Pattern Y co-occur in file Z"
|
|
- Never says: "This is bad" or "Fix this way"
|
|
|
|
## Core Components
|
|
|
|
### Indexer Package (`theauditor/indexer/`)
|
|
The indexer has been refactored from a monolithic 2000+ line file into a modular package structure:
|
|
|
|
```
|
|
theauditor/indexer/
|
|
├── __init__.py # Package initialization and backward compatibility
|
|
├── config.py # Constants, patterns, and configuration
|
|
├── database.py # DatabaseManager class for all DB operations
|
|
├── core.py # FileWalker and ASTCache classes
|
|
├── orchestrator.py # IndexOrchestrator - main coordination logic
|
|
└── extractors/
|
|
├── __init__.py # BaseExtractor abstract class and registry
|
|
├── python.py # Python-specific extraction logic
|
|
├── javascript.py # JavaScript/TypeScript extraction
|
|
├── docker.py # Docker/docker-compose extraction
|
|
├── sql.py # SQL extraction
|
|
└── nginx.py # Nginx configuration extraction
|
|
```
|
|
|
|
Key features:
|
|
- **Dynamic extractor registry** for automatic language detection
|
|
- **Batched database operations** (200 records per batch by default)
|
|
- **AST caching** for performance optimization
|
|
- **Monorepo detection** and intelligent path filtering
|
|
- **Parallel JavaScript processing** when semantic parser available
|
|
|
|
### Pipeline System (`theauditor/pipelines.py`)
|
|
Orchestrates comprehensive analysis pipeline in **parallel stages**:
|
|
|
|
**Stage 1 - Foundation (Sequential):**
|
|
1. Repository indexing - Build manifest and symbol database
|
|
2. Framework detection - Identify technologies in use
|
|
|
|
**Stage 2 - Concurrent Analysis (3 Parallel Tracks):**
|
|
- **Track A (Network I/O):**
|
|
- Dependency checking
|
|
- Documentation fetching
|
|
- Documentation summarization
|
|
- **Track B (Code Analysis):**
|
|
- Workset creation
|
|
- Linting
|
|
- Pattern detection
|
|
- **Track C (Graph Build):**
|
|
- Graph building
|
|
|
|
**Stage 3 - Final Aggregation (Sequential):**
|
|
- Graph analysis
|
|
- Taint analysis
|
|
- Factual correlation engine
|
|
- Report generation
|
|
|
|
### Pattern Detection Engine
|
|
- 100+ YAML-defined security patterns in `theauditor/patterns/`
|
|
- AST-based matching for Python and JavaScript
|
|
- Supports semantic analysis via TypeScript compiler
|
|
|
|
### Factual Correlation Engine (FCE) (`theauditor/fce.py`)
|
|
- **29 advanced correlation rules** in `theauditor/correlations/rules/`
|
|
- Detects complex vulnerability patterns across multiple tools
|
|
- Categories: Authentication, Injection, Data Exposure, Infrastructure, Code Quality, Framework-Specific
|
|
|
|
### Taint Analysis Package (`theauditor/taint_analyzer.py`)
|
|
A comprehensive taint analysis module that tracks data flow from sources to sinks:
|
|
|
|
- Tracks data flow from user inputs to dangerous outputs
|
|
- Detects SQL injection, XSS, command injection vulnerabilities
|
|
- Database-aware analysis using `repo_index.db`
|
|
- Supports both assignment-based and direct-use patterns
|
|
- Merges findings from multiple detection methods
|
|
|
|
**Note**: The optional severity scoring for taint analysis is provided by `theauditor/insights/taint.py` (Insights module)
|
|
|
|
### Graph Analysis (`theauditor/graph/`)
|
|
- **builder.py**: Constructs dependency graph from codebase
|
|
- **analyzer.py**: Detects cycles, measures complexity, identifies hotspots
|
|
- Uses NetworkX for graph algorithms
|
|
|
|
**Note**: The optional health scoring and recommendations are provided by `theauditor/insights/graph.py` (Insights module)
|
|
|
|
### Framework Detection (`theauditor/framework_detector.py`)
|
|
- Auto-detects Django, Flask, React, Vue, Angular, etc.
|
|
- Applies framework-specific rules
|
|
- Influences pattern selection and analysis behavior
|
|
|
|
### Configuration Parsers (`theauditor/parsers/`)
|
|
Specialized parsers for configuration file analysis:
|
|
- **webpack_config_parser.py**: Webpack configuration analysis
|
|
- **compose_parser.py**: Docker Compose file parsing
|
|
- **nginx_parser.py**: Nginx configuration parsing
|
|
- **dockerfile_parser.py**: Dockerfile security analysis
|
|
- **prisma_schema_parser.py**: Prisma ORM schema parsing
|
|
|
|
These parsers are used by extractors during indexing to extract security-relevant configuration data.
|
|
|
|
### Refactoring Detection (`theauditor/commands/refactor.py`)
|
|
Detects incomplete refactorings and cross-stack inconsistencies:
|
|
- Analyzes database migrations to detect schema changes
|
|
- Uses impact analysis to trace affected files
|
|
- Applies correlation rules from `/correlations/rules/refactoring.yaml`
|
|
- Detects API contract mismatches, field migrations, foreign key changes
|
|
- Supports auto-detection from migration files or specific change analysis
|
|
|
|
## System Architecture Diagrams
|
|
|
|
### High-Level Data Flow
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Input Layer"
|
|
CLI[CLI Commands]
|
|
Files[Project Files]
|
|
end
|
|
|
|
subgraph "Core Pipeline"
|
|
Index[Indexer]
|
|
Framework[Framework Detector]
|
|
Deps[Dependency Checker]
|
|
Patterns[Pattern Detection]
|
|
Taint[Taint Analysis]
|
|
Graph[Graph Builder]
|
|
FCE[Factual Correlation Engine]
|
|
end
|
|
|
|
subgraph "Storage"
|
|
DB[(SQLite DB)]
|
|
Raw[Raw Output]
|
|
Chunks[65KB Chunks]
|
|
end
|
|
|
|
CLI --> Index
|
|
Files --> Index
|
|
Index --> DB
|
|
Index --> Framework
|
|
Framework --> Deps
|
|
|
|
Deps --> Patterns
|
|
Patterns --> Graph
|
|
Graph --> Taint
|
|
Taint --> FCE
|
|
|
|
FCE --> Raw
|
|
Raw --> Chunks
|
|
```
|
|
|
|
### Parallel Pipeline Execution
|
|
|
|
```mermaid
|
|
graph LR
|
|
subgraph "Stage 1 - Sequential"
|
|
S1[Index] --> S2[Framework Detection]
|
|
end
|
|
|
|
subgraph "Stage 2 - Parallel"
|
|
direction TB
|
|
subgraph "Track A - Network I/O"
|
|
A1[Deps Check]
|
|
A2[Doc Fetch]
|
|
A3[Doc Summary]
|
|
A1 --> A2 --> A3
|
|
end
|
|
|
|
subgraph "Track B - Code Analysis"
|
|
B1[Workset]
|
|
B2[Linting]
|
|
B3[Patterns]
|
|
B1 --> B2 --> B3
|
|
end
|
|
|
|
subgraph "Track C - Graph"
|
|
C1[Graph Build]
|
|
end
|
|
end
|
|
|
|
subgraph "Stage 3 - Sequential"
|
|
E1[Graph Analysis] --> E2[Taint] --> E3[FCE] --> E4[Report]
|
|
end
|
|
|
|
S2 --> A1
|
|
S2 --> B1
|
|
S2 --> C1
|
|
|
|
A3 --> E1
|
|
B3 --> E1
|
|
C1 --> E1
|
|
```
|
|
|
|
### Data Chunking System
|
|
|
|
The extraction system (`theauditor/extraction.py`) implements pure courier model chunking:
|
|
|
|
```mermaid
|
|
graph TD
|
|
subgraph "Analysis Results"
|
|
P[Patterns.json]
|
|
T[Taint.json<br/>Multiple lists merged]
|
|
L[Lint.json]
|
|
F[FCE.json]
|
|
end
|
|
|
|
subgraph "Extraction Process"
|
|
E[Extraction Engine<br/>Budget: 1.5MB]
|
|
M[Merge Logic<br/>For taint_paths +<br/>rule_findings]
|
|
C1[Chunk 1<br/>0-65KB]
|
|
C2[Chunk 2<br/>65-130KB]
|
|
C3[Chunk 3<br/>130-195KB]
|
|
TR[Truncation<br/>Flag]
|
|
end
|
|
|
|
subgraph "Output"
|
|
R1[patterns_chunk01.json]
|
|
R2[patterns_chunk02.json]
|
|
R3[patterns_chunk03.json]
|
|
end
|
|
|
|
P --> E
|
|
T --> M --> E
|
|
L --> E
|
|
F --> E
|
|
|
|
E --> C1 --> R1
|
|
E --> C2 --> R2
|
|
E --> C3 --> R3
|
|
E -.->|If >195KB| TR
|
|
TR -.-> R3
|
|
```
|
|
|
|
Key features:
|
|
- **Budget system**: 1.5MB total budget for all chunks
|
|
- **Smart merging**: Taint analysis merges multiple finding lists (taint_paths, rule_findings, infrastructure)
|
|
- **Preservation**: All findings preserved, no filtering or sampling
|
|
- **Chunking**: Only chunks files >65KB, copies smaller files as-is
|
|
|
|
### Dual Environment Architecture
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Development Environment"
|
|
V1[.venv/]
|
|
PY[Python 3.11+]
|
|
AU[TheAuditor Code]
|
|
V1 --> PY --> AU
|
|
end
|
|
|
|
subgraph "Sandboxed Analysis Environment"
|
|
V2[.auditor_venv/.theauditor_tools/]
|
|
NODE[Bundled Node.js v20.11.1]
|
|
TS[TypeScript Compiler]
|
|
ES[ESLint]
|
|
PR[Prettier]
|
|
NM[node_modules/]
|
|
V2 --> NODE
|
|
NODE --> TS
|
|
NODE --> ES
|
|
NODE --> PR
|
|
NODE --> NM
|
|
end
|
|
|
|
AU -->|Analyzes using| V2
|
|
AU -.->|Never uses| V1
|
|
```
|
|
|
|
TheAuditor maintains strict separation between:
|
|
1. **Primary Environment** (`.venv/`): TheAuditor's Python code and dependencies
|
|
2. **Sandboxed Environment** (`.auditor_venv/.theauditor_tools/`): Isolated JS/TS analysis tools
|
|
|
|
This ensures reproducibility and prevents TheAuditor from analyzing its own analysis tools.
|
|
|
|
## Database Schema
|
|
|
|
```mermaid
|
|
erDiagram
|
|
files ||--o{ symbols : contains
|
|
files ||--o{ refs : contains
|
|
files ||--o{ api_endpoints : contains
|
|
files ||--o{ sql_queries : contains
|
|
files ||--o{ docker_images : contains
|
|
|
|
files {
|
|
string path PK
|
|
string language
|
|
int size
|
|
string hash
|
|
json metadata
|
|
}
|
|
|
|
symbols {
|
|
string path FK
|
|
string name
|
|
string type
|
|
int line
|
|
json metadata
|
|
}
|
|
|
|
refs {
|
|
string src FK
|
|
string value
|
|
string kind
|
|
int line
|
|
}
|
|
|
|
api_endpoints {
|
|
string file FK
|
|
string method
|
|
string path
|
|
int line
|
|
}
|
|
|
|
sql_queries {
|
|
string file_path FK
|
|
string command
|
|
string query
|
|
int line_number
|
|
}
|
|
|
|
docker_images {
|
|
string file_path FK
|
|
string base_image
|
|
json env_vars
|
|
json build_args
|
|
}
|
|
```
|
|
|
|
## Command Flow Sequence
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant User
|
|
participant CLI
|
|
participant Pipeline
|
|
participant Analyzers
|
|
participant Database
|
|
participant Output
|
|
|
|
User->>CLI: aud full
|
|
CLI->>Pipeline: Execute pipeline
|
|
Pipeline->>Database: Initialize schema
|
|
|
|
Pipeline->>Analyzers: Index files
|
|
Analyzers->>Database: Store file metadata
|
|
|
|
par Parallel Execution
|
|
Pipeline->>Analyzers: Dependency check
|
|
and
|
|
Pipeline->>Analyzers: Pattern detection
|
|
and
|
|
Pipeline->>Analyzers: Graph building
|
|
end
|
|
|
|
Pipeline->>Analyzers: Taint analysis
|
|
Analyzers->>Database: Query symbols & refs
|
|
|
|
Pipeline->>Analyzers: FCE correlation
|
|
Analyzers->>Output: Generate reports
|
|
|
|
Pipeline->>Output: Create chunks
|
|
Output->>User: .pf/readthis/
|
|
```
|
|
|
|
## Output Structure
|
|
|
|
All results are organized in the `.pf/` directory:
|
|
|
|
```
|
|
.pf/
|
|
├── raw/ # Immutable tool outputs (ground truth)
|
|
│ ├── eslint.json
|
|
│ ├── ruff.json
|
|
│ └── ...
|
|
├── readthis/ # AI-optimized chunks (<65KB each, max 3 chunks per file)
|
|
│ ├── manifest.md # Repository overview
|
|
│ ├── patterns_*.md # Security findings
|
|
│ ├── taint_*.md # Data-flow issues
|
|
│ └── tickets_*.md # Actionable tasks
|
|
├── repo_index.db # SQLite database of code symbols
|
|
├── pipeline.log # Execution trace
|
|
└── findings.json # Consolidated results
|
|
```
|
|
|
|
### Key Output Files
|
|
|
|
- **manifest.md**: Complete file inventory with SHA-256 hashes
|
|
- **patterns_*.md**: Chunked security findings from 100+ detection rules
|
|
- **tickets_*.md**: Prioritized, actionable issues with evidence
|
|
- **repo_index.db**: Queryable database of all code symbols and relationships
|
|
|
|
## Operating Modes
|
|
|
|
TheAuditor operates in two distinct modes:
|
|
|
|
### Courier Mode (External Tools)
|
|
- Preserves exact outputs from ESLint, Ruff, MyPy, etc.
|
|
- No interpretation or filtering
|
|
- Complete audit trail from source to finding
|
|
|
|
### Expert Mode (Internal Engines)
|
|
- **Taint Analysis**: Tracks untrusted data through the application
|
|
- **Pattern Detection**: YAML-based rules with AST matching
|
|
- **Graph Analysis**: Architectural insights and dependency tracking
|
|
- **Secret Detection**: Identifies hardcoded credentials and API keys
|
|
|
|
## CLI Entry Points
|
|
|
|
- **Main CLI**: `theauditor/cli.py` - Central command router
|
|
- **Command modules**: `theauditor/commands/` - One module per command
|
|
- **Utilities**: `theauditor/utils/` - Shared functionality
|
|
- **Configuration**: `theauditor/config_runtime.py` - Runtime configuration
|
|
|
|
Each command module follows a standardized structure with:
|
|
- `@click.command()` decorator
|
|
- `@handle_exceptions` decorator for error handling
|
|
- Consistent logging and output formatting
|
|
|
|
## Performance Optimizations
|
|
|
|
- **Batched database operations**: 200 records per batch (configurable)
|
|
- **Parallel rule execution**: ThreadPoolExecutor with 4 workers
|
|
- **AST caching**: Persistent cache for parsed AST trees
|
|
- **Incremental analysis**: Workset-based analysis for changed files only
|
|
- **Lazy loading**: Patterns and rules loaded on-demand
|
|
- **Memory-efficient chunking**: Stream large files instead of loading entirely
|
|
|
|
## Configuration System
|
|
|
|
TheAuditor supports runtime configuration via multiple sources (priority order):
|
|
|
|
1. **Environment variables** (`THEAUDITOR_*` prefix)
|
|
2. **`.pf/config.json`** file (project-specific)
|
|
3. **Built-in defaults** in `config_runtime.py`
|
|
|
|
Example configuration:
|
|
```bash
|
|
export THEAUDITOR_LIMITS_MAX_CHUNKS_PER_FILE=5 # Default: 3
|
|
export THEAUDITOR_LIMITS_MAX_CHUNK_SIZE=100000 # Default: 65000
|
|
export THEAUDITOR_LIMITS_MAX_FILE_SIZE=5242880 # Default: 2097152
|
|
export THEAUDITOR_TIMEOUTS_LINT_TIMEOUT=600 # Default: 300
|
|
```
|
|
|
|
## Advanced Features
|
|
|
|
### Database-Aware Rules
|
|
Specialized analyzers query `repo_index.db` to detect:
|
|
- ORM anti-patterns (N+1 queries, missing transactions)
|
|
- Docker security misconfigurations
|
|
- Nginx configuration issues
|
|
- Multi-file correlation patterns
|
|
|
|
### Holistic Analysis
|
|
Project-level analyzers that operate across the entire codebase:
|
|
- **Bundle Analyzer**: Correlates package.json, lock files, and imports
|
|
- **Source Map Detector**: Scans build directories for exposed maps
|
|
- **Framework Detectors**: Identify technology stack automatically
|
|
|
|
### Incremental Analysis
|
|
Workset-based analysis for efficient processing:
|
|
- Git diff integration for changed file detection
|
|
- Dependency tracking for impact analysis
|
|
- Cached results for unchanged files
|
|
|
|
## Contributing to TheAuditor
|
|
|
|
### Adding Language Support
|
|
|
|
TheAuditor's modular architecture makes it straightforward to add new language support:
|
|
|
|
#### 1. Create an Extractor
|
|
Create a new extractor in `theauditor/indexer/extractors/{language}.py`:
|
|
|
|
```python
|
|
from . import BaseExtractor
|
|
|
|
class {Language}Extractor(BaseExtractor):
|
|
def supported_extensions(self) -> List[str]:
|
|
return ['.ext', '.ext2']
|
|
|
|
def extract(self, file_info, content, tree=None):
|
|
# Extract symbols, imports, routes, etc.
|
|
return {
|
|
'imports': [],
|
|
'routes': [],
|
|
'symbols': [],
|
|
# ... other extracted data
|
|
}
|
|
```
|
|
|
|
The extractor will be automatically registered via the `BaseExtractor` inheritance.
|
|
|
|
#### 2. Create Configuration Parser (Optional)
|
|
For configuration files, create a parser in `theauditor/parsers/{language}_parser.py`:
|
|
|
|
```python
|
|
class {Language}Parser:
|
|
def parse_file(self, file_path: Path) -> Dict[str, Any]:
|
|
# Parse configuration file
|
|
return parsed_data
|
|
```
|
|
|
|
#### 3. Add Security Patterns
|
|
Create YAML patterns in `theauditor/patterns/{language}.yml`:
|
|
|
|
```yaml
|
|
- name: hardcoded-secret-{language}
|
|
pattern: 'api_key\s*=\s*["\'][^"\']+["\']'
|
|
severity: critical
|
|
category: security
|
|
languages: ["{language}"]
|
|
description: "Hardcoded API key in {Language} code"
|
|
```
|
|
|
|
#### 4. Add Framework Detection
|
|
Update `theauditor/framework_detector.py` to detect {Language} frameworks.
|
|
|
|
### Adding New Analyzers
|
|
|
|
#### Database-Aware Rules
|
|
Create analyzers that query `repo_index.db` in `theauditor/rules/{category}/`:
|
|
|
|
```python
|
|
def find_{issue}_patterns(db_path: str) -> List[Dict[str, Any]]:
|
|
conn = sqlite3.connect(db_path)
|
|
# Query and analyze
|
|
return findings
|
|
```
|
|
|
|
#### AST-Based Rules
|
|
For semantic analysis, create rules in `theauditor/rules/{framework}/`:
|
|
|
|
```python
|
|
def find_{framework}_issues(tree, file_path) -> List[Dict[str, Any]]:
|
|
# Traverse AST and detect issues
|
|
return findings
|
|
```
|
|
|
|
#### Pattern-Based Rules
|
|
Add YAML patterns to `theauditor/patterns/` for regex-based detection.
|
|
|
|
### Architecture Guidelines
|
|
|
|
1. **Maintain Truth Courier vs Insights separation** - Core modules report facts, insights add interpretation
|
|
2. **Use the extractor registry** - Inherit from `BaseExtractor` for automatic registration
|
|
3. **Follow existing patterns** - Look at `python.py` or `javascript.py` extractors as examples
|
|
4. **Write comprehensive tests** - Test extractors, parsers, and patterns
|
|
5. **Document your additions** - Update this file and CONTRIBUTING.md
|
|
|
|
For detailed contribution guidelines, see [CONTRIBUTING.md](CONTRIBUTING.md). |