# How to Use TheAuditor This comprehensive guide covers everything you need to know about setting up, configuring, and using **TheAuditor** for code analysis and security auditing. Whether you're performing a one-time security audit or integrating continuous analysis into your development workflow, this guide will walk you through every step. --- ## Prerequisites Before installing **TheAuditor**, ensure you have: - **Python 3.11 or higher** (3.12+ recommended) - **Git** (for repository operations) - **Operating System**: Linux, macOS, or Windows with WSL --- ## Installation & Setup ### Understanding the Architecture TheAuditor uses a **dual-environment** design: 1. **TheAuditor Installation** - The tool itself (installed once, used everywhere) 2. **Project Sandbox** - Created per-project for isolated analysis ### Step 1: Install TheAuditor Tool **IMPORTANT**: Do NOT create a virtual environment. Use your system Python. ```bash # Choose a permanent location for TheAuditor (NOT inside your projects) cd ~/tools # or C:\tools on Windows, or wherever you keep dev tools # Clone the repository git clone https://github.com/TheAuditorTool/Auditor.git cd TheAuditor # Install TheAuditor to your system pip install -e . # Verify the installation worked aud --version # Optional: Install with ML capabilities # pip install -e ".[ml]" # For development with all optional dependencies: # pip install -e ".[all]" # Includes Insights module ``` **Common Mistakes to Avoid:** - ❌ Don't create a venv before installing TheAuditor - ❌ Don't install TheAuditor inside your project directory - ❌ Don't run `pip install` from your project directory - ✅ Install TheAuditor ONCE in a tools directory - ✅ Use TheAuditor to analyze MANY projects ### Step 2: Setup Project for Analysis (MANDATORY) **Navigate to YOUR PROJECT directory first:** ```bash # Go to the project you want to analyze (NOT TheAuditor directory!) cd ~/my-project-to-audit # Create the sandboxed environment for THIS project aud setup-claude --target . ``` This command: - Creates **`.auditor_venv/.theauditor_tools/`** sandbox directory - Installs **TypeScript compiler** (`tsc`) in isolation - Installs **ESLint** and related tools - Updates all tools to latest versions - Configures the sandbox for TheAuditor's exclusive use **Why is this required?** - TheAuditor **NEVER** uses your global or project-installed tools - Ensures reproducible results across different environments - Prevents contamination between analysis tools and project dependencies - **Required for TheAuditor to function at all** - not just for JavaScript/TypeScript analysis **Expected output:** ``` Step 1: Setting up Python virtual environment... [OK] Venv already exists: C:\Users\user\Desktop\TheAuditor\.auditor_venv [OK] TheAuditor already installed in C:\Users\user\Desktop\TheAuditor\.auditor_venv Upgrading to ensure latest version... Installing TheAuditor from C:\Users\user\Desktop\TheAuditor... [OK] Installed TheAuditor (editable) from C:\Users\user\Desktop\TheAuditor [OK] Executable available: C:\Users\user\Desktop\TheAuditor\.auditor_venv\Scripts\aud.exe Installing Python linting tools... Checking for latest linter versions... [OK] Updated to latest package versions Installing linters from pyproject.toml... [OK] Python linters installed (ruff, mypy, black, bandit, pylint) Setting up JavaScript/TypeScript tools in sandboxed environment... Creating sandboxed tools directory: C:\Users\user\Desktop\TheAuditor\.auditor_venv\.theauditor_tools [OK] ESLint v9 flat config copied to sandbox [Track A] Checking for latest tool versions... [Track B] Setting up portable Node.js runtime... [OK] Node.js runtime already installed at C:\Users\user\Desktop\TheAuditor\.auditor_venv\.theauditor_tools\node-runtime [OK] Updated @typescript-eslint/parser: 8.41.0 → ^8.42.0 [OK] Updated @typescript-eslint/eslint-plugin: 8.41.0 → ^8.42.0 Updated 2 packages to latest versions Installing JS/TS linters using bundled Node.js... [OK] JavaScript/TypeScript tools installed in sandbox [OK] Tools isolated from project: C:\Users\user\Desktop\TheAuditor\.auditor_venv\.theauditor_tools [OK] Using bundled Node.js - no system dependency! [OK] ESLint verified at: C:\Users\user\Desktop\TheAuditor\.auditor_venv\.theauditor_tools\node_modules\.bin\eslint.cmd ``` --- ## Core Commands & Workflow ### Complete Audit Pipeline On a medium 20k LOC node/react/vite stack, expect the analysis to take around 30 minutes. Progress bars for tracks B/C may display inconsistently on PowerShell. Run a comprehensive audit with multiple analysis phases organized in parallel stages: ```bash aud full # Skip network operations (deps, docs) for faster execution aud full --offline ``` This executes in **parallel stages** for optimal performance: **Stage 1 - Foundation (Sequential):** 1. **Repository indexing** - Build manifest and symbol database 2. **Framework detection** - Identify technologies in use **Stage 2 - Concurrent Analysis (3 Parallel Tracks):** - **Track A (Network I/O):** *(skipped with --offline)* 3. **Dependency checking** - Scan for vulnerabilities 4. **Documentation fetching** - Gather project docs 5. **Documentation summarization** - Create AI-friendly summaries - **Track B (Code Analysis):** 6. **Workset creation** - Define analysis scope 7. **Linting** - Run code quality checks 8. **Pattern detection** - Apply security rules - **Track C (Graph Build):** 9. **Graph building** - Construct dependency graph **Stage 3 - Final Aggregation (Sequential):** 10. **Graph analysis** - Find architectural issues 11. **Taint analysis** - Track data flow 12. **Factual correlation engine** - Correlate findings across tools with 29 advanced rules 13. **Report generation** - Produce final output 14. **Summary generation** - Create executive summary **Output**: Complete results in **`.pf/readthis/`** directory ### Offline Mode When working on the same codebase repeatedly or when network access is limited, use offline mode to skip network operations (dependency checking and documentation fetching): ```bash # Run full audit without network operations aud full --offline # Combine with other flags aud full --offline --quiet aud full --offline --exclude-self # Only meant for dogfooding; in 9/10 projects, --exclude-self will correctly exclude the entire project, producing empty results ``` **Benefits:** - **Faster execution** - Skips slow network operations - **Air-gapped operation** - Works without internet access - **Iterative development** - Perfect for repeated runs during development **What gets skipped:** - Dependency vulnerability scanning - Documentation fetching and summarization - Latest version checks **What still runs:** - All code analysis (indexing, linting, patterns) - Graph building and analysis - Taint analysis and FCE - Report generation ### Incremental Analysis (Workset-based) Analyze only changed files based on git diff: ```bash # Create workset from uncommitted changes aud workset # Create workset from specific commit range aud workset --diff "HEAD~3..HEAD" # Create workset for all files aud workset --all ``` Then run targeted analysis: ```bash aud lint --workset aud detect-patterns --workset ``` ### Linting with Auto-fix Run comprehensive linting across all supported languages: ```bash # Run linting on workset aud lint --workset # Auto-fix issues where possible aud lint --fix # Run on all files aud lint --all ``` Supports: - **Python**: **Ruff**, **MyPy**, **Black**, **Bandit**, **Pylint** - **JavaScript/TypeScript**: **ESLint** with TypeScript parser - **General**: **Prettier** for formatting ### Security Analysis #### Taint Analysis Track data flow from **sources** (user input) to **sinks** (database, output): ```bash aud taint-analyze ``` Detects: - **SQL injection** vulnerabilities - **XSS** (Cross-site scripting) - **Command injection** - **Path traversal** - Other injection attacks #### Pattern Detection Run pattern-based vulnerability scanning: ```bash aud detect-patterns ``` Uses **100+ YAML-defined patterns** across multiple categories: **Security Patterns:** - Hardcoded secrets and API keys - Insecure randomness (**Math.random** for security) - Weak cryptographic algorithms - Authentication bypasses - Missing authentication decorators **Resource Management:** - Socket, stream, and worker leaks - File handles not closed properly - Database connections left open - Event listeners not removed **Concurrency Issues:** - **Race conditions** (check-then-act) - **Deadlocks** (nested locks, lock ordering) - Shared state without synchronization - Unsafe parallel writes **ORM & Database:** - **Sequelize** death queries and N+1 patterns - **Prisma** connection pool exhaustion - **TypeORM** missing transactions - Missing database indexes **Deployment & Infrastructure:** - **Docker** security misconfigurations - **nginx** exposed paths and weak SSL - **docker-compose** privileged containers - **webpack** source map exposure in production **Framework-Specific:** - **Django**, **Flask**, **FastAPI** vulnerabilities - **React** hooks dependency issues - **Vue** reactivity problems - **Angular**, **Next.js**, **Express.js** patterns - Multi-tenant security violations ### Docker Security Analysis Analyze Docker images for security misconfigurations and vulnerabilities: ```bash # Analyze all indexed Docker images aud docker-analyze # Filter by severity level aud docker-analyze --severity critical # Save results to JSON file aud docker-analyze --output docker-security.json ``` Detects: - **Containers running as root** - CIS Docker Benchmark violation - **Exposed secrets in ENV/ARG** - Hardcoded passwords, API keys, tokens - **High entropy values** - Potential secrets using Shannon entropy - **Known secret patterns** - GitHub tokens, AWS keys, Slack tokens The command requires Docker images to be indexed first (`aud index`). It queries the `repo_index.db` for Docker metadata and performs security analysis. ### Project Structure Report Generate comprehensive project structure and intelligence reports: ```bash # Generate default structure report aud structure # Specify output location aud structure --output PROJECT_OVERVIEW.md # Adjust directory tree depth aud structure --max-depth 6 # Analyze different root directory aud structure --root ./src ``` The report includes: - **Directory tree visualization** - Smart file grouping and critical file(size/loc) highlighting - **Project statistics** - Total files, LOC, estimated tokens - **Language distribution** - Percentage breakdown by file type - **Top 10 largest files** - By token count with percentage of codebase - **Top 15 critical files** - Identified by naming conventions (auth.py, config.js, etc.) - **AI context optimization** - Recommendations for reading order and token budget - **Symbol counts** - Functions, classes, imports from database Useful for: - Getting quick project overview - Understanding codebase structure - Planning AI assistant interactions - Identifying critical components - Token budget management for LLMs ### Impact Analysis Assess the blast radius of a specific code change: ```bash # Analyze impact of changes to a specific function aud impact --file "src/auth/login.py" --line 42 # Analyze impact with depth limit aud impact --file "src/database.py" --line 100 --depth 3 # Trace frontend to backend dependencies aud impact --file "frontend/api.ts" --line 50 --trace-to-backend ``` Shows: - Dependent functions and modules - Call chain analysis - Affected test files - Risk assessment - Cross-stack impact (frontend → backend tracing) ### Refactoring Analysis Detect and analyze refactoring issues such as data model changes, API contract mismatches, and incomplete migrations: ```bash # Analyze impact from a specific model change aud refactor --file "models/Product.ts" --line 42 # Auto-detect refactoring from database migrations aud refactor --auto-detect --migration-dir backend/migrations # Analyze current workset for refactoring issues aud refactor --workset # Generate detailed report aud refactor --auto-detect --output refactor_report.json ``` Detects: - **Data Model Changes**: Fields moved between tables (e.g., `product.price` → `variant.price`) - **Foreign Key Changes**: References updated (e.g., `product_id` → `product_variant_id`) - **API Contract Mismatches**: Frontend expects old structure, backend provides new - **Missing Updates**: Code still using old field/table names - **Cross-Stack Inconsistencies**: TypeScript interfaces not matching backend models The refactor command uses: - Impact analysis to trace affected files - Migration file analysis to detect schema changes - Pattern detection with refactoring-specific rules - FCE correlation to find related issues - Risk assessment based on blast radius ### Insights Analysis (Optional) Run optional interpretive analysis on top of factual audit data: ```bash # Run all insights modules aud insights --mode all # ML-powered insights (requires pip install -e ".[ml]") aud insights --mode ml --ml-train # Graph health metrics and recommendations aud insights --mode graph # Taint vulnerability scoring aud insights --mode taint # Impact analysis insights aud insights --mode impact # Generate comprehensive report aud insights --output insights_report.json # Train ML model on your codebase patterns aud insights --mode ml --ml-train --training-data .pf/raw/ # Get ML-powered suggestions aud insights --mode ml --ml-suggest ``` Modes: - **ml**: Machine learning predictions and pattern recognition - **graph**: Health scores, architectural recommendations - **taint**: Vulnerability severity scoring and classification - **impact**: Change impact assessment and risk scoring - **all**: Run all available insights modules The insights command: - Reads existing audit data from `.pf/raw/` - Applies interpretive scoring and classification - Generates actionable recommendations - Outputs to `.pf/insights/` for separation from facts - Provides technical scoring without crossing into semantic interpretation ### Graph Visualization Generate rich visual intelligence from dependency graphs: ```bash # Build dependency graphs first aud graph build # Basic visualization aud graph viz # Show only dependency cycles aud graph viz --view cycles --include-analysis # Top 10 hotspots (most connected nodes) aud graph viz --view hotspots --top-hotspots 10 # Architectural layers visualization aud graph viz --view layers --format svg # Impact analysis visualization aud graph viz --view impact --impact-target "src/auth/login.py" # Call graph instead of import graph aud graph viz --graph-type call --view full # Generate SVG for AI analysis aud graph viz --format svg --include-analysis --title "System Architecture" # Custom output location aud graph viz --out-dir ./architecture/ --format png ``` View Modes: - **full**: Complete graph with all nodes and edges - **cycles**: Only nodes/edges involved in dependency cycles (red highlighting) - **hotspots**: Top N most connected nodes with gradient coloring - **layers**: Architectural layers as subgraphs with clear hierarchy - **impact**: Highlight impact radius with color-coded upstream/downstream Visual Encoding: - **Node Color**: Programming language (Python=blue, JavaScript=yellow, TypeScript=blue) - **Node Size**: Importance/connectivity (larger = more dependencies) - **Edge Color**: Red for cycles, gray for normal dependencies - **Border Width**: Code churn (thicker = more changes) - **Node Shape**: Module=box, Function=ellipse, Class=diamond The graph viz command: - Generates Graphviz DOT format files - Optionally creates SVG/PNG images (requires Graphviz installation) - Supports filtered views for focusing on specific concerns - Includes analysis data for cycle and hotspot highlighting - Produces AI-readable SVG output for LLM analysis ### Dependency Management Check for outdated or vulnerable dependencies: ```bash # Check for latest versions aud deps --check-latest # Scan for known vulnerabilities aud deps --vuln-scan # Update all dependencies to latest aud deps --upgrade-all ``` --- ## Architecture: Truth Courier vs Insights ### Understanding the Separation of Concerns TheAuditor implements a strict architectural separation between **factual observation** (Truth Courier modules) and **optional interpretation** (Insights modules). This design ensures the tool remains an objective source of ground truth while offering actionable intelligence when needed. ### The Core Philosophy TheAuditor doesn't try to understand your business logic or make your AI "smarter." Instead, it solves the real problem: **LLMs lose context and make inconsistent changes across large codebases.** The workflow: 1. **You tell AI**: "Add JWT auth with CSRF tokens and password complexity" 2. **AI writes code**: Probably inconsistent due to context limits 3. **You run**: `aud full` 4. **TheAuditor reports**: All the inconsistencies and security holes 5. **AI reads the report**: Now sees the complete picture across all files 6. **AI fixes issues**: With full visibility of what's broken 7. **Repeat until clean** ### Truth Courier Modules (Core) These modules report verifiable facts without judgment: ```python # What Truth Couriers Report - Just Facts { "taint_analyzer": "Data from req.body flows to res.send at line 45", "pattern_detector": "Line 45 matches pattern 'unsanitized-output'", "impact_analyzer": "Changing handleRequest() affects 12 downstream functions", "graph_analyzer": "Module A imports B, B imports C, C imports A" } ``` **Key Truth Couriers:** - **Indexer**: Maps all code symbols and their locations - **Taint Analyzer**: Traces data flow through the application - **Impact Analyzer**: Maps dependency chains and change blast radius - **Graph Analyzer**: Detects cycles and architectural patterns - **Pattern Detector**: Matches code against security patterns ### Insights Modules (Optional Scoring) These optional modules add technical scoring and classification: ```python # What Insights Add - Technical Classifications { "taint/insights": { "vulnerability_type": "Cross-Site Scripting", "severity": "HIGH" }, "graph/insights": { "health_score": 70, "recommendation": "Reduce coupling" } } ``` **Installation:** ```bash # Base installation (Truth Couriers only) pip install -e . # With ML insights (optional) pip install -e ".[ml]" # Development with all dependencies (not for general users) # pip install -e ".[all]" ``` ### Correlation Rules: Detecting YOUR Patterns Correlation rules detect when multiple facts indicate an inconsistency in YOUR codebase: ```yaml # Example: Detecting incomplete refactoring - name: "PRODUCT_VARIANT_REFACTOR" co_occurring_facts: - tool: "grep" pattern: "ProductVariant.*retail_price" # Backend changed - tool: "grep" pattern: "product\\.unit_price" # Frontend didn't ``` This isn't "understanding" that products have prices. It's detecting that you moved a field from one model to another and some code wasn't updated. Pure consistency checking. The correlation engine loads rules from `/correlations/rules/`. We provide common patterns, but many are project-specific. You write rules that detect YOUR patterns, YOUR refactorings, YOUR inconsistencies. ### Why This Works **What doesn't work:** - Making AI "understand" your business domain - Adding semantic layers to guess what you mean - Complex context management systems **What does work:** - Accept that AI will make inconsistent changes - Detect those inconsistencies after the fact - Give AI the full picture so it can fix them TheAuditor doesn't try to prevent mistakes. It finds them so they can be fixed. ### Practical Example ```bash # You ask AI to implement authentication Human: "Add JWT auth with CSRF protection" # AI writes code (probably with issues due to context limits) AI: *implements auth across 15 files* # You audit it $ aud full # TheAuditor finds issues - "JWT secret hardcoded at auth.js:47" - "CSRF token generated but never validated" - "Auth middleware missing on /api/admin/*" # You can also check impact of changes $ aud impact --file "auth.js" --line 47 # Shows: "Changing this affects 23 files, 47 functions" # AI reads the audit and can now see ALL issues AI: *reads .pf/readthis/* AI: "I see 5 security issues across auth flow. Fixing..." # AI fixes with complete visibility AI: *fixes all issues because it can see the full picture* ``` ### Key Points 1. **No Business Logic Understanding**: TheAuditor doesn't need to know what your app does 2. **Just Consistency Checking**: It finds where your code doesn't match itself 3. **Facts, Not Opinions**: Reports what IS, not what SHOULD BE 4. **Complete Dependency Tracing**: Impact analyzer shows exactly what's affected by changes 5. **AI + Audit Loop**: Write → Audit → Fix → Repeat until clean This is why TheAuditor works where semantic understanding fails - it's not trying to read your mind, just verify your code's consistency. --- ## Understanding the Output ### Directory Structure After running analyses, results are organized in **`.pf/`**: ``` .pf/ ├── raw/ # Raw, unmodified tool outputs (Truth Couriers) │ ├── linting.json # Raw linter results │ ├── patterns.json # Pattern detection findings │ ├── taint_analysis.json # Taint analysis results │ ├── graph.json # Dependency graph data │ └── graph_analysis.json # Graph analysis (cycles, hotspots) │ ├── insights/ # Optional interpretive analysis (Insights modules) │ ├── ml_suggestions.json # ML predictions and patterns │ ├── taint_insights.json # Vulnerability severity scoring │ └── graph_insights.json # Health scores and recommendations │ ├── readthis/ # AI-consumable chunks │ ├── manifest.md # Repository overview │ ├── patterns_001.md # Chunked findings (65KB max) │ ├── patterns_002.md │ ├── taint_001.md # Chunked taint results │ ├── tickets_001.md # Actionable issue tickets │ └── summary.md # Executive summary │ ├── graphs/ # Graph visualizations │ ├── import_graph.dot # Dependency graph DOT file │ ├── import_graph_cycles.dot # Cycles-only view │ └── import_graph.svg # SVG visualization (if generated) │ ├── pipeline.log # Complete execution log ├── error.log # Error details (if failures occur) ├── findings.json # Consolidated findings ├── risk_scores.json # Risk analysis results └── report.md # Human-readable report ``` ### Key Output Files #### `.pf/raw/` Contains **unmodified outputs** from each tool. These files preserve the exact format and data from linters, scanners, and analyzers. **Never modified** after creation. This is the source of ground truth. #### `.pf/insights/` Contains **optional interpretive analysis** from Insights modules. These files add technical scoring and classification on top of raw data. Only created when insights commands are run. #### `.pf/graphs/` Contains **graph visualizations** in DOT and image formats. Generated by `aud graph viz` command with various view modes for focusing on specific concerns. #### `.pf/readthis/` Contains processed, **chunked data optimized for AI consumption**: - Each file is under **65KB** by default (configurable via `THEAUDITOR_LIMITS_MAX_CHUNK_SIZE`) - Maximum 3 chunks per file by default (configurable via `THEAUDITOR_LIMITS_MAX_CHUNKS_PER_FILE`) - Structured with clear headers and sections - Includes context, evidence, and suggested fixes - Ready for direct consumption by **Claude**, **GPT-4**, etc. #### `.pf/pipeline.log` Complete execution log showing: - Each phase's **execution time** - **Success/failure** status - Key statistics and findings - Error messages if any #### `.pf/error.log` Created only when errors occur. Contains: - Full **stack traces** - Detailed error messages - Phase-specific failure information - Debugging information --- ## Advanced Usage ### Custom Pattern Rules Create custom detection patterns in **`.pf/patterns/`**: ```yaml # .pf/patterns/custom_auth.yaml name: weak_password_check severity: high category: security pattern: 'password\s*==\s*["\']' description: "Hardcoded password comparison" test_template: | def test_weak_password(): assert password != "admin" ``` ### ML-Powered Suggestions Train models on your codebase patterns: ```bash # Initial training aud learn # Get improvement suggestions aud suggest # Provide feedback for continuous learning aud learn-feedback --accept ``` ### Development-Specific Flags #### Excluding TheAuditor's Own Files When testing or developing within TheAuditor's repository (e.g., analyzing `fakeproj/project_anarchy/`), use the `--exclude-self` flag to prevent false positives from TheAuditor's own files: ```bash # Exclude all TheAuditor files from analysis aud index --exclude-self aud full --exclude-self ``` This flag excludes: - All TheAuditor source code directories (`theauditor/`, `tests/`, etc.) - Root configuration files (`pyproject.toml`, `package-template.json`, `Dockerfile`) - Documentation and build files **Use case:** Testing vulnerable projects within TheAuditor's repository without framework detection picking up TheAuditor's own configuration files. ### CI/CD Integration #### GitHub Actions Example ```yaml name: Security Audit on: [push, pull_request] jobs: audit: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Set up Python uses: actions/setup-python@v2 with: python-version: '3.12' - name: Set up Node.js uses: actions/setup-node@v2 with: node-version: '18' - name: Install TheAuditor run: | pip install -e ".[all]" aud setup-claude --target . - name: Run Audit run: aud full - name: Upload Results if: always() uses: actions/upload-artifact@v2 with: name: audit-results path: .pf/ ``` ### Running TheAuditor on Its Own Codebase (Dogfooding) When developing TheAuditor or testing it on itself, you need a special dual-environment setup: #### Understanding the Dual-Environment Architecture TheAuditor maintains strict separation between: 1. **Primary Environment** (`.venv/`) - Where TheAuditor runs from 2. **Sandboxed Environment** (`.auditor_venv/.theauditor_tools/`) - Tools TheAuditor uses for analysis This ensures reproducibility and prevents TheAuditor from analyzing its own analysis tools. #### Setup Procedure for Dogfooding ```bash # 1. Clone and set up development environment git clone https://github.com/TheAuditorTool/Auditor.git cd theauditor python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate pip install -e . # 2. CRITICAL: Create the sandboxed analysis environment aud setup-claude --target . # 3. Verify setup aud full --quick-test # 4. Run full analysis on TheAuditor itself aud full ``` #### Analyzing Test Projects Within TheAuditor When analyzing test projects like `fakeproj/` from within TheAuditor's repository: ```bash cd fakeproj/project_anarchy aud full --exclude-self # Excludes TheAuditor's own files ``` The `--exclude-self` flag prevents: - Framework detection from identifying TheAuditor's `pyproject.toml` - False positives from TheAuditor's configuration files - Contamination from TheAuditor's source code --- ## Refactoring Detection TheAuditor includes sophisticated capabilities for detecting incomplete refactorings, data model changes, and cross-stack inconsistencies. ### Understanding Refactoring Issues Common refactoring problems TheAuditor detects: 1. **Data Model Evolution** - Fields moved between models (e.g., `product.price` → `variant.price`) 2. **Foreign Key Changes** - References updated in database but not in code 3. **API Contract Mismatches** - Frontend expects old structure, backend provides new 4. **Cross-Stack Inconsistencies** - TypeScript interfaces not matching backend models 5. **Incomplete Migrations** - Some code still using old field/table names ### How Refactoring Detection Works TheAuditor uses multiple techniques: #### Migration Analysis Analyzes database migration files to understand schema changes: ```javascript // Migration detected: Field moved from products to product_variants removeColumn('products', 'unit_price'); addColumn('product_variants', 'retail_price', DataTypes.DECIMAL); ``` #### Impact Analysis Traces dependencies to find all affected code: ```bash aud impact --file "models/Product.ts" --line 42 # Shows: 47 files need updating ``` #### Pattern Detection Over 30 refactoring-specific patterns detect common issues: ```yaml - name: "PRODUCT_PRICE_FIELD_REMOVED" description: "Code accessing price on Product after migration to ProductVariant" ``` #### Cross-Stack Tracing Matches frontend API calls to backend endpoints to detect contract mismatches. ### Using Refactoring Detection #### Quick Detection ```bash # Auto-detect from migrations aud refactor --auto-detect # Analyze specific change aud refactor --file "models/Product.ts" --line 42 # Use with workset aud refactor --workset # Generate detailed report aud refactor --auto-detect --output refactor_report.json ``` #### Best Practices for Refactoring **Before Refactoring:** 1. Run impact analysis: `aud impact --file "model.ts" --line 42` 2. Create workset: `aud workset --from-impact` 3. Baseline analysis: `aud refactor --workset` **During Refactoring:** - Run incremental checks: `aud refactor --workset` - Validate cross-stack: `aud impact --trace-to-backend` **After Refactoring:** - Full validation: `aud unified --mode refactor` - Generate report: `aud report --format refactoring` ### Real-World Example A product variant refactoring might be detected as: ``` PRODUCT_PRICE_FIELD_REMOVED - Frontend: 23 files accessing product.unit_price - Backend: Field moved to ProductVariant.retail_price - Impact: POS system cannot display prices ORDER_ITEMS_WRONG_REFERENCE - Database: order_items.product_variant_id (new) - Code: Still using order_items.product_id (old) - Impact: Orders cannot be created ``` ### Custom Refactoring Rules TheAuditor uses YAML-based correlation rules to detect refactoring issues. These rules are YOUR business logic - you define what patterns indicate problems in YOUR codebase. #### How It Works 1. **Rules Location**: `/theauditor/correlations/rules/refactoring.yaml` 2. **Rule Structure**: Each rule defines co-occurring facts that must ALL match 3. **Detection**: When all facts match, TheAuditor reports the issue 4. **No Code Changes**: Just edit YAML to define new patterns #### Creating Your Own Rules Edit `/theauditor/correlations/rules/refactoring.yaml` or create new YAML files: ```yaml rules: - name: "MY_FIELD_MIGRATION" description: "Detect when price field moved but old code remains" co_occurring_facts: - tool: "grep" pattern: "removeColumn.*price" # Migration removed field - tool: "grep" pattern: "product\\.price" # Code still uses old field confidence: 0.92 - name: "API_VERSION_MISMATCH" description: "Frontend calling v1 API but backend is v2" co_occurring_facts: - tool: "grep" pattern: "/api/v1/" # Frontend uses v1 - tool: "grep" pattern: "router.*'/v2/'" # Backend only has v2 confidence: 0.95 ``` #### Available Tools for Facts - **grep**: Pattern matching in files - **patterns**: Matches from pattern detection - **taint_analyzer**: Taint flow findings - **lint**: Linter findings #### Real Example from Production ```yaml - name: "PRODUCT_VARIANT_REFACTOR" description: "Product fields moved to ProductVariant but frontend still uses old structure" co_occurring_facts: - tool: "grep" pattern: "ProductVariant.*retail_price.*Sequelize" # Backend changed - tool: "grep" pattern: "product\\.unit_price|product\\.retail_price" # Frontend didn't confidence: 0.92 ``` This detects when you moved price fields from Product to ProductVariant model but frontend still expects the old structure. --- ## Troubleshooting ### Common Issues #### "TypeScript compiler not available in TheAuditor sandbox" **Solution**: Run **`aud setup-claude --target .`** to set up the sandbox. #### "Coverage < 90% - run `aud capsules` first" **Solution**: Generate code capsules for better analysis coverage: ```bash aud index aud workset --all ``` #### Linting produces no results **Solution**: Ensure linters are installed: ```bash # For Python pip install -e ".[linters]" # For JavaScript/TypeScript aud setup-claude --target . ``` #### Pipeline fails at specific phase **Solution**: Check **`.pf/error.log`** for details: ```bash cat .pf/error.log # Or check phase-specific error log cat .pf/error_phase_08.log ``` ### Performance Optimization For large repositories: ```bash # Limit analysis scope aud workset --paths "src/critical/**/*.py" # Run specific commands only aud index && aud lint && aud detect-patterns # Adjust chunking for larger context windows export THEAUDITOR_LIMITS_MAX_CHUNK_SIZE=100000 # 100KB chunks export THEAUDITOR_LIMITS_MAX_CHUNKS_PER_FILE=5 # Allow up to 5 chunks ``` ### Runtime Configuration TheAuditor supports environment variable overrides for runtime configuration: ```bash # Chunking configuration export THEAUDITOR_LIMITS_MAX_CHUNKS_PER_FILE=5 # Default: 3 export THEAUDITOR_LIMITS_MAX_CHUNK_SIZE=100000 # Default: 65000 (bytes) # File size limits export THEAUDITOR_LIMITS_MAX_FILE_SIZE=5242880 # Default: 2097152 (2MB) # Timeout configuration export THEAUDITOR_TIMEOUTS_LINT_TIMEOUT=600 # Default: 300 (seconds) export THEAUDITOR_TIMEOUTS_FCE_TIMEOUT=1200 # Default: 600 (seconds) # Batch processing export THEAUDITOR_LIMITS_DEFAULT_BATCH_SIZE=500 # Default: 200 ``` Configuration can also be set via `.pf/config.json` for project-specific overrides. --- ## Best Practices 1. **Always run `aud init` first** in a new project 2. **Set up the sandbox** for JavaScript/TypeScript projects using **`aud setup-claude --target .`** 3. **Use worksets** for incremental analysis during development 4. **Run `aud full`** before releases for comprehensive analysis 5. **Review `.pf/readthis/`** for AI-friendly issue summaries 6. **Check exit codes** in CI/CD for automated pass/fail decisions 7. **Archive results** with timestamps for audit trails --- ## Exit Codes for Automation **TheAuditor** uses specific exit codes for CI/CD integration: - **`0`** - Success, no critical/high issues - **`1`** - High severity findings - **`2`** - Critical severity findings - **`3`** - Pipeline/task incomplete Use these in scripts: ```bash aud full if [ $? -eq 2 ]; then echo "Critical vulnerabilities found - blocking deployment" exit 1 fi ``` --- ## Getting Help - Run **`aud --help`** for command overview - Run **`aud --help`** for specific command help - Check **`.pf/pipeline.log`** for execution details - Review **`.pf/error.log`** for troubleshooting - Refer to **`teamsop.md`** for development workflow --- ## Next Steps 1. Initialize your first project with **`aud init`** 2. Run **`aud full`** to see TheAuditor in action 3. Explore the results in **`.pf/readthis/`** 4. Integrate into your CI/CD pipeline 5. Customize patterns for your specific needs --- **Remember**: TheAuditor is designed to work **offline**, maintain **data integrity**, and produce **AI-ready outputs**. All analysis is **deterministic** and **reproducible**.