34 KiB
How to Use TheAuditor
This comprehensive guide covers everything you need to know about setting up, configuring, and using TheAuditor for code analysis and security auditing. Whether you're performing a one-time security audit or integrating continuous analysis into your development workflow, this guide will walk you through every step.
Prerequisites
Before installing TheAuditor, ensure you have:
- Python 3.11 or higher (3.12+ recommended)
- Git (for repository operations)
- Operating System: Linux, macOS, or Windows with WSL
Installation & Setup
Step 1: Install TheAuditor
# Clone the repository
git clone https://github.com/TheAuditorTool/Auditor.git
cd theauditor
# Install TheAuditor
pip install -e .
# Optional: Install with ML capabilities
# pip install -e ".[ml]"
# For development with all optional dependencies:
# pip install -e ".[all]" // "Insights module package".
Step 2: Sandboxed Toolchain Setup (MANDATORY)
aud setup-claude --target . // Inside project directory.
This command:
- Creates
.auditor_venv/.theauditor_tools/sandbox directory - Installs TypeScript compiler (
tsc) in isolation - Installs ESLint and related tools
- Updates all tools to latest versions
- Configures the sandbox for TheAuditor's exclusive use
Why is this required?
- TheAuditor NEVER uses your global or project-installed tools
- Ensures reproducible results across different environments
- Prevents contamination between analysis tools and project dependencies
- Required for TheAuditor to function at all - not just for JavaScript/TypeScript analysis
Expected output:
Step 1: Setting up Python virtual environment...
[OK] Venv already exists: C:\Users\user\Desktop\TheAuditor\.auditor_venv
[OK] TheAuditor already installed in C:\Users\user\Desktop\TheAuditor\.auditor_venv
Upgrading to ensure latest version...
Installing TheAuditor from C:\Users\user\Desktop\TheAuditor...
[OK] Installed TheAuditor (editable) from C:\Users\user\Desktop\TheAuditor
[OK] Executable available: C:\Users\user\Desktop\TheAuditor\.auditor_venv\Scripts\aud.exe
Installing Python linting tools...
Checking for latest linter versions...
[OK] Updated to latest package versions
Installing linters from pyproject.toml...
[OK] Python linters installed (ruff, mypy, black, bandit, pylint)
Setting up JavaScript/TypeScript tools in sandboxed environment...
Creating sandboxed tools directory: C:\Users\user\Desktop\TheAuditor\.auditor_venv\.theauditor_tools
[OK] ESLint v9 flat config copied to sandbox
[Track A] Checking for latest tool versions...
[Track B] Setting up portable Node.js runtime...
[OK] Node.js runtime already installed at C:\Users\user\Desktop\TheAuditor\.auditor_venv\.theauditor_tools\node-runtime
[OK] Updated @typescript-eslint/parser: 8.41.0 → ^8.42.0
[OK] Updated @typescript-eslint/eslint-plugin: 8.41.0 → ^8.42.0
Updated 2 packages to latest versions
Installing JS/TS linters using bundled Node.js...
[OK] JavaScript/TypeScript tools installed in sandbox
[OK] Tools isolated from project: C:\Users\user\Desktop\TheAuditor\.auditor_venv\.theauditor_tools
[OK] Using bundled Node.js - no system dependency!
[OK] ESLint verified at: C:\Users\user\Desktop\TheAuditor\.auditor_venv\.theauditor_tools\node_modules\.bin\eslint.cmd
Core Commands & Workflow
Complete Audit Pipeline
On a medium 20k LOC node/react/vite stack, expect the analysis to take around 30 minutes. Progress bars for tracks B/C may display inconsistently on PowerShell.
Run a comprehensive audit with all 14 analysis phases:
aud full
# Skip network operations (deps, docs) for faster execution
aud full --offline
This executes in parallel stages for optimal performance:
Stage 1 - Foundation (Sequential):
- Repository indexing - Build manifest and symbol database
- Framework detection - Identify technologies in use
Stage 2 - Concurrent Analysis (3 Parallel Tracks):
- Track A (Network I/O): (skipped with --offline) 3. Dependency checking - Scan for vulnerabilities 4. Documentation fetching - Gather project docs 5. Documentation summarization - Create AI-friendly summaries
- Track B (Code Analysis): 6. Workset creation - Define analysis scope 7. Linting - Run code quality checks 8. Pattern detection - Apply security rules
- Track C (Graph Build): 9. Graph building - Construct dependency graph
Stage 3 - Final Aggregation (Sequential): 10. Graph analysis - Find architectural issues 11. Taint analysis - Track data flow 12. Factual correlation engine - Correlate findings across tools with 29 advanced rules 13. Report generation - Produce final output
Output: Complete results in .pf/readthis/ directory
Offline Mode
When working on the same codebase repeatedly or when network access is limited, use offline mode to skip dependency checking and documentation phases:
# Run full audit without network operations
aud full --offline
# Combine with other flags
aud full --offline --quiet
aud full --offline --exclude-self # Only meant for dogfooding; in 9/10 projects, --exclude-self will correctly exclude the entire project, producing empty results
Benefits:
- Faster execution - Skips slow network operations
- Air-gapped operation - Works without internet access
- Iterative development - Perfect for repeated runs during development
What gets skipped:
- Dependency vulnerability scanning
- Documentation fetching and summarization
- Latest version checks
What still runs:
- All code analysis (indexing, linting, patterns)
- Graph building and analysis
- Taint analysis and FCE
- Report generation
Incremental Analysis (Workset-based)
Analyze only changed files based on git diff:
# Create workset from uncommitted changes
aud workset
# Create workset from specific commit range
aud workset --diff "HEAD~3..HEAD"
# Create workset for all files
aud workset --all
Then run targeted analysis:
aud lint --workset
aud detect-patterns --workset
Linting with Auto-fix
Run comprehensive linting across all supported languages:
# Run linting on workset
aud lint --workset
# Auto-fix issues where possible
aud lint --fix
# Run on all files
aud lint --all
Supports:
- Python: Ruff, MyPy, Black, Bandit, Pylint
- JavaScript/TypeScript: ESLint with TypeScript parser
- General: Prettier for formatting
Security Analysis
Taint Analysis
Track data flow from sources (user input) to sinks (database, output):
aud taint-analyze
Detects:
- SQL injection vulnerabilities
- XSS (Cross-site scripting)
- Command injection
- Path traversal
- Other injection attacks
Pattern Detection
Run pattern-based vulnerability scanning:
aud detect-patterns
Uses 100+ YAML-defined patterns across multiple categories:
Security Patterns:
- Hardcoded secrets and API keys
- Insecure randomness (Math.random for security)
- Weak cryptographic algorithms
- Authentication bypasses
- Missing authentication decorators
Resource Management:
- Socket, stream, and worker leaks
- File handles not closed properly
- Database connections left open
- Event listeners not removed
Concurrency Issues:
- Race conditions (check-then-act)
- Deadlocks (nested locks, lock ordering)
- Shared state without synchronization
- Unsafe parallel writes
ORM & Database:
- Sequelize death queries and N+1 patterns
- Prisma connection pool exhaustion
- TypeORM missing transactions
- Missing database indexes
Deployment & Infrastructure:
- Docker security misconfigurations
- nginx exposed paths and weak SSL
- docker-compose privileged containers
- webpack source map exposure in production
Framework-Specific:
- Django, Flask, FastAPI vulnerabilities
- React hooks dependency issues
- Vue reactivity problems
- Angular, Next.js, Express.js patterns
- Multi-tenant security violations
Docker Security Analysis
Analyze Docker images for security misconfigurations and vulnerabilities:
# Analyze all indexed Docker images
aud docker-analyze
# Filter by severity level
aud docker-analyze --severity critical
# Save results to JSON file
aud docker-analyze --output docker-security.json
Detects:
- Containers running as root - CIS Docker Benchmark violation
- Exposed secrets in ENV/ARG - Hardcoded passwords, API keys, tokens
- High entropy values - Potential secrets using Shannon entropy
- Known secret patterns - GitHub tokens, AWS keys, Slack tokens
The command requires Docker images to be indexed first (aud index). It queries the repo_index.db for Docker metadata and performs security analysis.
Project Structure Report
Generate comprehensive project structure and intelligence reports:
# Generate default structure report
aud structure
# Specify output location
aud structure --output PROJECT_OVERVIEW.md
# Adjust directory tree depth
aud structure --max-depth 6
# Analyze different root directory
aud structure --root ./src
The report includes:
- Directory tree visualization - Smart file grouping and critical file(size/loc) highlighting
- Project statistics - Total files, LOC, estimated tokens
- Language distribution - Percentage breakdown by file type
- Top 10 largest files - By token count with percentage of codebase
- Top 15 critical files - Identified by naming conventions (auth.py, config.js, etc.)
- AI context optimization - Recommendations for reading order and token budget
- Symbol counts - Functions, classes, imports from database
Useful for:
- Getting quick project overview
- Understanding codebase structure
- Planning AI assistant interactions
- Identifying critical components
- Token budget management for LLMs
Impact Analysis
Assess the blast radius of a specific code change:
# Analyze impact of changes to a specific function
aud impact --file "src/auth/login.py" --line 42
# Analyze impact with depth limit
aud impact --file "src/database.py" --line 100 --depth 3
# Trace frontend to backend dependencies
aud impact --file "frontend/api.ts" --line 50 --trace-to-backend
Shows:
- Dependent functions and modules
- Call chain analysis
- Affected test files
- Risk assessment
- Cross-stack impact (frontend → backend tracing)
Refactoring Analysis
Detect and analyze refactoring issues such as data model changes, API contract mismatches, and incomplete migrations:
# Analyze impact from a specific model change
aud refactor --file "models/Product.ts" --line 42
# Auto-detect refactoring from database migrations
aud refactor --auto-detect --migration-dir backend/migrations
# Analyze current workset for refactoring issues
aud refactor --workset
# Generate detailed report
aud refactor --auto-detect --output refactor_report.json
Detects:
- Data Model Changes: Fields moved between tables (e.g.,
product.price→variant.price) - Foreign Key Changes: References updated (e.g.,
product_id→product_variant_id) - API Contract Mismatches: Frontend expects old structure, backend provides new
- Missing Updates: Code still using old field/table names
- Cross-Stack Inconsistencies: TypeScript interfaces not matching backend models
The refactor command uses:
- Impact analysis to trace affected files
- Migration file analysis to detect schema changes
- Pattern detection with refactoring-specific rules
- FCE correlation to find related issues
- Risk assessment based on blast radius
Insights Analysis (Optional)
Run optional interpretive analysis on top of factual audit data:
# Run all insights modules
aud insights --mode all
# ML-powered insights (requires pip install -e ".[ml]")
aud insights --mode ml --ml-train
# Graph health metrics and recommendations
aud insights --mode graph
# Taint vulnerability scoring
aud insights --mode taint
# Impact analysis insights
aud insights --mode impact
# Generate comprehensive report
aud insights --output insights_report.json
# Train ML model on your codebase patterns
aud insights --mode ml --ml-train --training-data .pf/raw/
# Get ML-powered suggestions
aud insights --mode ml --ml-suggest
Modes:
- ml: Machine learning predictions and pattern recognition
- graph: Health scores, architectural recommendations
- taint: Vulnerability severity scoring and classification
- impact: Change impact assessment and risk scoring
- all: Run all available insights modules
The insights command:
- Reads existing audit data from
.pf/raw/ - Applies interpretive scoring and classification
- Generates actionable recommendations
- Outputs to
.pf/insights/for separation from facts - Provides technical scoring without crossing into semantic interpretation
Graph Visualization
Generate rich visual intelligence from dependency graphs:
# Build dependency graphs first
aud graph build
# Basic visualization
aud graph viz
# Show only dependency cycles
aud graph viz --view cycles --include-analysis
# Top 10 hotspots (most connected nodes)
aud graph viz --view hotspots --top-hotspots 10
# Architectural layers visualization
aud graph viz --view layers --format svg
# Impact analysis visualization
aud graph viz --view impact --impact-target "src/auth/login.py"
# Call graph instead of import graph
aud graph viz --graph-type call --view full
# Generate SVG for AI analysis
aud graph viz --format svg --include-analysis --title "System Architecture"
# Custom output location
aud graph viz --out-dir ./architecture/ --format png
View Modes:
- full: Complete graph with all nodes and edges
- cycles: Only nodes/edges involved in dependency cycles (red highlighting)
- hotspots: Top N most connected nodes with gradient coloring
- layers: Architectural layers as subgraphs with clear hierarchy
- impact: Highlight impact radius with color-coded upstream/downstream
Visual Encoding:
- Node Color: Programming language (Python=blue, JavaScript=yellow, TypeScript=blue)
- Node Size: Importance/connectivity (larger = more dependencies)
- Edge Color: Red for cycles, gray for normal dependencies
- Border Width: Code churn (thicker = more changes)
- Node Shape: Module=box, Function=ellipse, Class=diamond
The graph viz command:
- Generates Graphviz DOT format files
- Optionally creates SVG/PNG images (requires Graphviz installation)
- Supports filtered views for focusing on specific concerns
- Includes analysis data for cycle and hotspot highlighting
- Produces AI-readable SVG output for LLM analysis
Dependency Management
Check for outdated or vulnerable dependencies:
# Check for latest versions
aud deps --check-latest
# Scan for known vulnerabilities
aud deps --vuln-scan
# Update all dependencies to latest
aud deps --upgrade-all
Architecture: Truth Courier vs Insights
Understanding the Separation of Concerns
TheAuditor implements a strict architectural separation between factual observation (Truth Courier modules) and optional interpretation (Insights modules). This design ensures the tool remains an objective source of ground truth while offering actionable intelligence when needed.
The Core Philosophy
TheAuditor doesn't try to understand your business logic or make your AI "smarter." Instead, it solves the real problem: LLMs lose context and make inconsistent changes across large codebases.
The workflow:
- You tell AI: "Add JWT auth with CSRF tokens and password complexity"
- AI writes code: Probably inconsistent due to context limits
- You run:
aud full - TheAuditor reports: All the inconsistencies and security holes
- AI reads the report: Now sees the complete picture across all files
- AI fixes issues: With full visibility of what's broken
- Repeat until clean
Truth Courier Modules (Core)
These modules report verifiable facts without judgment:
# What Truth Couriers Report - Just Facts
{
"taint_analyzer": "Data from req.body flows to res.send at line 45",
"pattern_detector": "Line 45 matches pattern 'unsanitized-output'",
"impact_analyzer": "Changing handleRequest() affects 12 downstream functions",
"graph_analyzer": "Module A imports B, B imports C, C imports A"
}
Key Truth Couriers:
- Indexer: Maps all code symbols and their locations
- Taint Analyzer: Traces data flow through the application
- Impact Analyzer: Maps dependency chains and change blast radius
- Graph Analyzer: Detects cycles and architectural patterns
- Pattern Detector: Matches code against security patterns
Insights Modules (Optional Scoring)
These optional modules add technical scoring and classification:
# What Insights Add - Technical Classifications
{
"taint/insights": {
"vulnerability_type": "Cross-Site Scripting",
"severity": "HIGH"
},
"graph/insights": {
"health_score": 70,
"recommendation": "Reduce coupling"
}
}
Installation:
# Base installation (Truth Couriers only)
pip install -e .
# With ML insights (optional)
pip install -e ".[ml]"
# Development with all dependencies (not for general users)
# pip install -e ".[all]"
Correlation Rules: Detecting YOUR Patterns
Correlation rules detect when multiple facts indicate an inconsistency in YOUR codebase:
# Example: Detecting incomplete refactoring
- name: "PRODUCT_VARIANT_REFACTOR"
co_occurring_facts:
- tool: "grep"
pattern: "ProductVariant.*retail_price" # Backend changed
- tool: "grep"
pattern: "product\\.unit_price" # Frontend didn't
This isn't "understanding" that products have prices. It's detecting that you moved a field from one model to another and some code wasn't updated. Pure consistency checking.
The correlation engine loads rules from /correlations/rules/. We provide common patterns, but many are project-specific. You write rules that detect YOUR patterns, YOUR refactorings, YOUR inconsistencies.
Why This Works
What doesn't work:
- Making AI "understand" your business domain
- Adding semantic layers to guess what you mean
- Complex context management systems
What does work:
- Accept that AI will make inconsistent changes
- Detect those inconsistencies after the fact
- Give AI the full picture so it can fix them
TheAuditor doesn't try to prevent mistakes. It finds them so they can be fixed.
Practical Example
# You ask AI to implement authentication
Human: "Add JWT auth with CSRF protection"
# AI writes code (probably with issues due to context limits)
AI: *implements auth across 15 files*
# You audit it
$ aud full
# TheAuditor finds issues
- "JWT secret hardcoded at auth.js:47"
- "CSRF token generated but never validated"
- "Auth middleware missing on /api/admin/*"
# You can also check impact of changes
$ aud impact --file "auth.js" --line 47
# Shows: "Changing this affects 23 files, 47 functions"
# AI reads the audit and can now see ALL issues
AI: *reads .pf/readthis/*
AI: "I see 5 security issues across auth flow. Fixing..."
# AI fixes with complete visibility
AI: *fixes all issues because it can see the full picture*
Key Points
- No Business Logic Understanding: TheAuditor doesn't need to know what your app does
- Just Consistency Checking: It finds where your code doesn't match itself
- Facts, Not Opinions: Reports what IS, not what SHOULD BE
- Complete Dependency Tracing: Impact analyzer shows exactly what's affected by changes
- AI + Audit Loop: Write → Audit → Fix → Repeat until clean
This is why TheAuditor works where semantic understanding fails - it's not trying to read your mind, just verify your code's consistency.
Understanding the Output
Directory Structure
After running analyses, results are organized in .pf/:
.pf/
├── raw/ # Raw, unmodified tool outputs (Truth Couriers)
│ ├── linting.json # Raw linter results
│ ├── patterns.json # Pattern detection findings
│ ├── taint_analysis.json # Taint analysis results
│ ├── graph.json # Dependency graph data
│ └── graph_analysis.json # Graph analysis (cycles, hotspots)
│
├── insights/ # Optional interpretive analysis (Insights modules)
│ ├── ml_suggestions.json # ML predictions and patterns
│ ├── taint_insights.json # Vulnerability severity scoring
│ └── graph_insights.json # Health scores and recommendations
│
├── readthis/ # AI-consumable chunks
│ ├── manifest.md # Repository overview
│ ├── patterns_001.md # Chunked findings (65KB max)
│ ├── patterns_002.md
│ ├── taint_001.md # Chunked taint results
│ ├── tickets_001.md # Actionable issue tickets
│ └── summary.md # Executive summary
│
├── graphs/ # Graph visualizations
│ ├── import_graph.dot # Dependency graph DOT file
│ ├── import_graph_cycles.dot # Cycles-only view
│ └── import_graph.svg # SVG visualization (if generated)
│
├── pipeline.log # Complete execution log
├── error.log # Error details (if failures occur)
├── findings.json # Consolidated findings
├── risk_scores.json # Risk analysis results
└── report.md # Human-readable report
Key Output Files
.pf/raw/
Contains unmodified outputs from each tool. These files preserve the exact format and data from linters, scanners, and analyzers. Never modified after creation. This is the source of ground truth.
.pf/insights/
Contains optional interpretive analysis from Insights modules. These files add technical scoring and classification on top of raw data. Only created when insights commands are run.
.pf/graphs/
Contains graph visualizations in DOT and image formats. Generated by aud graph viz command with various view modes for focusing on specific concerns.
.pf/readthis/
Contains processed, chunked data optimized for AI consumption:
- Each file is under 65KB by default (configurable via
THEAUDITOR_LIMITS_MAX_CHUNK_SIZE) - Maximum 3 chunks per file by default (configurable via
THEAUDITOR_LIMITS_MAX_CHUNKS_PER_FILE) - Structured with clear headers and sections
- Includes context, evidence, and suggested fixes
- Ready for direct consumption by Claude, GPT-4, etc.
.pf/pipeline.log
Complete execution log showing:
- Each phase's execution time
- Success/failure status
- Key statistics and findings
- Error messages if any
.pf/error.log
Created only when errors occur. Contains:
- Full stack traces
- Detailed error messages
- Phase-specific failure information
- Debugging information
Advanced Usage
Custom Pattern Rules
Create custom detection patterns in .pf/patterns/:
# .pf/patterns/custom_auth.yaml
name: weak_password_check
severity: high
category: security
pattern: 'password\s*==\s*["\']'
description: "Hardcoded password comparison"
test_template: |
def test_weak_password():
assert password != "admin"
ML-Powered Suggestions
Train models on your codebase patterns:
# Initial training
aud learn
# Get improvement suggestions
aud suggest
# Provide feedback for continuous learning
aud learn-feedback --accept
Development-Specific Flags
Excluding TheAuditor's Own Files
When testing or developing within TheAuditor's repository (e.g., analyzing fakeproj/project_anarchy/), use the --exclude-self flag to prevent false positives from TheAuditor's own files:
# Exclude all TheAuditor files from analysis
aud index --exclude-self
aud full --exclude-self
This flag excludes:
- All TheAuditor source code directories (
theauditor/,tests/, etc.) - Root configuration files (
pyproject.toml,package-template.json,Dockerfile) - Documentation and build files
Use case: Testing vulnerable projects within TheAuditor's repository without framework detection picking up TheAuditor's own configuration files.
CI/CD Integration
GitHub Actions Example
name: Security Audit
on: [push, pull_request]
jobs:
audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.12'
- name: Set up Node.js
uses: actions/setup-node@v2
with:
node-version: '18'
- name: Install TheAuditor
run: |
pip install -e ".[all]"
aud setup-claude --target .
- name: Run Audit
run: aud full
- name: Upload Results
if: always()
uses: actions/upload-artifact@v2
with:
name: audit-results
path: .pf/
Running TheAuditor on Its Own Codebase (Dogfooding)
When developing TheAuditor or testing it on itself, you need a special dual-environment setup:
Understanding the Dual-Environment Architecture
TheAuditor maintains strict separation between:
- Primary Environment (
.venv/) - Where TheAuditor runs from - Sandboxed Environment (
.auditor_venv/.theauditor_tools/) - Tools TheAuditor uses for analysis
This ensures reproducibility and prevents TheAuditor from analyzing its own analysis tools.
Setup Procedure for Dogfooding
# 1. Clone and set up development environment
git clone https://github.com/TheAuditorTool/Auditor.git
cd theauditor
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -e .
# 2. CRITICAL: Create the sandboxed analysis environment
aud setup-claude --target .
# 3. Verify setup
aud full --quick-test
# 4. Run full analysis on TheAuditor itself
aud full
Analyzing Test Projects Within TheAuditor
When analyzing test projects like fakeproj/ from within TheAuditor's repository:
cd fakeproj/project_anarchy
aud full --exclude-self # Excludes TheAuditor's own files
The --exclude-self flag prevents:
- Framework detection from identifying TheAuditor's
pyproject.toml - False positives from TheAuditor's configuration files
- Contamination from TheAuditor's source code
Refactoring Detection
TheAuditor includes sophisticated capabilities for detecting incomplete refactorings, data model changes, and cross-stack inconsistencies.
Understanding Refactoring Issues
Common refactoring problems TheAuditor detects:
- Data Model Evolution - Fields moved between models (e.g.,
product.price→variant.price) - Foreign Key Changes - References updated in database but not in code
- API Contract Mismatches - Frontend expects old structure, backend provides new
- Cross-Stack Inconsistencies - TypeScript interfaces not matching backend models
- Incomplete Migrations - Some code still using old field/table names
How Refactoring Detection Works
TheAuditor uses multiple techniques:
Migration Analysis
Analyzes database migration files to understand schema changes:
// Migration detected: Field moved from products to product_variants
removeColumn('products', 'unit_price');
addColumn('product_variants', 'retail_price', DataTypes.DECIMAL);
Impact Analysis
Traces dependencies to find all affected code:
aud impact --file "models/Product.ts" --line 42
# Shows: 47 files need updating
Pattern Detection
Over 30 refactoring-specific patterns detect common issues:
- name: "PRODUCT_PRICE_FIELD_REMOVED"
description: "Code accessing price on Product after migration to ProductVariant"
Cross-Stack Tracing
Matches frontend API calls to backend endpoints to detect contract mismatches.
Using Refactoring Detection
Quick Detection
# Auto-detect from migrations
aud refactor --auto-detect
# Analyze specific change
aud refactor --file "models/Product.ts" --line 42
# Use with workset
aud refactor --workset
# Generate detailed report
aud refactor --auto-detect --output refactor_report.json
Best Practices for Refactoring
Before Refactoring:
- Run impact analysis:
aud impact --file "model.ts" --line 42 - Create workset:
aud workset --from-impact - Baseline analysis:
aud refactor --workset
During Refactoring:
- Run incremental checks:
aud refactor --workset - Validate cross-stack:
aud impact --trace-to-backend
After Refactoring:
- Full validation:
aud unified --mode refactor - Generate report:
aud report --format refactoring
Real-World Example
A product variant refactoring might be detected as:
PRODUCT_PRICE_FIELD_REMOVED
- Frontend: 23 files accessing product.unit_price
- Backend: Field moved to ProductVariant.retail_price
- Impact: POS system cannot display prices
ORDER_ITEMS_WRONG_REFERENCE
- Database: order_items.product_variant_id (new)
- Code: Still using order_items.product_id (old)
- Impact: Orders cannot be created
Custom Refactoring Rules
TheAuditor uses YAML-based correlation rules to detect refactoring issues. These rules are YOUR business logic - you define what patterns indicate problems in YOUR codebase.
How It Works
- Rules Location:
/theauditor/correlations/rules/refactoring.yaml - Rule Structure: Each rule defines co-occurring facts that must ALL match
- Detection: When all facts match, TheAuditor reports the issue
- No Code Changes: Just edit YAML to define new patterns
Creating Your Own Rules
Edit /theauditor/correlations/rules/refactoring.yaml or create new YAML files:
rules:
- name: "MY_FIELD_MIGRATION"
description: "Detect when price field moved but old code remains"
co_occurring_facts:
- tool: "grep"
pattern: "removeColumn.*price" # Migration removed field
- tool: "grep"
pattern: "product\\.price" # Code still uses old field
confidence: 0.92
- name: "API_VERSION_MISMATCH"
description: "Frontend calling v1 API but backend is v2"
co_occurring_facts:
- tool: "grep"
pattern: "/api/v1/" # Frontend uses v1
- tool: "grep"
pattern: "router.*'/v2/'" # Backend only has v2
confidence: 0.95
Available Tools for Facts
- grep: Pattern matching in files
- patterns: Matches from pattern detection
- taint_analyzer: Taint flow findings
- lint: Linter findings
Real Example from Production
- name: "PRODUCT_VARIANT_REFACTOR"
description: "Product fields moved to ProductVariant but frontend still uses old structure"
co_occurring_facts:
- tool: "grep"
pattern: "ProductVariant.*retail_price.*Sequelize" # Backend changed
- tool: "grep"
pattern: "product\\.unit_price|product\\.retail_price" # Frontend didn't
confidence: 0.92
This detects when you moved price fields from Product to ProductVariant model but frontend still expects the old structure.
Troubleshooting
Common Issues
"TypeScript compiler not available in TheAuditor sandbox"
Solution: Run aud setup-claude --target . to set up the sandbox.
"Coverage < 90% - run aud capsules first"
Solution: Generate code capsules for better analysis coverage:
aud index
aud workset --all
Linting produces no results
Solution: Ensure linters are installed:
# For Python
pip install -e ".[linters]"
# For JavaScript/TypeScript
aud setup-claude --target .
Pipeline fails at specific phase
Solution: Check .pf/error.log for details:
cat .pf/error.log
# Or check phase-specific error log
cat .pf/error_phase_08.log
Performance Optimization
For large repositories:
# Limit analysis scope
aud workset --paths "src/critical/**/*.py"
# Skip documentation phases
aud full --skip-docs
# Run specific phases only
aud index && aud lint && aud detect-patterns
# Adjust chunking for larger context windows
export THEAUDITOR_LIMITS_MAX_CHUNK_SIZE=100000 # 100KB chunks
export THEAUDITOR_LIMITS_MAX_CHUNKS_PER_FILE=5 # Allow up to 5 chunks
Runtime Configuration
TheAuditor supports environment variable overrides for runtime configuration:
# Chunking configuration
export THEAUDITOR_LIMITS_MAX_CHUNKS_PER_FILE=5 # Default: 3
export THEAUDITOR_LIMITS_MAX_CHUNK_SIZE=100000 # Default: 65000 (bytes)
# File size limits
export THEAUDITOR_LIMITS_MAX_FILE_SIZE=5242880 # Default: 2097152 (2MB)
# Timeout configuration
export THEAUDITOR_TIMEOUTS_LINT_TIMEOUT=600 # Default: 300 (seconds)
export THEAUDITOR_TIMEOUTS_FCE_TIMEOUT=1200 # Default: 600 (seconds)
# Batch processing
export THEAUDITOR_LIMITS_DEFAULT_BATCH_SIZE=500 # Default: 200
Configuration can also be set via .pf/config.json for project-specific overrides.
Best Practices
- Always run
aud initfirst in a new project - Set up the sandbox for JavaScript/TypeScript projects using
aud setup-claude --target . - Use worksets for incremental analysis during development
- Run
aud fullbefore releases for comprehensive analysis - Review
.pf/readthis/for AI-friendly issue summaries - Check exit codes in CI/CD for automated pass/fail decisions
- Archive results with timestamps for audit trails
Exit Codes for Automation
TheAuditor uses specific exit codes for CI/CD integration:
0- Success, no critical/high issues1- High severity findings2- Critical severity findings3- Pipeline/task incomplete
Use these in scripts:
aud full
if [ $? -eq 2 ]; then
echo "Critical vulnerabilities found - blocking deployment"
exit 1
fi
Getting Help
- Run
aud --helpfor command overview - Run
aud <command> --helpfor specific command help - Check
.pf/pipeline.logfor execution details - Review
.pf/error.logfor troubleshooting - Refer to
teamsop.mdfor development workflow
Next Steps
- Initialize your first project with
aud init - Run
aud fullto see TheAuditor in action - Explore the results in
.pf/readthis/ - Integrate into your CI/CD pipeline
- Customize patterns for your specific needs
Remember: TheAuditor is designed to work offline, maintain data integrity, and produce AI-ready outputs. All analysis is deterministic and reproducible.