mirror of
https://github.com/aljazceru/Auditor.git
synced 2025-12-17 03:24:18 +01:00
Fixed critical Windows compatibility issues and updated outdated documentation.
CRITICAL WINDOWS HANG FIXES:
1. ProcessPoolExecutor → ThreadPoolExecutor
- Fixes PowerShell/terminal hang where Ctrl+C wouldn't work
- Prevents .pf directory lock requiring Task Manager kill
- Root cause: Nested ProcessPool + ThreadPool on Windows creates kernel deadlock
2. Ctrl+C Interruption Support
- Replaced subprocess.run with Popen+poll pattern (industry standard)
- Poll subprocess every 100ms for interruption checking
- Added global stop_event and signal handlers for graceful shutdown
- Root cause: subprocess.run blocks threads with no signal propagation
DOCUMENTATION DRIFT FIX:
- Removed hardcoded "14 phases" references (actual is 19+ commands)
- Updated to "multiple analysis phases" throughout all docs
- Fixed CLI help text to be version-agnostic
- Added missing "Summary generation" step in HOWTOUSE.md
Changes:
- pipelines.py: ProcessPoolExecutor → ThreadPoolExecutor, added Popen+poll pattern
- Added signal handling and run_subprocess_with_interrupt() function
- commands/full.py: Updated docstring to remove specific phase count
- README.md: Changed "14 distinct phases" to "multiple analysis phases"
- HOWTOUSE.md: Updated phase references, added missing summary step
- CLAUDE.md & ARCHITECTURE.md: Removed hardcoded phase counts
Impact: Critical UX fixes - Windows compatibility restored, pipeline interruptible
Testing: Ctrl+C works, no PowerShell hangs, .pf directory deletable
394 lines
16 KiB
Markdown
394 lines
16 KiB
Markdown
# TheAuditor
|
|
|
|
Offline-First, AI-Centric SAST & Code Intelligence Platform
|
|
|
|
## What TheAuditor Does
|
|
|
|
TheAuditor is a comprehensive code analysis platform that:
|
|
|
|
- **Finds Security Vulnerabilities**: Detects OWASP Top 10, injection attacks, authentication issues, and framework-specific vulnerabilities
|
|
- **Tracks Data Flow**: Follows untrusted data from sources to sinks to identify injection points
|
|
- **Analyzes Architecture**: Builds dependency graphs, detects cycles, and measures code complexity
|
|
- **Detects Refactoring Issues**: Identifies incomplete migrations, API contract mismatches, and cross-stack inconsistencies
|
|
- **Runs Industry-Standard Tools**: Orchestrates ESLint, Ruff, MyPy, and other trusted linters
|
|
- **Produces AI-Ready Reports**: Generates chunked, structured output optimized for LLM consumption
|
|
|
|
Unlike traditional SAST tools, TheAuditor is designed specifically for AI-assisted development workflows, providing ground truth that both developers and AI assistants can trust.
|
|
|
|
## Quick Start
|
|
|
|
### Step 1: Install TheAuditor (One-Time Setup)
|
|
```bash
|
|
# Clone TheAuditor to your tools directory (NOT your project!)
|
|
cd ~/tools # or wherever you keep development tools
|
|
git clone https://github.com/TheAuditorTool/Auditor.git
|
|
cd TheAuditor
|
|
|
|
# Install using your SYSTEM Python (no venv needed!)
|
|
pip install -e .
|
|
|
|
# Verify installation
|
|
aud --version
|
|
```
|
|
|
|
### Step 2: Analyze Your Project
|
|
```bash
|
|
# Navigate to YOUR PROJECT directory (not TheAuditor!)
|
|
cd ~/my-project-to-audit
|
|
|
|
# Setup sandbox environment for THIS project
|
|
aud setup-claude --target .
|
|
|
|
# Run analysis
|
|
aud init # First time only
|
|
aud full # Complete security audit
|
|
|
|
# Check results
|
|
ls .pf/readthis/
|
|
```
|
|
|
|
**Important Directory Structure:**
|
|
- `~/tools/TheAuditor/` - Where TheAuditor tool lives
|
|
- `~/my-project/` - Your project being analyzed
|
|
- `~/my-project/.auditor_venv/` - Sandbox created BY TheAuditor
|
|
- `~/my-project/.pf/` - Analysis results
|
|
|
|
That's it! TheAuditor will analyze your codebase and generate AI-ready reports in `.pf/readthis/`.
|
|
|
|
## How It Works With ANY AI Assistant
|
|
|
|
<img src="https://github.com/user-attachments/assets/6abdf102-621c-4ebf-8ad6-c2912364bed5" width="600" alt="TheAuditor working in Claude Code" />
|
|
|
|
**Universal Integration**: Just tell your AI assistant to run `aud full` and read the results from `.pf/readthis/`. No SDK, no integration, no setup - it just works with Claude, Cursor, Windsurf, Copilot, or any future AI tool that can run commands and read files.
|
|
|
|
## The Solution: TheAuditor
|
|
|
|
TheAuditor is the antidote. It was built to stop "vibe coding" your way into security and quality assurance nightmares. Its mission is to provide an incorruptible source of **ground truth** for both the developer and their AI assistant.
|
|
|
|
Its philosophy is a direct rejection of the current trend:
|
|
|
|
- **It Orchestrates Verifiable Data.** The tool runs a suite of industry-standard linters and security scanners, preserving the raw, unfiltered output from each. It does not summarize or interpret this core data.
|
|
- **It's Built for AI Consumption.** The tool's primary engineering challenge is to adapt this raw truth into structured, AI-digestible chunks. It ensures the AI works with facts, not faulty summaries.
|
|
- **It's Focused and Extensible.** The initial focus is on Python and the Node.js ecosystem, but the modular, pattern-based architecture is designed to invite contributions for other languages and frameworks.
|
|
|
|
TheAuditor is not a replacement for a formal third-party audit. It is an engineering tool designed to catch the vast majority of glaring issues—from the OWASP Top 10 to common framework anti-patterns. **Its core commitment is to never cross the line from verifiable truth into semantic interpretation.**
|
|
|
|
Every AI assistant - Claude Code, Cursor, Windsurf, Copilot - they're all blind. They can write code but can't verify it's secure, correct, or complete. TheAuditor gives them eyes.
|
|
|
|
### Why This Matters
|
|
|
|
1. **Tool Agnostic** - Works with ANY AI assistant or IDE
|
|
- `aud full` from any terminal
|
|
- Results in `.pf/readthis/` ready for any LLM
|
|
|
|
2. **AI Becomes Self-Correcting**
|
|
- AI writes code
|
|
- AI runs `aud full`
|
|
- AI reads the ground truth
|
|
- AI fixes its own mistakes
|
|
- Recursive loop until actually correct
|
|
|
|
3. **No Human Intervention Required**
|
|
- You never touch the terminal
|
|
- The AI runs everything
|
|
- You just review and approve
|
|
|
|
### The Genius Architecture
|
|
|
|
```
|
|
Human: "Add authentication to my app"
|
|
↓
|
|
AI: *writes auth code*
|
|
↓
|
|
AI: `aud full`
|
|
↓
|
|
AI: *reads .pf/readthis/*
|
|
↓
|
|
AI: "Found 3 security issues, fixing..."
|
|
↓
|
|
AI: *fixes issues*
|
|
↓
|
|
AI: `aud full`
|
|
↓
|
|
AI: "Clean. Authentication complete."
|
|
```
|
|
|
|
### Market Reality Check
|
|
|
|
Every developer using AI assistants has this problem:
|
|
- AI writes insecure code
|
|
- AI introduces bugs
|
|
- AI doesn't see the full picture
|
|
- AI can't verify its work
|
|
|
|
TheAuditor solves ALL of this. It's not a "nice to have" - it's the missing piece that makes AI development actually trustworthy.
|
|
|
|
I've built the tool that makes AI assistants production-ready. This isn't competing with SonarQube/SemGrep. This is creating an entirely new category: **AI Development Verification Tools**.
|
|
|
|
## The Search for Ground Truth in an Age of AI
|
|
|
|
My background is in systems architecture/infrastructure, not professional software development. I have only been "coding/developing" for little over 3 months. This gives me a unique perspective: I can see the forest, but I'm blind to the individual trees of the code. After immersing myself for 500+ hours in AI-assisted development, I concluded that the entire ecosystem is built on a fundamentally flawed premise: it lacks a source of **ground truth**.
|
|
|
|
From start to launch on GitHub took me about a month across 250 active hours in front of the computer, for anyone that wonders or cares :P
|
|
|
|
### The Problem: A Cascade of Corrupted Context
|
|
|
|
Most AI development tools try to solve the wrong problem. They focus on perfecting the *input*—better prompts, more context—but they ignore the critical issue of **compounding deviation**.
|
|
|
|
An LLM is a powerful statistical engine, but it doesn't *understand*. The modern AI workflow forces this engine to play a high-stakes game of "telephone," where the original intent is corrupted at every step:
|
|
|
|
1. A human has an idea.
|
|
2. An AI refines it into a prompt.
|
|
3. Other tools add their own interpretive layers.
|
|
4. The primary AI assistant (e.g., Claude Opus) interprets the final, distorted prompt to generate code.
|
|
|
|
As a rookie "developer," the only thing I could trust was the raw output: the code and its errors. In a vacuum of deep programming knowledge, these facts were my only anchors.
|
|
|
|
This architectural flaw is amplified by two dangerous behaviours inherent to AI assistants:
|
|
|
|
- **Security Theater**: AI assistants are optimized to "make it work," which often means introducing rampant security anti-patterns like hardcoded credentials, disabled authentication, and the pervasive use of `as any` in TypeScript. This creates a dangerous illusion of progress.
|
|
- **Context Blindness**: With aggressive context compaction, an AI never sees the full picture. It works with fleeting snapshots of code, forcing it to make assumptions instead of decisions based on facts.
|
|
|
|
## The Comprehensive Analysis Pipeline
|
|
|
|
TheAuditor runs a comprehensive audit through multiple analysis phases organized in parallel stages:
|
|
|
|
**STAGE 1: Foundation (Sequential)**
|
|
1. **Index Repository** - Build complete code inventory and SQLite database
|
|
2. **Detect Frameworks** - Identify Django, Flask, React, Vue, etc.
|
|
|
|
**STAGE 2: Concurrent Analysis (3 parallel tracks)**
|
|
|
|
*Track A - Network Operations:*
|
|
3. **Check Dependencies** - Analyze package versions and known vulnerabilities
|
|
4. **Fetch Documentation** - Extract docstrings and comments
|
|
5. **Summarize Documentation** - Create AI-readable documentation chunks
|
|
|
|
*Track B - Code Analysis:*
|
|
6. **Create Workset** - Identify all source files for analysis
|
|
7. **Run Linting** - Execute Ruff, MyPy, ESLint as configured
|
|
8. **Detect Patterns** - Apply 100+ security pattern rules
|
|
|
|
*Track C - Graph & Flow:*
|
|
9. **Build Graph** - Create dependency graph structure
|
|
10. **Analyze Graph** - Find cycles, measure complexity
|
|
11. **Visualize Graph** - Generate multiple graph views
|
|
12. **Taint Analysis** - Track data flow from sources to sinks
|
|
|
|
**STAGE 3: Final Aggregation (Sequential)**
|
|
13. **Factual Correlation Engine** - Cross-reference findings across all tools
|
|
14. **Generate Report** - Produce final AI-consumable chunks in `.pf/readthis/`
|
|
15. **Summary Generation** - Create executive summary of findings
|
|
|
|
## Key Features
|
|
|
|
### Refactoring Detection & Analysis
|
|
|
|
TheAuditor detects incomplete refactorings and cross-stack inconsistencies using correlation rules:
|
|
|
|
```bash
|
|
# Analyze refactoring impact
|
|
aud refactor --file models/Product.ts --line 42
|
|
|
|
# Auto-detect from migrations
|
|
aud refactor --auto-detect
|
|
|
|
# Analyze workset
|
|
aud refactor --workset --output refactor_report.json
|
|
```
|
|
|
|
Detects:
|
|
- **Data Model Changes**: Fields moved between tables
|
|
- **API Contract Mismatches**: Frontend/backend inconsistencies
|
|
- **Foreign Key Updates**: Incomplete reference changes
|
|
- **Cross-Stack Issues**: TypeScript interfaces not matching models
|
|
|
|
Users define custom rules in `/correlations/rules/`, example provided in refactoring.yaml to detect project-specific patterns.
|
|
|
|
### Dependency Graph Visualization
|
|
|
|
TheAuditor now includes rich visual intelligence for dependency graphs using Graphviz:
|
|
|
|
- **Multiple View Modes**: Full graph, cycles-only, hotspots, architectural layers, impact analysis
|
|
- **Visual Intelligence Encoding**:
|
|
- Node colors indicate programming language (Python=blue, JS=yellow, TypeScript=blue)
|
|
- Node size shows importance based on connectivity
|
|
- Red highlighting for dependency cycles
|
|
- Border thickness encodes code churn
|
|
- **Actionable Insights**: Focus on what matters with filtered views
|
|
- **AI-Readable Output**: Generate SVG visualizations that LLMs can analyze
|
|
|
|
```bash
|
|
# Basic visualization
|
|
aud graph viz
|
|
|
|
# Show only dependency cycles
|
|
aud graph viz --view cycles --include-analysis
|
|
|
|
# Top 5 hotspots with connections
|
|
aud graph viz --view hotspots --top-hotspots 5
|
|
|
|
# Architectural layers visualization
|
|
aud graph viz --view layers --format svg
|
|
|
|
# Impact analysis for a specific file
|
|
aud graph viz --view impact --impact-target "src/auth.py"
|
|
```
|
|
|
|
### Insights Analysis (Optional)
|
|
|
|
Separate from the core Truth Courier modules, TheAuditor offers optional Insights for technical scoring:
|
|
|
|
```bash
|
|
# Run insights analysis on existing audit data
|
|
aud insights --mode all
|
|
|
|
# ML-powered insights (requires: pip install -e ".[ml]")
|
|
aud insights --mode ml --ml-train
|
|
|
|
# Graph health metrics and recommendations
|
|
aud insights --mode graph
|
|
|
|
# Generate comprehensive insights report
|
|
aud insights --output insights_report.json
|
|
```
|
|
|
|
Insights modules add interpretive scoring on top of factual data:
|
|
- **Health Scores**: Architecture quality metrics
|
|
- **Severity Classification**: Risk assessment beyond raw findings
|
|
- **Recommendations**: Actionable improvement suggestions
|
|
- **ML Predictions**: Pattern-based issue prediction
|
|
|
|
## Important: Antivirus Software Interaction
|
|
|
|
#### Why TheAuditor Triggers Antivirus Software
|
|
|
|
TheAuditor is a security scanner that identifies vulnerabilities in your code. By its very nature, it must:
|
|
|
|
1. **Read and analyze security vulnerabilities** - SQL injection, XSS attacks, hardcoded passwords
|
|
2. **Write these findings to disk** - Creating reports with exact code snippets as evidence
|
|
3. **Process files rapidly** - Scanning entire codebases in parallel for efficiency
|
|
|
|
This creates an inherent conflict with antivirus software, which sees these exact same behaviours as potentially malicious. When TheAuditor finds and documents a SQL injection vulnerability in your code, your antivirus sees us writing "malicious SQL injection patterns" to disk - because that's literally what we're doing, just for legitimate security analysis purposes.
|
|
|
|
#### Performance Impact You May Experience
|
|
|
|
When running TheAuditor, you may notice:
|
|
|
|
- **Increased antivirus CPU usage** - Your AV will scan every file we read AND every finding we write
|
|
- **Approximately 10-50% performance reduction, depending on software.** - Both TheAuditor and your AV are reading the same files simultaneously
|
|
- **Occasional delays or pauses** - Your AV may temporarily quarantine our output files for deeper inspection
|
|
|
|
This is not a bug or inefficiency in TheAuditor - it's the unavoidable consequence of two security tools doing their jobs simultaneously.
|
|
|
|
#### Our Stance on Antivirus
|
|
|
|
**We do NOT recommend:**
|
|
- ❌ Disabling your antivirus software
|
|
- ❌ Adding TheAuditor to your exclusion/whitelist
|
|
- ❌ Reducing your system's security in any way
|
|
|
|
Your antivirus is correctly identifying that we're writing security vulnerability patterns to disk. That's exactly what we do - we find vulnerabilities and document them. The fact that your AV is suspicious of this behavior means it's working properly.
|
|
|
|
#### What We've Done to Minimize Impact
|
|
|
|
1. **Intelligent resource management** - We automatically reduce parallel workers when system resources are constrained
|
|
2. **Pattern defanging** - We insert invisible characters into dangerous patterns to reduce false positives
|
|
3. **Adaptive performance** - We monitor CPU and RAM usage to avoid overwhelming your system
|
|
|
|
#### The Industry Reality
|
|
|
|
This is not a problem unique to TheAuditor. Every legitimate security scanner faces this same issue:
|
|
- **GitHub Advanced Security** runs in isolated cloud containers to avoid this
|
|
- **Commercial SAST tools** require enterprise AV exceptions
|
|
- **Popular scanners** explicitly document AV conflicts in their installation guides
|
|
|
|
The fundamental paradox: A tool that finds security vulnerabilities must write those vulnerabilities to disk, which makes it indistinguishable from malware to an antivirus. There is no technical solution to this - it's the inherent nature of security analysis tools.
|
|
|
|
#### What This Means for You
|
|
|
|
- Run TheAuditor when system load is low for best performance
|
|
- Expect the analysis to take longer than the raw processing time due to AV overhead
|
|
- If your AV quarantines output files in `.pf/`, you may need to restore them manually
|
|
- Consider running TheAuditor in a controlled environment if performance is critical
|
|
|
|
We believe in complete transparency about these limitations. This interaction with antivirus software is not a flaw in TheAuditor - it's proof that both your AV and our scanner are doing exactly what they're designed to do: identify and handle potentially dangerous code patterns.
|
|
|
|
## Common Issues & Troubleshooting
|
|
|
|
### "No such file or directory: .pf/manifest.json"
|
|
- **Cause**: Running `aud init` on a fresh project
|
|
- **Fix**: Update TheAuditor and reinstall:
|
|
```bash
|
|
cd ~/tools/TheAuditor
|
|
git pull
|
|
pip install -e .
|
|
```
|
|
|
|
### "Tree-sitter not available" warning
|
|
- **Cause**: Missing AST analysis tools
|
|
- **Fix**: Reinstall the sandbox in your project:
|
|
```bash
|
|
cd ~/my-project
|
|
rm -rf .auditor_venv
|
|
aud setup-claude --target .
|
|
```
|
|
|
|
### Installation timeout errors
|
|
- **Cause**: Slow compilation of C extensions
|
|
- **Fix**: Update TheAuditor or manually install:
|
|
```bash
|
|
cd ~/my-project
|
|
.auditor_venv/bin/pip install tree-sitter tree-sitter-language-pack
|
|
```
|
|
|
|
### Nested virtual environments
|
|
- **Issue**: Created your own venv before installing
|
|
- **Fix**: Exit all venvs and use system Python:
|
|
```bash
|
|
deactivate # Exit any active venv
|
|
cd ~/tools/TheAuditor
|
|
pip install -e . # Use system pip
|
|
```
|
|
|
|
---
|
|
|
|
## Documentation
|
|
|
|
- **[How to Use](HOWTOUSE.md)** - Complete installation and usage guide
|
|
- **[Architecture](ARCHITECTURE.md)** - Technical architecture and design patterns
|
|
- **[Contributing](CONTRIBUTING.md)** - How to contribute to TheAuditor
|
|
- **[Roadmap](ROADMAP.md)** - Future development plans
|
|
|
|
## Contributing
|
|
|
|
We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for:
|
|
- How to add new language support
|
|
- Creating security patterns
|
|
- Adding framework-specific rules
|
|
- Development guidelines
|
|
|
|
We especially need help with:
|
|
- **GraphQL** analysis
|
|
- **Java/Spring** support
|
|
- **Go** patterns
|
|
- **Ruby on Rails** detection
|
|
- **C#/.NET** analysis
|
|
|
|
---
|
|
|
|
## License
|
|
|
|
AGPL-3.0
|
|
|
|
## Commercial Licensing
|
|
|
|
TheAuditor is AGPL-3.0 licensed. For commercial use, SaaS deployment, or integration into proprietary systems, please contact via GitHub for licensing options.
|
|
|
|
## Support
|
|
|
|
For issues, questions, or feature requests, please open an issue on our [GitHub repository](https://github.com/TheAuditorTool/Auditor).
|
|
|
|
---
|
|
|
|
*TheAuditor: Bringing ground truth to AI-assisted development* |