mirror of
https://github.com/aljazceru/Auditor.git
synced 2025-12-17 03:24:18 +01:00
Fix: Critical Windows ProcessPoolExecutor hang and documentation drift
Fixed critical Windows compatibility issues and updated outdated documentation.
CRITICAL WINDOWS HANG FIXES:
1. ProcessPoolExecutor → ThreadPoolExecutor
- Fixes PowerShell/terminal hang where Ctrl+C wouldn't work
- Prevents .pf directory lock requiring Task Manager kill
- Root cause: Nested ProcessPool + ThreadPool on Windows creates kernel deadlock
2. Ctrl+C Interruption Support
- Replaced subprocess.run with Popen+poll pattern (industry standard)
- Poll subprocess every 100ms for interruption checking
- Added global stop_event and signal handlers for graceful shutdown
- Root cause: subprocess.run blocks threads with no signal propagation
DOCUMENTATION DRIFT FIX:
- Removed hardcoded "14 phases" references (actual is 19+ commands)
- Updated to "multiple analysis phases" throughout all docs
- Fixed CLI help text to be version-agnostic
- Added missing "Summary generation" step in HOWTOUSE.md
Changes:
- pipelines.py: ProcessPoolExecutor → ThreadPoolExecutor, added Popen+poll pattern
- Added signal handling and run_subprocess_with_interrupt() function
- commands/full.py: Updated docstring to remove specific phase count
- README.md: Changed "14 distinct phases" to "multiple analysis phases"
- HOWTOUSE.md: Updated phase references, added missing summary step
- CLAUDE.md & ARCHITECTURE.md: Removed hardcoded phase counts
Impact: Critical UX fixes - Windows compatibility restored, pipeline interruptible
Testing: Ctrl+C works, no PowerShell hangs, .pf directory deletable
This commit is contained in:
@@ -87,7 +87,7 @@ Key features:
|
|||||||
- **Parallel JavaScript processing** when semantic parser available
|
- **Parallel JavaScript processing** when semantic parser available
|
||||||
|
|
||||||
### Pipeline System (`theauditor/pipelines.py`)
|
### Pipeline System (`theauditor/pipelines.py`)
|
||||||
Orchestrates **14-phase** analysis pipeline in **parallel stages**:
|
Orchestrates comprehensive analysis pipeline in **parallel stages**:
|
||||||
|
|
||||||
**Stage 1 - Foundation (Sequential):**
|
**Stage 1 - Foundation (Sequential):**
|
||||||
1. Repository indexing - Build manifest and symbol database
|
1. Repository indexing - Build manifest and symbol database
|
||||||
|
|||||||
@@ -25,7 +25,7 @@ mypy theauditor --strict # Type checking
|
|||||||
|
|
||||||
# Running TheAuditor
|
# Running TheAuditor
|
||||||
aud init # Initialize project
|
aud init # Initialize project
|
||||||
aud full # Complete analysis (14 phases)
|
aud full # Complete analysis (multiple phases)
|
||||||
aud full --offline # Skip network operations (deps, docs)
|
aud full --offline # Skip network operations (deps, docs)
|
||||||
aud index --exclude-self # When analyzing TheAuditor itself
|
aud index --exclude-self # When analyzing TheAuditor itself
|
||||||
|
|
||||||
@@ -150,7 +150,7 @@ The indexer has been refactored from a monolithic 2000+ line file into a modular
|
|||||||
The package uses a dynamic extractor registry for automatic language detection and processing.
|
The package uses a dynamic extractor registry for automatic language detection and processing.
|
||||||
|
|
||||||
#### Pipeline System (`theauditor/pipelines.py`)
|
#### Pipeline System (`theauditor/pipelines.py`)
|
||||||
- Orchestrates **14-phase** analysis pipeline in **parallel stages**:
|
- Orchestrates comprehensive analysis pipeline in **parallel stages**:
|
||||||
- **Stage 1**: Foundation (index with batched DB operations, framework detection)
|
- **Stage 1**: Foundation (index with batched DB operations, framework detection)
|
||||||
- **Stage 2**: 3 concurrent tracks (Network I/O, Code Analysis, Graph Build)
|
- **Stage 2**: 3 concurrent tracks (Network I/O, Code Analysis, Graph Build)
|
||||||
- **Stage 3**: Final aggregation (graph analysis, taint, FCE, report)
|
- **Stage 3**: Final aggregation (graph analysis, taint, FCE, report)
|
||||||
@@ -324,7 +324,7 @@ if chunk_info.get('truncated', False):
|
|||||||
## Critical Working Knowledge
|
## Critical Working Knowledge
|
||||||
|
|
||||||
### Pipeline Execution Order
|
### Pipeline Execution Order
|
||||||
The `aud full` command runs 14 phases in 3 stages:
|
The `aud full` command runs multiple analysis phases in 3 stages:
|
||||||
1. **Sequential**: index → framework_detect
|
1. **Sequential**: index → framework_detect
|
||||||
2. **Parallel**: (deps, docs) || (workset, lint, patterns) || (graph_build)
|
2. **Parallel**: (deps, docs) || (workset, lint, patterns) || (graph_build)
|
||||||
3. **Sequential**: graph_analyze → taint → fce → report
|
3. **Sequential**: graph_analyze → taint → fce → report
|
||||||
|
|||||||
10
HOWTOUSE.md
10
HOWTOUSE.md
@@ -120,7 +120,7 @@ Setting up JavaScript/TypeScript tools in sandboxed environment...
|
|||||||
On a medium 20k LOC node/react/vite stack, expect the analysis to take around 30 minutes.
|
On a medium 20k LOC node/react/vite stack, expect the analysis to take around 30 minutes.
|
||||||
Progress bars for tracks B/C may display inconsistently on PowerShell.
|
Progress bars for tracks B/C may display inconsistently on PowerShell.
|
||||||
|
|
||||||
Run a comprehensive audit with all **14 analysis phases**:
|
Run a comprehensive audit with multiple analysis phases organized in parallel stages:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
aud full
|
aud full
|
||||||
@@ -152,12 +152,13 @@ This executes in **parallel stages** for optimal performance:
|
|||||||
11. **Taint analysis** - Track data flow
|
11. **Taint analysis** - Track data flow
|
||||||
12. **Factual correlation engine** - Correlate findings across tools with 29 advanced rules
|
12. **Factual correlation engine** - Correlate findings across tools with 29 advanced rules
|
||||||
13. **Report generation** - Produce final output
|
13. **Report generation** - Produce final output
|
||||||
|
14. **Summary generation** - Create executive summary
|
||||||
|
|
||||||
**Output**: Complete results in **`.pf/readthis/`** directory
|
**Output**: Complete results in **`.pf/readthis/`** directory
|
||||||
|
|
||||||
### Offline Mode
|
### Offline Mode
|
||||||
|
|
||||||
When working on the same codebase repeatedly or when network access is limited, use offline mode to skip dependency checking and documentation phases:
|
When working on the same codebase repeatedly or when network access is limited, use offline mode to skip network operations (dependency checking and documentation fetching):
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Run full audit without network operations
|
# Run full audit without network operations
|
||||||
@@ -1069,10 +1070,7 @@ For large repositories:
|
|||||||
# Limit analysis scope
|
# Limit analysis scope
|
||||||
aud workset --paths "src/critical/**/*.py"
|
aud workset --paths "src/critical/**/*.py"
|
||||||
|
|
||||||
# Skip documentation phases
|
# Run specific commands only
|
||||||
aud full --skip-docs
|
|
||||||
|
|
||||||
# Run specific phases only
|
|
||||||
aud index && aud lint && aud detect-patterns
|
aud index && aud lint && aud detect-patterns
|
||||||
|
|
||||||
# Adjust chunking for larger context windows
|
# Adjust chunking for larger context windows
|
||||||
|
|||||||
@@ -149,15 +149,15 @@ This architectural flaw is amplified by two dangerous behaviours inherent to AI
|
|||||||
- **Security Theater**: AI assistants are optimized to "make it work," which often means introducing rampant security anti-patterns like hardcoded credentials, disabled authentication, and the pervasive use of `as any` in TypeScript. This creates a dangerous illusion of progress.
|
- **Security Theater**: AI assistants are optimized to "make it work," which often means introducing rampant security anti-patterns like hardcoded credentials, disabled authentication, and the pervasive use of `as any` in TypeScript. This creates a dangerous illusion of progress.
|
||||||
- **Context Blindness**: With aggressive context compaction, an AI never sees the full picture. It works with fleeting snapshots of code, forcing it to make assumptions instead of decisions based on facts.
|
- **Context Blindness**: With aggressive context compaction, an AI never sees the full picture. It works with fleeting snapshots of code, forcing it to make assumptions instead of decisions based on facts.
|
||||||
|
|
||||||
## The 14-Phase Analysis Pipeline
|
## The Comprehensive Analysis Pipeline
|
||||||
|
|
||||||
TheAuditor runs a comprehensive audit through 14 distinct phases organized in 4 stages:
|
TheAuditor runs a comprehensive audit through multiple analysis phases organized in parallel stages:
|
||||||
|
|
||||||
**STAGE 1: Foundation (Sequential)**
|
**STAGE 1: Foundation (Sequential)**
|
||||||
1. **Index Repository** - Build complete code inventory and SQLite database
|
1. **Index Repository** - Build complete code inventory and SQLite database
|
||||||
2. **Detect Frameworks** - Identify Django, Flask, React, Vue, etc.
|
2. **Detect Frameworks** - Identify Django, Flask, React, Vue, etc.
|
||||||
|
|
||||||
**STAGE 2: Parallel Analysis (3 concurrent tracks)**
|
**STAGE 2: Concurrent Analysis (3 parallel tracks)**
|
||||||
|
|
||||||
*Track A - Network Operations:*
|
*Track A - Network Operations:*
|
||||||
3. **Check Dependencies** - Analyze package versions and known vulnerabilities
|
3. **Check Dependencies** - Analyze package versions and known vulnerabilities
|
||||||
@@ -175,9 +175,10 @@ TheAuditor runs a comprehensive audit through 14 distinct phases organized in 4
|
|||||||
11. **Visualize Graph** - Generate multiple graph views
|
11. **Visualize Graph** - Generate multiple graph views
|
||||||
12. **Taint Analysis** - Track data flow from sources to sinks
|
12. **Taint Analysis** - Track data flow from sources to sinks
|
||||||
|
|
||||||
**STAGE 3: Aggregation (Sequential)**
|
**STAGE 3: Final Aggregation (Sequential)**
|
||||||
13. **Factual Correlation Engine** - Cross-reference findings across all tools
|
13. **Factual Correlation Engine** - Cross-reference findings across all tools
|
||||||
14. **Generate Report** - Produce final AI-consumable chunks in `.pf/readthis/`
|
14. **Generate Report** - Produce final AI-consumable chunks in `.pf/readthis/`
|
||||||
|
15. **Summary Generation** - Create executive summary of findings
|
||||||
|
|
||||||
## Key Features
|
## Key Features
|
||||||
|
|
||||||
|
|||||||
@@ -13,7 +13,7 @@ from theauditor.utils.exit_codes import ExitCodes
|
|||||||
@click.option("--exclude-self", is_flag=True, help="Exclude TheAuditor's own files (for self-testing)")
|
@click.option("--exclude-self", is_flag=True, help="Exclude TheAuditor's own files (for self-testing)")
|
||||||
@click.option("--offline", is_flag=True, help="Skip network operations (deps, docs)")
|
@click.option("--offline", is_flag=True, help="Skip network operations (deps, docs)")
|
||||||
def full(root, quiet, exclude_self, offline):
|
def full(root, quiet, exclude_self, offline):
|
||||||
"""Run complete audit pipeline in exact order specified in teamsop.md."""
|
"""Run complete audit pipeline with multiple analysis phases organized in parallel stages."""
|
||||||
from theauditor.pipelines import run_full_pipeline
|
from theauditor.pipelines import run_full_pipeline
|
||||||
|
|
||||||
# Define log callback for console output
|
# Define log callback for console output
|
||||||
|
|||||||
@@ -4,11 +4,13 @@ import json
|
|||||||
import os
|
import os
|
||||||
import platform
|
import platform
|
||||||
import shutil
|
import shutil
|
||||||
|
import signal
|
||||||
import subprocess
|
import subprocess
|
||||||
import sys
|
import sys
|
||||||
import tempfile
|
import tempfile
|
||||||
|
import threading
|
||||||
import time
|
import time
|
||||||
from concurrent.futures import ProcessPoolExecutor, as_completed, wait
|
from concurrent.futures import ThreadPoolExecutor, as_completed, wait
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Any, Callable, List, Tuple
|
from typing import Any, Callable, List, Tuple
|
||||||
@@ -23,6 +25,79 @@ except ImportError:
|
|||||||
# Windows compatibility
|
# Windows compatibility
|
||||||
IS_WINDOWS = platform.system() == "Windows"
|
IS_WINDOWS = platform.system() == "Windows"
|
||||||
|
|
||||||
|
# Global stop event for interrupt handling
|
||||||
|
stop_event = threading.Event()
|
||||||
|
|
||||||
|
def signal_handler(signum, frame):
|
||||||
|
"""Handle Ctrl+C by setting stop event."""
|
||||||
|
print("\n[INFO] Interrupt received, stopping pipeline gracefully...", file=sys.stderr)
|
||||||
|
stop_event.set()
|
||||||
|
|
||||||
|
# Register signal handler
|
||||||
|
signal.signal(signal.SIGINT, signal_handler)
|
||||||
|
if not IS_WINDOWS:
|
||||||
|
signal.signal(signal.SIGTERM, signal_handler)
|
||||||
|
|
||||||
|
def run_subprocess_with_interrupt(cmd, stdout_fp, stderr_fp, cwd, shell=False, timeout=300):
|
||||||
|
"""
|
||||||
|
Run subprocess with interrupt checking every 100ms.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
cmd: Command to execute
|
||||||
|
stdout_fp: File handle for stdout
|
||||||
|
stderr_fp: File handle for stderr
|
||||||
|
cwd: Working directory
|
||||||
|
shell: Whether to use shell execution
|
||||||
|
timeout: Maximum time to wait (seconds)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
subprocess.CompletedProcess-like object with returncode, stdout, stderr
|
||||||
|
"""
|
||||||
|
process = subprocess.Popen(
|
||||||
|
cmd,
|
||||||
|
stdout=stdout_fp,
|
||||||
|
stderr=stderr_fp,
|
||||||
|
text=True,
|
||||||
|
cwd=cwd,
|
||||||
|
shell=shell
|
||||||
|
)
|
||||||
|
|
||||||
|
# Poll process every 100ms to check for completion or interruption
|
||||||
|
start_time = time.time()
|
||||||
|
while process.poll() is None:
|
||||||
|
if stop_event.is_set():
|
||||||
|
# User interrupted - terminate subprocess
|
||||||
|
process.terminate()
|
||||||
|
try:
|
||||||
|
process.wait(timeout=5)
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
process.kill()
|
||||||
|
process.wait()
|
||||||
|
raise KeyboardInterrupt("Pipeline interrupted by user")
|
||||||
|
|
||||||
|
# Check timeout
|
||||||
|
if time.time() - start_time > timeout:
|
||||||
|
process.terminate()
|
||||||
|
try:
|
||||||
|
process.wait(timeout=5)
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
process.kill()
|
||||||
|
process.wait()
|
||||||
|
raise subprocess.TimeoutExpired(cmd, timeout)
|
||||||
|
|
||||||
|
# Sleep briefly to avoid busy-waiting
|
||||||
|
time.sleep(0.1)
|
||||||
|
|
||||||
|
# Create result object similar to subprocess.run
|
||||||
|
class Result:
|
||||||
|
def __init__(self, returncode):
|
||||||
|
self.returncode = returncode
|
||||||
|
self.stdout = None
|
||||||
|
self.stderr = None
|
||||||
|
|
||||||
|
result = Result(process.returncode)
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
def run_command_chain(commands: List[Tuple[str, List[str]]], root: str, chain_name: str) -> dict:
|
def run_command_chain(commands: List[Tuple[str, List[str]]], root: str, chain_name: str) -> dict:
|
||||||
"""
|
"""
|
||||||
@@ -101,13 +176,13 @@ def run_command_chain(commands: List[Tuple[str, List[str]]], root: str, chain_na
|
|||||||
with open(stdout_file, 'w+', encoding='utf-8') as out_fp, \
|
with open(stdout_file, 'w+', encoding='utf-8') as out_fp, \
|
||||||
open(stderr_file, 'w+', encoding='utf-8') as err_fp:
|
open(stderr_file, 'w+', encoding='utf-8') as err_fp:
|
||||||
|
|
||||||
result = subprocess.run(
|
result = run_subprocess_with_interrupt(
|
||||||
cmd,
|
cmd,
|
||||||
stdout=out_fp,
|
stdout_fp=out_fp,
|
||||||
stderr=err_fp,
|
stderr_fp=err_fp,
|
||||||
text=True,
|
|
||||||
cwd=root,
|
cwd=root,
|
||||||
shell=IS_WINDOWS # Windows compatibility fix
|
shell=IS_WINDOWS, # Windows compatibility fix
|
||||||
|
timeout=300 # 5 minutes per command in parallel tracks
|
||||||
)
|
)
|
||||||
|
|
||||||
# Read outputs
|
# Read outputs
|
||||||
@@ -157,6 +232,12 @@ def run_command_chain(commands: List[Tuple[str, List[str]]], root: str, chain_na
|
|||||||
chain_errors.append(f"Error in {description}: {stderr}")
|
chain_errors.append(f"Error in {description}: {stderr}")
|
||||||
break # Stop chain on failure
|
break # Stop chain on failure
|
||||||
|
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
# User interrupted - clean up and exit
|
||||||
|
failed = True
|
||||||
|
write_status(f"INTERRUPTED: {description}", completed_count, len(commands))
|
||||||
|
chain_output.append(f"[INTERRUPTED] Pipeline stopped by user")
|
||||||
|
raise # Re-raise to propagate up
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
failed = True
|
failed = True
|
||||||
write_status(f"ERROR: {description}", completed_count, len(commands))
|
write_status(f"ERROR: {description}", completed_count, len(commands))
|
||||||
@@ -475,13 +556,13 @@ def run_full_pipeline(
|
|||||||
with open(stdout_file, 'w+', encoding='utf-8') as out_fp, \
|
with open(stdout_file, 'w+', encoding='utf-8') as out_fp, \
|
||||||
open(stderr_file, 'w+', encoding='utf-8') as err_fp:
|
open(stderr_file, 'w+', encoding='utf-8') as err_fp:
|
||||||
|
|
||||||
result = subprocess.run(
|
result = run_subprocess_with_interrupt(
|
||||||
cmd,
|
cmd,
|
||||||
stdout=out_fp,
|
stdout_fp=out_fp,
|
||||||
stderr=err_fp,
|
stderr_fp=err_fp,
|
||||||
text=True,
|
|
||||||
cwd=root,
|
cwd=root,
|
||||||
shell=IS_WINDOWS # Windows compatibility fix
|
shell=IS_WINDOWS, # Windows compatibility fix
|
||||||
|
timeout=300 # 5 minutes per command in parallel tracks
|
||||||
)
|
)
|
||||||
|
|
||||||
# Read outputs
|
# Read outputs
|
||||||
@@ -590,9 +671,9 @@ def run_full_pipeline(
|
|||||||
log_output(" Track B: Code Analysis (workset, lint, patterns)")
|
log_output(" Track B: Code Analysis (workset, lint, patterns)")
|
||||||
log_output(" Track C: Graph & Taint Analysis")
|
log_output(" Track C: Graph & Taint Analysis")
|
||||||
|
|
||||||
# Execute parallel tracks using ProcessPoolExecutor
|
# Execute parallel tracks using ThreadPoolExecutor (Windows-safe)
|
||||||
parallel_results = []
|
parallel_results = []
|
||||||
with ProcessPoolExecutor(max_workers=3) as executor:
|
with ThreadPoolExecutor(max_workers=3) as executor:
|
||||||
futures = []
|
futures = []
|
||||||
|
|
||||||
# Submit Track A if it has commands
|
# Submit Track A if it has commands
|
||||||
@@ -673,6 +754,12 @@ def run_full_pipeline(
|
|||||||
else:
|
else:
|
||||||
log_output(f"[FAILED] {result['name']} failed", is_error=True)
|
log_output(f"[FAILED] {result['name']} failed", is_error=True)
|
||||||
failed_phases += 1
|
failed_phases += 1
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
log_output(f"[INTERRUPTED] Pipeline stopped by user", is_error=True)
|
||||||
|
# Cancel remaining futures
|
||||||
|
for f in pending_futures:
|
||||||
|
f.cancel()
|
||||||
|
raise # Re-raise to exit
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
log_output(f"[ERROR] Parallel track failed with exception: {e}", is_error=True)
|
log_output(f"[ERROR] Parallel track failed with exception: {e}", is_error=True)
|
||||||
failed_phases += 1
|
failed_phases += 1
|
||||||
@@ -715,13 +802,13 @@ def run_full_pipeline(
|
|||||||
with open(stdout_file, 'w+', encoding='utf-8') as out_fp, \
|
with open(stdout_file, 'w+', encoding='utf-8') as out_fp, \
|
||||||
open(stderr_file, 'w+', encoding='utf-8') as err_fp:
|
open(stderr_file, 'w+', encoding='utf-8') as err_fp:
|
||||||
|
|
||||||
result = subprocess.run(
|
result = run_subprocess_with_interrupt(
|
||||||
cmd,
|
cmd,
|
||||||
stdout=out_fp,
|
stdout_fp=out_fp,
|
||||||
stderr=err_fp,
|
stderr_fp=err_fp,
|
||||||
text=True,
|
|
||||||
cwd=root,
|
cwd=root,
|
||||||
shell=IS_WINDOWS # Windows compatibility fix
|
shell=IS_WINDOWS, # Windows compatibility fix
|
||||||
|
timeout=600 # 10 minutes for final aggregation
|
||||||
)
|
)
|
||||||
|
|
||||||
# Read outputs
|
# Read outputs
|
||||||
|
|||||||
Reference in New Issue
Block a user