mirror of
https://github.com/aljazceru/Auditor.git
synced 2025-12-17 03:24:18 +01:00
Fix: Critical Windows ProcessPoolExecutor hang and documentation drift
Fixed critical Windows compatibility issues and updated outdated documentation.
CRITICAL WINDOWS HANG FIXES:
1. ProcessPoolExecutor → ThreadPoolExecutor
- Fixes PowerShell/terminal hang where Ctrl+C wouldn't work
- Prevents .pf directory lock requiring Task Manager kill
- Root cause: Nested ProcessPool + ThreadPool on Windows creates kernel deadlock
2. Ctrl+C Interruption Support
- Replaced subprocess.run with Popen+poll pattern (industry standard)
- Poll subprocess every 100ms for interruption checking
- Added global stop_event and signal handlers for graceful shutdown
- Root cause: subprocess.run blocks threads with no signal propagation
DOCUMENTATION DRIFT FIX:
- Removed hardcoded "14 phases" references (actual is 19+ commands)
- Updated to "multiple analysis phases" throughout all docs
- Fixed CLI help text to be version-agnostic
- Added missing "Summary generation" step in HOWTOUSE.md
Changes:
- pipelines.py: ProcessPoolExecutor → ThreadPoolExecutor, added Popen+poll pattern
- Added signal handling and run_subprocess_with_interrupt() function
- commands/full.py: Updated docstring to remove specific phase count
- README.md: Changed "14 distinct phases" to "multiple analysis phases"
- HOWTOUSE.md: Updated phase references, added missing summary step
- CLAUDE.md & ARCHITECTURE.md: Removed hardcoded phase counts
Impact: Critical UX fixes - Windows compatibility restored, pipeline interruptible
Testing: Ctrl+C works, no PowerShell hangs, .pf directory deletable
This commit is contained in:
@@ -87,7 +87,7 @@ Key features:
|
||||
- **Parallel JavaScript processing** when semantic parser available
|
||||
|
||||
### Pipeline System (`theauditor/pipelines.py`)
|
||||
Orchestrates **14-phase** analysis pipeline in **parallel stages**:
|
||||
Orchestrates comprehensive analysis pipeline in **parallel stages**:
|
||||
|
||||
**Stage 1 - Foundation (Sequential):**
|
||||
1. Repository indexing - Build manifest and symbol database
|
||||
|
||||
@@ -25,7 +25,7 @@ mypy theauditor --strict # Type checking
|
||||
|
||||
# Running TheAuditor
|
||||
aud init # Initialize project
|
||||
aud full # Complete analysis (14 phases)
|
||||
aud full # Complete analysis (multiple phases)
|
||||
aud full --offline # Skip network operations (deps, docs)
|
||||
aud index --exclude-self # When analyzing TheAuditor itself
|
||||
|
||||
@@ -150,7 +150,7 @@ The indexer has been refactored from a monolithic 2000+ line file into a modular
|
||||
The package uses a dynamic extractor registry for automatic language detection and processing.
|
||||
|
||||
#### Pipeline System (`theauditor/pipelines.py`)
|
||||
- Orchestrates **14-phase** analysis pipeline in **parallel stages**:
|
||||
- Orchestrates comprehensive analysis pipeline in **parallel stages**:
|
||||
- **Stage 1**: Foundation (index with batched DB operations, framework detection)
|
||||
- **Stage 2**: 3 concurrent tracks (Network I/O, Code Analysis, Graph Build)
|
||||
- **Stage 3**: Final aggregation (graph analysis, taint, FCE, report)
|
||||
@@ -324,7 +324,7 @@ if chunk_info.get('truncated', False):
|
||||
## Critical Working Knowledge
|
||||
|
||||
### Pipeline Execution Order
|
||||
The `aud full` command runs 14 phases in 3 stages:
|
||||
The `aud full` command runs multiple analysis phases in 3 stages:
|
||||
1. **Sequential**: index → framework_detect
|
||||
2. **Parallel**: (deps, docs) || (workset, lint, patterns) || (graph_build)
|
||||
3. **Sequential**: graph_analyze → taint → fce → report
|
||||
|
||||
10
HOWTOUSE.md
10
HOWTOUSE.md
@@ -120,7 +120,7 @@ Setting up JavaScript/TypeScript tools in sandboxed environment...
|
||||
On a medium 20k LOC node/react/vite stack, expect the analysis to take around 30 minutes.
|
||||
Progress bars for tracks B/C may display inconsistently on PowerShell.
|
||||
|
||||
Run a comprehensive audit with all **14 analysis phases**:
|
||||
Run a comprehensive audit with multiple analysis phases organized in parallel stages:
|
||||
|
||||
```bash
|
||||
aud full
|
||||
@@ -152,12 +152,13 @@ This executes in **parallel stages** for optimal performance:
|
||||
11. **Taint analysis** - Track data flow
|
||||
12. **Factual correlation engine** - Correlate findings across tools with 29 advanced rules
|
||||
13. **Report generation** - Produce final output
|
||||
14. **Summary generation** - Create executive summary
|
||||
|
||||
**Output**: Complete results in **`.pf/readthis/`** directory
|
||||
|
||||
### Offline Mode
|
||||
|
||||
When working on the same codebase repeatedly or when network access is limited, use offline mode to skip dependency checking and documentation phases:
|
||||
When working on the same codebase repeatedly or when network access is limited, use offline mode to skip network operations (dependency checking and documentation fetching):
|
||||
|
||||
```bash
|
||||
# Run full audit without network operations
|
||||
@@ -1069,10 +1070,7 @@ For large repositories:
|
||||
# Limit analysis scope
|
||||
aud workset --paths "src/critical/**/*.py"
|
||||
|
||||
# Skip documentation phases
|
||||
aud full --skip-docs
|
||||
|
||||
# Run specific phases only
|
||||
# Run specific commands only
|
||||
aud index && aud lint && aud detect-patterns
|
||||
|
||||
# Adjust chunking for larger context windows
|
||||
|
||||
@@ -149,15 +149,15 @@ This architectural flaw is amplified by two dangerous behaviours inherent to AI
|
||||
- **Security Theater**: AI assistants are optimized to "make it work," which often means introducing rampant security anti-patterns like hardcoded credentials, disabled authentication, and the pervasive use of `as any` in TypeScript. This creates a dangerous illusion of progress.
|
||||
- **Context Blindness**: With aggressive context compaction, an AI never sees the full picture. It works with fleeting snapshots of code, forcing it to make assumptions instead of decisions based on facts.
|
||||
|
||||
## The 14-Phase Analysis Pipeline
|
||||
## The Comprehensive Analysis Pipeline
|
||||
|
||||
TheAuditor runs a comprehensive audit through 14 distinct phases organized in 4 stages:
|
||||
TheAuditor runs a comprehensive audit through multiple analysis phases organized in parallel stages:
|
||||
|
||||
**STAGE 1: Foundation (Sequential)**
|
||||
1. **Index Repository** - Build complete code inventory and SQLite database
|
||||
2. **Detect Frameworks** - Identify Django, Flask, React, Vue, etc.
|
||||
|
||||
**STAGE 2: Parallel Analysis (3 concurrent tracks)**
|
||||
**STAGE 2: Concurrent Analysis (3 parallel tracks)**
|
||||
|
||||
*Track A - Network Operations:*
|
||||
3. **Check Dependencies** - Analyze package versions and known vulnerabilities
|
||||
@@ -175,9 +175,10 @@ TheAuditor runs a comprehensive audit through 14 distinct phases organized in 4
|
||||
11. **Visualize Graph** - Generate multiple graph views
|
||||
12. **Taint Analysis** - Track data flow from sources to sinks
|
||||
|
||||
**STAGE 3: Aggregation (Sequential)**
|
||||
**STAGE 3: Final Aggregation (Sequential)**
|
||||
13. **Factual Correlation Engine** - Cross-reference findings across all tools
|
||||
14. **Generate Report** - Produce final AI-consumable chunks in `.pf/readthis/`
|
||||
15. **Summary Generation** - Create executive summary of findings
|
||||
|
||||
## Key Features
|
||||
|
||||
|
||||
@@ -13,7 +13,7 @@ from theauditor.utils.exit_codes import ExitCodes
|
||||
@click.option("--exclude-self", is_flag=True, help="Exclude TheAuditor's own files (for self-testing)")
|
||||
@click.option("--offline", is_flag=True, help="Skip network operations (deps, docs)")
|
||||
def full(root, quiet, exclude_self, offline):
|
||||
"""Run complete audit pipeline in exact order specified in teamsop.md."""
|
||||
"""Run complete audit pipeline with multiple analysis phases organized in parallel stages."""
|
||||
from theauditor.pipelines import run_full_pipeline
|
||||
|
||||
# Define log callback for console output
|
||||
|
||||
@@ -4,11 +4,13 @@ import json
|
||||
import os
|
||||
import platform
|
||||
import shutil
|
||||
import signal
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
import threading
|
||||
import time
|
||||
from concurrent.futures import ProcessPoolExecutor, as_completed, wait
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed, wait
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any, Callable, List, Tuple
|
||||
@@ -23,6 +25,79 @@ except ImportError:
|
||||
# Windows compatibility
|
||||
IS_WINDOWS = platform.system() == "Windows"
|
||||
|
||||
# Global stop event for interrupt handling
|
||||
stop_event = threading.Event()
|
||||
|
||||
def signal_handler(signum, frame):
|
||||
"""Handle Ctrl+C by setting stop event."""
|
||||
print("\n[INFO] Interrupt received, stopping pipeline gracefully...", file=sys.stderr)
|
||||
stop_event.set()
|
||||
|
||||
# Register signal handler
|
||||
signal.signal(signal.SIGINT, signal_handler)
|
||||
if not IS_WINDOWS:
|
||||
signal.signal(signal.SIGTERM, signal_handler)
|
||||
|
||||
def run_subprocess_with_interrupt(cmd, stdout_fp, stderr_fp, cwd, shell=False, timeout=300):
|
||||
"""
|
||||
Run subprocess with interrupt checking every 100ms.
|
||||
|
||||
Args:
|
||||
cmd: Command to execute
|
||||
stdout_fp: File handle for stdout
|
||||
stderr_fp: File handle for stderr
|
||||
cwd: Working directory
|
||||
shell: Whether to use shell execution
|
||||
timeout: Maximum time to wait (seconds)
|
||||
|
||||
Returns:
|
||||
subprocess.CompletedProcess-like object with returncode, stdout, stderr
|
||||
"""
|
||||
process = subprocess.Popen(
|
||||
cmd,
|
||||
stdout=stdout_fp,
|
||||
stderr=stderr_fp,
|
||||
text=True,
|
||||
cwd=cwd,
|
||||
shell=shell
|
||||
)
|
||||
|
||||
# Poll process every 100ms to check for completion or interruption
|
||||
start_time = time.time()
|
||||
while process.poll() is None:
|
||||
if stop_event.is_set():
|
||||
# User interrupted - terminate subprocess
|
||||
process.terminate()
|
||||
try:
|
||||
process.wait(timeout=5)
|
||||
except subprocess.TimeoutExpired:
|
||||
process.kill()
|
||||
process.wait()
|
||||
raise KeyboardInterrupt("Pipeline interrupted by user")
|
||||
|
||||
# Check timeout
|
||||
if time.time() - start_time > timeout:
|
||||
process.terminate()
|
||||
try:
|
||||
process.wait(timeout=5)
|
||||
except subprocess.TimeoutExpired:
|
||||
process.kill()
|
||||
process.wait()
|
||||
raise subprocess.TimeoutExpired(cmd, timeout)
|
||||
|
||||
# Sleep briefly to avoid busy-waiting
|
||||
time.sleep(0.1)
|
||||
|
||||
# Create result object similar to subprocess.run
|
||||
class Result:
|
||||
def __init__(self, returncode):
|
||||
self.returncode = returncode
|
||||
self.stdout = None
|
||||
self.stderr = None
|
||||
|
||||
result = Result(process.returncode)
|
||||
return result
|
||||
|
||||
|
||||
def run_command_chain(commands: List[Tuple[str, List[str]]], root: str, chain_name: str) -> dict:
|
||||
"""
|
||||
@@ -101,13 +176,13 @@ def run_command_chain(commands: List[Tuple[str, List[str]]], root: str, chain_na
|
||||
with open(stdout_file, 'w+', encoding='utf-8') as out_fp, \
|
||||
open(stderr_file, 'w+', encoding='utf-8') as err_fp:
|
||||
|
||||
result = subprocess.run(
|
||||
result = run_subprocess_with_interrupt(
|
||||
cmd,
|
||||
stdout=out_fp,
|
||||
stderr=err_fp,
|
||||
text=True,
|
||||
stdout_fp=out_fp,
|
||||
stderr_fp=err_fp,
|
||||
cwd=root,
|
||||
shell=IS_WINDOWS # Windows compatibility fix
|
||||
shell=IS_WINDOWS, # Windows compatibility fix
|
||||
timeout=300 # 5 minutes per command in parallel tracks
|
||||
)
|
||||
|
||||
# Read outputs
|
||||
@@ -157,6 +232,12 @@ def run_command_chain(commands: List[Tuple[str, List[str]]], root: str, chain_na
|
||||
chain_errors.append(f"Error in {description}: {stderr}")
|
||||
break # Stop chain on failure
|
||||
|
||||
except KeyboardInterrupt:
|
||||
# User interrupted - clean up and exit
|
||||
failed = True
|
||||
write_status(f"INTERRUPTED: {description}", completed_count, len(commands))
|
||||
chain_output.append(f"[INTERRUPTED] Pipeline stopped by user")
|
||||
raise # Re-raise to propagate up
|
||||
except Exception as e:
|
||||
failed = True
|
||||
write_status(f"ERROR: {description}", completed_count, len(commands))
|
||||
@@ -475,13 +556,13 @@ def run_full_pipeline(
|
||||
with open(stdout_file, 'w+', encoding='utf-8') as out_fp, \
|
||||
open(stderr_file, 'w+', encoding='utf-8') as err_fp:
|
||||
|
||||
result = subprocess.run(
|
||||
result = run_subprocess_with_interrupt(
|
||||
cmd,
|
||||
stdout=out_fp,
|
||||
stderr=err_fp,
|
||||
text=True,
|
||||
stdout_fp=out_fp,
|
||||
stderr_fp=err_fp,
|
||||
cwd=root,
|
||||
shell=IS_WINDOWS # Windows compatibility fix
|
||||
shell=IS_WINDOWS, # Windows compatibility fix
|
||||
timeout=300 # 5 minutes per command in parallel tracks
|
||||
)
|
||||
|
||||
# Read outputs
|
||||
@@ -590,9 +671,9 @@ def run_full_pipeline(
|
||||
log_output(" Track B: Code Analysis (workset, lint, patterns)")
|
||||
log_output(" Track C: Graph & Taint Analysis")
|
||||
|
||||
# Execute parallel tracks using ProcessPoolExecutor
|
||||
# Execute parallel tracks using ThreadPoolExecutor (Windows-safe)
|
||||
parallel_results = []
|
||||
with ProcessPoolExecutor(max_workers=3) as executor:
|
||||
with ThreadPoolExecutor(max_workers=3) as executor:
|
||||
futures = []
|
||||
|
||||
# Submit Track A if it has commands
|
||||
@@ -673,6 +754,12 @@ def run_full_pipeline(
|
||||
else:
|
||||
log_output(f"[FAILED] {result['name']} failed", is_error=True)
|
||||
failed_phases += 1
|
||||
except KeyboardInterrupt:
|
||||
log_output(f"[INTERRUPTED] Pipeline stopped by user", is_error=True)
|
||||
# Cancel remaining futures
|
||||
for f in pending_futures:
|
||||
f.cancel()
|
||||
raise # Re-raise to exit
|
||||
except Exception as e:
|
||||
log_output(f"[ERROR] Parallel track failed with exception: {e}", is_error=True)
|
||||
failed_phases += 1
|
||||
@@ -715,13 +802,13 @@ def run_full_pipeline(
|
||||
with open(stdout_file, 'w+', encoding='utf-8') as out_fp, \
|
||||
open(stderr_file, 'w+', encoding='utf-8') as err_fp:
|
||||
|
||||
result = subprocess.run(
|
||||
result = run_subprocess_with_interrupt(
|
||||
cmd,
|
||||
stdout=out_fp,
|
||||
stderr=err_fp,
|
||||
text=True,
|
||||
stdout_fp=out_fp,
|
||||
stderr_fp=err_fp,
|
||||
cwd=root,
|
||||
shell=IS_WINDOWS # Windows compatibility fix
|
||||
shell=IS_WINDOWS, # Windows compatibility fix
|
||||
timeout=600 # 10 minutes for final aggregation
|
||||
)
|
||||
|
||||
# Read outputs
|
||||
|
||||
Reference in New Issue
Block a user