13 KiB
Contributing to TheAuditor
Thank you for your interest in contributing to TheAuditor! We're excited to have you join our mission to bring ground truth to AI-assisted development. This guide will help you get started with contributing to the project.
How to Get Involved
Reporting Bugs
Found a bug? Please help us fix it!
- Check existing GitHub Issues to see if it's already reported
- If not, create a new issue with:
- Clear description of the bug
- Steps to reproduce
- Expected vs actual behavior
- Your environment details (OS, Python version, Node.js version)
Suggesting Enhancements
Have an idea for improving TheAuditor?
- Review our ROADMAP.md to see if it aligns with our vision
- Check GitHub Issues for similar suggestions
- Create a new issue describing:
- The problem you're trying to solve
- Your proposed solution
- Why this would benefit TheAuditor users
Setting Up Your Development Environment
Follow these steps to get TheAuditor running locally for development:
# Clone the repository
git clone https://github.com/TheAuditorTool/Auditor.git
cd theauditor
# Create a Python virtual environment
python -m venv .venv
# Activate the virtual environment
# On Linux/macOS:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate
# Install TheAuditor in development mode
pip install -e .
# Optional: Install with ML capabilities
# pip install -e ".[ml]"
# For development with all optional dependencies:
# pip install -e ".[all]"
# MANDATORY: Set up the sandboxed environment
# This is required for TheAuditor to function at all
aud setup-claude --target .
The aud setup-claude --target . command creates an isolated environment at .auditor_venv/.theauditor_tools/ with all necessary JavaScript and TypeScript analysis tools. This ensures consistent, reproducible results across all development environments.
Making Changes & Submitting a Pull Request
Development Workflow
- Fork the repository on GitHub
- Create a feature branch from
main:git checkout -b feature/your-feature-name - Make your changes following our code standards (see below)
- Write/update tests if applicable
- Commit your changes with clear, descriptive messages:
git commit -m "Add GraphQL schema analyzer for type validation" - Push to your fork:
git push origin feature/your-feature-name - Create a Pull Request on GitHub with:
- Clear description of changes
- Link to any related issues
- Test results or examples
Code Standards
We use ruff for both linting and formatting Python code. Before submitting any code, you MUST run:
# Fix any auto-fixable issues and check for remaining problems
ruff check . --fix
# Format all Python code
ruff format .
Your pull request will not be merged if it fails these checks.
Additional Quality Checks
For comprehensive code quality, you can also run:
# Type checking (optional but recommended)
mypy theauditor --strict
# Run tests
pytest tests/
# Full linting suite
make lint
Code Style Guidelines
- Follow PEP 8 for Python code
- Use descriptive variable and function names
- Add docstrings to all public functions and classes
- Keep functions focused and small (under 50 lines preferred)
- Write self-documenting code; minimize comments
- Never commit secrets, API keys, or credentials
Adding Support for New Languages
TheAuditor's modular architecture makes it straightforward to add support for new programming languages. This section provides comprehensive guidance for contributors looking to expand our language coverage.
Overview
Adding a new language to TheAuditor involves:
- Creating a parser for the language
- Adding framework detection patterns
- Creating security pattern rules
- Writing comprehensive tests
- Updating documentation
Prerequisites
Before starting, ensure you have:
- Deep knowledge of the target language and its ecosystem
- Understanding of common security vulnerabilities in that language
- Familiarity with AST (Abstract Syntax Tree) concepts
- Python development experience
Step-by-Step Guide
Step 1: Create the Language Extractor
Create a new extractor in theauditor/indexer/extractors/{language}.py that inherits from BaseExtractor:
from . import BaseExtractor
class {Language}Extractor(BaseExtractor):
def supported_extensions(self) -> List[str]:
"""Return list of file extensions this extractor supports."""
return ['.ext', '.ext2']
def extract(self, file_info: Dict[str, Any], content: str,
tree: Optional[Any] = None) -> Dict[str, Any]:
"""Extract all relevant information from a file."""
return {
'imports': self.extract_imports(content, file_info['ext']),
'routes': self.extract_routes(content),
'symbols': [], # Add symbol extraction logic
'assignments': [], # For taint analysis
'function_calls': [], # For call graph
'returns': [] # For data flow
}
The extractor will be automatically registered through the BaseExtractor inheritance pattern.
Step 2: Create Configuration Parser (Optional)
If your language has configuration files that need parsing, create a parser in theauditor/parsers/{language}_parser.py:
class {Language}Parser:
def parse_file(self, file_path: Path) -> Dict[str, Any]:
"""Parse configuration file and extract security-relevant data."""
# Parse and return structured data
return parsed_data
Step 3: Add Framework Detection
Add your language's frameworks to theauditor/framework_registry.py:
# Add to FRAMEWORK_REGISTRY dictionary
"{framework_name}": {
"language": "{language}",
"detection_sources": {
# Package manifest files
"package.{ext}": [
["dependencies"],
["devDependencies"],
],
# Or for line-based search
"requirements.txt": "line_search",
# Or for content search
"build.file": "content_search",
},
"package_pattern": "{framework_package_name}",
"import_patterns": ["import {framework}", "from {framework}"],
"file_markers": ["config.{ext}", "app.{ext}"],
}
Step 4: Create Language-Specific Patterns
Create security patterns for your language in theauditor/patterns/{language}.yml:
Example pattern structure:
- name: hardcoded-secret-{language}
pattern: '(api[_-]?key|secret|token|password)\s*=\s*["\'][^"\']+["\']'
severity: critical
category: security
languages: ["{language}"]
description: "Hardcoded secret detected in {Language} code"
cwe: CWE-798
Step 5: Create AST-Based Rules (Optional but Recommended)
For complex security patterns, create AST-based rules in theauditor/rules/{language}/:
"""Security rules for {Language} using AST analysis."""
from typing import Any, Dict, List
def find_{vulnerability}_issues(ast_tree: Any, file_path: str) -> List[Dict[str, Any]]:
"""Find {vulnerability} issues in {Language} code.
Args:
ast_tree: Parsed AST from {language}_parser
file_path: Path to the source file
Returns:
List of findings with standard format
"""
findings = []
# Implement AST traversal and pattern detection
for node in walk_ast(ast_tree):
if is_vulnerable_pattern(node):
findings.append({
'pattern_name': '{VULNERABILITY}_ISSUE',
'message': 'Detailed description of the issue',
'file': file_path,
'line': node.line,
'column': node.column,
'severity': 'high',
'snippet': extract_snippet(node),
'category': 'security',
'match_type': 'ast'
})
return findings
Extractor Interface Specification
All language extractors MUST inherit from BaseExtractor and implement:
from theauditor.indexer.extractors import BaseExtractor
class LanguageExtractor(BaseExtractor):
"""Extractor for {Language} files."""
def supported_extensions(self) -> List[str]:
"""Return list of supported file extensions."""
return ['.ext']
def extract(self, file_info: Dict[str, Any], content: str,
tree: Optional[Any] = None) -> Dict[str, Any]:
"""Extract all relevant information from a file."""
return {
'imports': [],
'routes': [],
'symbols': [],
'assignments': [],
'function_calls': [],
'returns': []
}
Testing Requirements
Required Test Coverage
-
Extractor Tests (
tests/test_{language}_extractor.py):- Test extracting from valid files
- Test handling of syntax errors
- Test symbol extraction
- Test import extraction
- Test file extension detection
-
Pattern Tests (
tests/patterns/test_{language}_patterns.py):- Test security pattern detection
- Ensure patterns don't over-match (false positives)
-
Integration Tests (
tests/integration/test_{language}_integration.py):- Test language in complete analysis pipeline
Test Data
Create test fixtures in tests/fixtures/{language}/:
valid_code.{ext}- Valid code samplesvulnerable_code.{ext}- Code with known vulnerabilitiesedge_cases.{ext}- Edge cases and corner scenarios
Submission Checklist
Before submitting your PR, ensure:
- Extractor inherits from
BaseExtractorand implements required methods - Extractor placed in
theauditor/indexer/extractors/{language}.py - Framework detection added to
framework_detector.py(if applicable) - At least 10 security patterns created in
patterns/{language}.yml - AST-based rules for complex patterns (if applicable)
- All tests passing with >80% coverage
- Documentation updated (extractor docstrings, pattern descriptions)
- Example vulnerable code provided in test fixtures
- No external dependencies without approval
- Code follows project style (run
ruff format)
Adding New Analyzers
The Three-Tier Detection Architecture
TheAuditor uses a hybrid approach to detection, prioritizing accuracy and context. When contributing a new rule, please adhere to the following "AST First, Regex as Fallback" philosophy:
-
Tier 1: Multi-Language AST Rules (Preferred) For complex code patterns in source code (Python, JS/TS, etc.), extend or create a polymorphic AST-based rule in the
/rulesdirectory. These are the most powerful and accurate and should be the default choice for source code analysis. -
Tier 2: Language-Specific AST Rules If a multi-language backend is not feasible, a language-specific AST rule is the next best option. The corresponding regex pattern should then be scoped to exclude the language covered by the AST rule (see
db_issues.ymlfor an example). -
Tier 3: Regex Patterns (YAML) Regex patterns in
/patternsshould be reserved for:- Simple patterns where an AST is overkill.
- Configuration files where no AST parser exists (e.g.,
.yml,.conf). - Providing baseline coverage for languages not yet supported by an AST rule.
TheAuditor uses a modular architecture. To add new analysis capabilities:
Database-Aware Rules
For rules that query across multiple files:
# theauditor/rules/category/new_analyzer.py
def find_new_issues(db_path: str) -> List[Dict[str, Any]]:
conn = sqlite3.connect(db_path)
# Query the repo_index.db
# Return findings in standard format
Example ORM analyzer:
# theauditor/rules/orm/sequelize_detector.py
def find_sequelize_issues(db_path: str) -> List[Dict[str, Any]]:
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
cursor.execute(
"SELECT file, line, query_type, includes FROM orm_queries"
)
# Analyze for N+1 queries, death queries, etc.
AST-Based Rules
For semantic code analysis:
# theauditor/rules/framework/new_detector.py
def find_framework_issues(tree: Any, file_path: str) -> List[Dict[str, Any]]:
# Traverse semantic AST
# Return findings in standard format
Pattern-Based Rules
Add YAML patterns to theauditor/patterns/:
name: insecure_api_key
severity: critical
category: security
pattern: 'api[_-]?key\s*=\s*["\'][^"\']+["\']'
description: "Hardcoded API key detected"
Testing
Write tests for any new functionality:
# Run all tests
pytest
# Run specific test file
pytest tests/test_your_feature.py
# Run with coverage
pytest --cov=theauditor
Documentation
- Update relevant documentation when making changes
- Add docstrings to new functions and classes
- Update
README.mdif adding new commands or features - Consider updating
howtouse.mdfor user-facing changes
Getting Help
- Check our TeamSOP for our development workflow
- Review CLAUDE.md for AI-assisted development guidelines
- Ask questions in GitHub Issues or Discussions
- Join our community chat (if available)
License
By contributing to TheAuditor, you agree that your contributions will be licensed under the same license as the project.
We're excited to see your contributions! Whether you're fixing bugs, adding features, or improving documentation, every contribution helps make TheAuditor better for everyone.