# Contributing to TheAuditor Thank you for your interest in contributing to TheAuditor! We're excited to have you join our mission to bring ground truth to AI-assisted development. This guide will help you get started with contributing to the project. ## How to Get Involved ### Reporting Bugs Found a bug? Please help us fix it! 1. Check existing [GitHub Issues](https://github.com/TheAuditorTool/Auditor/issues) to see if it's already reported 2. If not, create a new issue with: - Clear description of the bug - Steps to reproduce - Expected vs actual behavior - Your environment details (OS, Python version, Node.js version) ### Suggesting Enhancements Have an idea for improving TheAuditor? 1. Review our [ROADMAP.md](ROADMAP.md) to see if it aligns with our vision 2. Check [GitHub Issues](https://github.com/TheAuditorTool/Auditor/issues) for similar suggestions 3. Create a new issue describing: - The problem you're trying to solve - Your proposed solution - Why this would benefit TheAuditor users ## Setting Up Your Development Environment Follow these steps to get TheAuditor running locally for development: ```bash # Clone the repository git clone https://github.com/TheAuditorTool/Auditor.git cd theauditor # Create a Python virtual environment python -m venv .venv # Activate the virtual environment # On Linux/macOS: source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install TheAuditor in development mode pip install -e . # Optional: Install with ML capabilities # pip install -e ".[ml]" # For development with all optional dependencies: # pip install -e ".[all]" # MANDATORY: Set up the sandboxed environment # This is required for TheAuditor to function at all aud setup-claude --target . ``` The `aud setup-claude --target .` command creates an isolated environment at `.auditor_venv/.theauditor_tools/` with all necessary JavaScript and TypeScript analysis tools. This ensures consistent, reproducible results across all development environments. ## Making Changes & Submitting a Pull Request ### Development Workflow 1. **Fork the repository** on GitHub 2. **Create a feature branch** from `main`: ```bash git checkout -b feature/your-feature-name ``` 3. **Make your changes** following our code standards (see below) 4. **Write/update tests** if applicable 5. **Commit your changes** with clear, descriptive messages: ```bash git commit -m "Add GraphQL schema analyzer for type validation" ``` 6. **Push to your fork**: ```bash git push origin feature/your-feature-name ``` 7. **Create a Pull Request** on GitHub with: - Clear description of changes - Link to any related issues - Test results or examples ## Code Standards We use **ruff** for both linting and formatting Python code. Before submitting any code, you MUST run: ```bash # Fix any auto-fixable issues and check for remaining problems ruff check . --fix # Format all Python code ruff format . ``` Your pull request will not be merged if it fails these checks. ### Additional Quality Checks For comprehensive code quality, you can also run: ```bash # Type checking (optional but recommended) mypy theauditor --strict # Run tests pytest tests/ # Full linting suite make lint ``` ### Code Style Guidelines - Follow PEP 8 for Python code - Use descriptive variable and function names - Add docstrings to all public functions and classes - Keep functions focused and small (under 50 lines preferred) - Write self-documenting code; minimize comments - Never commit secrets, API keys, or credentials ## Adding Support for New Languages TheAuditor's modular architecture makes it straightforward to add support for new programming languages. This section provides comprehensive guidance for contributors looking to expand our language coverage. ### Overview Adding a new language to TheAuditor involves: - Creating a parser for the language - Adding framework detection patterns - Creating security pattern rules - Writing comprehensive tests - Updating documentation ### Prerequisites Before starting, ensure you have: - Deep knowledge of the target language and its ecosystem - Understanding of common security vulnerabilities in that language - Familiarity with AST (Abstract Syntax Tree) concepts - Python development experience ### Step-by-Step Guide #### Step 1: Create the Language Extractor Create a new extractor in `theauditor/indexer/extractors/{language}.py` that inherits from `BaseExtractor`: ```python from . import BaseExtractor class {Language}Extractor(BaseExtractor): def supported_extensions(self) -> List[str]: """Return list of file extensions this extractor supports.""" return ['.ext', '.ext2'] def extract(self, file_info: Dict[str, Any], content: str, tree: Optional[Any] = None) -> Dict[str, Any]: """Extract all relevant information from a file.""" return { 'imports': self.extract_imports(content, file_info['ext']), 'routes': self.extract_routes(content), 'symbols': [], # Add symbol extraction logic 'assignments': [], # For taint analysis 'function_calls': [], # For call graph 'returns': [] # For data flow } ``` The extractor will be automatically registered through the `BaseExtractor` inheritance pattern. #### Step 2: Create Configuration Parser (Optional) If your language has configuration files that need parsing, create a parser in `theauditor/parsers/{language}_parser.py`: ```python class {Language}Parser: def parse_file(self, file_path: Path) -> Dict[str, Any]: """Parse configuration file and extract security-relevant data.""" # Parse and return structured data return parsed_data ``` #### Step 3: Add Framework Detection Add your language's frameworks to `theauditor/framework_registry.py`: ```python # Add to FRAMEWORK_REGISTRY dictionary "{framework_name}": { "language": "{language}", "detection_sources": { # Package manifest files "package.{ext}": [ ["dependencies"], ["devDependencies"], ], # Or for line-based search "requirements.txt": "line_search", # Or for content search "build.file": "content_search", }, "package_pattern": "{framework_package_name}", "import_patterns": ["import {framework}", "from {framework}"], "file_markers": ["config.{ext}", "app.{ext}"], } ``` #### Step 4: Create Language-Specific Patterns Create security patterns for your language in `theauditor/patterns/{language}.yml`: Example pattern structure: ```yaml - name: hardcoded-secret-{language} pattern: '(api[_-]?key|secret|token|password)\s*=\s*["\'][^"\']+["\']' severity: critical category: security languages: ["{language}"] description: "Hardcoded secret detected in {Language} code" cwe: CWE-798 ``` #### Step 5: Create AST-Based Rules (Optional but Recommended) For complex security patterns, create AST-based rules in `theauditor/rules/{language}/`: ```python """Security rules for {Language} using AST analysis.""" from typing import Any, Dict, List def find_{vulnerability}_issues(ast_tree: Any, file_path: str) -> List[Dict[str, Any]]: """Find {vulnerability} issues in {Language} code. Args: ast_tree: Parsed AST from {language}_parser file_path: Path to the source file Returns: List of findings with standard format """ findings = [] # Implement AST traversal and pattern detection for node in walk_ast(ast_tree): if is_vulnerable_pattern(node): findings.append({ 'pattern_name': '{VULNERABILITY}_ISSUE', 'message': 'Detailed description of the issue', 'file': file_path, 'line': node.line, 'column': node.column, 'severity': 'high', 'snippet': extract_snippet(node), 'category': 'security', 'match_type': 'ast' }) return findings ``` ### Extractor Interface Specification All language extractors MUST inherit from `BaseExtractor` and implement: ```python from theauditor.indexer.extractors import BaseExtractor class LanguageExtractor(BaseExtractor): """Extractor for {Language} files.""" def supported_extensions(self) -> List[str]: """Return list of supported file extensions.""" return ['.ext'] def extract(self, file_info: Dict[str, Any], content: str, tree: Optional[Any] = None) -> Dict[str, Any]: """Extract all relevant information from a file.""" return { 'imports': [], 'routes': [], 'symbols': [], 'assignments': [], 'function_calls': [], 'returns': [] } ``` ### Testing Requirements #### Required Test Coverage 1. **Extractor Tests** (`tests/test_{language}_extractor.py`): - Test extracting from valid files - Test handling of syntax errors - Test symbol extraction - Test import extraction - Test file extension detection 2. **Pattern Tests** (`tests/patterns/test_{language}_patterns.py`): - Test security pattern detection - Ensure patterns don't over-match (false positives) 3. **Integration Tests** (`tests/integration/test_{language}_integration.py`): - Test language in complete analysis pipeline #### Test Data Create test fixtures in `tests/fixtures/{language}/`: - `valid_code.{ext}` - Valid code samples - `vulnerable_code.{ext}` - Code with known vulnerabilities - `edge_cases.{ext}` - Edge cases and corner scenarios ### Submission Checklist Before submitting your PR, ensure: - [ ] Extractor inherits from `BaseExtractor` and implements required methods - [ ] Extractor placed in `theauditor/indexer/extractors/{language}.py` - [ ] Framework detection added to `framework_detector.py` (if applicable) - [ ] At least 10 security patterns created in `patterns/{language}.yml` - [ ] AST-based rules for complex patterns (if applicable) - [ ] All tests passing with >80% coverage - [ ] Documentation updated (extractor docstrings, pattern descriptions) - [ ] Example vulnerable code provided in test fixtures - [ ] No external dependencies without approval - [ ] Code follows project style (run `ruff format`) ## Adding New Analyzers ### The Three-Tier Detection Architecture TheAuditor uses a hybrid approach to detection, prioritizing accuracy and context. When contributing a new rule, please adhere to the following "AST First, Regex as Fallback" philosophy: - **Tier 1: Multi-Language AST Rules (Preferred)** For complex code patterns in source code (Python, JS/TS, etc.), extend or create a polymorphic AST-based rule in the `/rules` directory. These are the most powerful and accurate and should be the default choice for source code analysis. - **Tier 2: Language-Specific AST Rules** If a multi-language backend is not feasible, a language-specific AST rule is the next best option. The corresponding regex pattern should then be scoped to exclude the language covered by the AST rule (see `db_issues.yml` for an example). - **Tier 3: Regex Patterns (YAML)** Regex patterns in `/patterns` should be reserved for: 1. Simple patterns where an AST is overkill. 2. Configuration files where no AST parser exists (e.g., `.yml`, `.conf`). 3. Providing baseline coverage for languages not yet supported by an AST rule. TheAuditor uses a modular architecture. To add new analysis capabilities: ### Database-Aware Rules For rules that query across multiple files: ```python # theauditor/rules/category/new_analyzer.py def find_new_issues(db_path: str) -> List[Dict[str, Any]]: conn = sqlite3.connect(db_path) # Query the repo_index.db # Return findings in standard format ``` Example ORM analyzer: ```python # theauditor/rules/orm/sequelize_detector.py def find_sequelize_issues(db_path: str) -> List[Dict[str, Any]]: conn = sqlite3.connect(db_path) cursor = conn.cursor() cursor.execute( "SELECT file, line, query_type, includes FROM orm_queries" ) # Analyze for N+1 queries, death queries, etc. ``` ### AST-Based Rules For semantic code analysis: ```python # theauditor/rules/framework/new_detector.py def find_framework_issues(tree: Any, file_path: str) -> List[Dict[str, Any]]: # Traverse semantic AST # Return findings in standard format ``` ### Pattern-Based Rules Add YAML patterns to `theauditor/patterns/`: ```yaml name: insecure_api_key severity: critical category: security pattern: 'api[_-]?key\s*=\s*["\'][^"\']+["\']' description: "Hardcoded API key detected" ``` ## Testing Write tests for any new functionality: ```bash # Run all tests pytest # Run specific test file pytest tests/test_your_feature.py # Run with coverage pytest --cov=theauditor ``` ## Documentation - Update relevant documentation when making changes - Add docstrings to new functions and classes - Update `README.md` if adding new commands or features - Consider updating `howtouse.md` for user-facing changes ## Getting Help - Check our [TeamSOP](teamsop.md) for our development workflow - Review [CLAUDE.md](CLAUDE.md) for AI-assisted development guidelines - Ask questions in GitHub Issues or Discussions - Join our community chat (if available) ## License By contributing to TheAuditor, you agree that your contributions will be licensed under the same license as the project. --- We're excited to see your contributions! Whether you're fixing bugs, adding features, or improving documentation, every contribution helps make TheAuditor better for everyone.