Files
Auditor/CONTRIBUTING.md

13 KiB

Contributing to TheAuditor

Thank you for your interest in contributing to TheAuditor! We're excited to have you join our mission to bring ground truth to AI-assisted development. This guide will help you get started with contributing to the project.

How to Get Involved

Reporting Bugs

Found a bug? Please help us fix it!

  1. Check existing GitHub Issues to see if it's already reported
  2. If not, create a new issue with:
    • Clear description of the bug
    • Steps to reproduce
    • Expected vs actual behavior
    • Your environment details (OS, Python version, Node.js version)

Suggesting Enhancements

Have an idea for improving TheAuditor?

  1. Review our ROADMAP.md to see if it aligns with our vision
  2. Check GitHub Issues for similar suggestions
  3. Create a new issue describing:
    • The problem you're trying to solve
    • Your proposed solution
    • Why this would benefit TheAuditor users

Setting Up Your Development Environment

Follow these steps to get TheAuditor running locally for development:

# Clone the repository
git clone https://github.com/TheAuditorTool/Auditor.git
cd theauditor

# Create a Python virtual environment
python -m venv .venv

# Activate the virtual environment
# On Linux/macOS:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

# Install TheAuditor in development mode
pip install -e .

# Optional: Install with ML capabilities
# pip install -e ".[ml]"

# For development with all optional dependencies:
# pip install -e ".[all]"

# MANDATORY: Set up the sandboxed environment
# This is required for TheAuditor to function at all
aud setup-claude --target .

The aud setup-claude --target . command creates an isolated environment at .auditor_venv/.theauditor_tools/ with all necessary JavaScript and TypeScript analysis tools. This ensures consistent, reproducible results across all development environments.

Making Changes & Submitting a Pull Request

Development Workflow

  1. Fork the repository on GitHub
  2. Create a feature branch from main:
    git checkout -b feature/your-feature-name
    
  3. Make your changes following our code standards (see below)
  4. Write/update tests if applicable
  5. Commit your changes with clear, descriptive messages:
    git commit -m "Add GraphQL schema analyzer for type validation"
    
  6. Push to your fork:
    git push origin feature/your-feature-name
    
  7. Create a Pull Request on GitHub with:
    • Clear description of changes
    • Link to any related issues
    • Test results or examples

Code Standards

We use ruff for both linting and formatting Python code. Before submitting any code, you MUST run:

# Fix any auto-fixable issues and check for remaining problems
ruff check . --fix

# Format all Python code
ruff format .

Your pull request will not be merged if it fails these checks.

Additional Quality Checks

For comprehensive code quality, you can also run:

# Type checking (optional but recommended)
mypy theauditor --strict

# Run tests
pytest tests/

# Full linting suite
make lint

Code Style Guidelines

  • Follow PEP 8 for Python code
  • Use descriptive variable and function names
  • Add docstrings to all public functions and classes
  • Keep functions focused and small (under 50 lines preferred)
  • Write self-documenting code; minimize comments
  • Never commit secrets, API keys, or credentials

Adding Support for New Languages

TheAuditor's modular architecture makes it straightforward to add support for new programming languages. This section provides comprehensive guidance for contributors looking to expand our language coverage.

Overview

Adding a new language to TheAuditor involves:

  • Creating a parser for the language
  • Adding framework detection patterns
  • Creating security pattern rules
  • Writing comprehensive tests
  • Updating documentation

Prerequisites

Before starting, ensure you have:

  • Deep knowledge of the target language and its ecosystem
  • Understanding of common security vulnerabilities in that language
  • Familiarity with AST (Abstract Syntax Tree) concepts
  • Python development experience

Step-by-Step Guide

Step 1: Create the Language Extractor

Create a new extractor in theauditor/indexer/extractors/{language}.py that inherits from BaseExtractor:

from . import BaseExtractor

class {Language}Extractor(BaseExtractor):
    def supported_extensions(self) -> List[str]:
        """Return list of file extensions this extractor supports."""
        return ['.ext', '.ext2']
    
    def extract(self, file_info: Dict[str, Any], content: str, 
                tree: Optional[Any] = None) -> Dict[str, Any]:
        """Extract all relevant information from a file."""
        return {
            'imports': self.extract_imports(content, file_info['ext']),
            'routes': self.extract_routes(content),
            'symbols': [],  # Add symbol extraction logic
            'assignments': [],  # For taint analysis
            'function_calls': [],  # For call graph
            'returns': []  # For data flow
        }

The extractor will be automatically registered through the BaseExtractor inheritance pattern.

Step 2: Create Configuration Parser (Optional)

If your language has configuration files that need parsing, create a parser in theauditor/parsers/{language}_parser.py:

class {Language}Parser:
    def parse_file(self, file_path: Path) -> Dict[str, Any]:
        """Parse configuration file and extract security-relevant data."""
        # Parse and return structured data
        return parsed_data

Step 3: Add Framework Detection

Add your language's frameworks to theauditor/framework_registry.py:

# Add to FRAMEWORK_REGISTRY dictionary
"{framework_name}": {
    "language": "{language}",
    "detection_sources": {
        # Package manifest files
        "package.{ext}": [
            ["dependencies"],
            ["devDependencies"],
        ],
        # Or for line-based search
        "requirements.txt": "line_search",
        # Or for content search
        "build.file": "content_search",
    },
    "package_pattern": "{framework_package_name}",
    "import_patterns": ["import {framework}", "from {framework}"],
    "file_markers": ["config.{ext}", "app.{ext}"],
}

Step 4: Create Language-Specific Patterns

Create security patterns for your language in theauditor/patterns/{language}.yml:

Example pattern structure:

- name: hardcoded-secret-{language}
  pattern: '(api[_-]?key|secret|token|password)\s*=\s*["\'][^"\']+["\']'
  severity: critical
  category: security
  languages: ["{language}"]
  description: "Hardcoded secret detected in {Language} code"
  cwe: CWE-798

For complex security patterns, create AST-based rules in theauditor/rules/{language}/:

"""Security rules for {Language} using AST analysis."""

from typing import Any, Dict, List

def find_{vulnerability}_issues(ast_tree: Any, file_path: str) -> List[Dict[str, Any]]:
    """Find {vulnerability} issues in {Language} code.
    
    Args:
        ast_tree: Parsed AST from {language}_parser
        file_path: Path to the source file
        
    Returns:
        List of findings with standard format
    """
    findings = []
    
    # Implement AST traversal and pattern detection
    for node in walk_ast(ast_tree):
        if is_vulnerable_pattern(node):
            findings.append({
                'pattern_name': '{VULNERABILITY}_ISSUE',
                'message': 'Detailed description of the issue',
                'file': file_path,
                'line': node.line,
                'column': node.column,
                'severity': 'high',
                'snippet': extract_snippet(node),
                'category': 'security',
                'match_type': 'ast'
            })
    
    return findings

Extractor Interface Specification

All language extractors MUST inherit from BaseExtractor and implement:

from theauditor.indexer.extractors import BaseExtractor

class LanguageExtractor(BaseExtractor):
    """Extractor for {Language} files."""
    
    def supported_extensions(self) -> List[str]:
        """Return list of supported file extensions."""
        return ['.ext']
    
    def extract(self, file_info: Dict[str, Any], content: str, 
                tree: Optional[Any] = None) -> Dict[str, Any]:
        """Extract all relevant information from a file."""
        return {
            'imports': [],
            'routes': [],
            'symbols': [],
            'assignments': [],
            'function_calls': [],
            'returns': []
        }

Testing Requirements

Required Test Coverage

  1. Extractor Tests (tests/test_{language}_extractor.py):

    • Test extracting from valid files
    • Test handling of syntax errors
    • Test symbol extraction
    • Test import extraction
    • Test file extension detection
  2. Pattern Tests (tests/patterns/test_{language}_patterns.py):

    • Test security pattern detection
    • Ensure patterns don't over-match (false positives)
  3. Integration Tests (tests/integration/test_{language}_integration.py):

    • Test language in complete analysis pipeline

Test Data

Create test fixtures in tests/fixtures/{language}/:

  • valid_code.{ext} - Valid code samples
  • vulnerable_code.{ext} - Code with known vulnerabilities
  • edge_cases.{ext} - Edge cases and corner scenarios

Submission Checklist

Before submitting your PR, ensure:

  • Extractor inherits from BaseExtractor and implements required methods
  • Extractor placed in theauditor/indexer/extractors/{language}.py
  • Framework detection added to framework_detector.py (if applicable)
  • At least 10 security patterns created in patterns/{language}.yml
  • AST-based rules for complex patterns (if applicable)
  • All tests passing with >80% coverage
  • Documentation updated (extractor docstrings, pattern descriptions)
  • Example vulnerable code provided in test fixtures
  • No external dependencies without approval
  • Code follows project style (run ruff format)

Adding New Analyzers

The Three-Tier Detection Architecture

TheAuditor uses a hybrid approach to detection, prioritizing accuracy and context. When contributing a new rule, please adhere to the following "AST First, Regex as Fallback" philosophy:

  • Tier 1: Multi-Language AST Rules (Preferred) For complex code patterns in source code (Python, JS/TS, etc.), extend or create a polymorphic AST-based rule in the /rules directory. These are the most powerful and accurate and should be the default choice for source code analysis.

  • Tier 2: Language-Specific AST Rules If a multi-language backend is not feasible, a language-specific AST rule is the next best option. The corresponding regex pattern should then be scoped to exclude the language covered by the AST rule (see db_issues.yml for an example).

  • Tier 3: Regex Patterns (YAML) Regex patterns in /patterns should be reserved for:

    1. Simple patterns where an AST is overkill.
    2. Configuration files where no AST parser exists (e.g., .yml, .conf).
    3. Providing baseline coverage for languages not yet supported by an AST rule.

TheAuditor uses a modular architecture. To add new analysis capabilities:

Database-Aware Rules

For rules that query across multiple files:

# theauditor/rules/category/new_analyzer.py
def find_new_issues(db_path: str) -> List[Dict[str, Any]]:
    conn = sqlite3.connect(db_path)
    # Query the repo_index.db
    # Return findings in standard format

Example ORM analyzer:

# theauditor/rules/orm/sequelize_detector.py
def find_sequelize_issues(db_path: str) -> List[Dict[str, Any]]:
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    cursor.execute(
        "SELECT file, line, query_type, includes FROM orm_queries"
    )
    # Analyze for N+1 queries, death queries, etc.

AST-Based Rules

For semantic code analysis:

# theauditor/rules/framework/new_detector.py
def find_framework_issues(tree: Any, file_path: str) -> List[Dict[str, Any]]:
    # Traverse semantic AST
    # Return findings in standard format

Pattern-Based Rules

Add YAML patterns to theauditor/patterns/:

name: insecure_api_key
severity: critical
category: security
pattern: 'api[_-]?key\s*=\s*["\'][^"\']+["\']'
description: "Hardcoded API key detected"

Testing

Write tests for any new functionality:

# Run all tests
pytest

# Run specific test file
pytest tests/test_your_feature.py

# Run with coverage
pytest --cov=theauditor

Documentation

  • Update relevant documentation when making changes
  • Add docstrings to new functions and classes
  • Update README.md if adding new commands or features
  • Consider updating howtouse.md for user-facing changes

Getting Help

  • Check our TeamSOP for our development workflow
  • Review CLAUDE.md for AI-assisted development guidelines
  • Ask questions in GitHub Issues or Discussions
  • Join our community chat (if available)

License

By contributing to TheAuditor, you agree that your contributions will be licensed under the same license as the project.


We're excited to see your contributions! Whether you're fixing bugs, adding features, or improving documentation, every contribution helps make TheAuditor better for everyone.