mirror of https://github.com/aljazceru/Auditor.git synced 2025-12-17 03:24:18 +01:00

Files

TheAuditorTool ba5c287b02 Initial commit: TheAuditor v1.0.1 - AI-centric SAST and Code Intelligence Platform

2025-09-07 20:39:47 +07:00

13 KiB

Raw Permalink Blame History

Contributing to TheAuditor

Thank you for your interest in contributing to TheAuditor! We're excited to have you join our mission to bring ground truth to AI-assisted development. This guide will help you get started with contributing to the project.

How to Get Involved

Reporting Bugs

Found a bug? Please help us fix it!

Check existing GitHub Issues to see if it's already reported
If not, create a new issue with:
- Clear description of the bug
- Steps to reproduce
- Expected vs actual behavior
- Your environment details (OS, Python version, Node.js version)

Suggesting Enhancements

Have an idea for improving TheAuditor?

Review our ROADMAP.md to see if it aligns with our vision
Check GitHub Issues for similar suggestions
Create a new issue describing:
- The problem you're trying to solve
- Your proposed solution
- Why this would benefit TheAuditor users

Setting Up Your Development Environment

Follow these steps to get TheAuditor running locally for development:

# Clone the repository
git clone https://github.com/TheAuditorTool/Auditor.git
cd theauditor

# Create a Python virtual environment
python -m venv .venv

# Activate the virtual environment
# On Linux/macOS:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

# Install TheAuditor in development mode
pip install -e .

# Optional: Install with ML capabilities
# pip install -e ".[ml]"

# For development with all optional dependencies:
# pip install -e ".[all]"

# MANDATORY: Set up the sandboxed environment
# This is required for TheAuditor to function at all
aud setup-claude --target .

The aud setup-claude --target . command creates an isolated environment at .auditor_venv/.theauditor_tools/ with all necessary JavaScript and TypeScript analysis tools. This ensures consistent, reproducible results across all development environments.

Making Changes & Submitting a Pull Request

Development Workflow

Fork the repository on GitHub

Create a feature branch from main:

git checkout -b feature/your-feature-name

Make your changes following our code standards (see below)
Write/update tests if applicable

Commit your changes with clear, descriptive messages:

git commit -m "Add GraphQL schema analyzer for type validation"

Push to your fork:

git push origin feature/your-feature-name

Create a Pull Request on GitHub with:
- Clear description of changes
- Link to any related issues
- Test results or examples

Code Standards

We use ruff for both linting and formatting Python code. Before submitting any code, you MUST run:

# Fix any auto-fixable issues and check for remaining problems
ruff check . --fix

# Format all Python code
ruff format .

Your pull request will not be merged if it fails these checks.

Additional Quality Checks

For comprehensive code quality, you can also run:

# Type checking (optional but recommended)
mypy theauditor --strict

# Run tests
pytest tests/

# Full linting suite
make lint

Code Style Guidelines

Follow PEP 8 for Python code
Use descriptive variable and function names
Add docstrings to all public functions and classes
Keep functions focused and small (under 50 lines preferred)
Write self-documenting code; minimize comments
Never commit secrets, API keys, or credentials

Adding Support for New Languages

TheAuditor's modular architecture makes it straightforward to add support for new programming languages. This section provides comprehensive guidance for contributors looking to expand our language coverage.

Overview

Adding a new language to TheAuditor involves:

Creating a parser for the language
Adding framework detection patterns
Creating security pattern rules
Writing comprehensive tests
Updating documentation

Prerequisites

Before starting, ensure you have:

Deep knowledge of the target language and its ecosystem
Understanding of common security vulnerabilities in that language
Familiarity with AST (Abstract Syntax Tree) concepts
Python development experience

Step-by-Step Guide

Step 1: Create the Language Extractor

Create a new extractor in theauditor/indexer/extractors/{language}.py that inherits from BaseExtractor:

from . import BaseExtractor

class {Language}Extractor(BaseExtractor):
    def supported_extensions(self) -> List[str]:
        """Return list of file extensions this extractor supports."""
        return ['.ext', '.ext2']
    
    def extract(self, file_info: Dict[str, Any], content: str, 
                tree: Optional[Any] = None) -> Dict[str, Any]:
        """Extract all relevant information from a file."""
        return {
            'imports': self.extract_imports(content, file_info['ext']),
            'routes': self.extract_routes(content),
            'symbols': [],  # Add symbol extraction logic
            'assignments': [],  # For taint analysis
            'function_calls': [],  # For call graph
            'returns': []  # For data flow
        }

The extractor will be automatically registered through the BaseExtractor inheritance pattern.

Step 2: Create Configuration Parser (Optional)

If your language has configuration files that need parsing, create a parser in theauditor/parsers/{language}_parser.py:

class {Language}Parser:
    def parse_file(self, file_path: Path) -> Dict[str, Any]:
        """Parse configuration file and extract security-relevant data."""
        # Parse and return structured data
        return parsed_data

Step 3: Add Framework Detection

Add your language's frameworks to theauditor/framework_registry.py:

# Add to FRAMEWORK_REGISTRY dictionary
"{framework_name}": {
    "language": "{language}",
    "detection_sources": {
        # Package manifest files
        "package.{ext}": [
            ["dependencies"],
            ["devDependencies"],
        ],
        # Or for line-based search
        "requirements.txt": "line_search",
        # Or for content search
        "build.file": "content_search",
    },
    "package_pattern": "{framework_package_name}",
    "import_patterns": ["import {framework}", "from {framework}"],
    "file_markers": ["config.{ext}", "app.{ext}"],
}

Step 4: Create Language-Specific Patterns

Create security patterns for your language in theauditor/patterns/{language}.yml:

Example pattern structure:

- name: hardcoded-secret-{language}
  pattern: '(api[_-]?key|secret|token|password)\s*=\s*["\'][^"\']+["\']'
  severity: critical
  category: security
  languages: ["{language}"]
  description: "Hardcoded secret detected in {Language} code"
  cwe: CWE-798

Step 5: Create AST-Based Rules (Optional but Recommended)

For complex security patterns, create AST-based rules in theauditor/rules/{language}/:

"""Security rules for {Language} using AST analysis."""

from typing import Any, Dict, List

def find_{vulnerability}_issues(ast_tree: Any, file_path: str) -> List[Dict[str, Any]]:
    """Find {vulnerability} issues in {Language} code.
    
    Args:
        ast_tree: Parsed AST from {language}_parser
        file_path: Path to the source file
        
    Returns:
        List of findings with standard format
    """
    findings = []
    
    # Implement AST traversal and pattern detection
    for node in walk_ast(ast_tree):
        if is_vulnerable_pattern(node):
            findings.append({
                'pattern_name': '{VULNERABILITY}_ISSUE',
                'message': 'Detailed description of the issue',
                'file': file_path,
                'line': node.line,
                'column': node.column,
                'severity': 'high',
                'snippet': extract_snippet(node),
                'category': 'security',
                'match_type': 'ast'
            })
    
    return findings

Extractor Interface Specification

All language extractors MUST inherit from BaseExtractor and implement:

from theauditor.indexer.extractors import BaseExtractor

class LanguageExtractor(BaseExtractor):
    """Extractor for {Language} files."""
    
    def supported_extensions(self) -> List[str]:
        """Return list of supported file extensions."""
        return ['.ext']
    
    def extract(self, file_info: Dict[str, Any], content: str, 
                tree: Optional[Any] = None) -> Dict[str, Any]:
        """Extract all relevant information from a file."""
        return {
            'imports': [],
            'routes': [],
            'symbols': [],
            'assignments': [],
            'function_calls': [],
            'returns': []
        }

Testing Requirements

Required Test Coverage

Extractor Tests (tests/test_{language}_extractor.py):
- Test extracting from valid files
- Test handling of syntax errors
- Test symbol extraction
- Test import extraction
- Test file extension detection
Pattern Tests (tests/patterns/test_{language}_patterns.py):
- Test security pattern detection
- Ensure patterns don't over-match (false positives)
Integration Tests (tests/integration/test_{language}_integration.py):
- Test language in complete analysis pipeline

Test Data

Create test fixtures in tests/fixtures/{language}/:

valid_code.{ext} - Valid code samples
vulnerable_code.{ext} - Code with known vulnerabilities
edge_cases.{ext} - Edge cases and corner scenarios

Submission Checklist

Before submitting your PR, ensure:

Extractor inherits from BaseExtractor and implements required methods
Extractor placed in theauditor/indexer/extractors/{language}.py
Framework detection added to framework_detector.py (if applicable)
At least 10 security patterns created in patterns/{language}.yml
AST-based rules for complex patterns (if applicable)
All tests passing with >80% coverage
Documentation updated (extractor docstrings, pattern descriptions)
Example vulnerable code provided in test fixtures
No external dependencies without approval
Code follows project style (run ruff format)

Adding New Analyzers

The Three-Tier Detection Architecture

TheAuditor uses a hybrid approach to detection, prioritizing accuracy and context. When contributing a new rule, please adhere to the following "AST First, Regex as Fallback" philosophy:

Tier 1: Multi-Language AST Rules (Preferred) For complex code patterns in source code (Python, JS/TS, etc.), extend or create a polymorphic AST-based rule in the /rules directory. These are the most powerful and accurate and should be the default choice for source code analysis.
Tier 2: Language-Specific AST Rules If a multi-language backend is not feasible, a language-specific AST rule is the next best option. The corresponding regex pattern should then be scoped to exclude the language covered by the AST rule (see db_issues.yml for an example).
Tier 3: Regex Patterns (YAML) Regex patterns in /patterns should be reserved for:
1. Simple patterns where an AST is overkill.
2. Configuration files where no AST parser exists (e.g., .yml, .conf).
3. Providing baseline coverage for languages not yet supported by an AST rule.

TheAuditor uses a modular architecture. To add new analysis capabilities:

Database-Aware Rules

For rules that query across multiple files:

# theauditor/rules/category/new_analyzer.py
def find_new_issues(db_path: str) -> List[Dict[str, Any]]:
    conn = sqlite3.connect(db_path)
    # Query the repo_index.db
    # Return findings in standard format

Example ORM analyzer:

# theauditor/rules/orm/sequelize_detector.py
def find_sequelize_issues(db_path: str) -> List[Dict[str, Any]]:
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    cursor.execute(
        "SELECT file, line, query_type, includes FROM orm_queries"
    )
    # Analyze for N+1 queries, death queries, etc.

AST-Based Rules

For semantic code analysis:

# theauditor/rules/framework/new_detector.py
def find_framework_issues(tree: Any, file_path: str) -> List[Dict[str, Any]]:
    # Traverse semantic AST
    # Return findings in standard format

Pattern-Based Rules

Add YAML patterns to theauditor/patterns/:

name: insecure_api_key
severity: critical
category: security
pattern: 'api[_-]?key\s*=\s*["\'][^"\']+["\']'
description: "Hardcoded API key detected"

Testing

Write tests for any new functionality:

# Run all tests
pytest

# Run specific test file
pytest tests/test_your_feature.py

# Run with coverage
pytest --cov=theauditor

Documentation

Update relevant documentation when making changes
Add docstrings to new functions and classes
Update README.md if adding new commands or features
Consider updating howtouse.md for user-facing changes

Getting Help

Check our TeamSOP for our development workflow
Review CLAUDE.md for AI-assisted development guidelines
Ask questions in GitHub Issues or Discussions
Join our community chat (if available)

License

By contributing to TheAuditor, you agree that your contributions will be licensed under the same license as the project.

We're excited to see your contributions! Whether you're fixing bugs, adding features, or improving documentation, every contribution helps make TheAuditor better for everyone.

13 KiB Raw Permalink Blame History