mirror of
https://github.com/aljazceru/Auditor.git
synced 2025-12-17 03:24:18 +01:00
429 lines
13 KiB
Markdown
429 lines
13 KiB
Markdown
# Contributing to TheAuditor
|
|
|
|
Thank you for your interest in contributing to TheAuditor! We're excited to have you join our mission to bring ground truth to AI-assisted development. This guide will help you get started with contributing to the project.
|
|
|
|
## How to Get Involved
|
|
|
|
### Reporting Bugs
|
|
|
|
Found a bug? Please help us fix it!
|
|
|
|
1. Check existing [GitHub Issues](https://github.com/TheAuditorTool/Auditor/issues) to see if it's already reported
|
|
2. If not, create a new issue with:
|
|
- Clear description of the bug
|
|
- Steps to reproduce
|
|
- Expected vs actual behavior
|
|
- Your environment details (OS, Python version, Node.js version)
|
|
|
|
### Suggesting Enhancements
|
|
|
|
Have an idea for improving TheAuditor?
|
|
|
|
1. Review our [ROADMAP.md](ROADMAP.md) to see if it aligns with our vision
|
|
2. Check [GitHub Issues](https://github.com/TheAuditorTool/Auditor/issues) for similar suggestions
|
|
3. Create a new issue describing:
|
|
- The problem you're trying to solve
|
|
- Your proposed solution
|
|
- Why this would benefit TheAuditor users
|
|
|
|
## Setting Up Your Development Environment
|
|
|
|
Follow these steps to get TheAuditor running locally for development:
|
|
|
|
```bash
|
|
# Clone the repository
|
|
git clone https://github.com/TheAuditorTool/Auditor.git
|
|
cd theauditor
|
|
|
|
# Create a Python virtual environment
|
|
python -m venv .venv
|
|
|
|
# Activate the virtual environment
|
|
# On Linux/macOS:
|
|
source .venv/bin/activate
|
|
# On Windows:
|
|
.venv\Scripts\activate
|
|
|
|
# Install TheAuditor in development mode
|
|
pip install -e .
|
|
|
|
# Optional: Install with ML capabilities
|
|
# pip install -e ".[ml]"
|
|
|
|
# For development with all optional dependencies:
|
|
# pip install -e ".[all]"
|
|
|
|
# MANDATORY: Set up the sandboxed environment
|
|
# This is required for TheAuditor to function at all
|
|
aud setup-claude --target .
|
|
```
|
|
|
|
The `aud setup-claude --target .` command creates an isolated environment at `.auditor_venv/.theauditor_tools/` with all necessary JavaScript and TypeScript analysis tools. This ensures consistent, reproducible results across all development environments.
|
|
|
|
## Making Changes & Submitting a Pull Request
|
|
|
|
### Development Workflow
|
|
|
|
1. **Fork the repository** on GitHub
|
|
2. **Create a feature branch** from `main`:
|
|
```bash
|
|
git checkout -b feature/your-feature-name
|
|
```
|
|
3. **Make your changes** following our code standards (see below)
|
|
4. **Write/update tests** if applicable
|
|
5. **Commit your changes** with clear, descriptive messages:
|
|
```bash
|
|
git commit -m "Add GraphQL schema analyzer for type validation"
|
|
```
|
|
6. **Push to your fork**:
|
|
```bash
|
|
git push origin feature/your-feature-name
|
|
```
|
|
7. **Create a Pull Request** on GitHub with:
|
|
- Clear description of changes
|
|
- Link to any related issues
|
|
- Test results or examples
|
|
|
|
## Code Standards
|
|
|
|
We use **ruff** for both linting and formatting Python code. Before submitting any code, you MUST run:
|
|
|
|
```bash
|
|
# Fix any auto-fixable issues and check for remaining problems
|
|
ruff check . --fix
|
|
|
|
# Format all Python code
|
|
ruff format .
|
|
```
|
|
|
|
Your pull request will not be merged if it fails these checks.
|
|
|
|
### Additional Quality Checks
|
|
|
|
For comprehensive code quality, you can also run:
|
|
|
|
```bash
|
|
# Type checking (optional but recommended)
|
|
mypy theauditor --strict
|
|
|
|
# Run tests
|
|
pytest tests/
|
|
|
|
# Full linting suite
|
|
make lint
|
|
```
|
|
|
|
### Code Style Guidelines
|
|
|
|
- Follow PEP 8 for Python code
|
|
- Use descriptive variable and function names
|
|
- Add docstrings to all public functions and classes
|
|
- Keep functions focused and small (under 50 lines preferred)
|
|
- Write self-documenting code; minimize comments
|
|
- Never commit secrets, API keys, or credentials
|
|
|
|
## Adding Support for New Languages
|
|
|
|
TheAuditor's modular architecture makes it straightforward to add support for new programming languages. This section provides comprehensive guidance for contributors looking to expand our language coverage.
|
|
|
|
### Overview
|
|
|
|
Adding a new language to TheAuditor involves:
|
|
- Creating a parser for the language
|
|
- Adding framework detection patterns
|
|
- Creating security pattern rules
|
|
- Writing comprehensive tests
|
|
- Updating documentation
|
|
|
|
### Prerequisites
|
|
|
|
Before starting, ensure you have:
|
|
- Deep knowledge of the target language and its ecosystem
|
|
- Understanding of common security vulnerabilities in that language
|
|
- Familiarity with AST (Abstract Syntax Tree) concepts
|
|
- Python development experience
|
|
|
|
### Step-by-Step Guide
|
|
|
|
#### Step 1: Create the Language Extractor
|
|
|
|
Create a new extractor in `theauditor/indexer/extractors/{language}.py` that inherits from `BaseExtractor`:
|
|
|
|
```python
|
|
from . import BaseExtractor
|
|
|
|
class {Language}Extractor(BaseExtractor):
|
|
def supported_extensions(self) -> List[str]:
|
|
"""Return list of file extensions this extractor supports."""
|
|
return ['.ext', '.ext2']
|
|
|
|
def extract(self, file_info: Dict[str, Any], content: str,
|
|
tree: Optional[Any] = None) -> Dict[str, Any]:
|
|
"""Extract all relevant information from a file."""
|
|
return {
|
|
'imports': self.extract_imports(content, file_info['ext']),
|
|
'routes': self.extract_routes(content),
|
|
'symbols': [], # Add symbol extraction logic
|
|
'assignments': [], # For taint analysis
|
|
'function_calls': [], # For call graph
|
|
'returns': [] # For data flow
|
|
}
|
|
```
|
|
|
|
The extractor will be automatically registered through the `BaseExtractor` inheritance pattern.
|
|
|
|
#### Step 2: Create Configuration Parser (Optional)
|
|
|
|
If your language has configuration files that need parsing, create a parser in `theauditor/parsers/{language}_parser.py`:
|
|
|
|
```python
|
|
class {Language}Parser:
|
|
def parse_file(self, file_path: Path) -> Dict[str, Any]:
|
|
"""Parse configuration file and extract security-relevant data."""
|
|
# Parse and return structured data
|
|
return parsed_data
|
|
```
|
|
|
|
#### Step 3: Add Framework Detection
|
|
|
|
Add your language's frameworks to `theauditor/framework_registry.py`:
|
|
|
|
```python
|
|
# Add to FRAMEWORK_REGISTRY dictionary
|
|
"{framework_name}": {
|
|
"language": "{language}",
|
|
"detection_sources": {
|
|
# Package manifest files
|
|
"package.{ext}": [
|
|
["dependencies"],
|
|
["devDependencies"],
|
|
],
|
|
# Or for line-based search
|
|
"requirements.txt": "line_search",
|
|
# Or for content search
|
|
"build.file": "content_search",
|
|
},
|
|
"package_pattern": "{framework_package_name}",
|
|
"import_patterns": ["import {framework}", "from {framework}"],
|
|
"file_markers": ["config.{ext}", "app.{ext}"],
|
|
}
|
|
```
|
|
|
|
#### Step 4: Create Language-Specific Patterns
|
|
|
|
Create security patterns for your language in `theauditor/patterns/{language}.yml`:
|
|
|
|
Example pattern structure:
|
|
```yaml
|
|
- name: hardcoded-secret-{language}
|
|
pattern: '(api[_-]?key|secret|token|password)\s*=\s*["\'][^"\']+["\']'
|
|
severity: critical
|
|
category: security
|
|
languages: ["{language}"]
|
|
description: "Hardcoded secret detected in {Language} code"
|
|
cwe: CWE-798
|
|
```
|
|
|
|
#### Step 5: Create AST-Based Rules (Optional but Recommended)
|
|
|
|
For complex security patterns, create AST-based rules in `theauditor/rules/{language}/`:
|
|
|
|
```python
|
|
"""Security rules for {Language} using AST analysis."""
|
|
|
|
from typing import Any, Dict, List
|
|
|
|
def find_{vulnerability}_issues(ast_tree: Any, file_path: str) -> List[Dict[str, Any]]:
|
|
"""Find {vulnerability} issues in {Language} code.
|
|
|
|
Args:
|
|
ast_tree: Parsed AST from {language}_parser
|
|
file_path: Path to the source file
|
|
|
|
Returns:
|
|
List of findings with standard format
|
|
"""
|
|
findings = []
|
|
|
|
# Implement AST traversal and pattern detection
|
|
for node in walk_ast(ast_tree):
|
|
if is_vulnerable_pattern(node):
|
|
findings.append({
|
|
'pattern_name': '{VULNERABILITY}_ISSUE',
|
|
'message': 'Detailed description of the issue',
|
|
'file': file_path,
|
|
'line': node.line,
|
|
'column': node.column,
|
|
'severity': 'high',
|
|
'snippet': extract_snippet(node),
|
|
'category': 'security',
|
|
'match_type': 'ast'
|
|
})
|
|
|
|
return findings
|
|
```
|
|
|
|
### Extractor Interface Specification
|
|
|
|
All language extractors MUST inherit from `BaseExtractor` and implement:
|
|
|
|
```python
|
|
from theauditor.indexer.extractors import BaseExtractor
|
|
|
|
class LanguageExtractor(BaseExtractor):
|
|
"""Extractor for {Language} files."""
|
|
|
|
def supported_extensions(self) -> List[str]:
|
|
"""Return list of supported file extensions."""
|
|
return ['.ext']
|
|
|
|
def extract(self, file_info: Dict[str, Any], content: str,
|
|
tree: Optional[Any] = None) -> Dict[str, Any]:
|
|
"""Extract all relevant information from a file."""
|
|
return {
|
|
'imports': [],
|
|
'routes': [],
|
|
'symbols': [],
|
|
'assignments': [],
|
|
'function_calls': [],
|
|
'returns': []
|
|
}
|
|
```
|
|
|
|
### Testing Requirements
|
|
|
|
#### Required Test Coverage
|
|
|
|
1. **Extractor Tests** (`tests/test_{language}_extractor.py`):
|
|
- Test extracting from valid files
|
|
- Test handling of syntax errors
|
|
- Test symbol extraction
|
|
- Test import extraction
|
|
- Test file extension detection
|
|
|
|
2. **Pattern Tests** (`tests/patterns/test_{language}_patterns.py`):
|
|
- Test security pattern detection
|
|
- Ensure patterns don't over-match (false positives)
|
|
|
|
3. **Integration Tests** (`tests/integration/test_{language}_integration.py`):
|
|
- Test language in complete analysis pipeline
|
|
|
|
#### Test Data
|
|
|
|
Create test fixtures in `tests/fixtures/{language}/`:
|
|
- `valid_code.{ext}` - Valid code samples
|
|
- `vulnerable_code.{ext}` - Code with known vulnerabilities
|
|
- `edge_cases.{ext}` - Edge cases and corner scenarios
|
|
|
|
### Submission Checklist
|
|
|
|
Before submitting your PR, ensure:
|
|
|
|
- [ ] Extractor inherits from `BaseExtractor` and implements required methods
|
|
- [ ] Extractor placed in `theauditor/indexer/extractors/{language}.py`
|
|
- [ ] Framework detection added to `framework_detector.py` (if applicable)
|
|
- [ ] At least 10 security patterns created in `patterns/{language}.yml`
|
|
- [ ] AST-based rules for complex patterns (if applicable)
|
|
- [ ] All tests passing with >80% coverage
|
|
- [ ] Documentation updated (extractor docstrings, pattern descriptions)
|
|
- [ ] Example vulnerable code provided in test fixtures
|
|
- [ ] No external dependencies without approval
|
|
- [ ] Code follows project style (run `ruff format`)
|
|
|
|
## Adding New Analyzers
|
|
|
|
### The Three-Tier Detection Architecture
|
|
|
|
TheAuditor uses a hybrid approach to detection, prioritizing accuracy and context. When contributing a new rule, please adhere to the following "AST First, Regex as Fallback" philosophy:
|
|
|
|
- **Tier 1: Multi-Language AST Rules (Preferred)**
|
|
For complex code patterns in source code (Python, JS/TS, etc.), extend or create a polymorphic AST-based rule in the `/rules` directory. These are the most powerful and accurate and should be the default choice for source code analysis.
|
|
|
|
- **Tier 2: Language-Specific AST Rules**
|
|
If a multi-language backend is not feasible, a language-specific AST rule is the next best option. The corresponding regex pattern should then be scoped to exclude the language covered by the AST rule (see `db_issues.yml` for an example).
|
|
|
|
- **Tier 3: Regex Patterns (YAML)**
|
|
Regex patterns in `/patterns` should be reserved for:
|
|
1. Simple patterns where an AST is overkill.
|
|
2. Configuration files where no AST parser exists (e.g., `.yml`, `.conf`).
|
|
3. Providing baseline coverage for languages not yet supported by an AST rule.
|
|
|
|
TheAuditor uses a modular architecture. To add new analysis capabilities:
|
|
|
|
### Database-Aware Rules
|
|
For rules that query across multiple files:
|
|
```python
|
|
# theauditor/rules/category/new_analyzer.py
|
|
def find_new_issues(db_path: str) -> List[Dict[str, Any]]:
|
|
conn = sqlite3.connect(db_path)
|
|
# Query the repo_index.db
|
|
# Return findings in standard format
|
|
```
|
|
|
|
Example ORM analyzer:
|
|
```python
|
|
# theauditor/rules/orm/sequelize_detector.py
|
|
def find_sequelize_issues(db_path: str) -> List[Dict[str, Any]]:
|
|
conn = sqlite3.connect(db_path)
|
|
cursor = conn.cursor()
|
|
cursor.execute(
|
|
"SELECT file, line, query_type, includes FROM orm_queries"
|
|
)
|
|
# Analyze for N+1 queries, death queries, etc.
|
|
```
|
|
|
|
### AST-Based Rules
|
|
For semantic code analysis:
|
|
```python
|
|
# theauditor/rules/framework/new_detector.py
|
|
def find_framework_issues(tree: Any, file_path: str) -> List[Dict[str, Any]]:
|
|
# Traverse semantic AST
|
|
# Return findings in standard format
|
|
```
|
|
|
|
### Pattern-Based Rules
|
|
Add YAML patterns to `theauditor/patterns/`:
|
|
```yaml
|
|
name: insecure_api_key
|
|
severity: critical
|
|
category: security
|
|
pattern: 'api[_-]?key\s*=\s*["\'][^"\']+["\']'
|
|
description: "Hardcoded API key detected"
|
|
```
|
|
|
|
## Testing
|
|
|
|
Write tests for any new functionality:
|
|
|
|
```bash
|
|
# Run all tests
|
|
pytest
|
|
|
|
# Run specific test file
|
|
pytest tests/test_your_feature.py
|
|
|
|
# Run with coverage
|
|
pytest --cov=theauditor
|
|
```
|
|
|
|
## Documentation
|
|
|
|
- Update relevant documentation when making changes
|
|
- Add docstrings to new functions and classes
|
|
- Update `README.md` if adding new commands or features
|
|
- Consider updating `howtouse.md` for user-facing changes
|
|
|
|
## Getting Help
|
|
|
|
- Check our [TeamSOP](teamsop.md) for our development workflow
|
|
- Review [CLAUDE.md](CLAUDE.md) for AI-assisted development guidelines
|
|
- Ask questions in GitHub Issues or Discussions
|
|
- Join our community chat (if available)
|
|
|
|
## License
|
|
|
|
By contributing to TheAuditor, you agree that your contributions will be licensed under the same license as the project.
|
|
|
|
---
|
|
|
|
We're excited to see your contributions! Whether you're fixing bugs, adding features, or improving documentation, every contribution helps make TheAuditor better for everyone. |