How It Works - Pinata

Architecture Overview

Pinata is a static analysis tool that scans source code for security vulnerabilities and test coverage gaps. It uses pattern matching against a curated database of detection rules, then scores results and optionally generates tests.

Source Code

→

Scanner

→

Pattern Matcher

→

Gaps

→

Scorer

→

Pinata Score

Gaps

→

AI Service

→

Tests + Explanations

Scanning Pipeline

When you run pinata analyze, the following steps execute:

File Discovery

Recursively walks the directory, filtering by language (TypeScript, Python, JavaScript) and respecting .pinataignore patterns.

Category Loading

Loads 45 detection categories from YAML definitions. Each category contains patterns, severity levels, and test templates.

Pattern Matching

For each file, runs all applicable patterns (filtered by language). Patterns are regex-based with negative patterns to reduce false positives.

Scoring

Calculates the Pinata Score based on gap count, severity weights, and domain coverage.

Pattern Definition Format

Each detection category is defined in YAML with the following structure:

id: sql-injection
version: 1
name: SQL Injection
description: |
  Detects SQL queries built with string concatenation...
domain: security
priority: P0
severity: critical
applicableLanguages:
  - python
  - typescript

detectionPatterns:
  - id: ts-template-literal-query
    type: regex
    language: typescript
    pattern: "(query|execute).*`.*\\$\\{"
    confidence: high
    description: Detects template literals in SQL queries
    negativePattern: "parameterized|prepared"

testTemplates:
  - id: jest-sql-injection
    language: typescript
    framework: jest
    template: |
      describe('SQL Injection', () => {
        it('uses parameterized queries', () => {
          // Test code...
        });
      });

Confidence Levels

Each pattern has a confidence level that affects filtering and scoring:

High - Very likely a real issue. Few false positives.
Medium - Likely an issue but may need manual review.
Low - Possible issue. Higher false positive rate.

By default, Pinata only reports high confidence findings. Use --confidence medium or --confidence low to see more.

Scoring Algorithm

The Pinata Score (0-100) represents your codebase's security health. Higher is better.

Score = 100 - Σ(gap_weight × severity_multiplier × confidence_factor)

Severity Multipliers

Critical - 10 points per gap
High - 5 points per gap
Medium - 2 points per gap
Low - 1 point per gap

Domain Coverage

The score also considers which risk domains have been scanned. If your codebase has no database code, the Data domain won't penalize you for missing data validation patterns.

Diminishing Returns

After 10 gaps of the same category, additional gaps have reduced impact. This prevents a single repeated issue from dominating the score.

AI Features

Pinata integrates with LLMs (Anthropic Claude, OpenAI GPT) for enhanced analysis:

🧠

Natural Language Explanations - Understand vulnerabilities in plain English with remediation guidance.

🧪

Test Generation - AI fills template variables intelligently based on your actual code context.

💡

Pattern Suggestions - Submit vulnerable code samples and get new detection patterns.

📊

Risk Assessment - AI evaluates real-world exploitability of findings.

AI Service Architecture

┌─────────────────────────────────────────────────┐
│                  AI Service                      │
├─────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐               │
│  │  Explainer  │  │  Generator  │               │
│  └──────┬──────┘  └──────┬──────┘               │
│         │                │                       │
│         ▼                ▼                       │
│  ┌─────────────────────────────────────┐        │
│  │         Provider Abstraction         │        │
│  │    (Anthropic / OpenAI / Mock)       │        │
│  └─────────────────────────────────────┘        │
└─────────────────────────────────────────────────┘

The AI service abstracts away provider differences. You can switch between Anthropic and OpenAI by changing your API key configuration.

Performance

Pinata is designed for speed:

Parallel scanning - Multiple files processed concurrently
Lazy loading - Categories loaded on-demand
Early filtering - Skip files by extension before reading
Pattern compilation - Regex patterns compiled once and reused

Benchmarks

On a typical codebase (10,000 files, 500K LOC), Pinata completes in under 10 seconds. AI features add latency based on provider response times.

Extensibility

Pinata is designed for customization:

Custom Categories

Add your own detection categories by creating YAML files in your project:

# .pinata/categories/my-company-auth.yml
id: my-company-auth
name: MyCompany Auth Standards
domain: security
severity: high
detectionPatterns:
  - id: legacy-auth-function
    pattern: "legacyAuthenticate\\("
    confidence: high
    description: Legacy auth function deprecated

Output Formats

Integrate with any system using standard output formats:

SARIF - GitHub Advanced Security, VS Code
JUnit XML - Jenkins, GitLab CI, CircleCI
JSON - Custom scripting and dashboards
Markdown - PR comments and documentation

Security Model

Pinata is designed with security in mind:

Local-first - All scanning happens locally. No code leaves your machine.
Optional AI - AI features are opt-in and require explicit API key configuration.
Minimal dependencies - Small dependency tree reduces supply chain risk.
No telemetry - We don't collect usage data or send anything home.

AI Privacy

When using AI features, code snippets are sent to your configured AI provider (Anthropic or OpenAI). Use the --no-ai flag to disable all AI features in sensitive environments.

How Pinata Works

Architecture Overview

Scanning Pipeline

Pattern Definition Format

Confidence Levels

Scoring Algorithm

Severity Multipliers

Domain Coverage

AI Features

AI Service Architecture

Performance

Extensibility

Custom Categories

Output Formats

Security Model