Threat model - CodeGate

AI coding tools execute instructions from repository-controlled files. This creates a class of threats where anyone who can write or influence project files can alter agent behavior—sometimes without the user being aware a configuration surface exists. CodeGate’s threat model is organized around the attack surfaces that repository files expose.

Attack surfaces

MCP server definitions and remote endpoints

MCP server configuration files define which external processes or remote endpoints an agent can communicate with. A malicious project can add servers that exfiltrate data, execute arbitrary commands, or proxy instructions through a trusted-looking interface.

Remote MCP server endpoints are a high-risk surface. A server at a legitimate-looking domain can serve tool descriptions that contain hidden instructions, manipulate agent reasoning, or redirect tool calls to attacker-controlled infrastructure.

Relevant CVEs:

CVE-2025-59536 (Check Point research): MCP consent bypass. Demonstrates that consent dialogs can be bypassed or silently auto-approved, allowing MCP servers to execute without explicit user acknowledgment.
CVE-2026-21852 (Check Point research): API key exfiltration path. A repository-controlled MCP server configuration can establish a channel through which the agent sends credentials to an attacker endpoint.

Detection layers:

Layer 1: Discovers MCP configuration files and server definitions across known config paths.
Layer 2: NEW_SERVER, CONFIG_CHANGE, COMMAND_EXEC, ENV_OVERRIDE, CONSENT_BYPASS findings flag new or changed servers, command-executing server definitions, and auto-approval patterns.
Layer 3 (opt-in): Fetches remote tool descriptions and runs tool-description analysis; reports TOXIC_FLOW findings for manipulation patterns.

Rug-pull detection tracks MCP config hashes between scans. When a server definition changes after the user has previously scanned the project, CodeGate reports a CONFIG_CHANGE finding.

Hooks, workflows, and command templates

Git hooks, CI workflow files, task runner configurations, and agent command templates can all trigger shell execution. A project file that installs or modifies a hook can establish persistent execution on the developer’s machine. Relevant CVE:

CVE-2025-61260 (Check Point research, CVSS 9.8): Codex CLI command injection. A repository-controlled file caused Codex CLI to execute attacker-supplied shell commands. Scored 9.8 due to no user interaction required.

GIT_HOOK and COMMAND_EXEC findings in hooks or workflow files are high-risk indicators. A hook that runs at pre-commit, post-checkout, or post-merge executes automatically on common developer operations.

Detection layers:

Layer 2: GIT_HOOK and COMMAND_EXEC findings for hook files and command template execution patterns.
Workflow audit pack (--workflow-audits): Checks GitHub Actions files for unpinned action references, high-risk triggers (pull_request_target, workflow_run), overly broad permissions, and template expression injection in run steps.

Rule and skill markdown with hidden or coercive instructions

Instruction files such as AGENTS.md, CODEX.md, .cursorrules, skill markdown, and rule packs are read directly by agents as behavioral instructions. These files can contain hidden Unicode characters, coercive directives, or prompt injection payloads embedded in normal-looking text.

Instruction files are a silent injection surface. Content that looks like documentation to a human reader can function as an adversarial prompt when consumed by an agent. This is documented by Snyk’s ToxicSkills research.

Detection layers:

Layer 2: RULE_INJECTION findings for suspicious pattern heuristics in rule/skill markdown. Unicode analysis detects hidden characters (bidirectional override, invisible separators, zero-width joiners) used for visual spoofing.
Layer 3 (opt-in): Text-only local instruction-file analysis through a supported meta-agent (currently Claude Code). CodeGate passes file content as inert text without executing referenced URLs.

Workspace settings and extension manifests

IDE workspace settings (settings.json, .vscode/, .cursor/) and extension manifests can reconfigure tool behavior, override security settings, or install additional extensions. A project-level settings file can lower security bars without the user noticing. Detection layers:

Layer 2: IDE_SETTINGS findings for risky workspace setting patterns. Controlled by check_ide_settings in config.
Layer 1: Extension manifest discovery across known IDE config paths.

Files that change after the user trusted a project (rug-pull)

A project that appeared safe at initial review can change its configuration later. This is the “rug-pull” pattern: the user scans, trusts, and continues using a project, while the repository quietly modifies MCP servers, hooks, or rule files between scans. Detection layers:

Scan-state baseline: CodeGate persists MCP config hashes in ~/.codegate/scan-state.json (configurable via scan_state_path).
NEW_SERVER findings when a previously unseen MCP server identifier appears.
CONFIG_CHANGE findings when a tracked server’s config hash changes.
codegate run post-scan recheck: immediately before launching a tool, CodeGate checks whether any config surface file has changed since the scan. If files changed, launch is blocked and a rescan is required.

Use --reset-state to intentionally clear the baseline when you want to re-establish trust from the current state.

Finding categories as threat indicators

Each finding category maps to one or more threat types:

Finding category	Threat type
`COMMAND_EXEC`	Hooks, command templates, MCP command arrays
`ENV_OVERRIDE`	Credential exfiltration setup, environment manipulation
`CONSENT_BYPASS`	MCP auto-approval, prompt suppression
`RULE_INJECTION`	Instruction-file prompt injection
`IDE_SETTINGS`	Workspace security setting manipulation
`SYMLINK_ESCAPE`	Path traversal via symlinks in project files
`GIT_HOOK`	Persistent hook-based execution
`NEW_SERVER`	Previously unseen MCP server (rug-pull indicator)
`CONFIG_CHANGE`	Changed MCP server config since last scan (rug-pull indicator)
`TOXIC_FLOW`	Cross-tool manipulation via tool descriptions (Layer 3)
`PARSE_ERROR`	Malformed config that may evade static analysis

See Finding categories for the full reference including severity levels and remediation guidance.

Layers of detection

Layer	Mechanism	Network	Opt-in
Layer 1	Discovery of known AI config files and installed tools	Offline	No
Layer 2	Static risk detection across discovered surfaces	Offline	No
Layer 3	Remote resource and meta-agent analysis	Online	Yes (`--deep`)
Layer 4	Remediation with backup and undo	Offline	Yes (`--remediate`)

Layers 1 and 2 run on every scan. Layer 3 requires explicit --deep and per-resource consent. This keeps the default scan surface minimal and auditable.

​Attack surfaces

​MCP server definitions and remote endpoints

​Hooks, workflows, and command templates

​Rule and skill markdown with hidden or coercive instructions

​Workspace settings and extension manifests

​Files that change after the user trusted a project (rug-pull)

​Finding categories as threat indicators

​Layers of detection

Attack surfaces

MCP server definitions and remote endpoints

Hooks, workflows, and command templates

Rule and skill markdown with hidden or coercive instructions

Workspace settings and extension manifests

Files that change after the user trusted a project (rug-pull)

Finding categories as threat indicators

Layers of detection