New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details → →

Book a Demo Sign in

code-agent-eval

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

code-agent-eval

TypeScript library for evaluating prompts against coding agents (Claude Code, Cursor, etc.) with multi-iteration testing and scoring

latest

Source

npm

Version: 0.0.1-alpha.7

Version published: 4 days ago

Weekly downloads: 134

Maintainers: 1

Weekly downloads

Created: 5 months ago

Source

code-agent-eval

Evaluate coding agent prompts (Claude Code, Cursor, etc.) by running them multiple times and scoring outputs. Test reliability, capture changes, measure success rates.

Key Principle: Your codebase stays untouched. All modifications happen in isolated temp directories.

Features

🔄 Multi-iteration runs with aggregate metrics (pass rate, mean/min/max, std dev)
⚡ Sequential, parallel, or rate-limited execution
🔒 Isolated temp directories per iteration
✅ Built-in scorers (build/test/lint), skillPickedUp for Skill invocations, plus custom scorers
📊 Git diff capture; with resultsDir, exports results.md, per-iteration logs, and results.json
🔧 Environment variable injection (static/dynamic)
🖥️ CLI (code-agent-eval) to run evals from a config file (--eval-file)

Installation

npm install code-agent-eval
# or
pnpm add code-agent-eval
# or
yarn add code-agent-eval
# or
bun add code-agent-eval

Quick Start

import { runClaudeCodeEval, scorers } from 'code-agent-eval';

const result = await runClaudeCodeEval({
  name: 'add-feature',
  prompts: [{ id: 'v1', prompt: 'Add a health check endpoint' }],
  projectDir: './my-app',
  iterations: 10,
  execution: { mode: 'parallel' }, // or 'sequential' (default), 'parallel-limit'
  scorers: [scorers.buildSuccess(), scorers.testSuccess()],
});

console.log(`Pass rate: ${result.aggregateScores._overall.passRate * 100}%`);

CLI

Run an eval from a file that exports a default (or named config) EvalConfig:

npx code-agent-eval --eval-file ./examples/cli-test.ts

After npm install -g code-agent-eval, use code-agent-eval instead of npx. See code-agent-eval --help for every flag.

Eval files loaded via --eval-file may use import { scorers, … } from 'code-agent-eval'. The CLI resolves that specifier to the same package as the running binary, so npx works without installing code-agent-eval in the project (no local node_modules entry required for those imports).

Useful options: --json (results on stdout), --dry-run (validate config and print plan), --show-skill (print eval/skill guide), --iterations, --verbose, --results-dir. Env vars CODE_AGENT_EVAL_ITERATIONS, CODE_AGENT_EVAL_VERBOSE, CODE_AGENT_EVAL_RESULTS_DIR override config when set.

When the process runs inside an agentic environment, JSON-style stdout may be selected automatically; use --no-agent-detect or CODE_AGENT_EVAL_AGENT_DETECT=0 to disable.

Development

npm install              # Install dependencies
npm run typecheck        # TypeScript check
npm run build            # Build library
npm run test             # Run tests

# Examples
npx tsx examples/phase1-single-run.ts
npx tsx examples/phase2-multi-iteration.ts
npx tsx examples/parallel-execution.ts
npx tsx examples/multi-prompt-parallel.ts
npx tsx examples/results-export.ts
npx tsx examples/plugin-execution.ts
npx code-agent-eval --eval-file ./examples/cli-test.ts

Documentation

See CLAUDE.md for agent context; expanded architecture, config, and scorer examples are in docs/claude/.

Requirements

Node.js 18+
ANTHROPIC_API_KEY for the Claude Agent SDK
Claude Code available on the host (CLI auth / environment expected for agent runs)

License

MIT

Keywords

FAQs

What is code-agent-eval?

Is code-agent-eval popular?

Is code-agent-eval well maintained?

Package last updated on 28 Mar 2026

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

code-agent-eval

code-agent-eval

Features

Installation

Quick Start

CLI

Development

Documentation

Requirements

License

Keywords

Related posts

TeamPCP Compromises Telnyx Python SDK to Deliver Credential-Stealing Malware

TeamPCP Partners With Ransomware Group Vect to Target Open Source Supply Chains