New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →
Socket
Book a DemoSign in
Socket

code-agent-eval

Package Overview
Dependencies
Maintainers
1
Versions
7
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

code-agent-eval

TypeScript library for evaluating prompts against coding agents (Claude Code, Cursor, etc.) with multi-iteration testing and scoring

latest
Source
npmnpm
Version
0.0.1-alpha.7
Version published
Weekly downloads
134
-6.29%
Maintainers
1
Weekly downloads
 
Created
Source

code-agent-eval

npm version License: MIT TypeScript

Evaluate coding agent prompts (Claude Code, Cursor, etc.) by running them multiple times and scoring outputs. Test reliability, capture changes, measure success rates.

Key Principle: Your codebase stays untouched. All modifications happen in isolated temp directories.

Features

  • 🔄 Multi-iteration runs with aggregate metrics (pass rate, mean/min/max, std dev)
  • ⚡ Sequential, parallel, or rate-limited execution
  • 🔒 Isolated temp directories per iteration
  • ✅ Built-in scorers (build/test/lint), skillPickedUp for Skill invocations, plus custom scorers
  • 📊 Git diff capture; with resultsDir, exports results.md, per-iteration logs, and results.json
  • 🔧 Environment variable injection (static/dynamic)
  • 🖥️ CLI (code-agent-eval) to run evals from a config file (--eval-file)

Installation

npm install code-agent-eval
# or
pnpm add code-agent-eval
# or
yarn add code-agent-eval
# or
bun add code-agent-eval

Quick Start

import { runClaudeCodeEval, scorers } from 'code-agent-eval';

const result = await runClaudeCodeEval({
  name: 'add-feature',
  prompts: [{ id: 'v1', prompt: 'Add a health check endpoint' }],
  projectDir: './my-app',
  iterations: 10,
  execution: { mode: 'parallel' }, // or 'sequential' (default), 'parallel-limit'
  scorers: [scorers.buildSuccess(), scorers.testSuccess()],
});

console.log(`Pass rate: ${result.aggregateScores._overall.passRate * 100}%`);

CLI

Run an eval from a file that exports a default (or named config) EvalConfig:

npx code-agent-eval --eval-file ./examples/cli-test.ts

After npm install -g code-agent-eval, use code-agent-eval instead of npx. See code-agent-eval --help for every flag.

Eval files loaded via --eval-file may use import { scorers, … } from 'code-agent-eval'. The CLI resolves that specifier to the same package as the running binary, so npx works without installing code-agent-eval in the project (no local node_modules entry required for those imports).

Useful options: --json (results on stdout), --dry-run (validate config and print plan), --show-skill (print eval/skill guide), --iterations, --verbose, --results-dir. Env vars CODE_AGENT_EVAL_ITERATIONS, CODE_AGENT_EVAL_VERBOSE, CODE_AGENT_EVAL_RESULTS_DIR override config when set.

When the process runs inside an agentic environment, JSON-style stdout may be selected automatically; use --no-agent-detect or CODE_AGENT_EVAL_AGENT_DETECT=0 to disable.

Development

npm install              # Install dependencies
npm run typecheck        # TypeScript check
npm run build            # Build library
npm run test             # Run tests

# Examples
npx tsx examples/phase1-single-run.ts
npx tsx examples/phase2-multi-iteration.ts
npx tsx examples/parallel-execution.ts
npx tsx examples/multi-prompt-parallel.ts
npx tsx examples/results-export.ts
npx tsx examples/plugin-execution.ts
npx code-agent-eval --eval-file ./examples/cli-test.ts

Documentation

See CLAUDE.md for agent context; expanded architecture, config, and scorer examples are in docs/claude/.

Requirements

  • Node.js 18+
  • ANTHROPIC_API_KEY for the Claude Agent SDK
  • Claude Code available on the host (CLI auth / environment expected for agent runs)

License

MIT

Keywords

code-agent

FAQs

Package last updated on 28 Mar 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts