
Research
Supply Chain Attack on Axios Pulls Malicious Dependency from npm
A supply chain attack on Axios introduced a malicious dependency, plain-crypto-js@4.2.1, published minutes earlier and absent from the project’s GitHub releases.
code-agent-eval
Advanced tools
TypeScript library for evaluating prompts against coding agents (Claude Code, Cursor, etc.) with multi-iteration testing and scoring
Evaluate coding agent prompts (Claude Code, Cursor, etc.) by running them multiple times and scoring outputs. Test reliability, capture changes, measure success rates.
Key Principle: Your codebase stays untouched. All modifications happen in isolated temp directories.
skillPickedUp for Skill invocations, plus custom scorersresultsDir, exports results.md, per-iteration logs, and results.jsoncode-agent-eval) to run evals from a config file (--eval-file)npm install code-agent-eval
# or
pnpm add code-agent-eval
# or
yarn add code-agent-eval
# or
bun add code-agent-eval
import { runClaudeCodeEval, scorers } from 'code-agent-eval';
const result = await runClaudeCodeEval({
name: 'add-feature',
prompts: [{ id: 'v1', prompt: 'Add a health check endpoint' }],
projectDir: './my-app',
iterations: 10,
execution: { mode: 'parallel' }, // or 'sequential' (default), 'parallel-limit'
scorers: [scorers.buildSuccess(), scorers.testSuccess()],
});
console.log(`Pass rate: ${result.aggregateScores._overall.passRate * 100}%`);
Run an eval from a file that exports a default (or named config) EvalConfig:
npx code-agent-eval --eval-file ./examples/cli-test.ts
After npm install -g code-agent-eval, use code-agent-eval instead of npx. See code-agent-eval --help for every flag.
Eval files loaded via --eval-file may use import { scorers, … } from 'code-agent-eval'. The CLI resolves that specifier to the same package as the running binary, so npx works without installing code-agent-eval in the project (no local node_modules entry required for those imports).
Useful options: --json (results on stdout), --dry-run (validate config and print plan), --show-skill (print eval/skill guide), --iterations, --verbose, --results-dir. Env vars CODE_AGENT_EVAL_ITERATIONS, CODE_AGENT_EVAL_VERBOSE, CODE_AGENT_EVAL_RESULTS_DIR override config when set.
When the process runs inside an agentic environment, JSON-style stdout may be selected automatically; use --no-agent-detect or CODE_AGENT_EVAL_AGENT_DETECT=0 to disable.
npm install # Install dependencies
npm run typecheck # TypeScript check
npm run build # Build library
npm run test # Run tests
# Examples
npx tsx examples/phase1-single-run.ts
npx tsx examples/phase2-multi-iteration.ts
npx tsx examples/parallel-execution.ts
npx tsx examples/multi-prompt-parallel.ts
npx tsx examples/results-export.ts
npx tsx examples/plugin-execution.ts
npx code-agent-eval --eval-file ./examples/cli-test.ts
See CLAUDE.md for agent context; expanded architecture, config, and scorer examples are in docs/claude/.
ANTHROPIC_API_KEY for the Claude Agent SDKMIT
FAQs
TypeScript library for evaluating prompts against coding agents (Claude Code, Cursor, etc.) with multi-iteration testing and scoring
The npm package code-agent-eval receives a total of 125 weekly downloads. As such, code-agent-eval popularity was classified as not popular.
We found that code-agent-eval demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
A supply chain attack on Axios introduced a malicious dependency, plain-crypto-js@4.2.1, published minutes earlier and absent from the project’s GitHub releases.

Research
Malicious versions of the Telnyx Python SDK on PyPI delivered credential-stealing malware via a multi-stage supply chain attack.

Security News
TeamPCP is partnering with ransomware group Vect to turn open source supply chain attacks on tools like Trivy and LiteLLM into large-scale ransomware operations.