
Security News
Axios Maintainer Confirms Social Engineering Attack Behind npm Compromise
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.
@llm-dev-ops/latency-lens
Advanced tools
High-precision LLM latency profiler - WebAssembly bindings for measuring token throughput, TTFT, and cost metrics
High-precision LLM latency profiler powered by WebAssembly. Measure token throughput, Time to First Token (TTFT), inter-token latency, and cost metrics for OpenAI, Anthropic, and other LLM providers.
npm install @llm-dev-ops/latency-lens
npm install -g @llm-dev-ops/latency-lens
After installing globally, you can use the CLI:
# Show help
latency-lens help
# Show version
latency-lens version
# Run a test to see metrics in action
latency-lens test
latency-lens version - Display version informationlatency-lens test - Run a simulated metrics collection testlatency-lens help - Show usage informationimport { LatencyCollector } from '@llm-dev-ops/latency-lens';
// Create collector with 60-second window
const collector = new LatencyCollector(60000);
// Start tracking a request
const requestId = collector.start_request('openai', 'gpt-4-turbo');
// Record first token received
collector.record_first_token(requestId);
// Record each subsequent token
collector.record_token(requestId);
collector.record_token(requestId);
// ... more tokens
// Complete the request
collector.complete_request(
requestId,
150, // input tokens
800, // output tokens
null, // thinking tokens (optional)
0.05 // cost in USD
);
// Get aggregated metrics
const metrics = collector.get_metrics();
console.log('TTFT P95:', metrics.ttft_distribution.p95_ms, 'ms');
console.log('Throughput:', metrics.throughput.tokens_per_second, 'tokens/sec');
import { LatencyCollector } from '@llm-dev-ops/latency-lens';
const collector = new LatencyCollector(30000);
async function trackOpenAIRequest(prompt) {
const reqId = collector.start_request('openai', 'gpt-4-turbo');
const stream = await openai.chat.completions.create({
model: 'gpt-4-turbo',
messages: [{ role: 'user', content: prompt }],
stream: true
});
let firstToken = true;
for await (const chunk of stream) {
if (firstToken) {
collector.record_first_token(reqId);
firstToken = false;
} else {
collector.record_token(reqId);
}
}
collector.complete_request(reqId, 100, 500, null, 0.025);
}
// Track multiple requests
await Promise.all([
trackOpenAIRequest('What is AI?'),
trackOpenAIRequest('Explain quantum computing'),
trackOpenAIRequest('Write a poem')
]);
// Analyze performance
const metrics = collector.get_metrics();
console.log('Performance Report:');
console.log('===================');
console.log(`Total requests: ${metrics.total_requests}`);
console.log(`Success rate: ${(metrics.success_rate * 100).toFixed(2)}%`);
console.log(`TTFT P50: ${metrics.ttft_distribution.p50_ms.toFixed(2)}ms`);
console.log(`TTFT P95: ${metrics.ttft_distribution.p95_ms.toFixed(2)}ms`);
console.log(`Total cost: $${metrics.total_cost_usd.toFixed(4)}`);
Main class for collecting metrics.
new LatencyCollector(window_ms: number)
window_ms - Time window in milliseconds for metrics aggregationstart_request(provider: string, model: string): string
Start tracking a new request. Returns a unique request ID.
record_first_token(request_id: string): void
Record when the first token is received (measures TTFT).
record_token(request_id: string): void
Record each subsequent token received.
complete_request(request_id: string, input_tokens: number, output_tokens: number, thinking_tokens: number | null, cost_usd: number): void
Mark the request as complete and record final metrics.
record_failure(request_id: string, error: string): void
Mark the request as failed.
get_metrics(): Metrics
Get aggregated metrics for all requests.
reset(): void
Clear all collected metrics.
{
session_id: string,
start_time: string,
end_time: string,
total_requests: number,
successful_requests: number,
failed_requests: number,
success_rate: number,
ttft_distribution: {
min_ms: number,
max_ms: number,
mean_ms: number,
p50_ms: number,
p90_ms: number,
p95_ms: number,
p99_ms: number,
p99_9_ms: number,
stddev_ms: number
},
inter_token_distribution: { /* same as ttft_distribution */ },
total_latency_distribution: { /* same as ttft_distribution */ },
throughput: {
tokens_per_second: number,
requests_per_second: number
},
total_input_tokens: number,
total_output_tokens: number,
total_thinking_tokens: number | null,
total_cost_usd: number | null,
avg_cost_per_request: number | null,
provider_breakdown: [string, number][],
model_breakdown: [string, number][]
}
Built with Rust and WebAssembly for maximum performance:
Requires a modern browser with WebAssembly support:
Apache-2.0
FAQs
High-precision LLM latency profiler - WebAssembly bindings for measuring token throughput, TTFT, and cost metrics
The npm package @llm-dev-ops/latency-lens receives a total of 3 weekly downloads. As such, @llm-dev-ops/latency-lens popularity was classified as not popular.
We found that @llm-dev-ops/latency-lens demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.

Security News
The Axios compromise shows how time-dependent dependency resolution makes exposure harder to detect and contain.