agent-evaluation

Name: agent-evaluation
Author: abeltennyson

byabeltennyson · ai-agents

"Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent."

openclaw skills install abel-agent-evaluation

Or ask OpenClaw: "Install the agent-evaluation skill"

Quick Start

openclaw skills install abel-agent-evaluation

Try agent-evaluation with OpenClaw Cloud

Install and run agent-evaluation instantly — no setup required.

Try OpenClaw Cloud Learn More