Ask me what skills you need
What are you building?
Tell me what you're working on and I'll find the best agent skills for you.
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.
Master comprehensive evaluation strategies for LLM applications, from automated metrics to human evaluation and A/B testing.
Fast, repeatable, scalable evaluation using computed scores.
Text Generation:
Classification:
Retrieval (RAG):
npx skills add wshobson/agents --skill llm-evaluationHow clear and easy to understand the SKILL.md instructions are, rated from 1 to 5.
Mostly clear, but there are still a few confusing or poorly structured parts.
How directly an agent can act on the SKILL.md instructions, rated from 1 to 5.
Partially actionable with several concrete steps, but still missing important details.