Ask me what skills you need
What are you building?
Tell me what you're working on and I'll find the best agent skills for you.
Evaluate skills: trigger testing, A/B benchmarks, structure validation, head-to-head bake-offs.
Measure and improve skill quality through empirical testing — because structure doesn't guarantee behavior, and measurement beats assumption. Also covers head-to-head bake-offs of two peer implementations of the same artifact (Mode F).
| Signal | Load These Files | Why |
|---|---|---|
| tasks related to this reference | schemas.md | Loads detailed guidance from schemas.md. |
| tasks related to this reference | self-improve-loop.md | Loads detailed guidance from self-improve-loop.md. |
| "bake-off", "head-to-head", "compare implementations", "grade two versions", "which Feynman skill is better" | bake-off-methodology.md | Loads the bake-off rubric, anti-rationalization gate, fold-filter, and worked Feynman example. |
Step 1: Identify the skill
# Validate skill structure first
python3 -m scripts.skill_eval.quick_validate <path/to/skill>
This checks: SKILL.md exists, valid frontmatter, required fields (name, description), kebab-case naming, description under 1024 chars, no angle brackets.
npx skills add notque/vexjoy-agent --skill skill-evalHow clear and easy to understand the SKILL.md instructions are, rated from 1 to 5.
Mostly clear, but there are still a few confusing or poorly structured parts.
How directly an agent can act on the SKILL.md instructions, rated from 1 to 5.
Mostly actionable with clear steps; only a few small gaps remain.