Ask me what skills you need
What are you building?
Tell me what you're working on and I'll find the best agent skills for you.
SkillsBench task authoring — walk a contributor from idea to submission-ready task following CONTRIBUTING.md and the task-implementation rubric. Use when the user wants to create a new SkillsBench task, scaffold a task from an existing workflow (notebook, Excel workbook, document, dataset), convert a prompt or a benchmark item into a SkillsBench task, write skills for a task, or prepare a SkillsBench PR. Pairs with `task-review` (run that as a self-check before submitting).
Build a task that scores well on the task principles. Two artifacts when you're done: a directory under tasks/<task-id>/ that bench tasks check accepts, and a PR description that maps cleanly to the PR template.
1. propose → one-paragraph proposal, gut-check against the proposal rubric
2. scaffold → bench tasks init, plus the layout described below
3. instruction → formal, outcome-focused, equivalent to source (NOT verbatim)
4. environment → Dockerfile + bundled inputs; do NOT bake skills
5. tests → 4–10 test functions, parametrize for bulk; check formulas AND values
6. solution → human-written oracle that derives answers, not hardcodes them
7. skills → 2–3 generalizable skills (or reuse existing ones from /tasks/*/environment/skills/)
8. validate → bench tasks check + bench eval create -a oracle (must reach 1.0)
9. self-review → invoke task-review skill on the local path
10. agent runs → Opus 4.7 / latest Codex with and without skills
11. submit → PR with the table the template asks for
npx skills add benchflow-ai/skillsbench --skill task-creatorHow clear and easy to understand the SKILL.md instructions are, rated from 1 to 5.
Clear and well structured, with only minor parts that might need a second read.
How directly an agent can act on the SKILL.md instructions, rated from 1 to 5.
Mostly actionable with clear steps; only a few small gaps remain.