launching-evals

Run, monitor, analyze, and debug LLM evaluations via nemo-evaluator-launcher. Covers running evaluations, checking status and live progress, debugging failed runs, exporting artifacts and logs, and analyzing results. ALWAYS triggers on mentions of running evaluations, checking progress, debugging failed evals, analyzing or analysing runs or results, run directories or artifact paths on clusters, Slurm job issues, invocation IDs, or inspecting logs (client logs, server logs, SSH to cluster, tail logs, grep logs). Do NOT use for creating or modifying evaluation configs.

Best for LLM evaluation pipelinesWorks with GitHubLow risk

#nemo #evaluator #launcher #llm #evaluation #slurm #hpc

⌘source

author: @NVIDIA
repo: NVIDIA/skills
language: Python

✦overview.md

Key Features

·Run LLM evaluations via CLI
·Check evaluation status and live progress
·Debug failed runs with logs and artifacts
·Export artifacts and logs
·Analyze evaluation results
·Supports Slurm job monitoring

Use Cases

→Run a single evaluation task with a custom config
→Monitor progress of multiple evaluation jobs on a cluster
→Debug a failed evaluation by inspecting logs and artifacts
→Analyze evaluation results to compare model performance

Best for

✓LLM evaluation pipelines
✓HPC clusters with Slurm
✓ML teams running periodic evals

Not ideal for

!Creating or modifying evaluation configs
!Evaluations outside NeMo ecosystem

FAQs

skills/Model-Optimizer/launching-evals/SKILL.md

name

launching-evals

description

license

Apache-2.0

NeMo Evaluator Skill

Quick Reference

nemo-evaluator-launcher CLI

# Run evaluation
uv run nemo-evaluator-launcher run --config <path.yaml>
uv run nemo-evaluator-launcher run --config <path.yaml> -t <a_single_task_to_be_run_by_name>
uv run nemo-evaluator-launcher run --config <path.yaml> -t <task_name_1> -t <task_name_2> ...
uv run nemo-evaluator-launcher run --config <path.yaml> -o evaluation.nemo_evaluator_config.config.params.limit_samples=10 ...

# Preview the resolved config and the sbatch script without running the evaluation
uv run nemo-evaluator-launcher run --config <path.yaml> --dry-run

# Check status (--json for machine-readable output)
uv run nemo-evaluator-launcher status <invocation_id> --json

# Get evaluation run info (output paths, slurm job IDs, cluster hostname, etc.)
uv run nemo-evaluator-launcher info <invocation_id>

# Copy just the logs (quick — good for debugging)
uv run nemo-evaluator-launcher info <invocation_id> --copy-logs ./evaluation-results/

# For artifacts: use `nel info` to discover paths. If remote, SSH to explore and rsync what you need.
# If local, just read directly from the paths shown by `nel info`.

...

$install

1-click copy

npx skills add NVIDIA/skills --skill launching-evals

Safety assessment

★

Clarity score

How clear and easy to understand the SKILL.md instructions are, rated from 1 to 5.

4/ 5

very good

Clear and well structured, with only minor parts that might need a second read.

◎

Actionability score

How directly an agent can act on the SKILL.md instructions, rated from 1 to 5.

4/ 5

high

Mostly actionable with clear steps; only a few small gaps remain.

~community cookbook

April 29, 2026

◧ Compare

launching-evals

Best for LLM evaluation pipelinesWorks with GitHubLow risk

NeMo Evaluator Skill

Quick Reference

nemo-evaluator-launcher CLI

# Run evaluation uv run nemo-evaluator-launcher run --config <path.yaml> uv run nemo-evaluator-launcher run --config <path.yaml> -t <a_single_task_to_be_run_by_name> uv run nemo-evaluator-launcher run --config <path.yaml> -t <task_name_1> -t <task_name_2> ... uv run nemo-evaluator-launcher run --config <path.yaml> -o evaluation.nemo_evaluator_config.config.params.limit_samples=10 ... # Preview the resolved config and the sbatch script without running the evaluation uv run nemo-evaluator-launcher run --config <path.yaml> --dry-run # Check status (--json for machine-readable output) uv run nemo-evaluator-launcher status <invocation_id> --json # Get evaluation run info (output paths, slurm job IDs, cluster hostname, etc.) uv run nemo-evaluator-launcher info <invocation_id> # Copy just the logs (quick — good for debugging) uv run nemo-evaluator-launcher info <invocation_id> --copy-logs ./evaluation-results/ # For artifacts: use `nel info` to discover paths. If remote, SSH to explore and rsync what you need. # If local, just read directly from the paths shown by `nel info`.

launching-evals

Key Features

Use Cases

Best for

Not ideal for

FAQs

Does this skill create or modify evaluation configs?

Can I use it with any HPC scheduler?

How do I preview a run without executing it?

NeMo Evaluator Skill

Quick Reference

nemo-evaluator-launcher CLI

Safety assessment

Clarity score

Actionability score

~community cookbook

~you might also like

scoring-checks

llm-provider

motion-frames

project-surface-scan

caliber-testing

labrat-operator

AI Skill Finder

launching-evals

Key Features

Use Cases

Best for

Not ideal for

FAQs

Does this skill create or modify evaluation configs?

Can I use it with any HPC scheduler?

How do I preview a run without executing it?

NeMo Evaluator Skill

Quick Reference

nemo-evaluator-launcher CLI

Safety assessment

Clarity score

Actionability score

~community cookbook

~you might also like

scoring-checks

llm-provider

motion-frames

project-surface-scan

caliber-testing

labrat-operator