verl-async-dapo

Verl 单异步 DAPO 训练配置生成器。触发场景：(1) 启动单异步 DAPO 训练 (2) 生成训练脚本 (3) 配置特性参数 (4)

Best for Ascend硬件训练Works with GitHubHigh risk

#verl #dapo #training #ascend

⌘source

author: @ascend-ai-coding
repo: ascend-ai-coding/awesome-ascend-skills
language: Python

✦overview.md

Key Features

·交互式启动流程
·性能特性默认开启
·显存特性OOM自动开启
·支持容器选择
·SwanLab监控集成

Use Cases

→启动单异步DAPO训练
→生成训练脚本
→配置训练特性参数
→训练前检查环境

Best for

✓Ascend硬件训练
✓DAPO算法实验
✓需要监控的实验

Not ideal for

!非异步DAPO场景
!无容器环境

FAQs

external/gitcode-ascend/verl-async-dapo/SKILL.md

name

external-gitcode-ascend-verl-async-dapo

description

Verl 单异步 DAPO 训练配置生成器。触发场景：(1) 启动单异步 DAPO 训练 (2) 生成训练脚本 (3) 配置特性参数 (4) 训练前检查。**特性策略**：用户未指定时默认开启性能特性（flash_attn/dynamic_batch/remove_padding/gradient_checkpointing），显存特性（offload/recompute）默认关闭。OOM 时自动追加显存特性重试。**训练监控**：启动后输出 SwanLab 链接供用户自行查看，仅在错误时通知用户。**依赖 skill**：SwanLab 配置通过 swanlab-setup skill 提供。

license

UNKNOWN

original-name:verl-async-dapo

synced-from:https://gitcode.com/Ascend/agent-skills

synced-date:2026-04-29

synced-commit:8faee0275e457955c8f50989aef8972c0838db31

Verl 单异步 DAPO 训练

交互式启动流程

启动训练服务前，Agent 必须按以下顺序询问用户：

1. 容器选择

检测到以下容器：
  [1] jins ( 运行中)
  [2] 创建新容器

请选择容器 (1/2):

2. 代理配置（如创建新容器或网络问题）

请提供代理配置（如不需要请回复"无"）：
  http_proxy: ?
  https_proxy: ?

3. 特性选择

是否有自定义特性需求？(回复"默认"使用默认配置)

性能特性（默认全开）:
  - flash_attn: Flash Attention 加速
  - dynamic_batch: 动态 Batch Size
  - remove_padding: Remove Padding 优化
  - gradient_checkpointing: 梯度检查点

显存特性（默认关闭，OOM 时自动开启）:
  - offload: 参数/优化器卸载
  - recompute: 重计算

可选特性:
  - prefix_cache: Prefix Cache
  - chunked_prefill: Chunked Prefill

4. SwanLab 配置（如未配置）

SwanLab 监控配置：
  Host: ?
  API Key: ?

快速开始

# 方式 1: 快速启动脚本 (推荐)
CONTAINER=jins TRAIN_STEPS=100 \
SWANLAB_HOST=http://10.143.2.129:8000 \
SWANLAB_API_KEY=your-key \
bash scripts/quick_start.sh

# 方式 2: 分步执行
# 2.1 训练前检查
bash scripts/preflight_check.sh

# 2.2 启动单异步 DAPO 训练 (使用默认特性进行训练)
export TRAIN_STEPS=4
bash scripts/run_dapo.sh

# 2.3 启动 One-Step-Off-Policy 训练
export TRAIN_STEPS=4
bash scripts/run_one_step_off_policy.sh

One-Step-Off-Policy 训练

One-Step-Off-Policy 是资源隔离的单异步训练模式，Trainer 和 Rollout 使用独立的 GPU 组。

# 基本用法
export TRAIN_STEPS=100
export TRAINER_GPUS=4

...

$install

1-click copy

npx skills add ascend-ai-coding/awesome-ascend-skills --skill verl-async-dapo

Safety assessment

★

Clarity score

How clear and easy to understand the SKILL.md instructions are, rated from 1 to 5.

3/ 5

good

Mostly clear, but there are still a few confusing or poorly structured parts.

◎

Actionability score

How directly an agent can act on the SKILL.md instructions, rated from 1 to 5.

4/ 5

high

Mostly actionable with clear steps; only a few small gaps remain.

~community cookbook

April 30, 2026

◧ Compare

Verl 单异步 DAPO 训练

交互式启动流程

启动训练服务前，Agent 必须按以下顺序询问用户：

1. 容器选择

检测到以下容器： [1] jins ( 运行中) [2] 创建新容器请选择容器 (1/2):

2. 代理配置（如创建新容器或网络问题）

请提供代理配置（如不需要请回复"无"）： http_proxy: ? https_proxy: ?

3. 特性选择

是否有自定义特性需求？(回复"默认"使用默认配置) 性能特性（默认全开）: - flash_attn: Flash Attention 加速 - dynamic_batch: 动态 Batch Size - remove_padding: Remove Padding 优化 - gradient_checkpointing: 梯度检查点显存特性（默认关闭，OOM 时自动开启）: - offload: 参数/优化器卸载 - recompute: 重计算可选特性: - prefix_cache: Prefix Cache - chunked_prefill: Chunked Prefill

4. SwanLab 配置（如未配置）

SwanLab 监控配置： Host: ? API Key: ?

快速开始

# 方式 1: 快速启动脚本 (推荐) CONTAINER=jins TRAIN_STEPS=100 \ SWANLAB_HOST=http://10.143.2.129:8000 \ SWANLAB_API_KEY=your-key \ bash scripts/quick_start.sh # 方式 2: 分步执行 # 2.1 训练前检查 bash scripts/preflight_check.sh # 2.2 启动单异步 DAPO 训练 (使用默认特性进行训练) export TRAIN_STEPS=4 bash scripts/run_dapo.sh # 2.3 启动 One-Step-Off-Policy 训练 export TRAIN_STEPS=4 bash scripts/run_one_step_off_policy.sh

One-Step-Off-Policy 训练

One-Step-Off-Policy 是资源隔离的单异步训练模式，Trainer 和 Rollout 使用独立的 GPU 组。

# 基本用法 export TRAIN_STEPS=100 export TRAINER_GPUS=4

verl-async-dapo

Key Features

Use Cases

Best for

Not ideal for

FAQs

如何快速启动训练？

特性配置有什么默认行为？

如何查看训练监控？

Verl 单异步 DAPO 训练

交互式启动流程

1. 容器选择

2. 代理配置（如创建新容器或网络问题）

3. 特性选择

4. SwanLab 配置（如未配置）

快速开始

One-Step-Off-Policy 训练

Safety assessment

Clarity score

Actionability score

~community cookbook

~you might also like

sn-image-doctor

setting-up-compose-hotswan

resume-session

enforcing-stability-in-ci

ping

aspireify

AI Skill Finder

verl-async-dapo

Key Features

Use Cases

Best for

Not ideal for

FAQs

如何快速启动训练？

特性配置有什么默认行为？

如何查看训练监控？

Verl 单异步 DAPO 训练

交互式启动流程

1. 容器选择

2. 代理配置（如创建新容器或网络问题）

3. 特性选择

4. SwanLab 配置（如未配置）

快速开始

One-Step-Off-Policy 训练

Safety assessment

Clarity score

Actionability score

~community cookbook

~you might also like

sn-image-doctor

setting-up-compose-hotswan

resume-session

enforcing-stability-in-ci

ping

aspireify