Ask me what skills you need
What are you building?
Tell me what you're working on and I'll find the best agent skills for you.
对两类分类数据进行对比分析,统计数量差异与比例关系并生成可视化图表。
This sub-skill covers one capability of the Excel workflow. For reading/counting/Parquet optimization, see the parent workflow SKILL.md.
Step1 读取文件并统计所有 sheet 的总行数,评估是否需要进行大文件优化处理。
import pandas as pd
from pandas import read_excel
from pathlib import Path
# 统计所有 sheet 的行数以决定处理策略
file_path = "input_data.xlsx"
sheet_names = pd.ExcelFile(file_path).sheet_names
total_rows = 0
for sheet in sheet_names:
# 仅读取行索引以快速计数
df_tmp = read_excel(file_path, sheet_name=sheet, usecols=[0])
total_rows += len(df_tmp)
print(f"Total rows across all sheets: {total_rows}")
Step2 提取对比维度的分类信息,执行数据清洗,包括去除空值、处理合并单元格填充以及排除非数据行。
# 定义目标列名
target_col_a = "category_a_column"
target_col_b = "category_b_column"
# 处理合并单元格(ffill)并清洗数据
df[target_col_a] = df[target_col_a].ffill()
df[target_col_b] = df[target_col_b].ffill()
# 排除标题行占位符(如 '代码'、'名称')及空值
exclude_val = "代码"
data_a = df[target_col_a].dropna()
data_a = data_a[data_a != exclude_val]
data_b = df[target_col_b].dropna()
data_b = data_b[data_b != exclude_val]
Step3 统计分类数量,计算差异值与占比,生成多维度对比统计表。
count_a = len(data_a)
count_b = len(data_b)
total_count = count_a + count_b
npx skills add OpenSenseNova/SenseNova-Skills --skill comparison-analysisHow clear and easy to understand the SKILL.md instructions are, rated from 1 to 5.
Mostly clear, but there are still a few confusing or poorly structured parts.
How directly an agent can act on the SKILL.md instructions, rated from 1 to 5.
Some hints are present, but an agent still has to guess many steps.