comparison-analysis

对两类分类数据进行对比分析，统计数量差异与比例关系并生成可视化图表。

Best for Excel data analysis tasksWorks with GitHubLow risk

#excel #data-analysis #comparison #visualization

⌘source

author: @OpenSenseNova
repo: OpenSenseNova/SenseNova-Skills
language: Python

✦overview.md

Key Features

·Row count estimation across sheets
·Data cleaning with ffill
·Category count and ratio computation
·Statistical summary table generation

Use Cases

→Comparing categorical data across two columns in Excel
→Analyzing distribution differences between two categories

Best for

✓Excel data analysis tasks
✓Categorical data comparison

Not ideal for

!Large datasets without Parquet optimization

FAQs

skills/sn-da-excel-workflow/capability/excel-data-analysis/comparison-analysis/SKILL.md

name

categorical-comparison-analysis

description

对两类分类数据进行对比分析，统计数量差异与比例关系并生成可视化图表。

categorical-comparison-analysis

This sub-skill covers one capability of the Excel workflow. For reading/counting/Parquet optimization, see the parent workflow SKILL.md.

Step1 读取文件并统计所有 sheet 的总行数，评估是否需要进行大文件优化处理。

import pandas as pd
from pandas import read_excel
from pathlib import Path

# 统计所有 sheet 的行数以决定处理策略
file_path = "input_data.xlsx"
sheet_names = pd.ExcelFile(file_path).sheet_names
total_rows = 0
for sheet in sheet_names:
    # 仅读取行索引以快速计数
    df_tmp = read_excel(file_path, sheet_name=sheet, usecols=[0])
    total_rows += len(df_tmp)

print(f"Total rows across all sheets: {total_rows}")

Step2 提取对比维度的分类信息，执行数据清洗，包括去除空值、处理合并单元格填充以及排除非数据行。

# 定义目标列名
target_col_a = "category_a_column"
target_col_b = "category_b_column"

# 处理合并单元格（ffill）并清洗数据
df[target_col_a] = df[target_col_a].ffill()
df[target_col_b] = df[target_col_b].ffill()

# 排除标题行占位符（如 '代码'、'名称'）及空值
exclude_val = "代码" 
data_a = df[target_col_a].dropna()
data_a = data_a[data_a != exclude_val]

data_b = df[target_col_b].dropna()
data_b = data_b[data_b != exclude_val]

Step3 统计分类数量，计算差异值与占比，生成多维度对比统计表。

count_a = len(data_a)
count_b = len(data_b)
total_count = count_a + count_b

...

$install

1-click copy

npx skills add OpenSenseNova/SenseNova-Skills --skill comparison-analysis

Safety assessment

★

Clarity score

How clear and easy to understand the SKILL.md instructions are, rated from 1 to 5.

3/ 5

good

Mostly clear, but there are still a few confusing or poorly structured parts.

◎

Actionability score

How directly an agent can act on the SKILL.md instructions, rated from 1 to 5.

2/ 5

low

Some hints are present, but an agent still has to guess many steps.

~community cookbook

~you might also like

view all →

duplicate-value-coloring

★70

testing#excel

[✓]from @OpenSenseNova

[✓]

对比Excel多表中的特定系数并对异常值进行颜色标记。

April 30, 2026

◧ Compare

numeric-format-normalization

★70

testing#excel

[✓]from @OpenSenseNova

[✓]

对 Excel 数据进行数值格式标准化与清洗，支持大规模数据的 Parquet 转换流程，并完成关键指标的合计核对与结果文件导出。

April 30, 2026

◧ Compare

bar-chart-visualization

★70

testing#excel

[✓]from @OpenSenseNova

[✓]

读取多工作表Excel文件，自动处理合并单元格与数据清洗，进行交叉分组统计并生成带总计行的结果表，最后绘制支持中英文字体的美化柱状图，适用于多维度数据汇总与可视化分析。

April 30, 2026

◧ Compare

percentage-calculation

★70

testing#excel

[✓]from @OpenSenseNova

[✓]

根据文件行数动态切换大文件处理策略（Parquet转换），通过逐行扫描或列匹配提取关键指标并计算占比、均值等统计量，最终输出结构化Excel报告及可视化图表。

April 30, 2026

◧ Compare

kpi-metric-analysis

★70

testing#analysis

[✓]from @OpenSenseNova

[✓]

根据数据量自动选择读取策略（大文件转Parquet），提取关键指标进行单位一致性验证与排序分析，并输出可下载的结果表格。

April 30, 2026

◧ Compare

category-statistics

★70

documentation#python

[✓]from @OpenSenseNova

[✓]

提取指定类别列并统计各类别数量与占比，生成高分辨率的柱状图、饼图等组合可视化报告，适用于分类数据的分布情况分析。

April 30, 2026

◧ Compare

categorical-comparison-analysis

This sub-skill covers one capability of the Excel workflow. For reading/counting/Parquet optimization, see the parent workflow SKILL.md.

Step1 读取文件并统计所有 sheet 的总行数，评估是否需要进行大文件优化处理。

import pandas as pd from pandas import read_excel from pathlib import Path # 统计所有 sheet 的行数以决定处理策略 file_path = "input_data.xlsx" sheet_names = pd.ExcelFile(file_path).sheet_names total_rows = 0 for sheet in sheet_names: # 仅读取行索引以快速计数 df_tmp = read_excel(file_path, sheet_name=sheet, usecols=[0]) total_rows += len(df_tmp) print(f"Total rows across all sheets: {total_rows}")

Step2 提取对比维度的分类信息，执行数据清洗，包括去除空值、处理合并单元格填充以及排除非数据行。

# 定义目标列名 target_col_a = "category_a_column" target_col_b = "category_b_column" # 处理合并单元格（ffill）并清洗数据 df[target_col_a] = df[target_col_a].ffill() df[target_col_b] = df[target_col_b].ffill() # 排除标题行占位符（如 '代码'、'名称'）及空值 exclude_val = "代码" data_a = df[target_col_a].dropna() data_a = data_a[data_a != exclude_val] data_b = df[target_col_b].dropna() data_b = data_b[data_b != exclude_val]

Step3 统计分类数量，计算差异值与占比，生成多维度对比统计表。

count_a = len(data_a) count_b = len(data_b) total_count = count_a + count_b

comparison-analysis

Key Features

Use Cases

Best for

Not ideal for

FAQs

What file formats are supported?

Does it handle merged cells?

categorical-comparison-analysis

Safety assessment

Clarity score

Actionability score

~community cookbook

~you might also like

duplicate-value-coloring

numeric-format-normalization

bar-chart-visualization

percentage-calculation

kpi-metric-analysis

category-statistics

AI Skill Finder

comparison-analysis

Key Features

Use Cases

Best for

Not ideal for

FAQs

What file formats are supported?

Does it handle merged cells?

categorical-comparison-analysis

Safety assessment

Clarity score

Actionability score

~community cookbook

~you might also like

duplicate-value-coloring

numeric-format-normalization

bar-chart-visualization

percentage-calculation

kpi-metric-analysis

category-statistics