category-coloring

当Excel文件总行数超过1万行时，通过转换为Parquet格式提升读取性能，提取目标指标并计算最大值，最后将结果输出为Excel并对特定行进行高亮标注。

Best for 大数据量Excel处理Works with GitHubLow risk

#parquet #excel #data-analysis #large-file #highlight

⌘source

author: @OpenSenseNova
repo: OpenSenseNova/SenseNova-Skills
language: Python

✦overview.md

Key Features

·统计Excel总行数
·自动切换Parquet读取
·提取目标指标最大值
·高亮标注特定行

Use Cases

→分析超过1万行的Excel报表
→快速找出指标最高的分类
→性能瓶颈排查与数据抽取

Best for

✓大数据量Excel处理
✓指标分析报表

Not ideal for

!小数据集
!实时查询

FAQs

skills/sn-da-excel-workflow/capability/excel-cell-coloring/category-coloring/SKILL.md

name

large-file-parquet-analysis-and-highlight

description

当Excel文件总行数超过1万行时，通过转换为Parquet格式提升读取性能，提取目标指标并计算最大值，最后将结果输出为Excel并对特定行进行高亮标注。

Skill Steps

Step1 读取文件并统计所有 sheet 的行数，汇总后打印总行数，用于判断数据规模是否需要启用大文件处理。

import pandas as pd

file_path = "input_data.xlsx"

# 读取所有sheet并统计总行数
xls = pd.ExcelFile(file_path)
sheet_names = xls.sheet_names
print(f"Sheet列表: {sheet_names}")

total_rows = 0
for sheet in sheet_names:
    # 仅读取一列以加快行数统计速度
    df_temp = pd.read_excel(file_path, sheet_name=sheet, usecols=[0], header=None)
    rows = len(df_temp)
    total_rows += rows
    print(f"Sheet '{sheet}': {rows} 行")

print(f"\n总行数 = {total_rows}")

Step2 当总行数 ≥ 1万时，读取已转换为 Parquet 格式的数据文件，通过行列匹配提取目标指标数据，并找出最大值及其对应分类。

import pandas as pd

# 假设已通过大文件处理技能将Excel转换为Parquet
parquet_path = "converted_data.parquet"
df = pd.read_parquet(parquet_path)

# 假设第2行（索引1）是分类表头（如：控股类型、区域等）
header_row = df.iloc[1].tolist()
print("分类表头:", header_row)

# 找到目标指标所在的行（占位示例：'目标指标名称'）
target_metric = '目标指标名称'
target_rows = df[df[0] == target_metric]

if not target_rows.empty:
    # 提取数值
    values = target_rows.iloc[0, 1:].tolist()
    
    # 清洗数据并找出最大值及其对应的分类
    numeric_values = []
    for val in values:
        try:
            numeric_values.append(float(val))
        except:
            numeric_values.append(0)

...

$install

1-click copy

npx skills add OpenSenseNova/SenseNova-Skills --skill category-coloring

Safety assessment

★

Clarity score

How clear and easy to understand the SKILL.md instructions are, rated from 1 to 5.

3/ 5

good

Mostly clear, but there are still a few confusing or poorly structured parts.

◎

Actionability score

How directly an agent can act on the SKILL.md instructions, rated from 1 to 5.

3/ 5

medium

Partially actionable with several concrete steps, but still missing important details.

~community cookbook

~you might also like

view all →

line-chart-visualization

★70

performance#data visualization

[✓]from @OpenSenseNova

[✓]

提取结构化数据并进行特征清洗与聚类分析，生成包含趋势对比、分布特征与参数敏感性的多维度综合可视化图表，适用于各类趋势预测与多维对比场景。

April 30, 2026

◧ Compare

trend-analysis

★70

performance#数据趋势

[✓]from @OpenSenseNova

[✓]

基于多维度数据进行分级评估与趋势预测，通过设定差异化增长率计算预测值，并生成对比可视化图表，适用于绩效评估、目标设定等场景。

April 30, 2026

◧ Compare

kpi-metric-analysis

★70

testing#analysis

[✓]from @OpenSenseNova

[✓]

根据数据量自动选择读取策略（大文件转Parquet），提取关键指标进行单位一致性验证与排序分析，并输出可下载的结果表格。

April 30, 2026

◧ Compare

outlier-coloring

★70

testing#excel

[✓]from @OpenSenseNova

[✓]

识别 Excel 中的超限数值与错误单元格并进行高亮标注。

April 30, 2026

◧ Compare

time-series-analysis

★70

performance#time series

[✓]from @OpenSenseNova

[✓]

对时间序列或分类数据进行多维度趋势分析、百分比清洗、绩效分级建模与预测，并生成高分辨率的可视化综合报告，适用于业务指标监控与预测场景。

April 30, 2026

◧ Compare

histogram-visualization

★70

testing#statistical-analysis

[✓]from @OpenSenseNova

[✓]

执行数值型数据的分布分析与异常值检测，支持通过正则表达式从文本中提取误差项并生成高分辨率的箱线图与直方图报告。

April 30, 2026

◧ Compare

category-coloring

当Excel文件总行数超过1万行时，通过转换为Parquet格式提升读取性能，提取目标指标并计算最大值，最后将结果输出为Excel并对特定行进行高亮标注。

Best for 大数据量Excel处理Works with GitHubLow risk

Skill Steps

Step1 读取文件并统计所有 sheet 的行数，汇总后打印总行数，用于判断数据规模是否需要启用大文件处理。

import pandas as pd file_path = "input_data.xlsx" # 读取所有sheet并统计总行数 xls = pd.ExcelFile(file_path) sheet_names = xls.sheet_names print(f"Sheet列表: {sheet_names}") total_rows = 0 for sheet in sheet_names: # 仅读取一列以加快行数统计速度 df_temp = pd.read_excel(file_path, sheet_name=sheet, usecols=[0], header=None) rows = len(df_temp) total_rows += rows print(f"Sheet '{sheet}': {rows} 行") print(f"\n总行数 = {total_rows}")

Step2 当总行数 ≥ 1万时，读取已转换为 Parquet 格式的数据文件，通过行列匹配提取目标指标数据，并找出最大值及其对应分类。

import pandas as pd # 假设已通过大文件处理技能将Excel转换为Parquet parquet_path = "converted_data.parquet" df = pd.read_parquet(parquet_path) # 假设第2行（索引1）是分类表头（如：控股类型、区域等） header_row = df.iloc[1].tolist() print("分类表头:", header_row) # 找到目标指标所在的行（占位示例：'目标指标名称'） target_metric = '目标指标名称' target_rows = df[df[0] == target_metric] if not target_rows.empty: # 提取数值 values = target_rows.iloc[0, 1:].tolist() # 清洗数据并找出最大值及其对应的分类 numeric_values = [] for val in values: try: numeric_values.append(float(val)) except: numeric_values.append(0)

category-coloring

Key Features

Use Cases

Best for

Not ideal for

FAQs

支持哪些Excel格式？

如何指定目标指标？

Skill Steps

Safety assessment

Clarity score

Actionability score

~community cookbook

~you might also like

line-chart-visualization

trend-analysis

kpi-metric-analysis

outlier-coloring

time-series-analysis

histogram-visualization

AI Skill Finder

category-coloring

Key Features

Use Cases

Best for

Not ideal for

FAQs

支持哪些Excel格式？

如何指定目标指标？

Skill Steps

Safety assessment

Clarity score

Actionability score

~community cookbook

~you might also like

line-chart-visualization

trend-analysis

kpi-metric-analysis

outlier-coloring

time-series-analysis

histogram-visualization