threshold-filtering

根据 Excel 数据量级自动判断处理策略，执行数值列清洗、条件过滤，并使用 openpyxl 对符合条件的单元格进行样式标记与导出。

Best for Data wrangling tasksWorks with GitHubLow risk

#excel #threshold #data cleaning #styling #pandas #openpyxl

⌘source

author: @OpenSenseNova
repo: OpenSenseNova/SenseNova-Skills
language: Python

✦overview.md

Key Features

·Row counting across all sheets
·Numeric coercion and cleaning
·Conditional filtering by threshold
·Cell styling with openpyxl

Use Cases

→Analyze large Excel files for data quality before processing
→Filter and highlight rows exceeding a numeric threshold
→Clean misformatted numeric columns for downstream analysis

Best for

✓Data wrangling tasks
✓Excel report automation

Not ideal for

!Non-numeric analysis
!Real-time processing

FAQs

skills/sn-da-excel-workflow/capability/excel-data-filtering/threshold-filtering/SKILL.md

name

excel-threshold-analysis-and-styling

description

根据 Excel 数据量级自动判断处理策略，执行数值列清洗、条件过滤，并使用 openpyxl 对符合条件的单元格进行样式标记与导出。

Excel Threshold Analysis and Styling

Note: This sub-skill covers one step of the Excel analysis workflow. For the full pipeline (file reading, row counting, large-file optimization, export), see the parent workflow SKILL.md.

Step1 读取 Excel 文件中所有工作表的行数并汇总，用于评估数据规模。

import pandas as pd

file_path = 'input_file.xlsx'

# 读取所有 sheet 名称并统计总行数
xls = pd.ExcelFile(file_path)
sheet_names = xls.sheet_names
total_rows = 0

for sheet in sheet_names:
    # header=None 用于快速统计包含表头的总行数
    df_tmp = pd.read_excel(file_path, sheet_name=sheet, header=None)
    rows = len(df_tmp)
    total_rows += rows
    print(f"Sheet '{sheet}': {rows} 行")

print(f"\n总行数汇总: {total_rows}")

Step2 对目标数据表进行清洗，将指定列的非数值内容转换为缺失值并剔除，确保数据类型为数值型。

target_sheet = 'Sheet1'
target_col = '数量' # 待处理的目标列名
header_idx = 1     # 表头所在行索引（0开始计数）

df = pd.read_excel(file_path, sheet_name=target_sheet, header=header_idx)

# 强制转换数值类型，无法转换的内容变为 NaN 并删除
df[target_col] = pd.to_numeric(df[target_col], errors='coerce')
df_cleaned = df.dropna(subset=[target_col])

print(f"清洗完成，有效数据行数: {len(df_cleaned)}")

Step3 筛选符合特定数值条件的记录并进行统计。

filter_threshold = 10

...

$install

1-click copy

npx skills add OpenSenseNova/SenseNova-Skills --skill threshold-filtering

Safety assessment

★

Clarity score

How clear and easy to understand the SKILL.md instructions are, rated from 1 to 5.

4/ 5

very good

Clear and well structured, with only minor parts that might need a second read.

◎

Actionability score

How directly an agent can act on the SKILL.md instructions, rated from 1 to 5.

3/ 5

medium

Partially actionable with several concrete steps, but still missing important details.

~community cookbook

~you might also like

view all →

duplicate-value-coloring

★70

testing#excel

[✓]from @OpenSenseNova

[✓]

对比Excel多表中的特定系数并对异常值进行颜色标记。

April 30, 2026

◧ Compare

sn-search-academic

★70

documentation#arxiv

[✓]from @OpenSenseNova

[✓]

搜索学术论文和百科知识：ArXiv 预印本、Semantic Scholar（含引用数）、PubMed 生医文献、Wikipedia 百科。支持按章节读取 ArXiv HTML 全文和 PMC 开放获取全文，适合学术调研和深度阅读。

April 30, 2026

◧ Compare

numeric-format-normalization

★70

testing#excel

[✓]from @OpenSenseNova

[✓]

对 Excel 数据进行数值格式标准化与清洗，支持大规模数据的 Parquet 转换流程，并完成关键指标的合计核对与结果文件导出。

April 30, 2026

◧ Compare

bar-chart-visualization

★70

testing#excel

[✓]from @OpenSenseNova

[✓]

读取多工作表Excel文件，自动处理合并单元格与数据清洗，进行交叉分组统计并生成带总计行的结果表，最后绘制支持中英文字体的美化柱状图，适用于多维度数据汇总与可视化分析。

April 30, 2026

◧ Compare

percentage-calculation

★70

testing#excel

[✓]from @OpenSenseNova

[✓]

根据文件行数动态切换大文件处理策略（Parquet转换），通过逐行扫描或列匹配提取关键指标并计算占比、均值等统计量，最终输出结构化Excel报告及可视化图表。

April 30, 2026

◧ Compare

kpi-metric-analysis

★70

testing#analysis

[✓]from @OpenSenseNova

[✓]

根据数据量自动选择读取策略（大文件转Parquet），提取关键指标进行单位一致性验证与排序分析，并输出可下载的结果表格。

April 30, 2026

◧ Compare

Excel Threshold Analysis and Styling

Note: This sub-skill covers one step of the Excel analysis workflow. For the full pipeline (file reading, row counting, large-file optimization, export), see the parent workflow SKILL.md.

Step1 读取 Excel 文件中所有工作表的行数并汇总，用于评估数据规模。

import pandas as pd file_path = 'input_file.xlsx' # 读取所有 sheet 名称并统计总行数 xls = pd.ExcelFile(file_path) sheet_names = xls.sheet_names total_rows = 0 for sheet in sheet_names: # header=None 用于快速统计包含表头的总行数 df_tmp = pd.read_excel(file_path, sheet_name=sheet, header=None) rows = len(df_tmp) total_rows += rows print(f"Sheet '{sheet}': {rows} 行") print(f"\n总行数汇总: {total_rows}")

Step2 对目标数据表进行清洗，将指定列的非数值内容转换为缺失值并剔除，确保数据类型为数值型。

target_sheet = 'Sheet1' target_col = '数量' # 待处理的目标列名 header_idx = 1 # 表头所在行索引（0开始计数） df = pd.read_excel(file_path, sheet_name=target_sheet, header=header_idx) # 强制转换数值类型，无法转换的内容变为 NaN 并删除 df[target_col] = pd.to_numeric(df[target_col], errors='coerce') df_cleaned = df.dropna(subset=[target_col]) print(f"清洗完成，有效数据行数: {len(df_cleaned)}")

Step3 筛选符合特定数值条件的记录并进行统计。

filter_threshold = 10

threshold-filtering

Key Features

Use Cases

Best for

Not ideal for

FAQs

What file formats are supported?

Can I specify multiple thresholds?

Excel Threshold Analysis and Styling

Safety assessment

Clarity score

Actionability score

~community cookbook

~you might also like

duplicate-value-coloring

sn-search-academic

numeric-format-normalization

bar-chart-visualization

percentage-calculation

kpi-metric-analysis

AI Skill Finder

threshold-filtering

Key Features

Use Cases

Best for

Not ideal for

FAQs

What file formats are supported?

Can I specify multiple thresholds?

Excel Threshold Analysis and Styling

Safety assessment

Clarity score

Actionability score

~community cookbook

~you might also like

duplicate-value-coloring

sn-search-academic

numeric-format-normalization

bar-chart-visualization

percentage-calculation

kpi-metric-analysis