Ask me what skills you need
What are you building?
Tell me what you're working on and I'll find the best agent skills for you.
统计多Sheet Excel总行数并根据规模选择处理策略,提取特定维度信息进行去重统计,并生成摘要与明细报表。
This sub-skill covers one capability of the Excel workflow. For reading/counting/Parquet optimization, see the parent workflow SKILL.md.
Step1 加载目标数据表,并进行初步的数据预览与结构检查。
import pandas as pd
file_path = 'input_file.xlsx'
target_sheet = 'Sheet1' # 根据实际情况指定 sheet 名称
# 读取数据,header=None 用于处理无表头或非标准表头文件
df = pd.read_excel(file_path, sheet_name=target_sheet, header=None)
print(f"数据形状: {df.shape}")
print("前 5 行预览:")
print(df.head())
Step2 遍历数据行,基于关键词提取目标信息,并执行数据清洗(去除空格、空值过滤)。
import pandas as pd
# 设定目标列索引及过滤关键词
target_col_idx = 1
keywords = ["关键词A", "关键词B"] # 示例:如"综合楼"、"控制中心"
extracted_data = []
for idx, row in df.iterrows():
cell_val = str(row[target_col_idx]) if pd.notna(row[target_col_idx]) else ""
# 数据清洗:去除首尾空格并匹配关键词
clean_val = cell_val.strip()
if any(k in clean_val for k in keywords):
if clean_val and clean_val.lower() not in ["nan", "null", ""]:
extracted_data.append(clean_val)
print(f"提取到相关记录共 {len(extracted_data)} 条")
Step3 对提取的信息进行分类去重,统计各维度的唯一项数量。
# 使用 set 进行高效去重
category_a_items = set()
category_b_items = set()
for item in extracted_data:
npx skills add OpenSenseNova/SenseNova-Skills --skill duplicate-removalHow clear and easy to understand the SKILL.md instructions are, rated from 1 to 5.
Clear and well structured, with only minor parts that might need a second read.
How directly an agent can act on the SKILL.md instructions, rated from 1 to 5.
Mostly actionable with clear steps; only a few small gaps remain.