Ask me what skills you need
What are you building?
Tell me what you're working on and I'll find the best agent skills for you.
利用交叉表与热力图对分类数据进行多维度占比分析,适用于奖项分布、绩效评估或市场占有率等结构化数据的清洗与可视化。
Step1 对原始数据进行清洗与重构,处理 Excel 合并单元格导致的缺失值,并筛选核心分析列。
import pandas as pd
def preprocess_pivot_data(file_path, target_cols=['奖项', '项目名称', '成员', '单位']):
"""
清理并重构数据列,处理合并单元格填充。
"""
df = pd.read_excel(file_path)
# 映射通用列名
df.columns = target_cols
# 关键技巧:处理合并单元格。ffill 前需确保数据按原始分类顺序排列
# 假设第一列为分类标签(如奖项名称)
df[target_cols[0]] = df[target_cols[0]].fillna(method='ffill')
# 删除关键信息(如成员或单位)缺失的无效行
df = df.dropna(subset=[target_cols[2], target_cols[3]])
# 清洗字符串空格
for col in df.select_dtypes(['object']).columns:
df[col] = df[col].str.strip()
return df
Step2 构建交叉分析表(Crosstab),计算不同维度下的频数分布及百分比占比。
def create_cross_analysis(df, index_col='单位', columns_col='奖项'):
"""
构建交叉表并计算各分类维度的获奖/分布比例。
"""
# 生成频数统计交叉表
cross_table = pd.crosstab(df[index_col], df[columns_col])
# 计算占比:各列(奖项)下各行(单位)的分布比例
# div(axis=1) 表示按列求和后进行除法
award_proportions = cross_table.div(cross_table.sum(axis=0), axis=1) * 100
# 技巧:生成带有总计行和占比的汇总表
summary = cross_table.copy()
summary['总计'] = summary.sum(axis=1)
summary.loc['合计'] = summary.sum()
npx skills add OpenSenseNova/SenseNova-Skills --skill pivot-table-analysisHow clear and easy to understand the SKILL.md instructions are, rated from 1 to 5.
Clear and well structured, with only minor parts that might need a second read.
How directly an agent can act on the SKILL.md instructions, rated from 1 to 5.
Mostly actionable with clear steps; only a few small gaps remain.