Ask me what skills you need
What are you building?
Tell me what you're working on and I'll find the best agent skills for you.
执行全面的异常值检测与数据质量评估,利用 IQR 方法识别异常值并结合偏度、峰度分析数据分布特征,适用于非正态分布数据的预处理阶段。
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
# 设置中英文字体以支持可视化显示 (SimHei 或 WenQuanYi)
plt.rcParams['font.sans-serif'] = ['SimHei', 'WenQuanYi Zen Hei', 'DejaVu Sans']
plt.rcParams['axes.unicode_minus'] = False
# 加载数据
file_path = 'data.xlsx' # 替换为实际文件路径
df = pd.read_excel(file_path)
# 基础信息检查
print(f"数据形状: {df.shape}")
print(f"数据类型:\n{df.dtypes}")
print(df.head())
# 自动筛选数值型列进行分析
target_cols = df.select_dtypes(include=[np.number]).columns.tolist()
outlier_summary = []
for col in target_cols:
data = df[col].dropna()
if data.empty:
continue
# 四分位距计算 (IQR)
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
# 识别异常值
outliers = data[(data < lower_bound) | (data > upper_bound)]
outlier_summary.append({
'target_col': col,
'outlier_count': len(outliers),
'outlier_ratio': f"{(len(outliers)/len(data)*100):.2f}%",
'lower_limit': lower_bound,
'upper_limit': upper_bound,
npx skills add OpenSenseNova/SenseNova-Skills --skill outlier-detectionHow clear and easy to understand the SKILL.md instructions are, rated from 1 to 5.
Very clear and well structured, with almost no room for misunderstanding.
How directly an agent can act on the SKILL.md instructions, rated from 1 to 5.
Highly actionable with clear, concrete steps that an agent can follow directly.