Smoothed Z-score Algorithm 数据异常点算法

Smoothed Z-score Algorithm

简介

时序异常检测,可以检测实时时间序列数据中的峰值信号并且平滑数据数据的一种算法,说人话就是找出数据的异常突变点并且同时平滑曲线,线性O(n)复杂度

原理

利用数据均值(mean)和标准差(std)来判断数据是否为异常值

算法具体实现:使用一个长度为lag的滑动窗口,在这个窗口中计算窗口中的均值filter_avg和标准差filter_std,然后根据参数threshold将数据和平均值、方差做比较,最后利用influence平滑曲线(influence越大,原数据影响越大,曲线越不平滑)

代码实现

# Python3
# Created by Santiego
def smooth_data_and_find_peak(data_raw, lag, threshold, influence=0.5):
    # Smoothed Z-Score Algorithm
    res_peak = []
    res_data_smoothed = data_raw
    filter_avg = np.zeros(256)
    filter_std = np.zeros(256)
    filter_avg[lag - 1] = np.mean(data_raw[0: lag])
    filter_std[lag - 1] = np.std(data_raw[0: lag])
    for i in range(lag, 255):
        if abs(data_raw[i] - filter_avg[i - 1]) > threshold * filter_std[i - 1]:
            if data_raw[i] > filter_avg[i-1]:
                res_peak.append(i)
            res_data_smoothed[i] = influence * data_raw[i] + (1-influence) * res_data_smoothed[i-1]
            filter_avg[i] = np.mean(res_data_smoothed[(i - lag): i])
            filter_std[i] = np.std(res_data_smoothed[(i - lag): i])
        else:
            res_data_smoothed[i] = data_raw[i]
            filter_avg[i] = np.mean(res_data_smoothed[(i - lag):i])
            filter_std[i] = np.std(res_data_smoothed[(i - lag):i])

    return res_peak, res_data_smoothed

注意

需要调参,而且算法只是为了寻找数据异常突变点,不能寻找大趋势,也就是说不能很好的寻找到数据的峰值(理论和实际测试)

posted @ 2022-01-24 15:56  Santiego  阅读(845)  评论(2编辑  收藏  举报