spell日志聚类算法-笔记

参考资料

Drain3和Spell算法论文学习：https://www.ctyun.cn/developer/article/435428960813125
Spell 基于最长公共子序列的在线日志解析方法: https://blog.csdn.net/qq_41773806/article/details/124499955
原始论文: https://users.cs.utah.edu/~lifeifei/papers/spell.pdf

核心代码

代码位置: Spell实现

constLogMessL = [w for w in logmessageL if w != "<*>"]  # 分割日志为 token list, 去除任意符<*>

# Find an existing matched log cluster
# TODO: 前缀树, 全匹配, 跳过<*>
matchCluster = self.PrefixTreeMatch(rootNode, constLogMessL, 0)  

if matchCluster is None:
    # TODO: 模板token list的长度大于目前的日志token list长度的一半;
    #       去除重复token, 
    #       token in token_set or token == "<*>" for token in logClust.logTemplate
    #       且模板token都在当前日志token list中(<*>可以匹配任意token)
    matchCluster = self.SimpleLoopMatch(logCluL, constLogMessL)  
    
    if matchCluster is None:
        # TODO: 去除重复token;
        #       len(set_seq & set_template) < 0.5 * size_seq
        #       不看顺序的【公共序列】太少就跳过,
        #       且进行最长【公共子序列】匹配(日志和模板)
        matchCluster = self.LCSMatch(logCluL, logmessageL)  

        # Match no existing log cluster
        if matchCluster is None:
            newCluster = LCSObject(logTemplate=logmessageL, logIDL=[logID])
            logCluL.append(newCluster)
            self.addSeqToPrefixTree(rootNode, newCluster)
        # Add the new log message to the existing cluster
        else:
            newTemplate = self.getTemplate(
                self.LCS(logmessageL, matchCluster.logTemplate),
                matchCluster.logTemplate,
            )  # 合并模板
            if " ".join(newTemplate) != " ".join(matchCluster.logTemplate):  # 模板更新
                self.removeSeqFromPrefixTree(rootNode, matchCluster)
                matchCluster.logTemplate = newTemplate
                self.addSeqToPrefixTree(rootNode, matchCluster)
if matchCluster:
    matchCluster.logIDL.append(logID)

posted @ 2024-10-09 18:32 漫漫长夜何时休阅读(118) 评论(0) 编辑收藏举报

刷新页面返回顶部

阁下

spell日志聚类算法-笔记

参考资料

核心代码

公告