spell日志聚类算法-笔记
参考资料
Drain3和Spell算法论文学习:https://www.ctyun.cn/developer/article/435428960813125
Spell 基于最长公共子序列的在线日志解析方法: https://blog.csdn.net/qq_41773806/article/details/124499955
原始论文: https://users.cs.utah.edu/~lifeifei/papers/spell.pdf
核心代码
代码位置: Spell实现
constLogMessL = [w for w in logmessageL if w != "<*>"] # 分割日志为 token list, 去除任意符<*>
# Find an existing matched log cluster
# TODO: 前缀树, 全匹配, 跳过<*>
matchCluster = self.PrefixTreeMatch(rootNode, constLogMessL, 0)
if matchCluster is None:
# TODO: 模板token list的长度大于目前的日志token list长度的一半;
# 去除重复token,
# token in token_set or token == "<*>" for token in logClust.logTemplate
# 且模板token都在当前日志token list中(<*>可以匹配任意token)
matchCluster = self.SimpleLoopMatch(logCluL, constLogMessL)
if matchCluster is None:
# TODO: 去除重复token;
# len(set_seq & set_template) < 0.5 * size_seq
# 不看顺序的【公共序列】太少就跳过,
# 且进行最长【公共子序列】匹配(日志和模板)
matchCluster = self.LCSMatch(logCluL, logmessageL)
# Match no existing log cluster
if matchCluster is None:
newCluster = LCSObject(logTemplate=logmessageL, logIDL=[logID])
logCluL.append(newCluster)
self.addSeqToPrefixTree(rootNode, newCluster)
# Add the new log message to the existing cluster
else:
newTemplate = self.getTemplate(
self.LCS(logmessageL, matchCluster.logTemplate),
matchCluster.logTemplate,
) # 合并模板
if " ".join(newTemplate) != " ".join(matchCluster.logTemplate): # 模板更新
self.removeSeqFromPrefixTree(rootNode, matchCluster)
matchCluster.logTemplate = newTemplate
self.addSeqToPrefixTree(rootNode, matchCluster)
if matchCluster:
matchCluster.logIDL.append(logID)
本文来自博客园,作者:漫漫长夜何时休,转载请注明原文链接:https://www.cnblogs.com/ag-chen/p/18454868