【特征】图的故事

19-Malware-Attributed CFG

　　To strike a balance between generality and performance:

control flow graphs (CFGs)
deep graph convolutional neural network(DGCNN)

　　从PC Malware汇编代码中提取CFG，用CFG+GCNN。本文的关键在于CFG节点的向量化，统计CFG节点中的代码属性概括为数值，用数值向量表示节点，由CFG变为Attributed CFG(ACFG)。

“Although other low-level representations such as hexadecimal byte sequences have similar properties, a CFG explicitly expresses the execution logic of a program using a graph data structure. Hence, the semantics of a malware program is embodied by not only the characteristics of the code in individual basic blocks but also their structural dependencies defined by the edges connecting these basic blocks.”(Yan 等, 2019, p. 53)

Idea

汇编代码转换控制流图(CFG)，CFG中一个节点为一个汇编代码块
CFG转换Attributed CFG(ACFG)，即节点向量化。
ACFG用GCNN处理

　　关键点在于节点如何向量化：统计代码块中的代码属性概括为数值以表示节点。见下表Table 1，统计代码块中的数值常量个数、转移指令、Call指令、Mov指令条数等，得到一个向量，用该向量表示节点。

　　‍

Discussion

　　‍

Reference

　　Yan, Jiaqi, Guanhua Yan, and Dong Jin. "Classifying malware represented as control flow graphs using deep graph convolutional neural network." 2019 49th annual IEEE/IFIP international conference on dependable systems and networks (DSN). IEEE, 2019. (CCF-B)

　　‍

22-Android Malware-Contrastive Learning

Idea

　　在安卓恶意软件检测中使用对比学习对抗混淆问题。

先获取函数调用图，再用中心性分析(centrality analysis)，将调用图转换为图像。
用对比学习最大化正样本的相似性，最小化负样本的相似性，减小代码混淆引入的差异，扩大不同类别恶意程序的差异，让混淆后的程序也能正确分类。
混淆后的样本视为原始样本的正样本之一。

　　对比学习后的可视化和解释性问题：

获取热力图：“Gradient-weighted Class Activation Mapping++ (Grad-CAM++)”

“GradCAM++ is a class-discriminative localization technique that generates visual explanations for any CNN-based network without changing the architecture or retraining.”(Wu 等, 2022, p. 2)

　　‍

框架

对Android Malware静态分析得到函数调用图，进入Image Generation阶段。

　　Image Generation：

下图为426个敏感API（先验），是由三个API集合并得到。
社交网络的分析方法：对于每个API调用，用四种中心性计算方法，计算节点在网络（函数调用图）中的重要性。即对一个API调用（一行），计算得到四列（四种中心性），最后得到426*4的向量。若API未在程序中出现过，填0.
对426*4的向量在末尾添加60个0，即填充15行，reshape为42*42的尺寸。
为什么是4242：因为是4的整数倍，用4264填充减少。

　　Contrastive Learning for Classification:

　　有监督对比学习用于分类：

　　有监督对比学习，比对常见分类模型有两个主要改动：

数据增强
损失函数

　　此处正样本对不是数据增强的结果，属于同一类的样本为正样本。在一个batch内有许多正样本和许多负样本。

　　其实就是改个损失函数...

　　‍

Discussion

用社交网络的中心性分析方法不错。
对比学习约等于改个损失函数，涨点的方法，并没有那么高大上。
可扩展性一般，426敏感API 这个数字属于先验，如果新增加API需要进行较大改动。
差评：不开源

　　‍

Reference

　　Wu, Yueming, et al. "Contrastive Learning for Robust Android Malware Familial Classification." IEEE Transactions on Dependable and Secure Computing (2022).

　　‍

22-Code Vulnerability-VulCNN

Idea

　　Input:

source code

　　Graph Extraction:

提取程序依赖图(PDG)

　　Sentence Embedding：

用Sentence2vec方法，对每个节点都embedding得到节点向量。

　　Image Generation:

计算节点的度和度中心性
每种度中心性得到一张特征图，3种度中心性得到一个RGB图像
再用CNN

　　其中需要考虑：

节点的个数问题，即每个函数的代码行数，超过一个阈值要裁剪，小于一个阈值要填充。

　　‍

Discussion

CNN需要输入固定尺度的图像，故而仍需要限制网络中的节点个数。
可以取和sentence embedding维度相同的节点个数，得到一个矩形图像，结合SPPnet，可进行多尺度处理。
好评：开源，https://github.com/CGCL-codes/VulCNN

　　‍

Reference

　　Y. Wu, D. Zou, S. Dou, W. Yang, D. Xu and H. Jin, "VulCNN: An Image-inspired Scalable Vulnerability Detection System," 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE), Pittsburgh, PA, USA, 2022, pp. 2365-2376, doi: 10.1145/3510003.3510229.

posted @ 2023-03-22 21:53 巴啦啦胖魔仙阅读(115) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· 【特征】字节序列

· 【特征】操作码序列

· angr原理与实践（二）—— 各类图的生成（CFG CG ACFG DDG等）

· Proj CMI Paper Reading: Vulnerability Detection in C/C++ Source Code With Graph Representation Learning

· Stanford_CS224W----Machine learning with graph

阅读排行：
· 25岁的心里话
· 闲置电脑爆改个人服务器（超详细） #公网映射 #Vmware虚拟网络编辑器
· 零经验选手，Compose 一天开发一款小游戏！
· 通过 API 将Deepseek响应流式内容输出到前端
· 因为Apifox不支持离线，我果断选择了Apipost！

公告

昵称：巴啦啦胖魔仙
园龄： 4年1个月
粉丝： 0
关注： 0

+加关注

2025年3月

日

一

二

三

四

五

六

巴啦啦胖魔仙

天道崩塌，我陈平安唯有一剑，可富强、民主、文明、和谐、自由、平等、公正、法治，爱国、敬业、诚信、友善

【特征】图的故事

【特征】图的故事

19-Malware-Attributed CFG

Idea

Discussion

Reference

22-Android Malware-Contrastive Learning

Idea

框架

Discussion

Reference

22-Code Vulnerability-VulCNN

Idea

Discussion

Reference

公告

我的标签

随笔分类

随笔档案