what is reverse engineering
1、逆向工程的定义
Reverse engineering is the “reverse progression” implementation of forward engineering. In
order to understand the activities of reverse engineering, we should first understand forward
engineering. Figure 1 shows the phases of the ‘waterfall’ model, for forward engineering,
which is a typical process used to the construction of legacy systems. Generally in forward
engineering, a program is composed through the process of requirement analysis, architecture
design, system design, and system implementation. During the process of system development,
design information that is not fully supported by a specific programming language is lost. The
primary difficulty in maintenance and evolution of legacy systems is to recover the lost
information [53].
Reverse engineering (RE) is an essential part of the
maintenance and evolution process of complex software
systems
以上这句话说白了,逆向工程就是我们从已经被处理过被优化的代码中或者数据中找出里面的潜在的规律,然后发现此软件作者本身的意图,让我们更好的去理解这个软件
一些其他名词解释:
refined
Forward Engineering : The traditional process of moving from high-level abstractions and logical implementation -independent designs to the physical mplementation of a system.
Redocumentation : The creation or revision of a semantically equivalent representation within the same relative abstraction level. The resulting forms of representation are usually considered alternative views (for example, data flow,data structure, and control flow) that are intended for a technical audience .
Design recovery : A subset of reverse engineering in which domain knowledge ,external information, and deduction or fuzzy reasoning are added to the observation of the subject system to identify useful high-level abstractions beyond those obtained directly by examining the system.
Re-engineering : 工程再造, Also known as both renovation and reclamation, re-engineering refers to the examination and alteration of a system to reconstitute it in a new form ,and the subsequent implementation of the new form. It generally includes a form of reverse engineering (to achieve a more bstract description), followed by a form
of forward engineering or restructuring .
Program understanding : 程序理解 The task of recapturing the abstract design of a system, in part or in full, from its source code.
cluster based technology architecture extraction
2、逆向工程的本质
结构化再文档是对软件架构方面的逆向工程。结构化再文档首先要做的第一件事情就是设计模式探测(Design PaRernsDetection)
3、逆向工程对我们有什么好处
3.1 对于大型的开发团队,比如说100多个人的团队,这时候每个人的生产效率是巨大的,同时在一个项目上进行代码修改,你可想而知会是一个什么结果?那就是过了1个月后可能代码增加了几千个文件或者说以前的代码逻辑已经面目全非,那么一旦出现了一个bug,这时候就要花费巨大的人力物力去跟踪,而对于项目中的其他成员来说,A成员增加的代码比如说要花费B成员2小时去阅读,那么整个项目100个人的话就会有100*2=200小时的时间浪费,想想是非常可怕的事情,最终的后果就是将会话费整个项目非常大的时间
3.2 能够检查你的逻辑架构是不是合理。 一个容易被逆向工程工具进行识别的代码逻辑框架必然是我们已经经过验证的、真实可靠的架构,而如果你写的代码逆向工程工具识别不出来或者说识别为乱七八糟的话,有两种情况:要么你的架构未加入到逆向工具的识别模式中,要么就是你的代码写的比较糟糕了
3.3 对于新接触该项目的成员来说真是一大福音。我们经常会碰到项目中会有人离职或者有新的项目成员进行加入。而每次的变动都会带来花在代码阅读上的巨额时间,这些时间是非常不必要的
3.4 提升团队的软件生成力 Productivity improvements are to be accomplished through design recovery techniques, enhanced product query technologies,and software (source) translation
4、 逆向工程的方法论
4.1 从源代码中提取架构遵循两个大的方向, 一个就是 自底而上的方式(clustered based techniques),一个是自上而下的方式(pattern-based techniques)
4.2 对于源码中提取架构的细节, 包括以下几个细节点(这里使用的是clustered based techniques方式)
slicing-> clustering-> database or user interface migration-> objectication-> architecture recovery-> metrics gathering-> business rule extraction
下面我来一一解释各个流程
4.2.1 slicing
slicing的主要内容就是design recovery和algorithm extraction
设计发现只是利用ast的技术将 零零散散的对象和流程从代码中进行提取出来,真正的高层抽象是有pattern abstract来进行完成的
设计发现主要包括
语法分析、语义分析形成ast(抽象语义树),由此发现 -> 类对象、数据结构 类图
-> 控制流程 序列图
-> 每个类里面的子流程 活动图
- pattern extraction