[hyperscan][pkg-config] hyperscan 从0到1路线图
经过一系列的研究学习,知识储备之后,终于,可以开始研究hyperscan了。
[knowledge][模式匹配] 字符匹配/模式匹配 正则表达式 自动机
[knowledge][perl][pcre][sed] sed / PCRE 语法/正则表达式
[development][PCRE] PCRE
----------------------------------------------------------------------
Now, let‘s hyperscan! 2333333
中文介绍:https://www.sdnlab.com/18773.html
官网: https://01.org/zh/hyperscan
正式入口:https://github.com/intel/hyperscan
开发者手册:http://intel.github.io/hyperscan/dev-reference/preface.html
官方例子程序:在源码的 examples/ 子目录。
其中用到了一个比较特殊的依赖库: Ragel http://www.colm.net/open-source/ragel/
Understanding the formal relationship between regular expressions and deterministic finite automata is key to using Ragel effectively.
理解正则表达式与确定有限状态自动机之间的关系,是用好这个库的关键。
提一个题外的编译系统 Ninja https://ninja-build.org/
编译一下:
┬─[tong@T7:~/Src/thirdparty/github]─[01:49:27 PM] ╰─>$ git clone git@github.com:intel/hyperscan.git ┬─[tong@T7:~/Src/thirdparty/github]─[01:49:45 PM] ╰─>$ cd hyperscan/ ┬─[tong@T7:~/Src/thirdparty/github/hyperscan]─[01:49:48 PM] ╰─>$ mkdir BUILD [root@dpdk hyperscan]# cd BUILD/
编译之前,需要安装boost,或者下载boost的源码。 仅在编译的时候需要boost的头文件而已,并不需要编译boost,也不需要安装boost的运行时库。
┬─[tong@T7:~/Src/thirdparty]─[02:10:39 PM] ╰─>$ wget https://dl.bintray.com/boostorg/release/1.66.0/source/boost_1_66_0.tar.bz2 ┬─[tong@T7:~/Src/thirdparty]─[02:13:40 PM] ╰─>$ tar jxf boost_1_66_0.tar.bz2
还需有安装 Ragel-devel
[root@dpdk BUILD]# yum install ragel-devel
开始编译:
[root@dpdk BUILD]# cmake -DBOOST_ROOT=~/src/thirdparty/boost_1_66_0/ ../ [root@dpdk BUILD]# cmake --build . [root@dpdk BUILD]# make
测试编译是否成功:
[root@dpdk BUILD]# ./bin/unit-hyperscan
读开发手册:http://intel.github.io/hyperscan/dev-reference/
备注:
模糊匹配之
莱文斯坦距离:https://zh.wikipedia.org/wiki/萊文斯坦距離
汉明距离:https://zh.wikipedia.org/wiki/%E6%B1%89%E6%98%8E%E8%B7%9D%E7%A6%BB
API:http://intel.github.io/hyperscan/dev-reference/api_files.html
指定安装路径:
[root@dpdk BUILD]# cmake -DCMAKE_INSTALL_PREFIX=~/src/thirdparty/github/hyperscan/BUILD/DIST/ ../
[root@dpdk BUILD]# make
[root@dpdk BUILD]# make install
安装后的内容:
[root@dpdk DIST]# tree . ├── include │ └── hs │ ├── hs_common.h │ ├── hs_compile.h │ ├── hs.h │ └── hs_runtime.h ├── lib64 │ ├── libhs.a │ ├── libhs_runtime.a │ └── pkgconfig │ └── libhs.pc └── share └── doc └── hyperscan └── examples ├── patbench.cc ├── pcapscan.cc ├── README.md └── simplegrep.c 8 directories, 11 files
例子: share/doc/hyperscan/examples/simplegrep.c
编译:
gcc -o simplegrep simplegrep.c $(pkg-config --cflags --libs libhs)
引申内容:pkt-config
介绍:https://zh.wikipedia.org/wiki/Pkg-config
主页:https://www.freedesktop.org/wiki/Software/pkg-config/
On most systems, pkg-config looks in /usr/lib/pkgconfig, /usr/share/pkgconfig, /usr/local/lib/pkg‐ config and /usr/local/share/pkgconfig for these files. It will additionally look in the colon-separated (on Windows, semicolon-separated) list of directories specified by the PKG_CONFIG_PATH environment variable.
默认情况下,pkg-config会到以下四个目录中查找,/usr/lib/pkgconfig, /usr/share/pkgconfig, /usr/local/lib/pkg‐config and /usr/local/share/pkgconfig
之后,还会查找环境变量PKG_CONFIG_PATH 中的路径。
因为,我的hyperscan 没有安装在标准路径下,所以:
[tong@T7 pkgconfig]$ export PKG_CONFIG_PATH=~/Src/thirdparty/github/hyperscan/BUILD/DIST/lib64/pkgconfig [tong@T7 pkgconfig]$ pkg-config --cflags --libs libhs -I/root/src/thirdparty/github/hyperscan/BUILD/DIST/include/hs -L/root/src/thirdparty/github/hyperscan/BUILD/DIST/lib -lhs [tong@T7 pkgconfig]$
增加一个软链接:
[root@dpdk DIST]# ln -rfs lib64/ lib [root@dpdk DIST]# ll total 4 drwxr-xr-x 1 root root 4 Jan 29 14:33 include lrwxrwxrwx 1 root root 5 Jan 30 16:08 lib -> lib64 drwxr-xr-x 1 root root 62 Jan 29 14:32 lib64 drwxr-xr-x 1 root root 6 Jan 29 14:32 share [root@dpdk DIST]#
编译:
[root@dpdk examples]# cat build.sh #! /usr/bin/bash export PKG_CONFIG_PATH=/root/src/thirdparty/github/hyperscan/BUILD/DIST/lib64/pkgconfig g++ -o simplegrep simplegrep.c $(pkg-config --cflags --libs libhs)
测试运行:
[root@dpdk examples]# ./simplegrep init simplegrep.c Scanning 8040 bytes with Hyperscan [root@dpdk examples]#
块内容模式匹配的例子:
https://github.com/tony-caotong/knickknack/tree/master/examples/hyperscan
效果如下:
[root@dpdk hyperscan]# ./test Usage: ./test <pattern> <string> [root@dpdk hyperscan]# ./test 1234 dafkj1234dlkfajf id: 0, matched position: 9 [root@dpdk hyperscan]# ./test 1234 dafkj1234dl1234kfajf id: 0, matched position: 9 id: 0, matched position: 15 [root@dpdk hyperscan]#
查看例子源码,修改编译,可以做多模式匹配:
默认的两个pattern为 1234, 5678
效果如下:
[root@dpdk hyperscan]# ./test "12345678" id: 1, matched position: 4 id: 2, matched position: 8 [root@dpdk hyperscan]# ./test "dfasfdsaf1234gadgdgdahg5678" id: 1, matched position: 13 id: 2, matched position: 27 [root@dpdk hyperscan]# ./test "dfasfdsaf1234gadgdgdahg5678dfsag" id: 1, matched position: 13 id: 2, matched position: 27 [root@dpdk hyperscan]#
修改例子中的宏,可以开启流匹配模式
默认的两个pattern为1234,5678. 运行时的俩个参数可以组合成一个流。
效果如下:
[root@dpdk hyperscan]# ./test "dfasfdsaf1234gad" "gdgdahg5678dfsag" id: 1, matched position: 13 id: 2, matched position: 27 [root@dpdk hyperscan]# ./test "dfasfdsaf123" "4gadgdgdahg5678dfsag" id: 1, matched position: 13 id: 2, matched position: 27 [root@dpdk hyperscan]#
continue:
[hyperscan] hyperscan 1到1.5 --!!