openGauss源码解析（159）

openGauss源码解析：AI技术（6）

3. benchmark模块解析

benchmark的驱动脚本存放路径为X-Tuner的benchmark子目录。X-Tuner自带常用的benchmark驱动脚本，例如TPC-C、TPC-H等。X-Tuner通过调用benchmark/__init__.py文件中的get_benchmark_instance函数来加载不同的benchmark驱动脚本，获取benchmark驱动实例。其中，benchmark驱动脚本的格式如表8-3所示。

表8-3 benchmark驱动脚本的格式说明

脚本格式	说明
驱动脚本文件名	表示benchmark的名字，该名字用于表示驱动脚本的唯一性，可通过在X-Tuner的配置文件中的配置项benchmark_script来指定选择加载哪个benchmark驱动脚本
驱动脚本内容三要素	path变量、cmd变量以及run函数

benchmark目录中的template.py文件是benchmark驱动脚本的模板，在该目录中，有TPC-C、TPC-H等预先写好的示例，都是基于该模板实现的。该模板定义了benchmark驱动脚本的基本结构，每个benchmark驱动脚本对调优程序来说，都可以认为是一个黑盒，只需要明确输入、输出的格式即可。下面来看看template.py中定义的benchmark驱动脚本格式。

# 提示：你需要先把数据导入到数据库中

# 调优程序会自动调用下述函数，该函数返回结果值就是benchmark测试结果

# path 变量表示实际benchmark所在路径

path = ''

# cmd变量定义了使用什么shell命令才可以启动benchmark

cmd = ''

# 函数定义了远端和本地的命令行接口，通过exec_command_sync()方法执行shell命令

def run(remote_server, local_host) -> float:

return 0

下面给出几个具体的例子，tpcc.py中给出的TPC-C测试脚本的例子如下：

from tuner.exceptions import ExecutionError

# 提示：你需要自己下载benchmark-sql测试工具，同时使用openGauss的JDBC驱动替换PostgreSQL目录下的JDBC驱动文件。你需要自己配置好TPC-C测试配置

# 测试程序通过下述命令运行

path = '/path/to/benchmarksql/run' # TPC-C测试脚本benchmark-sql的存放路径

cmd = "./runBenchmark.sh props.gs" # 自定义一个名为props.gs的benchmark-sql测试配置文件

def run(remote_server,local_host):

# 切换到TPC-C脚本目录下，清除历史错误日志，然后运行测试命令

# 此处最好等待几秒钟，因为benchmark-sql测试脚本生成最终测试报告是通过一个shell脚本实现的，整个过程会有延迟

# 为了保证能够获取到最终的tpmC数值报告，这里选择等待3秒钟

stdout, stderr = remote_server.exec_command_sync(['cd %s' % path, 'rm -rf benchmarksql-error.log', cmd, 'sleep 3'])

# 如果标准错误流中有数据，则报异常退出

if len(stderr) > 0:

raise ExecutionError(stderr)

# 寻找最终tpmC结果

tpmC = None

split_string = stdout.split() # 对标准输出流结果进行分词

for i, st in enumerate(split_string):

# 在benchmark-sql 5.0中，tpmC最终测试结果数值在‘(NewOrders)’关键字的后两位，正常情况下，找到该字段后直接返回即可

if "(NewOrders)" in st:

tpmC = split_string[i + 2]

break

stdout, stderr = remote_server.exec_command_sync(

"cat %s/benchmarksql-error.log" % path)

nb_err = stdout.count("ERROR:") # 判断整个benchmark运行过程中，是否有报错，记录报错的错误数

return float(tpmC) - 10 * nb_err # 这里将报错的错误数作为一个惩罚项，惩罚系数为10，越高的惩罚系数表示越看中报错的数量

其中，TPC-C配置文件props.gs的关键内容如下：

db=opengauss

driver=org.postgresql.Driver

// 配置连接信息

conn=jdbc:postgresql://192.168.1.100:5678/tpcc

…

// 定义数据量

warehouses=1

loadWorkers=4

// 定义并发量

terminals=100

//To run specified transactions per terminal- runMins must equal zero

runTxnsPerTerminal=10

//To run for specified minutes- runTxnsPerTerminal must equal zero

runMins=0

//Number of total transactions per minute

limitTxnsPerMin=300

…

有关TPC-C的测试脚本benchmark-sql的使用，网上公开的教程和资料非常多，此处不再赘述。openGauss的JDBC驱动可以在官方网站上进行下载，下载地址为：https://opengauss.org/zh/download.html。

下面再看一下TPC-H的例子。

import time

from tuner.exceptions import ExecutionError

# 提示：你需要先自行导入数据，然后准备sql测试文件

# 下述程序会自动采集整体运行时延

path = '/path/to/tpch/queries' # 存放TPC-H测试用的SQL脚本目录

cmd = "gsql -U {user} -W {password} -d {db} -p {port} -f {file}" # 需要运行TPC-H测试脚本的命令，一般使用'gsql -f 脚本文件'来运行

# 需要指出的是，由于可能会通过gsql连接数据库，因此可能会需要用户名、密码等信息，可以通过上述占位符，如{user}、{password}等进行占位，X-Tuner会自行渲染

def run(remote_server, local_host):

…

# 代价为全部测试用例的执行总时长

cost = time.time() - time_start

# 取相反数，适配run 函数的定义：返回结果越大表示性能越好。

return - cost

TPC-H脚本的全局变量cmd中存在占位符{user}、{password}等，这些会通过X-Tuner进行渲染，相关代码存在于benchmark/__init__.py中，如下所示：

def get_benchmark_instance(script, path, cmd, db_info):

…

# 验证benchmark 脚本有效性，如果没有指定path与cmd变量，会抛出异常

if (not getattr(bm, 'path', False)) or (not getattr(bm, 'cmd', False)) or (not getattr(bm, 'run', False)):

raise ConfigureError('The benchmark script %s is invalid. '

'For details, see the example template and description document.' % script)

# 检查run函数是否存在，且参数数量为2，即本地和远程两个shell接口

check_run_assertion = isinstance(bm.run, types.FunctionType) and bm.run.__code__.co_argcount == 2

if not check_run_assertion:

raise ConfigureError('The run function in the benchmark instance is not correctly defined. '

'Redefine the function by referring to the examples.')

# cmd与path变量，优先使用配置文件中的配置项，如果没有对应的配置项，则默认使用脚本中的内容

if path.strip() != '':

bm.path = path

if cmd.strip() != '':

bm.cmd = cmd

# 渲染cmd命令中的占位符

bm.cmd = bm.cmd.replace('{host}', db_info['host']) \

.replace('{port}', str(db_info['port'])) \

.replace('{user}', db_info['db_user']) \

.replace('{password}', db_info['db_user_pwd']) \

.replace('{db}', db_info['db_name'])

# 将数据库宿主机的shell接口包装起来

def wrapper(server_ssh):

return bm.run(server_ssh, local_ssh)

return wrapper

posted @ 2024-04-30 11:18 openGauss-bot 阅读(1) 评论(0) 编辑收藏举报

刷新页面返回顶部

openGauss-bot

openGauss源码解析（159）

openGauss源码解析：AI技术（6）

3. benchmark模块解析

公告