clickhouse支持UDF|通过SQL以及配置文件创建自定义函数

一、用户通过SQL创建函数

　　从 lambda 表达式创建用户定义的函数。表达式必须由函数参数、常量、运算符或其他函数调用组成。

　　句法：

CREATE FUNCTION name AS (parameter0, ...) -> expression
--删除函数
DROP FUNCTION [IF EXISTS] function_name

　　一个函数可以有任意数量的参数。

　　有一些限制：

　　1）函数的名称在用户定义函数和系统函数中必须是唯一的。

　　2）不允许使用递归函数。

　　3）函数使用的所有变量都必须在其参数列表中指定。

　　如果违反任何限制，则会引发异常。

　　例子：

CREATE FUNCTION linear_equation AS (x, k, b) -> k*x + b;
SELECT number, linear_equation(number, 2, 1) FROM numbers(3);

　　结果：

┌─number─┬─plus(multiply(2, number), 1)─┐
│      0 │                            1 │
│      1 │                            3 │
│      2 │                            5 │
└────────┴──────────────────────────────┘

　　在以下查询中的用户定义函数中调用条件函数：

CREATE FUNCTION parity_str AS (n) -> if(n % 2, 'odd', 'even');
SELECT number, parity_str(number) FROM numbers(3);

　　结果：

┌─number─┬─if(modulo(number, 2), 'odd', 'even')─┐
│      0 │ even                                 │
│      1 │ odd                                  │
│      2 │ even                                 │
└────────┴──────────────────────────────────────┘

示例

CREATE FUNCTION linear_equation AS (x, k, b) -> k*x + b;
SELECT number, linear_equation(number, 2, 1) FROM numbers(3);

SELECT
    number,
    linear_equation(number, 2, 1)
FROM numbers(3)

Query id: 9a4a2978-b186-4bc2-ac0c-86daf0328212

┌─number─┬─plus(multiply(2, number), 1)─┐
│      0 │                            1 │
│      1 │                            3 │
│      2 │                            5 │
└────────┴──────────────────────────────┘

3 rows in set. Elapsed: 0.002 sec. 

CREATE FUNCTION parity_str AS (n) -> if(n % 2, 'odd', 'even');
SELECT number, parity_str(number) FROM numbers(3);

SELECT
    number,
    parity_str(number)
FROM numbers(3)

Query id: 59a97a32-15c4-4417-8444-51cb00a01ac0

┌─number─┬─if(modulo(number, 2), 'odd', 'even')─┐
│      0 │ even                                 │
│      1 │ odd                                  │
│      2 │ even                                 │
└────────┴──────────────────────────────────────┘

3 rows in set. Elapsed: 0.002 sec.

二、用户通过配置文件定义函数

　　ClickHouse 可以调用任何外部可执行程序或脚本来处理数据。在配置文件中描述这些功能，并将该文件的路径添加到user_defined_executable_functions_config设置中的主要配置中。如果路径中使用了通配符*，则加载与该模式匹配的所有文件。例子：

<user_defined_executable_functions_config>*_function.xml</user_defined_executable_functions_config>

　　相对于设置中指定的路径搜索用户定义的功能配置user_files_path。

　　功能配置包含以下设置：

　　1）name- 函数名称。

　　2）commandexecute_direct- 如果为假，则执行或命令的脚本名称。

　　3）argument- 带有type, 和可选参数的参数描述name。每个参数都在单独的设置中进行描述。如果参数名称是用户定义的函数格式（如Native或JSONEachRow ）的序列化的一部分，则必须指定名称。默认参数名称值为c+ argument_number。

　　4）format-将参数传递给命令的格式。

　　5）return_type- 返回值的类型。

　　6）return_name- 返回值的名称。如果返回名称是用户定义函数格式（如Native或JSONEachRow）的序列化的一部分，则需要指定返回名称。可选的。默认值为result。

　　7）type- 可执行类型。如果type设置为executable则启动单个命令。如果设置为executable_pool，则创建命令池。

　　8）max_command_execution_time- 处理数据块的最大执行时间（以秒为单位）。此设置仅对executable_pool命令有效。可选的。默认值为10。

　　9）command_termination_timeout- 关闭管道后命令应该完成的时间（以秒为单位）。之后时间SIGTERM被发送到执行命令的进程。可选的。默认值为10。

　　10）command_read_timeout- 从命令 stdout 读取数据的超时时间（以毫秒为单位）。默认值 10000。可选参数。

　　11）command_write_timeout- 以毫秒为单位将数据写入命令标准输入的超时。默认值 10000。可选参数。

　　12）pool_size- 命令池的大小。可选的。默认值为16。

　　13）send_chunk_header- 控制是否在发送要处理的数据块之前发送行数。可选的。默认值为false。

　　14）execute_direct- 如果execute_direct= 1，则将command在 user_scripts 文件夹中搜索。可以使用空格分隔符指定其他脚本参数。示例：script_name arg1 arg2。如果execute_direct= 0,command作为的参数传递bin/sh -c。默认值为1。可选参数。

　　15）lifetime- 以秒为单位的函数的重新加载间隔。如果设置为，0则不会重新加载该函数。默认值为0。可选参数。

　　该命令必须从中读取参数STDIN并将结果输出到STDOUT. 该命令必须迭代地处理参数。也就是说，在处理了一大块参数之后，它必须等待下一个块。

　　示例test_function使用 XML 配置创建。文件 test_function.xml(默认execute_direct= 1的情况下)。

<functions>
    <function>
        <type>executable</type>
        <name>test_function_python</name>
        <return_type>String</return_type>
        <argument>
            <type>UInt64</type>
            <name>value</name>
        </argument>
        <format>TabSeparated</format>
        <command>test_function.py</command>
    </function>
</functions>

　　user_scripts文件夹内的脚本文件test_function.py。

#!/usr/bin/python3

import sys

if __name__ == '__main__':
    for line in sys.stdin:
        print("Value " + line, end='')
        sys.stdout.flush()

　　查询：

SELECT test_function_python(toUInt64(2));

　　结果：

┌─test_function_python(2)─┐
│ Value 2                 │
└─────────────────────────┘

　　test_function_sum手动创建指定execute_direct=0使用XML 配置。文件 test_function.xml。

<functions>
    <function>
        <type>executable</type>
        <name>test_function_sum</name>
        <return_type>UInt64</return_type>
        <argument>
            <type>UInt64</type>
            <name>lhs</name>
        </argument>
        <argument>
            <type>UInt64</type>
            <name>rhs</name>
        </argument>
        <format>TabSeparated</format>
        <command>cd /; clickhouse-local --input-format TabSeparated --output-format TabSeparated --structure 'x UInt64, y UInt64' --query "SELECT x + y FROM table"</command>
        <execute_direct>0</execute_direct>
    </function>
</functions>

　　查询：

SELECT test_function_sum(2, 2);

　　结果：

┌─test_function_sum(2, 2)─┐
│                       4 │
└─────────────────────────┘

　　使用 XML 配置使用test_function_sum_json命名参数和格式JSONEachRow创建。文件 test_function.xml。

<function>
    <type>executable</type>
    <name>test_function_sum_json</name>
    <return_type>UInt64</return_type>
    <return_name>result_name</return_name>
    <argument>
        <type>UInt64</type>
        <name>argument_1</name>
    </argument>
    <argument>
        <type>UInt64</type>
        <name>argument_2</name>
    </argument>
    <format>JSONEachRow</format>
    <command>test_function_sum_json.py</command>
</function>

　　user_scripts文件夹内的脚本文件test_function_sum_json.py。

#!/usr/bin/python3

import sys
import json

if __name__ == '__main__':
    for line in sys.stdin:
        value = json.loads(line)
        first_arg = int(value['argument_1'])
        second_arg = int(value['argument_2'])
        result = {'result_name': first_arg + second_arg}
        print(json.dumps(result), end='\n')
        sys.stdout.flush()

　　查询：

SELECT test_function_python(toUInt64(2));

　　结果：

┌─test_function_sum_json(2, 2)─┐
│                            4 │
└──────────────────────────────┘

posted @ 2022-02-28 14:21 渐逝的星光阅读(4180) 评论(0) 编辑收藏举报

刷新页面返回顶部

渐逝的星光

云卷云舒风入怀，潮涨潮落月洗尘。

clickhouse支持UDF|通过SQL以及配置文件创建自定义函数

一、用户通过SQL创建函数

示例

二、用户通过配置文件定义函数

公告