java mysql自定义函数UDF之调用c函数

  正如sqlite可以定义自定义函数,它是通过API定义c函数的,不像其他,如这里的mysql。sqlite提供原生接口就可以方便的调用其他语言的方法,同样的mysql也支持调用其它语言的方法。

     google "mysql call c function"发现一片文章 MySQL User Defined Functions 

This tutorial explains what an User Defined Function (UDF) is, what it does and why/when they are useful.

1. What is an User Defined Function?

Basically an User Defined Function (UDF) is a piece code that extends the functionality of a MySQL server by adding a new function that behaves just like a native (built-in) MySQL function like abs() or concat(). UDFs are written in C (or C++ if you really need to)! ... well maybe there is a way to write them in BASIC, .NET or whatever but I don't see why anybody would want to do that.

2. Why/When are UDFs useful?

As implied by the name UDFs are useful when you need to extend the functionality of your MySQL server. This little table should make it clear which method is best for a given situation:

Method

Speed

Language

Development

Stored Procedures

slow

SQL

~minutes (for small functions)

UDF

fast

C

~hour

Native Function

fast

C

major pain in the ***

And by "slow" I mean: "slower than the others"! Stored Procedures are still much faster then normal SQL statements!

A little explanation on native functions: The code you have to write here is essentially the same as the one for an UDF. BUT you have to write it in the MySQL source code and recompile the whole thing. This will (believe me) be a lot of work because you have to do it again and again with every new version of MySQL.

3. How to use UDFs?

This part is really easy. When you have your UDF finished you just use it like every other native function. For example : "SELECT MyFunction(data1, data2) FROM table"

4. Writing the UDF

Now let's get started on writing our first UDF in steps:

  1. Create a new shared-library project (in the example I used VC++ 6.0 with a standard DLL)

  2. First we need some headers. These headers are either standard library headers or from the MySQL Server's include directory

  3. Now we have to decide what kind of function we want. There are essentially two choices to be made:

    • is the function an aggregate function or not? (we will learn more about aggregate functions later)

    • which type of return value should the function return? Here we have 4 options:

      Type

      Description

      STRING

      A string literal. Translates to char* in C

      INTEGER

      A normal integer. Translates to a 64 bit integer in C

      REAL

      A floating point number. Translates to double in C

      DECIAML

      This one isn't really finished at this time. MySQL treats it as STRING

  4. Let's talk about non-aggregate functions first. Now we have to declare and implement some functions the MySQL server needs to use our UDF. But first some structs we'll need for that:

    • UDF_INIT:

      Type

      Name

      Description

      <code>

      my_bool

      maybe_null

      1 if function can return NULL

      unsigned int

      decimals

      for REAL functions

      unsigned long

      max_length

      For string functions

      char *

      ptr

      free pointer for function data

      my_bool

      const_item

      0 if result is independent of argument

    • UDF_ARGS:

      Type

      Name

      Description

      unsigned int

      arg_count

      Number of argument

      enum Item_result *

      arg_type

      Array containing the types of the arguments

      char **

      args

      Array of pointer to the arguments

      unsigned long *

      lengths

      Array of the argument's lengths (only needed for strings)

      char *

      maybe_null

      Array of "maybe_null" flags (1 if argument maybe null)

      char **

      attributes

      Array of pointers to the arguments' attributes (see chapter x for details)

      unsigned long *

      attribute_lengths

      Array of attributes lengths

    now let's take a look at the functions:

    De-/Initialization:

    extern "C" my_bool MyTest_init(UDF_INIT *initid, UDF_ARGS *args, 
                                   char *message)
    {
            // The most important thing to do here is setting up the memory
            // you need...
            // Lets say we need a lonlong type variable to keep a checksum
            // Although we do not need one in this case
            longlong* i = new longlong; // create the variable
            *i = 0;                     // set it to a value
            
            // store it as a char pointer in the pointer variable
            // Make sure that you don`t run in typecasting troubles later!!
            initid->ptr = (char*)i;
            
            // check the arguments format
            if (args->arg_count != 1)
            {
                strcpy(message,"MyTest() requires one arguments");
                return 1;
            }
    
            if (args->arg_type[0] != INT_RESULT)
            {
                strcpy(message,"MyTest() requires an integer");
                return 1;
            }       
            return 0;            
    }
    
    extern "C" void MyTest_deinit(UDF_INIT *initid)
    {
            // Here you have to free the memory you allocated in the 
            // initialization function
            delete (longlong*)initid->ptr;
    }

     

    The actual function:

    extern "C" longlong MyTest(UDF_INIT *initid, UDF_ARGS *args,
                               char *is_null, char *error)
    {
            /* So finally this is the part were we do the real work. This 
            function is called for every record and the current value(s) 
            or better pointers to the current values are stroed in the 
            UDF_ARGS variable. We have to get the values, do our calculation 
            and return the result. NOTE: You can access the memory 
            allocated in MyTest_init through the UDF_INIT variable.
            In this example we will just add 5 to every value...*/
            return *((longlong*)args->args[0])+5;
    }

     

  5. All done! Now we have to compile the library and copy it to a directory where our OS can find it. On Windows that would be anywhere the PATH System variable says. Personally I use the MySQL servers bin directory. You have to make sure that the library is in one of those directories otherwise MySQL can't use it! And also make sure to export all the functions MySQL needs!

  6. And at last we have to tell MySQL about it. This is really straightforward: Just execute the following SQL command:

    CREATE [AGGREGATE] FUNCTION MyTest 
    RETURNS [INTEGER|STRING|REAL|DECIMAL] SONAME the_libraries_exact_name
    Now you can use it like any other function.

     

5. Aggregate functions

Now some words to aggregate functions. When your UDF is an aggregate function you have to add some more functions and some functions are used in a different way. The calling sequence is:

  1. Call MyTest_init to allocate memory (just like a normal UDF)
  2. MySQL sorts the table according to the GROUP BY statement
  3. Call MyTest_clear for the first row in each group
  4. Call MyTest_add for each row that belongs to the same group
  5. Call MyTest to get the result when the group changes or the last row has been processed
  6. Repeat 3 to 5 until all rows have been processed
  7. Call MyTest_deinit to free any used memory

Now let's look at the new functions needed for the aggregate function. In this example we'll simply add up all the values. (like the native SUM function)

void MyTest_clear(UDF_INIT *initid, char *is_null, char *error)
{
        /* The clear function resets the sum to 0 for each new group
        Of course you have to allocate a longlong variable in the init 
        function and assign it to the pointer as seen above */
        *((longlong*)initid->ptr) = 0;
}

void MyTest_add(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error)
{
        // For each row the current value is added to the sum
        *((longlong*)initid->ptr) = *((longlong*)initid->ptr) + 
                                    *((longlong*)args->args[0]);
}

longlong MyTest(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error)
{
        // And in the end the sum is returned
        return *((longlong*)initid->ptr);
}

 

6. Advanced topics

Here are some things you should know when you write more complex UDFs:

  • A string function should return a pointer to the result and set *result and *length to the contents and length of the return value. For example:

    memcpy(result, "result string", 13);
    *length = 13;

    The result buffer that is passed to the MyTest function is 255 bytes long. If your result fits in this, you don't have to worry about memory allocation for results.

    If your string function needs to return a string longer than 255 bytes, you must allocate the space for it with malloc or new in your MyTest_init function or your MyTest function and free it in yourMyTest_deinit function. You can store the allocated memory in the ptr slot in the UDF_INIT structure for reuse by future MyTest calls.

  • To indicate an error return in the main function, set *error to 1: If MyTest() sets *error to 1 for any row, the function value is NULL for the current row and for any subsequent rows processed by the statement in which MyTest() was invoked.

  • For more information see the MySQL Online Manual

7. Some guidelines

Here are some guidelines that you should follow if you want to make sure your UDF runs smoothly Wink | ;-)

  • Do not call any other applications or processes inside an UDF!

  • Do not store any information locally! (This goes for shared libraries in general)

  • Do not allocate any global or static variables!

  • Always check the type of your arguments. As you can see MySQL converts everything to char pointers. This can lead to major troubles if you convert a string literal to a integer pointer and so on.

  • Be extra careful with the memory allocation! If you have memory leaks you can bring down the sever in no time.

8. Debugging UDFs

Debugging an UDF can be pretty nerve wracking because every time your UDF crashes it takes down the whole MySQL server along with it. So I wrote a little command line tool to work around that problem. Just execute it after compilation and it does the rest. Meaning, it emulates a call to the function by calling an "SELECT" command and then passing the results to the library and printing out the result on the command line . So when the UDF produces some serious errors only the little helper goes down and not the whole server. It is still in "beta" so don't expect to much....

9. Sources

MySQL Online Manual

 摘自:研究如何编写和使用MySQL UDF 2008/06/30

1. 简介

MySQL的UDF(User Defined Function)类似于一种API, 用户根据一定的规范用C/C++(或采用C调用规范的语言)编写一组函数(UDF),然后编译成动态链接库,通过CREATE FUNCTION和DROP FUNCTION语句来加载和卸载UDF。UDF被加载后可以像调用MySQL的内置函数一样来调用它,并且服务器在启动时会自动加载原来存在的UDF。

2. 特性

  • UDF的参数和返回值的类型可以为字符串,整数或实型。
  • UDF分为简单UDF和聚合UDF。
  • 服务器会向UDF提供相应的信息以便UDF检查实参的数量、类型和名称。
  • UDF也可以强制服务器在调用UDF前检查实参类型。
  • UDF可以指明返回值为NULL或发生了错误。

3. CREATE FUNCTION的语法

CREATE [AGGREGATE] FUNCTION function_name RETURNS {STRING|INTEGER|REAL|DECIMAL} SONAME shared_library_name

function_name 为函数名 ,RETURNS子句表明UDF的返回值类型,目前DECIMAL和STRING类型是等价的 。

shared_library_name是UDF所在的动态链接库的名称(不含路径),该文件需放在可被系统搜索到的目录中。

在MySQL 5.1中,动态链接库需要放在由plugin_dir变量所指定的插件目录中。

4. DROP FUNCTION的语法

DROP FUNCTION function_name

5. 编写UDF

对于每个UDF,需要编写若干相关C/C++函数,下面用XXX()代表在SQL语句中调用UDF,xxx()代表相应的C/C++函数调用,实际使用时,两者应相同。

  • xxx()  ——  必须,主函数,该函数的返回值即为UDF的返回值。返回值类型与SQL数据类型的对应关系如下:
SQL类型C/C++类型
STRING/DECIMAL  char *
INTEGER  long long
REAL double

 

  •  xxx_init()  ——  可选,xxx()的初始化函数,它可被用来
    • 检查XXX()的实参个数和类型,或强制由服务器来检查。
    • 为主函数分配内存。
    • 指明UDF返回值的最大长度或精度,以及返回值是否可能为NULL.
  •  xxx_deinit() ——  可选,xxx()的清理函数,用来释放在xxx_init()中分配的内存

对于聚合UDF, 还需要另外两个函数。

  • xxx_clear()  ——  必须(MySQL5.1),重置当前聚合值。
  • xxx_add() ——  必须,将参数加到当前聚合值中去。
  • xxx_reset() —— 必须(MySQL 5.1以下版本),重置当前聚合值,并将参数作为初始聚合值。

所以函数必须是线程安全的。可以参考《MySQL参考手册》第30章第3节以获得更详细的说明。此外,在MySQL源代码包中有编写UDF的示例,位置为sql/udf_example.c.

6. 简单UDF的调用顺序

当SQL语句调用XXX()时,服务器首先调用xxx_init()进行参数检查或内存分配,若xxx_init()出错,则SQL语句被终止,主函数和清理函数将不会被调用。然后,服务器对每一行记录调用一次主函数,最后清理函数xxx_deinit()被调用。

7. 聚合UDF的调用顺序

  1. 调用xxx_init()进行初始化。
  2. 根据GROUP BY表达式对表进行排序。
  3. 对每个新组调用xxx_clear()重置聚合值。
  4. 对组中每一行记录调用xxx_add().
  5. 当该组数据处理完后,调用xxx()计算聚合值.
  6. 重复步骤3-5直到所有数据处理完毕。
  7. 调用xxx_deinit()进行清理。

8. 编译和安装UDF

用类似的命令来编译UDF:

shell> gcc -shared -o udf_example.so udf_example.c

将经过编译得到的动态链接库(.so文件)放到系统可以找到并加载该文件的目录中。然后通过CREATE FUNCTION语句让服务器加载动态链接库中的函数。

9. 示例

udf_test.c

#include <mysql.h>
#include <time.h>
#include <string.h>
my_bool datetime_now_init( UDF_INIT *initid, UDF_ARGS *args, char *message )
{
    if ( args->arg_count )
    {
        strcpy( message, "datetime_now() does not need any argument." );
        return(1);
    }
    return(0);
}


char *datetime_now( UDF_INIT *initid __attribute__( (unused) ), UDF_ARGS *args __attribute__( (unused) ),
            char *result, unsigned long *length, char *is_null, char *error __attribute__( (unused) ) )
{
    const time_t now_time = time( NULL );
    strftime( result, 255, "%Y/%m/%d %I:%M:%S %p %Z %z", localtime( &now_time ) );
    *length = strlen( result );
    return(result);
}

编译 

shell> gcc -fPIC -Wall -O3 -I/usr/local/mysql/include -shared -o udf_test.so udf_test.c

 

安装

mysql> CREATE FUNCTION datetime_now RETURNS STRING SONAME 'udf_test.so';
Query OK, 0 rows affected (0.00 sec)

使用

mysql> SELECT datetime_now();
+----------------------------------+
| datetime_now()                   |
+----------------------------------+
| 2008/06/27 05:27:03 PM CST +0800 |
+----------------------------------+
1 row in set (0.00 sec)

10. 与存储过程函数的区别

存储过程中的函数是一组SQL语句的集合,其能力受限于SQL的表达能力,它还要受限于MySQL自身的一些限制。它主要用于较高层次的数据处理。

UDF的限制相对来说要小很多,与编写服务器代码时的限制相同,只要符合相应的约定即可。它的定位偏向于底层,主要用于提供一些基础的数据处理功能。另外要注意的是,UDF一旦出错,会导致服务器崩溃!

11. 参考资料

 

红黑联盟上文章

MySQL:如何编写UDF

1. 什么是UDF

UDF顾名思义,就是User defined Function,用户定义函数。我们知道,MySQL本身支持很多内建的函数,此外还可以通过创建存储方法来定义函数。UDF为用户提供了一种更高效的方式来创建函数。

UDF与普通函数类似,有参数,也有输出。分为两种类型:单次调用型和聚集函数。前者能够针对每一行数据进行处理,后者则用于处理Group By这样的情况。

2. 为什么用UDF

既然MySQL本身提供了大量的函数,并且也支持定义函数,为什么我们还需要UDF呢?这主要基于以下几点:

  1. UDF的兼容性很好,这得益于MySQL的UDF基本上没有变动
  2. 比存储方法具有更高的执行效率,并支持聚集函数
  3. 相比修改代码增加函数,更加方便简单

当然UDF也是有缺点的,这是因为UDF也处于mysqld的内存空间中,不谨慎的内存使用很容易导致mysqld crash掉。

 

3. 如何编写UDF

UDF的API包括

  • name_init():在执行SQL之前会被调用,主要做一些初始化的工作,比如分配后续用到的内存、初始化变量、检查参数是否合法等。
  • name_deinit():在执行完SQL后调用,大多用于内存清理等工作。init和deinit这两个函数都是可选的
  • name(): UDF的主要处理函数,当为单次调用型时,可以处理每一行的数据;当为聚集函数时,则返回Group by后的聚集结果。
  •  name_add(): 在每个分组中每行调用
  • name_clear():在每个分组之后调用

为了便于理解,这里给出两种UDF类型的API调用图:

 

 \

下面将就上述几个API进行详细的讲解:

1). name_init

原型:my_boolname_init(UDF_INIT *initid, UDF_ARGS *args, char *message)

UDF_INIT结构体如下:

 

字段

类型

描述

maybe_null

my_bool

如果为1表示该UDF可以返回NULL

decimals

unsigned int

返回值为实数时,表示精度,范围0~30

max_length

unsigned long

对于返回值为INTEGER类型值为21,对于REAL类型值为17,对于字符串类型,存储函数最长参数的长度

ptr

char*

额外的指针,我们可以在这里分配内存。通过initd传递给其他API

const_item

my_bool

为1表示函数总是返回相同的值

extension

void*

用于扩展?

 

UDF_ARGS结构体如下:

字段

类型

描述

arg_count

unsigned int

参数个数

arg_type

enum Item_result*

参数类型数组,记录每一个参数的类型,可以是STRING_RESULT、REAL_RESULT、INT_RESULT以及DECIMAL_RESULT

args

char **

同样是一个数组,用于存储实际数据。

STRING_RESULT与DECIMAL_RESULT类型为char*,INT_RESULT类型为long long*,REAL_RESULT类型为double*,或者一个NULL指针

lengths

unsigned long*

数组,用于存储每一个参数的长度

maybe_null

char *

该数组用于表明每个参数是否可以为NULL,例如

attributes

char **

每个参数的名字

attribute_lengths

unsigned long*

每个参数名字的长度

extension

void*

用于扩展?

 

Message:用于打印错误信息,该指针本身提供长度为MYSQL_ERRMSG_SIZE,来存储信息;

2).name_deinit

原型:void name_deinit(UDF_INIT*initid)

该函数会进行一些内存释放和清理的工作,在之前我们提到initid->ptr,我们可以在该区域·进行内存的动态分配,这里就可以直接进行内存释放。

 

3).name()

原型:针对不同的返回值类型,有不同的函数原型:

返回值类型

函数原型

STRING or DECIMAL

char *name(UDF_INIT *initid, UDF_ARGS *args, char *result, unsigned long *length, char *is_null, char *error)

INTEGER

long long name(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error)

REAL

double name(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error);

 

当返回值为STRING类型时,参数result开辟一个buffer来存储返回值,但不超过766字节,在length参数中存储了字符串的长度。

每个函数原型还包括了is_null和error参数,当*is_null被设置为1时,返回值为NULL,设置*error为1,表明发生了错误。

 

4).name_add()和name_clear()

原型:

void name_add(UDF_INIT *initid, UDF_ARGS *args, char *is_null,char *error)

void name_clear(UDF_INIT *initid, char *is_null, char *error)

对于聚合类型的UDF,name_addd和name_clear会被反复调用。

4. 两个例子

下面将举两个简单的例子,一个单次调用型函数,一个聚集类型函数,来描述写一个UDF的过程。

 

1)接受一个参数,并返回该参数的值

/* 初始化 */
my_booludf_int_init( UDF_INIT * initid, UDF_ARGS * args, char *message )
{
    if ( args->arg_count != 1 )             /* 检查参数个数 */
    {
        strcpy( message,
            "udf_intexample() can onlyaccept one argument" );
        return(1);
    }
    if ( args->arg_type[0] != INT_RESULT )  /* 检查参数类型 */
    {
        strcpy( message,
            "udf_intexample() argumenthas to be an integer" );
        return(1);
    }
    return(0);
}

/* 清理操作 */
voidudf_int_deinit( UDF_INIT * initid )
{
}

/* 主函数 */
long long udf_int( UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error )
{
    long long num = (*(long long *) args->args[0]); /* 获取第一个参数值 */
    return(num);
}

 

2)接受一个浮点数类型的参数,并对每个分组进行求和

/* 初始化 */
my_booludf_floatsum_init( UDF_INIT * initid, UDF_ARGS * args, char *message )
{
    double *total = (double *) malloc( sizeof(double) );
    if ( total == NULL )             /* 内存分配失败 */
    {
        strcpy( message, "udf_floatsum:alloc mem failed!" );
        return(1);
    }
    *total = 0;
    initid->ptr = (char *) total;
    if ( args->arg_count != 1 )       /* 检查参数个数 */
    {
        strcpy( message, "too moreargs,only one!" );
        return(1);
    }
    if ( args->arg_type[0] != REAL_RESULT ) /* 检查参数类型 */
    {
        strcpy( message, "wrongtype" );
        return(1);
    }
    initid->decimals = 3;     /* 设置返回值精度 */
    return(0);
}
/* 清理、释放在init函数中分配的内存 */
voidudf_floatsum_deinit( UDF_INIT * initid )
{
    free( initid->ptr );
}
/* 每一行都会调用到该函数 */
voidudf_floatsum_add( UDF_INIT * initid, UDF_ARGS * args, char *is_null, char *error )
{
    double* float_total;
    float_total = (double *) initid->ptr;
    if ( args->args[0] )
        *float_total += *(double *) args->args[0];
}
/* 每个分组完成后,返回结果 */
doubleudf_floatsum( UDF_INIT * initid, UDF_ARGS * args, char *is_null, char *error )
{
    double* float_total;
    float_total = (double *) initid->ptr;
    return(*float_total);
}
/* 在进行下一个分组前调用,设置initid->ptr指向的值为0,以便下一次分组统计 */
voidudf_floatsum_clear( UDF_INIT * initid, char *is_null, char *error )
{
    double *float_total;
    float_total = (double *) initid->ptr;
    *float_total = 0;
}

 

3) Mysql-udf-http是一个开源的UDF,可以利用HTTP协议进行REST操作。什么是REST操作呢?REST是一种web service架构风格,其实现基于HTTP协议的四种方法:POST、GET、PUT以及DELETE操作,在mysql-udf-http里分别对应的函数是http_post、http_get()、http_put()、http_delete()。

 

源码下载:http://curl.haxx.se/download/curl-7.21.1.tar.gz

 

./configure–prefix={mysql安装目录} –with-mysql=/usr/local/webserver/mysql/bin/mysql_config

Make&& make install

 

该UDF的实现原理比较简单,主要使用libcurl库函数来实现http协议通信,总共三百多行代码。这里有使用和介绍http://blog.s135.com/mysql-udf-http/

有些比较有趣的功能:

例如,我们可以通过GET方法获取微博中的个人信息,其中1821798401为用户ID

selecthttp_get('http://api.t.sina.com.cn/statuses/user_timeline/1821798401.json?count=1&source=1561596835')

 

UDF具有非常高的自由度,你可以编写你任何想要实现的功能函数,甚至可以引用MySQL内核的代码和变量。

当然,UDF也有着局限性,如下:

a)        在mysql库下必须有func表,并且在‑‑skip‑grant‑tables开启的情况下,UDF会被禁止;

b)        当UDF挂掉时,有可能会导致mysqld crash掉;

c)        所有的UDF的函数必须是线程安全的,如果非要用全局变量,需要加互斥,尽量在name_init中分配资源,并在name_deinit中释放

d)        需要有insert权限

 
其他:

 

posted @ 2015-08-12 22:29  山岚的一缺  阅读(1788)  评论(0编辑  收藏  举报
喜欢
评论
收藏
顶部