PHP通过Thrift操作Hbase
一 、HBase访问接口
1. Native Java API,最常规和高效的访问方式,适合Hadoop MapReduce Job并行批处理HBase表数据
2. HBase Shell,HBase的命令行工具,最简单的接口,适合HBase管理使用
3. Thrift Gateway,利用Thrift序列化技术,支持C++,PHP,Python等多种语言,适合其他异构系统在线访问HBase表数据
4. REST Gateway,支持REST 风格的Http API访问HBase, 解除了语言限制
5. Pig,可以使用Pig Latin流式编程语言来操作HBase中的数据,和Hive类似,本质最终也是编译成MapReduce Job来处理HBase表数据,适合做数据统计
6. Hive,当前Hive的Release版本尚没有加入对HBase的支持,但在下一个版本Hive 0.7.0中将会支持HBase,可以使用类似SQL语言来访问HBase
如果使用PHP操作Hbase,推荐使用Facebook开源出来的thrift,官网是:http://thrift.apache.org/ ,它是一个类似ice的中间件,用于不同系统语言间信息交换。
二、安装Thrift
在Hadoop和Hbase都已经安装好的集群上安装Thrift,Thrift安装在Hmaster机器上
1. 下载thrift
wget http://mirror.bjtu.edu.cn/apache/thrift/0.9.0/thrift-0.9.0.tar.gz
2. 解压
tar -xzf thrift-0.9.0.tar.gz
3 .编译安装:
如果是源码编译的,首先要使用./boostrap.sh创建文件./configure
./configure --prefix=/usr/local/thrift --with-php-config=/usr/local/php/bin/php-config
make
make install
4. 启动:
/usr/local/hbase/bin/hbase-daemon.sh start thrift --port=27090
Thrift默认监听的端口是9090
5. 编译:thrift_protocol.so
cd /usr/local/src/thrift-0.9.0/lib/php/src/ext/thrift_protocol /usr/local/php/bin/phpize ./configure --enable-thrift_protocol --with-php-config=/usr/local/php/bin/php-config make make install
然后把生成的thrift_protocol.so文件配置到php.ini并重启apache服务
6. 使用thrift生成接口文件
/usr/local/thrift/bin/thrift --gen php /usr/local/hbase/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift
然后把接口文件copy到packages目录下
三、测试:
1 .php脚本库操作Hbase
<?php ini_set("display_errors", E_ALL); $GLOBALS["THRIFT_ROOT"] = "/html/lib/thrift"; require_once( $GLOBALS["THRIFT_ROOT"] . "/Thrift.php" ); require_once( $GLOBALS["THRIFT_ROOT"] . "/Transport/TSocket.php" ); require_once( $GLOBALS["THRIFT_ROOT"] . "/Transport/TBufferedTransport.php" ); require_once( $GLOBALS["THRIFT_ROOT"] . "/Protocol/TBinaryProtocol.php" ); require_once( $GLOBALS["THRIFT_ROOT"] . "/Packages/Hbase.php" ); $socket = new TSocket("10.64.60.83", "27090"); $socket->setSendTimeout(10000); // Ten seconds $socket->setRecvTimeout(20000); // Twenty seconds $transport = new TBufferedTransport($socket); $protocol = new TBinaryProtocol($transport); $client = new HbaseClient($protocol); $transport->open(); //获取表列表 $tables = $client->getTableNames(); sort($tables); foreach ($tables as $name) { echo( " found: {$name}\n" ); } //创建新表student $columns = array( new ColumnDescriptor(array( "name" => "id:", "maxVersions" => 10 )), new ColumnDescriptor(array( "name" => "name:" )), new ColumnDescriptor(array( "name" => "score:" )), ); $tableName = "student"; try { $client->createTable($tableName, $columns); } catch (AlreadyExists $ae) { echo( "WARN: {$ae->message}\n" ); } //获取表的描述 $descriptors = $client->getColumnDescriptors($tableName); asort($descriptors); foreach ($descriptors as $col) { echo( " column: {$col->name}, maxVer: {$col->maxVersions}\n" ); } //修改表列的数据 $row = "2"; $valid = "foobar-\xE7\x94\x9F\xE3\x83\x93"; $mutations = array( new Mutation(array( "column" => "score", "value" => $valid )), ); $client->mutateRow($tableName, $row, $mutations); //获取表列的数据 $row_name = "2"; $fam_col_name = "score"; $arr = $client->get($tableName, $row_name, $fam_col_name); // $arr = array foreach ($arr as $k => $v) { // $k = TCell echo ("value = {$v->value} , <br> "); echo ("timestamp = {$v->timestamp} <br>"); } $arr = $client->getRow($tableName, $row_name); // $client->getRow return a array foreach ($arr as $k => $TRowResult) { // $k = 0 ; non-use // $TRowResult = TRowResult var_dump($TRowResult); } $transport->close(); ?>