【C#】通过Thrift操作HBase

题外话:C#  调用 Java 的几种方法

1.将Java端的接口通过WebService方式发布,C#可以方便的调用

2.先使用C++ 通过 JNI 调用 Java,C# 调用C++的接口

3.使用开源的库直接使用C#调用Java ,详细信息请点击

4.使用IKVM实现C#调用Java,参考:http://www.ikvm.net/

 

之所以说这些,是因为自己这边客户端要调用HBase接口(Java实现),刚开始我是使用WS方式实现调用,这种方式很简单,而且通用性好。之后一段时间发现了上面所说的第三种方式,并成功调用,但是写这个库的哥们,好像没有维护自己写的这个库,里面有几个很明显的BUG,而且在循环调用的时间,会报内存错误,由于对JNI不太熟悉,也就放弃了这种方式,如果对这种方式感兴趣的童鞋可以给他完善一下,再提个醒,这个开源库依赖jvm.dll,只有32位的JavaJDK才行。至于第二种和第四种方式没有深入研究,在这也就不说了。

 

最终我并没有采用上面的任何一种方式,而我采用的是Thrift方式,虽然比Java API 慢一点,但也在可接受的范围之内。接下来就要进入正题了:

 

准备阶段:

1. 下载 Thrift 的源代码包,http://thrift.apache.org/

2. 下载 Thrift compiler for Windows ,http://www.apache.org/dyn/closer.cgi?path=/thrift/0.9.0/thrift-0.9.0.exe

生成Thrfit接口类:

1. 从HBase包中,得到HBase.Thrift文件。(..\hbase-0.94.6.1\src\main\resources\org\apache\hadoop\hbase\thrift 在此目录下)

2. 将Thrift-0.9.0.exe 与 HBase.Thrift文件放到同意目录下(当然也可以不在同一目录)

3. 进入命令行, Thrift-0.9.0.exe -gen CSharp HBase.Thrift此目录下就成了名为gen-csharp的文件夹

构建解决方案

该准备的项目代码都已完成,新建VS Project , 将Thrift 的源代码项目与刚刚生成的接口接口类引入。

开始集群的Thrift服务

hbase-daemon.sh start thrift   端口号默认9090

编写测试代码

 

   第一篇(查看)已经介绍了关于通过Thrift访问HBase基本使用,这一篇说一些关于通过Thrift对HBase进行查询的操作。通过HBase.Thrift 所生成的接口类中,其中Hbase.cs为核心类,所有客户端操作HBase的接口的定义和实现都在此类中,如果想查看Thrift服务端的代码,请参考HBase源代码。

以下是类中所定义的查询接口:

 

 1 List<TRowResult> getRow(byte[] tableName, byte[] row, Dictionary<byte[], byte[]> attributes);  
 2 List<TRowResult> getRowWithColumns(byte[] tableName, byte[] row, List<byte[]> columns, Dictionary<byte[], byte[]> attributes);  
 3 List<TRowResult> getRowTs(byte[] tableName, byte[] row, long timestamp, Dictionary<byte[], byte[]> attributes);  
 4 List<TRowResult> getRowWithColumnsTs(byte[] tableName, byte[] row, List<byte[]> columns, long timestamp, Dictionary<byte[], byte[]> attributes);  
 5 List<TRowResult> getRows(byte[] tableName, List<byte[]> rows, Dictionary<byte[], byte[]> attributes);  
 6 List<TRowResult> getRowsWithColumns(byte[] tableName, List<byte[]> rows, List<byte[]> columns, Dictionary<byte[], byte[]> attributes);  
 7 List<TRowResult> getRowsTs(byte[] tableName, List<byte[]> rows, long timestamp, Dictionary<byte[], byte[]> attributes);  
 8 List<TRowResult> getRowsWithColumnsTs(byte[] tableName, List<byte[]> rows, List<byte[]> columns, long timestamp, Dictionary<byte[], byte[]> attributes);  
 9 int scannerOpenWithScan(byte[] tableName, TScan scan, Dictionary<byte[], byte[]> attributes);  
10 int scannerOpen(byte[] tableName, byte[] startRow, List<byte[]> columns, Dictionary<byte[], byte[]> attributes);  
11 int scannerOpenWithStop(byte[] tableName, byte[] startRow, byte[] stopRow, List<byte[]> columns, Dictionary<byte[], byte[]> attributes);  
12 int scannerOpenWithPrefix(byte[] tableName, byte[] startAndPrefix, List<byte[]> columns, Dictionary<byte[], byte[]> attributes);  
13 int scannerOpenTs(byte[] tableName, byte[] startRow, List<byte[]> columns, long timestamp, Dictionary<byte[], byte[]> attributes);  
14 int scannerOpenWithStopTs(byte[] tableName, byte[] startRow, byte[] stopRow, List<byte[]> columns, long timestamp, Dictionary<byte[], byte[]> attributes);  
15 List<TRowResult> scannerGet(int id);  
16 List<TRowResult> scannerGetList(int id, int nbRows);  
17 void scannerClose(int id);  

 


 

 

结合项目的应用介绍几个比较常用的接口(其实只要用过HBaseAPI的童鞋,上面这些接口就不在话下了):

1.getRow(这类查询简单,通过rowkey获取数据,接口的大部分参数类型为字节数组)

结果:

2.scannerOpenWithStop(通过RowKey的范围获取数据)

 

        /// <summary>  
        /// 通过RowKey的范围获取数据  
        /// </summary>  
        /// <param name="tablename"></param>  
        /// <param name="stRowkey"></param>  
        /// <param name="?"></param>  
        /// <remarks>结果集包含StartRowKey列值,不包含EndRowKey的列值</remarks>  
        static void GetDataFromHBaseThroughRowKeyRange(string tablename,  
            string stRowkey,string endRowkey)  
        {  
            transport.Open();  
            int ScannerID = client.scannerOpenWithStop(Encoding.UTF8.GetBytes(tablename),   
                Encoding.UTF8.GetBytes(stRowkey), Encoding.UTF8.GetBytes(endRowkey),   
                new List<byte[]> { Encoding.UTF8.GetBytes("i:Data") }, null);  
  
            List<TRowResult> reslut = client.scannerGetList(ScannerID, 100);  
            foreach (var key in reslut)  
            {  
                Console.WriteLine(Encoding.UTF8.GetString(key.Row));  
  
                foreach (var k in key.Columns)  
                {  
                    Console.Write(Encoding.UTF8.GetString(k.Key) + "\t");  
                    Console.WriteLine(Encoding.UTF8.GetString(k.Value.Value));  
                    Console.WriteLine("++++++++++++++++++++++++++++++++++++++");  
                }  
            }  
        }  
  
        //调用  
        static void Main(string[] args)  
        {  
           GetDataFromHBaseThroughRowKeyRange("HStudy", "001", "006");  
        }  

  

结果:

3.scannerOpenWithPrefix(通过RowKey的前缀进行匹配查询)

 

/// <summary>  
/// 通过Rowkey前缀Fliter  
/// </summary>  
/// <param name="tablename"></param>  
/// <param name="startrowkey"></param>  
/// <param name="endrowkey"></param>  
static void GetDataFromHBaseThroughRowKeyPrefix(string tablename, string Prefixrowkey)  
{  
    transport.Open();  
    int ScannerID = client.scannerOpenWithPrefix(Encoding.UTF8.GetBytes(tablename), Encoding.UTF8.GetBytes(Prefixrowkey), new List<byte[]> { Encoding.UTF8.GetBytes("i:Data") }, null);  
    /* 
    *  scannerGetList(string ID),源码中其实调用scannerGetList(string ID,int nbRow)方法,nbRow传值为1 
    */  
    List<TRowResult> reslut = client.scannerGetList(ScannerID,100);             
    foreach (var key in reslut)  
    {  
        Console.WriteLine(Encoding.UTF8.GetString(key.Row));  
  
        foreach (var k in key.Columns)  
        {  
            Console.Write(Encoding.UTF8.GetString(k.Key) + "\t");  
            Console.WriteLine(Encoding.UTF8.GetString(k.Value.Value));  
            Console.WriteLine("++++++++++++++++++++++++++++++++++++++");  
        }  
    }  
}  
/调用  
static void Main(string[] args)  
bsp;      {  
   GetDataFromHBaseThroughRowKeyPrefix("HStudy", "00");  
}  

  

 

4.scannerOpenWithScan(通过过滤器进行查询)

这个接口是所有查询接口中最麻烦的一个吧,因为它用到了过滤器,也就是HBaseAPI中的Filter。这个接口的参数中有一个参数类型为TScan,基本结构如下:

 

public partial class TScan : TBase  
   {  
       private byte[] _startRow;  
       private byte[] _stopRow;  
       private long _timestamp;  
       private List<byte[]> _columns;  
       private int _caching;  
       private byte[] _filterString;  
   }  

  

前面的几个参数不多说,这里说一下_filterString (关于HaseAPI中各种Filter这里就不多说),以常见的SingleColumnValueFilter为例,如果我想定义一个查询PatientName为小红的一个过滤器:

           

 stringfilterString = "SingleColumnValueFilter('s','PatientName',=,'substring:小红')";

            byte[]_filterString = Encoding.UTF8.GetBytes(filterString);

  


如果要定义多个过滤器,过滤器之间用‘AND’连接。

  

/// <summary>  
/// 通过Filter进行数据的Scanner  
/// </summary>  
/// <param name="tablename"></param>  
/// <param name="filterString"></param>  
static void GetDataFromHBaseThroughFilter(string tablename, string filterString,List<byte[]> _cols)  
{  
    TScan _scan = new TScan();  
    //SingleColumnValueFilter('i', 'Data', =, '2')  
    _scan.FilterString =Encoding.UTF8.GetBytes(filterString);  
    _scan.Columns = _cols;  
    transport.Open();  
    int ScannerID = client.scannerOpenWithScan(Encoding.UTF8.GetBytes(tablename), _scan,null);  
  
    List<TRowResult> reslut = client.scannerGetList(ScannerID, 100);  
    foreach (var key in reslut)  
    {  
        Console.WriteLine(Encoding.UTF8.GetString(key.Row));  
  
        foreach (var k in key.Columns)  
        {  
            Console.Write(Encoding.UTF8.GetString(k.Key) + "\t");  
            Console.WriteLine(Encoding.UTF8.GetString(k.Value.Value));  
            Console.WriteLine("++++++++++++++++++++++++++++++++++++++");  
        }  
    }  
}  
  
  
static void Main(string[] args)  
{  
    ////GetDataFromHBaseThroughRowKeyRange("HImages", "123.456.1", "123.456.9");  
    List<byte[]> _byte = new List<byte[]>();  
    _byte.Add(Encoding.UTF8.GetBytes("s:PatientName"));  
    _byte.Add(Encoding.UTF8.GetBytes("s:StudyInstanceUID"));  
    _byte.Add(Encoding.UTF8.GetBytes("s:PatientSex"));  
  
    ////string filterString = "((SingleColumnValueFilter('s','PatientName',=,'substring:Jim')) AND (SingleColumnValueFilter('s','PatientSex',=,'substring:10')))";  
    string filterString = "SingleColumnValueFilter('s','PatientName',=,'substring:小红')";             
    GetDataFromHBaseThroughFilter("HStudy", filterString, _byte);  
    Console.ReadLine();  
  
}  

  

关于通过Thrift查询HBase就说到这。

#最近无论工作还是生活都弄的一团糟,可能是我太浮躁了,整天游戏人生浑浑噩噩的过。算了不说这些扫兴的事情了。

关于C#访问Thrift的文章,这应该是第三篇了,也是最后一篇了吧。Thrfit 最麻烦的应该是查询那部分,第二篇我也详细的说了一下,这一篇文章,说一下Thrift中使用很平凡的API(新建表,删除表,插入数据,更新数据,删除数据),最后发一下,为了方便使用Thrift写的一个Helper类。

 
void createTable(byte[] tableName, List<ColumnDescriptor> columnFamilies);  
void mutateRow(byte[] tableName, byte[] row, List<Mutation> mutations, Dictionary<byte[], byte[]> attributes);  
void mutateRowTs(byte[] tableName, byte[] row, List<Mutation> mutations, long timestamp, Dictionary<byte[], byte[]> attributes);  
void mutateRows(byte[] tableName, List<BatchMutation> rowBatches, Dictionary<byte[], byte[]> attributes);  
void mutateRowsTs(byte[] tableName, List<BatchMutation> rowBatches, long timestamp, Dictionary<byte[], byte[]> attributes);  
void deleteTable(byte[] tableName);  
void deleteAll(byte[] tableName, byte[] row, byte[] column, Dictionary<byte[], byte[]> attributes);  
void deleteAllTs(byte[] tableName, byte[] row, byte[] column, long timestamp, Dictionary<byte[], byte[]> attributes);  
void deleteAllRow(byte[] tableName, byte[] row, Dictionary<byte[], byte[]> attributes);  
void deleteAllRowTs(byte[] tableName, byte[] row, long timestamp, Dictionary<byte[], byte[]> attributes);  

  

值得注意的是,Thrift 插入行和更新行使用的同一函数(mutateRow等一类函数),使用过HBaseAPI的童鞋,这点不足为奇。这几个API都比较简单,下面我就直接贴出Helper类及简单的测试类。

 

using System;  
using System.Collections.Generic;  
using System.Linq;  
using System.Text;  
using System.Threading.Tasks;  
using ThriftHelper;  
using IThrift;  
using Thrift.Transport;  
using Thrift.Protocol;  
namespace Test  
{  
    class Program  
    {  
        /* 
         *   表名: HTest 
         *   列簇: i 
         *   子列: Data 
         */  
        static void Main(string[] args)  
        {  
            #region Test  
  
            Helper.Open();  
            Printer("CreateTable:");  
            ColumnDescriptor _cd = new ColumnDescriptor();  
            _cd.Name = Encoding.UTF8.GetBytes("i");  
  
            if (Helper.CreateTable("ITest", new List<ColumnDescriptor> { _cd }))  
                Printer("CreateTable is Success");  
            else  
                Printer("CreateTable Occurred Error");  
  
            Printer("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++");  
            Printer("MutateRowHBase:");  
            Mutation _mutation = new Mutation();  
            _mutation.Column = Encoding.UTF8.GetBytes("i:one");  
            _mutation.Value = Encoding.UTF8.GetBytes("1");  
  
            if (Helper.MutateRowHBase("ITest", "001", new List<Mutation> { _mutation }))  
                Printer("MutateRowHBase is Success");  
            else  
                Printer("MutateRowHBase Occurred Error");  
  
            Printer("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++");  
  
            Printer("GetDataFromHBase:");  
            List<TRowResult> _result = Helper.GetDataFromHBase("ITest", "001");  
            Printer(_result);  
  
            Printer("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++");  
  
            Printer("MutateRowHBase:");  
            _mutation = new Mutation();  
            _mutation.Column = Encoding.UTF8.GetBytes("i:one");  
            _mutation.Value = Encoding.UTF8.GetBytes("-1");  
  
            if (Helper.MutateRowHBase("ITest", "001", new List<Mutation> { _mutation }))  
                Printer("MutateRowHBase is Success");  
            else  
                Printer("MutateRowHBase Occurred Error");  
  
            Printer("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++");  
  
            Printer("GetDataFromHBase:");  
            _result = Helper.GetDataFromHBase("ITest", "001");  
            Printer(_result);  
  
            Printer("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++");  
  
            Printer("DeleteRow:");  
            if (Helper.DeleteAllRow("ITest", "001"))  
                Printer("DeleteAllRow is Success");  
            else  
                Printer("DeleteAllRow Occurred Error");  
            Helper.Close();  
 
 
            #endregion  
  
            Console.ReadKey();  
             
        }  
  
        static void Printer(List<TRowResult> reslut)  
        {  
            if (reslut.Count == 0)  
                return;  
            foreach (var key in reslut)  
            {  
                Console.WriteLine(Encoding.UTF8.GetString(key.Row));  
  
                foreach (var k in key.Columns)  
                {  
                    Console.Write(Encoding.UTF8.GetString(k.Key) + "\t");  
                    Console.WriteLine(Encoding.UTF8.GetString(k.Value.Value));  
                    Console.WriteLine("++++++++++++++++++++++++++++++++++++++");  
                }  
            }  
        }  
        static void Printer(string conent)  
        {  
            Console.Write(conent);  
        }  
    }  
}  

   

Helper类:

using System;  
using System.Collections.Generic;  
using System.Linq;  
using System.Text;  
using System.Threading.Tasks;  
using Thrift;  
using IThrift;  
using Thrift.Transport;  
using Thrift.Protocol;  
  
namespace ThriftHelper  
{  
    public static class Helper  
    {  
        static TTransport transport = new TSocket("192.168.2.200", 9090);  
        static TProtocol tProtocol = new TBinaryProtocol(transport);  
        static Hbase.Client client = new Hbase.Client(tProtocol);  
  
        public static void Open()  
        {  
            transport.Open();  
        }  
  
        public static void Close()  
        {  
            transport.Close();  
        }  
  
        /// <summary>  
        /// 通过rowkey获取数据  
        /// </summary>  
        /// <param name="tablename"></param>  
        /// <param name="rowkey"></param>  
        public static List<TRowResult> GetDataFromHBase(string tablename, string rowkey)  
        {  
            List<TRowResult> reslut = client.getRow(Encoding.UTF8.GetBytes(tablename), Encoding.UTF8.GetBytes(rowkey), null);  
            return reslut;  
        }  
  
        /// <summary>  
        /// 通过Rowkey前缀Fliter  
        /// </summary>  
        /// <param name="tablename"></param>  
        /// <param name="startrowkey"></param>  
        /// <param name="endrowkey"></param>  
        public static List<TRowResult> GetDataFromHBaseThroughRowKeyPrefix(string tablename, string Prefixrowkey,List<string> _cols)  
        {  
            List<byte[]> _bytes = new List<byte[]>();  
            foreach (string str in _cols)  
                _bytes.Add(Encoding.UTF8.GetBytes(str));  
  
              
            int ScannerID = client.scannerOpenWithPrefix(Encoding.UTF8.GetBytes(tablename), Encoding.UTF8.GetBytes(Prefixrowkey),  
                _bytes, null);  
            /* 
            *  scannerGetList(string ID),源码中其实调用scannerGetList(string ID,int nbRow)方法,nbRow传值为1 
            */  
            List<TRowResult> reslut = client.scannerGetList(ScannerID, 100);  
            return reslut;  
        }  
  
        /// <summary>  
        /// 通过RowKey的范围获取数据  
        /// </summary>  
        /// <param name="tablename"></param>  
        /// <param name="stRowkey"></param>  
        /// <param name="?"></param>  
        /// <remarks>结果集包含StartRowKey列值,不包含EndRowKey的列值</remarks>  
        public static List<TRowResult> GetDataFromHBaseThroughRowKeyRange(string tablename,  
            string stRowkey, string endRowkey,List<string> _cols)  
        {  
            List<byte[]> _bytes = new List<byte[]>();  
            foreach (string str in _cols)  
                _bytes.Add(Encoding.UTF8.GetBytes(str));  
  
              
            int ScannerID = client.scannerOpenWithStop(Encoding.UTF8.GetBytes(tablename),  
                Encoding.UTF8.GetBytes(stRowkey), Encoding.UTF8.GetBytes(endRowkey),  
                _bytes, null);  
  
            List<TRowResult> reslut = client.scannerGetList(ScannerID, 100);  
            return reslut;  
        }  
  
        /// <summary>  
        /// 通过Filter进行数据的Scanner  
        /// </summary>  
        /// <param name="tablename"></param>  
        /// <param name="filterString"></param>  
        public static List<TRowResult> GetDataFromHBaseThroughFilter(string tablename, string filterString, List<byte[]> _cols)  
        {  
            TScan _scan = new TScan();  
            //SingleColumnValueFilter('i', 'Data', =, '2')  
            _scan.FilterString = Encoding.UTF8.GetBytes(filterString);  
            _scan.Columns = _cols;  
              
            int ScannerID = client.scannerOpenWithScan(Encoding.UTF8.GetBytes(tablename), _scan, null);  
  
            List<TRowResult> reslut = client.scannerGetList(ScannerID, 100);  
            return reslut;  
        }  
  
        public static bool MutateRowHBase(string tablename, string rowkey, List<Mutation> _mutations)  
        {  
            try  
            {  
                client.mutateRow(Encoding.UTF8.GetBytes(tablename), Encoding.UTF8.GetBytes(rowkey), _mutations, null);  
                return true;  
            }  
            catch (Exception e)  
            {  
                return false;  
            }  
        }  
  
        public static bool MutateRowsHBase(string tablename, List<BatchMutation> _BatchMutation)  
        {  
            try  
            {  
                client.mutateRows(Encoding.UTF8.GetBytes(tablename), _BatchMutation, null);  
                return true;  
            }  
            catch (Exception e)  
            {  
  
                return false;  
            }  
  
        }  
  
        public static bool DeleteRowHBase(string tablename, string rowkey, string column)  
        {  
            try  
            {  
                client.deleteAll(Encoding.UTF8.GetBytes(tablename), Encoding.UTF8.GetBytes(rowkey),  
                    Encoding.UTF8.GetBytes(column), null);  
                return true;  
            }  
            catch (Exception e)  
            {  
  
                return false;  
            }  
  
        }  
  
        public static bool DeleteAllRow(string tablename, string rowkey)  
        {  
            try  
            {  
                client.deleteAllRow(Encoding.UTF8.GetBytes(tablename), Encoding.UTF8.GetBytes(rowkey), null);  
                return true;  
            }  
            catch (Exception e)  
            {  
                return false;  
            }  
             
        }  
  
        public static bool DeleteTable(string tablename)  
        {  
            try  
            {  
                client.deleteTable(Encoding.UTF8.GetBytes(tablename));  
                return true;  
            }  
            catch (Exception e)  
            {  
                return false;  
            }  
              
        }  
  
        public static bool CreateTable(string tablename, List<ColumnDescriptor> _cols)  
        {  
            try  
            {  
                client.createTable(Encoding.UTF8.GetBytes(tablename), _cols);  
                return true;  
            }  
            catch (Exception e)  
            {  
                return false;  
            }  
              
        }  
    }  
}  

  

好了,关于Thrift的基本操作就写到这,当然Thrift也支持Hbase中比较高级的操作,在以后的博客会不断更新。谢谢大家,个人水平有限,不足之处请谅解。

posted @ 2014-07-28 17:05  ..空白  阅读(1399)  评论(2)    收藏  举报