本文介绍用 python 远程连接 hive,此时需要 hive 启动 hiveserver2 服务

 

windows 下报如下错误

thrift.transport.TTransport.TTransportException: Could not start SASL: Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2

不好玩,还是 linux 吧

 

安装依赖包

pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive

最麻烦的是安装 sasl,出现异常参考下面的异常记录

 

python 操作 hive

官网方法-pyhs2

import pyhs2

## 连接 hve 和 数据库
with pyhs2.connect(host='192.168.10.10',
                   port=10000,
                   authMechanism="PLAIN",
                   user='postgres',
                   password='postgres',
                   database='hive1') as conn:
    with conn.cursor() as cur:
        #Show databases
        print cur.getDatabases()

        #Execute query
        cur.execute("show tables")      # 注意没有 分号

        #Return column info from query
        print cur.getSchema()

        cur.execute('select * from hive_01 limit 3')
        #Fetch table results
        for i in cur.fetch():
            print i

 

其他方法-pyhive

from pyhive import hive


conn = hive.Connection(host='192.168.10.10', port=10000, username="postgres", password='postgres', auth='CUSTOM')
cursor = conn.cursor()
cursor.execute('SHOW DATABASES')
# 打印结果
for result in cursor.fetchall():
     print(result)

auth 

 

其他方法-impyla

其他依赖

pip install six
pip install bit_array
pip install thriftpy

demo

from impala.dbapi import connect 
conn = connect(host ='****',port = ****)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description   # 打印结果集的schema 
results = cursor.fetchall()

 

异常记录

安装 sasl 报错如下

gcc: error trying to exec 'cc1plus': execvp: No such file or directory
error: command 'gcc' failed with exit status 1

 

解决方案

Ubuntu 系统的话可能需要先装好 libsasl2-dev

apt-get install libsasl2-dev

CentOS 系统需要预先装好 python-devel 和 cyrus-sasl-devel

yum install gcc-c++ python-devel.x86_64 cyrus-sasl-devel.x86_64

 

 

 

参考资料: 

https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-PythonClientDriver  官网

https://www.lfd.uci.edu/~gohlke/pythonlibs/#sasl  第三方包下载地址

https://www.jianshu.com/p/a0ff0a4495a1  Python 连接hive(Linux)

https://www.jianshu.com/p/99581dce8309  本文后面有编译python脚本

https://blog.csdn.net/clany888/article/details/82989068  pyhive  and impyla 两种方式

https://www.cnblogs.com/chushiyaoyue/p/5628882.html  pyhs2 方式,写的代码多

http://www.zhangdongshengtech.com/article-detials/267  pyhive

https://www.cnblogs.com/free-easy0000/p/9638982.html  异常记录

https://www.jianshu.com/p/9dd3a741a8ba  也总结了几种方式

https://www.360kuai.com/pc/9026a62a8775d368f?cota=4&kuai_so=1&tj_url=so_rec&sign=360_57c3bbd1&refer_scene=so_1

 

https://blog.csdn.net/bigdataf/article/details/78479331  Windows pyhs2下模块安装,没试过