pyhton 操作hive数据仓库
使用库Pyhive
安装:pip install Pyhive -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | from pyhive import hive # or import hive conn = hive.Connection(host = '****' , port = * * * * , username = '****' , database = '****' ) cursor.execute(' 'SELECT * FROM my_awesome_data LIMIT 10' ') for i in range ( * * * * ): sql = "INSERT INTO **** VALUES ({},'username{}')" . format (value, str (username)) cursor.execute(sql) # 下面是官网代码: from pyhive import presto # or import hive cursor = presto.connect( 'localhost' ).cursor() cursor.execute( 'SELECT * FROM my_awesome_data LIMIT 10' ) print (cursor.fetchone()) print (cursor.fetchall()) |
impyla
安装:
pip install Pyhive -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
1 2 3 4 5 6 | from impala.dbapi import connect conn = connect(host = '****' ,port = * * * * ) cursor = conn.cursor() cursor.execute( 'SELECT * FROM mytable LIMIT 100' ) print cursor.description # 打印结果集的schema results = cursor.fetchall() |
impyla交互hive 与pandas
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | from pyhive import hive import pandas as pd def LinkHive(sql_select): connection = hive.Connection(host = 'localhost' ) cur = connection.cursor() cur.execute(sql_select) columns = [col[ 0 ] for col in cursor.description] result = [ dict ( zip (columns, row)) for row in cursor.fetchall()] Main = pd.DataFrame(result) Main.columns = columns return Main sql = "select * from 数据库.表名" df = LinkHive(sql)<br>或者 |
rom impala.dbapi import connect
from impala.util import as_pandas
conn = connect(host='10.161.20.11', port=21050)
cur = conn.cursor()
cur.execute('SHOW TABLES')
cur.execute('SELECT * FROM businfo')
data = as_pandas(cur)
print (data)
print (type(data))
Usage
Impyla implements the Python DB API v2.0 (PEP 249) database interface (refer to it for API details):
from impala.dbapi import connect
conn = connect(host='my.host.com', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description # prints the result set's schema
results = cursor.fetchall()
The Cursor
object also exposes the iterator interface, which is buffered (controlled by cursor.arraysize
):
cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:
process(row)
Furthermore the Cursor
object returns you information about the columns returned in the query. This is useful to export your data as a csv file.
import csv
cursor.execute('SELECT * FROM mytable LIMIT 100')
columns = [datum[0] for datum in cursor.description]
targetfile = '/tmp/foo.csv'
with open(targetfile, 'w', newline='') as outcsv:
writer = csv.writer(outcsv, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL, lineterminator='\n')
writer.writerow(columns)
for row in cursor:
writer.writerow(row)
You can also get back a pandas DataFrame object
from impala.util import as_pandas
df = as_pandas(cur)
# carry df through scikit-learn, for example
· go语言实现终端里的倒计时
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· 分享一个免费、快速、无限量使用的满血 DeepSeek R1 模型,支持深度思考和联网搜索!
· 使用C#创建一个MCP客户端
· ollama系列1:轻松3步本地部署deepseek,普通电脑可用
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· 按钮权限的设计及实现