使用python访问hdfs
使用非Java语言访问hdfs有两种方法,一种是利用libhdfs.so来访问hdfs,另一种是使用thrift通信框架来访问,这里暂先介绍libhdfs
1、先安装libhdfs
# 前提是安装jdk6、jre6,利用cloudera.repo来安装hadoop-0.20
sudo yum –y install libhdfs*
2、安装python-devel(2.6+), gcc
sudo yum –y install python-devel gcc
3、下载libpyhdfs源码, 准备依赖包
svn checkout http://libpyhdfs.googlecode.com/svn/trunk/ libpyhdfs
cd libpyhdfs
cp /usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u0.jar lib/hadoop-0.20.1-core.jar
cp /usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar lib/
cp /usr/lib/libhdfs.so.0 lib/
ln –s lib/libhdfs.so.0 lib/libhdfs.so
4、配置setup.py, 修改Java环境路径
vim setup.py
include_dirs = ['/usr/lib/jvm/java-6-sun/include/']
-> include_dirs = ['/usr/java/jdk1.6.0_24/include/']
runtime_library_dirs = ['/usr/local/lib/pyhdfs', '/usr/lib/jvm/java-6-sun/jre/lib/i386/server']
-> runtime_library_dirs = ['/usr/local/lib/pyhdfs', '/usr/java/jdk1.6.0_24/jre/lib/i386/server'],
5、修改jdk1.6.0_24/include/jni.h
#include "jni_md.h"
-> #include "linux/jni_md.h"
6、安装libpyhdfs
sudo python setup.py install --prefix="/usr/local"
# 测试
python pyhdfs_test.py
# 十分悲剧的报错了。。
import pyhdfs
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: libjvm.so: cannot enable executable stack as shared object requires: Permission denied
# 查了好久,最后发现是selinux的问题,暂时没有别的办法,把丫关了吧
sudo vim /etc/selinux/config
# 修改 SELINUX=disabled, 然后非重启关闭
sudo setenforce 0
# 检查selinux状态
sudo getenforce
再试试!!