使用python访问hdfs

使用非Java语言访问hdfs有两种方法,一种是利用libhdfs.so来访问hdfs,另一种是使用thrift通信框架来访问,这里暂先介绍libhdfs

1、先安装libhdfs

# 前提是安装jdk6、jre6,利用cloudera.repo来安装hadoop-0.20

sudo yum –y install libhdfs*

 

2、安装python-devel(2.6+), gcc

sudo yum –y install python-devel gcc

   

3、下载libpyhdfs源码, 准备依赖包

svn checkout http://libpyhdfs.googlecode.com/svn/trunk/ libpyhdfs

cd libpyhdfs

cp /usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u0.jar lib/hadoop-0.20.1-core.jar

cp /usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar lib/

cp /usr/lib/libhdfs.so.0 lib/

ln –s lib/libhdfs.so.0 lib/libhdfs.so

   

4、配置setup.py, 修改Java环境路径

vim setup.py

include_dirs = ['/usr/lib/jvm/java-6-sun/include/']  

    ->  include_dirs = ['/usr/java/jdk1.6.0_24/include/']

runtime_library_dirs = ['/usr/local/lib/pyhdfs', '/usr/lib/jvm/java-6-sun/jre/lib/i386/server']

    ->  runtime_library_dirs = ['/usr/local/lib/pyhdfs', '/usr/java/jdk1.6.0_24/jre/lib/i386/server'],

   

5、修改jdk1.6.0_24/include/jni.h

#include "jni_md.h"

    ->  #include "linux/jni_md.h"

   

6、安装libpyhdfs

sudo python setup.py install --prefix="/usr/local"

# 测试

python pyhdfs_test.py

# 十分悲剧的报错了。。

import pyhdfs
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: libjvm.so: cannot enable executable stack as shared object requires: Permission denied

# 查了好久,最后发现是selinux的问题,暂时没有别的办法,把丫关了吧

sudo vim /etc/selinux/config

# 修改 SELINUX=disabled, 然后非重启关闭
sudo setenforce 0

# 检查selinux状态

sudo getenforce

再试试!!

posted on 2011-05-04 16:20  张淼  阅读(6917)  评论(0编辑  收藏  举报