python

  • 1. 以python package 安装scipy

Try to install it as a python package using pip as follows

$ sudo apt-get install python-scipy

If you want to run a python 3.x script, install scipy by:

$ pip3 install scipy
Otherwise install it by:
$ pip install scipy

注意添加到PYTHONPATH里面

  • 2. 如何run python programes on yarn clusters/ spark standalone clusters

方案一:在所有的slave和master上部署好所需要的python环境,如:Numpy

方案二:将所需要的python dependency 打包进virtual environment, 然后用spark-submit提交任务即可,需要指定--archives venv.zip

ubantu上安装可以采用:

  apt-get update
  apt-get install -y python-setuptools python-dev
  apt-get insall -y gcc make
  apt-get install -y zip
  easy_install pip
  • 3. 使用nootbooks(如:Jupyter notebook)

首先安装依赖的环境:

sudo apt install python

sudo apt install python-pip

sudo pip install numpy scipy pandas scikit-learn matplotlib seaborn wordcloud

launch the Jupyter notebook,需要指定:

export PYTHONPATH=${PYTHON_API_PATH}:$PYTHONPATH
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --notebook-dir=./ --ip=* --no-browser"
${SPARK_HOME}/bin/pyspark \
  --master ${MASTER} \
  -- ...

jupyter dashboard URL is http://your_node:8888/

 

介绍:

Jupyter Notebook(此前被称为 IPython notebook)是一个交互式笔记本,支持运行 40 多种编程语言。

官网地址:https://jupyter.readthedocs.io/en/latest/install.html

 

python 一些简单应用:

dict类型是key-value对

for item in vocab:
    print(item + ":" + str(vocab[item]))

posted @ 2017-07-18 13:59  大球和二憨  阅读(181)  评论(0编辑  收藏  举报