python
- 1. 以python package 安装
scipy
Try to install it as a python package using pip as follows
$ sudo apt-get install python-scipy
If you want to run a python 3.x script, install scipy by:
$ pip3 install scipy
Otherwise install it by:
$ pip install scipy
注意添加到PYTHONPATH里面
- 2. 如何run python programes on yarn clusters/ spark standalone clusters
方案一:在所有的slave和master上部署好所需要的python环境,如:Numpy
方案二:将所需要的python dependency 打包进virtual environment, 然后用spark-submit提交任务即可,需要指定--archives venv.zip
ubantu上安装可以采用:
apt-get update
apt-get install -y python-setuptools python-dev
apt-get insall -y gcc make
apt-get install -y zip
easy_install pip
3. 使用nootbooks(如:Jupyter notebook)
首先安装依赖的环境:
sudo apt install python
sudo apt install python-pip
sudo pip install numpy scipy pandas scikit-learn matplotlib seaborn wordcloud
launch the Jupyter notebook,需要指定:
export PYTHONPATH=${PYTHON_API_PATH}:$PYTHONPATH
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --notebook-dir=./ --ip=* --no-browser"
${SPARK_HOME}/bin/pyspark \ --master ${MASTER} \
-- ...
jupyter dashboard URL is http://your_node:8888/
介绍:
Jupyter Notebook(此前被称为 IPython notebook)是一个交互式笔记本,支持运行 40 多种编程语言。
官网地址:https://jupyter.readthedocs.io/en/latest/install.html
python 一些简单应用:
dict类型是key-value对
for item in vocab:
print(item + ":" + str(vocab[item]))