参考:
https://blog.csdn.net/qq_42906753/article/details/105138596
1、Manage Docker as a non-root user时,出现问题,
docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
See 'docker run --help'.
原因:未启动docker
解决办法:
service docker start
部署机ip:192.168.170.142/24 目标机ip:192.168.170.145/24
目标机:
部署机
参考https://blog.csdn.net/weixin_44002829/article/details/97619826
下载python3.6编译安装
命令:wget https://www.python.org/ftp/python/3.6.0/Python-3.6.0.tgz
(如果没有安装wget, 先安装,命令 : yum install wget)
解压: tar -xzvf Python-3.6.0.tgz
(解压在home目录)
指向路径: cd Python-3.6.0
(不知道文件夹在哪可以查找一下 用ls 指令查一下在那个目录下,然后cd)
编译: ./configure --prefix=/usr/local
如果遇到 configure: error: no acceptable C complier found in $PATH
解决: yum install gcc
继续 :
make altinstall
更改 /usr/bin/python链接
cd /usr/bin
mv python python.backup
ln -s /usr/local/bin/python3.6 /usr/bin/python
ln -s /usr/local/bin/python3.6 /usr/bin/python3
更改yum脚本的python 依赖
(这个改了不知道有什么用)
ls yum*
vi /usr/bin/yum
vi /usr/libexec/urlgrabber-ext-down
(将执行指令后进入的文件的开头为
!/usr/bin/python 改为 #!/usr/bin/python2)
之后python3.6就完成了.
下载FATE
curl -OL https://github.com/FederatedAI/KubeFATE/releases/download/v1.3.0/kubefate-docker-compose.tar.gz#下载
tar -xvzf kubefate-docker-compose.tar.gz #解压
进入docker-deploy目录,对parties.conf修改。
下载安装虚拟化所用工具(pip install virtualenvwrapper)时,出现错误:
Could not fetch URL https://pypi.org/simple/virtualenvwrapper/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/virtualenvwrapper/ (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.",)) - skipping
Could not find a version that satisfies the requirement virtualenvwrapper (from versions: )
No matching distribution found for virtualenvwrapper
pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.
使用ssh
在部署机上使用ssh root@192.168.170.145
可以在部署机上连接目标机。
在部署机上,下载并解压Kubefate1.3的kubefate-docker-compose.tar.gz资源包
# curl -OL https://github.com/FederatedAI/KubeFATE/releases/download/v1.3.0/kubefate-docker-compose.tar.gz
# tar -xzf kubefate-docker-compose.tar.gz
定义需要部署的实例数目
进入docker-deploy目录
# cd docker-deploy/
编辑parties.conf如下
vi parties.conf
user=root
dir=/data/projects/fate
partylist=(10000 9999)
partyiplist=(192.168.170.142 192.168.170.145)
servingiplist=(192.168.170.142 192.168.170.145)
exchangeip=
执行生成集群启动文件脚本
#bash generate_config.sh
执行启动集群脚本
# bash docker_deploy.sh all
命令输入后需要每个用户输入4次root用户的密码
验证集群基本功能
#docker exec -it confs-10000_python_1 bash
之后出现为error:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
解决:重启docker
#systemctl daemon-reload
#systemctl restart docker.service
出现问题:
Status: Downloaded newer image for federatedai/fateboard:1.3.0-release
Creating docker-deploy_proxy_1 ... done
Creating docker-deploy_redis_1 ... done
Creating docker-deploy_mysql_1 ... done
Creating docker-deploy_federation_1 ... done
Creating docker-deploy_egg_1 ... done
Creating docker-deploy_meta-service_1 ... done
Creating docker-deploy_roll_1 ... done
Creating docker-deploy_python_1 ... error
ERROR: for docker-deploy_python_1 Cannot create container for service python: failed to mount local volume: mount /path/to/host/dir/examples:/var/lib/docker/volumes/docker-deploy_shared_dir_examples/_data, flags: 0x1000: no such file or directory
ERROR: for python Cannot create container for service python: failed to mount local volume: mount /path/to/host/dir/examples:/var/lib/docker/volumes/docker-deploy_shared_dir_examples/_data, flags: 0x1000: no such file or directory
ERROR: Encountered errors while bringing up the project.