nwnusun

   ::  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

aiflow部署文档

一、简介

本文档用来进行aiflow项目的完全部署

二、安装环境

2.1 安装系统镜像版本

使用Centos镜像文件:CentOS-7-x86_64-DVD-1908.iso

cat /etc/centos-release
CentOS Linux release 7.7.1908 (Core)

2.2 环境准备

准备不低于2台虚拟机。 1台master,其余的做node。本文档只准备了3台虚拟机。均在阿里云

主机名 内网 配置
master 172.31.121.126 1处理器4核心 8G
node1 172.31.121.127 1处理器4核心 8G
node2 172.31.121.128 1处理器4核心 8G

2.3 配置ssh免密登录

首先每个虚拟机用root身份登录。

关闭所有节点的selixux以及firewalld

sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
systemctl disable firewalld
systemctl stop firewalld

禁用交换分区。为了保证 kubelet 正常工作,你 必须 禁用交换分区。

swapoff -a
free -m  #查看当前swap情况,如果swap不都为0,重启 reboot
vi /etc/fstab #在swap分区这行前加 '#' 禁用掉,:wq保存退出

然后在每个执行一遍生成密钥

ssh-keygen -t rsa
#连续按三下回车,生成的密钥保存在/root/.ssh/下

然后把每台机器的id_rsa.pub公钥记录下来,每台生成的都不一样

#node2的
cat /root/.ssh/id_rsa.pub 
#底下这么长一串是一整行
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCe3noZeJokxnJe2wA6gh/xINExd7+zcG2BWvs2Uu2sN0QKVtRcqp7N4NCdKNM3nYDl4FpX/bl4+ZOD5UWpPFLZsmiDZbE1PX3baGnmpgL5gzyWQwUzyLfqyLIA/MaJCSpQHszuHlSHpJF6eFxqttHietM2gKLAYzUHAYWX+UNs2XeEHPqMlnDvtA02tRVc0UT8W3uM31FwjHOTDgoDw6tWhZTQm9Lft87HJ1iYYBn root@node2

#node1的
cat /root/.ssh/id_rsa.pub 

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDBN7ZksfWKHln8/eTdOGVxOZqQOV+b0eZfp27Kq/leMQWdqD8DEiPh7LUBQZ9xsghH0FkdwVw6oRL8/JDFARMIlaW1Ml3nB72ZLQB0QUfzp6OJWl7vW2vUbPqceBA+Sm7UQ/by2HkwqNqJdNx+RLupZUu780gIBJ3aeP0fjAr19U46Iksgs+c+jV7EAnWTfM05vTODUVwgt2PoCZhHPO0hLfrGheBIHbuK+tAzQWMkWErw/xljKM2v root@node1

#master的
cat /root/.ssh/id_rsa.pub 

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxHp/nZ1h6r/9lRkvf3F1a+Ger0jVXXJivL6vLXS0bx/gV4Q1M6LLKW2FcU3EDIbyGe8eNxApNq7KW4IgYD/+vVxBpWbDNLHXKtPls9GxgSLMX5siDIV08QGfk5Wz0e6U9Fsb3JmoKozGryDuXyY9V/KwaU+LJwo7WTMb9Q4VIR5bnGLyuGu/ygeK201W5qnjVfm2o4/JnHEU82DbOnL71NYbkg+8QakvpizHZYKKUaT92ocOCGIYA+K6a6wDX root@master

只在master上操作,在/root/.ssh/下新增一个authorized_keys文件

vim /root/.ssh/authorized_keys 
#把前面三个密钥都复制进来,一个是一整行
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCe3noZeJokxnJe2wA6gh/xINExd7+zcG2BWvs2Uu2sN0QKVtRcqp7N4NCdKNM3nYDl4FpX/bl4+ZOD5UWpPFLZsmiDZbE1PX3baGnmpgL5gzyWQwUzyLfqyLIA/MaJCSpQHszuHlSHpJF6eFxqttHietM2gKLAYzUHAYWX+UNs2XeEHPqMlnDvtA02tRVc0UT8W3uM31FwjHOTDgoDw6tWhZTQm9Lft87HJ1iYYBn root@node2
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDBN7ZksfWKHln8/eTdOGVxOZqQOV+b0eZfp27Kq/leMQWdqD8DEiPh7LUBQZ9xsghH0FkdwVw6oRL8/JDFARMIlaW1Ml3nB72ZLQB0QUfzp6OJWl7vW2vUbPqceBA+Sm7UQ/by2HkwqNqJdNx+RLupZUu780gIBJ3aeP0fjAr19U46Iksgs+c+jV7EAnWTfM05vTODUVwgt2PoCZhHPO0hLfrGheBIHbuK+tAzQWMkWErw/xljKM2v root@node1
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxHp/nZ1h6r/9lRkvf3F1a+Ger0jVXXJivL6vLXS0bx/gV4Q1M6LLKW2FcU3EDIbyGe8eNxApNq7KW4IgYD/+vVxBpWbDNLHXKtPls9GxgSLMX5siDIV08QGfk5Wz0e6U9Fsb3JmoKozGryDuXyY9V/KwaU+LJwo7WTMb9Q4VIR5bnGLyuGu/ygeK201W5qnjVfm2o4/JnHEU82DbOnL71NYbkg+8QakvpizHZYKKUaT92ocOCGIYA+K6a6wDX root@master

只在master上操作, 发送到每个node节点,scp和第一次ssh时需要输入密码。发完了ssh测试一下

scp /root/.ssh/authorized_keys root@node1:/root/.ssh/
scp /root/.ssh/authorized_keys root@node2:/root/.ssh/

#ssh测试一下
ssh node1
ssh node2

2.4 yum配置阿里源

master和node都要用

进入目录,备份原源,获取新源

#进入目录
cd /etc/yum.repos.d/

#备份原源
cp ./CentOS-Base.repo ./CentOS-Base.repo.bak

#获取阿里源
curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo

#清理缓存
yum clean all

#生成新的缓存
yum makecache

# !!!!千万千万不要yum update,看到任何教程让这样做都不要做,轻则系统重启重则完全重装

#清理epel源
rpm -e epel-release

#获取epel源
curl -o /etc/yum.repos.d/epel-7.repo http://mirrors.aliyun.com/repo/epel-7.repo

#清理缓存
yum clean all

#生成新的缓存
yum makecache

2.5 配置Java环境

master和node都要做

删除自带的openJDK

cd
rpm -qa | grep java
java-1.8.0-openjdk-headless-1.8.0.101-3.b13.el7_2.x86_64
java-1.8.0-openjdk-1.8.0.101-3.b13.el7_2.x86_64

#显示前缀为pyhton、tzdata和javapackages的三个可以不删

rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.101-3.b13.el7_2.x86_64
rpm -e --nodeps java-1.8.0-openjdk-1.8.0.101-3.b13.el7_2.x86_64

传入jdk-8u202-linux-x64.tar.gz,解压,记录当前路径

tar -zxvf jdk-8u202-linux-x64.tar.gz

cd jdk1.8.0_202

pwd  #记录路径
/root/jdk1.8.0_202

修改 /etc/profile

vim /etc/profile
#在文件的export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL这一行下面写入这几行
#注意看JAVA_HOME=后面是记录的路径

export JAVA_HOME=/root/jdk1.8.0_202
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

应用修改,查看java安装是否成功

source /etc/profile
java -version
#显示如下内容即可,
java version "1.8.0_202"
Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)

2.6 安装Python环境

master和node都要做

安装编译环境

cd
yum -y groupinstall "Development tools"
#这是一行
yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel

yum install libffi-devel -y

传入Python-3.8.10.tgz,解压,记录当前路径

tar -zxvf Python-3.8.10.tgz 

cd Python-3.8.10/

执行配置,安装

#如果有会提示file exist,不用管
mkdir /usr/local/python3
#prefix是预安装路径,不要改
./configure --prefix=/usr/local/python3
make && make install

创建软连接

ln -s /usr/local/python3/bin/python3 /usr/local/bin/python3
ln -s /usr/local/python3/bin/pip3 /usr/local/bin/pip3
#删除原来的python2
rm -rf /usr/bin/python
ln -s /usr/local/python3/bin/python3 /usr/bin/python

查看安装是否成功

python3 -V
Python 3.8.10

pip3 -V
pip 21.1.1 from /usr/local/python3/lib/python3.8/site-packages/pip (python 3.8)

导入所需的包,每个节点都要

pip3 install --upgrade pip
pip3 install kfp
pip3 install kubernetes

2.7 前期准备

完成kubeflow环境搭建.md的内容

三、安装MySQL

只在master进行

  1. 将mysql-5.6.10-linux-glibc2.5-x86_64.tar.gz包上传到master的/usr/local目录下

  2. 删除原本的mysql

    yum list installed | grep mysql
    yum -y remove 文件名(此处为上一步查到的结果)
    rpm -qa|grep mariadb # 查询出来已安装的mariadb
    rpm -e --nodeps 文件名  // 卸载mariadb,文件名为上述命令查询出来的文件
    
  3. 删除etc目录下的my.cnf

    rm /etc/my.cnf
    
  4. 解压

    tar -zxvf mysql-5.6.10-linux-glibc2.5-x86_64.tar.gz
    #将解压好的文件夹命名为mysql
    mv 解压出来的文件夹名 mysql
    #进入mysql目录
    cd mysql
    
  5. 执行以下命令来创建mysql用户组

    groupadd mysql
    #执行以下命令来创建一个用户名为mysql的用户并加入mysql用户组
    useradd -g mysql mysql
    
  6. 创建配置文件my.cnf

    #复制一份my.cnf到/etc目录下
    cp /usr/local/mysql/my-default.cnf /etc/my.cnf
    #修改配置,只需将端口一行放开且改为3307
    vim /etc/my.cnf
    [root@bdilab001 mysql]# cat /etc/my.cnf 
    # For advice on how to change settings please see
    # http://dev.mysql.com/doc/refman/5.6/en/server-configuration-defaults.html
    # *** DO NOT EDIT THIS FILE. It's a template which will be copied to the
    # *** default location during install, and will be replaced if you
    # *** upgrade to a newer version of MySQL.
    
    [mysqld]
    
    # Remove leading # and set to the amount of RAM for the most important data
    # cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
    # innodb_buffer_pool_size = 128M
    
    # Remove leading # to turn on a very important data integrity option: logging
    # changes to the binary log between backups.
    # log_bin
    
    # These are commonly set, remove the # and set as required.
    # basedir = .....
    #datadir = /var/lib/mysql
    port = 3307    
    # server_id = .....
    # socket = .....
    
    # Remove leading # to set options mainly useful for reporting servers.
    # The server defaults are faster for transactions and fast SELECTs.
    # Adjust sizes as needed, experiment to find the optimal values.
    # join_buffer_size = 128M
    # sort_buffer_size = 2M
    # read_rnd_buffer_size = 2M 
    
    sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES 
    
    
  7. 修改当前目录拥有着为mysql用户

    cd /usr/local/mysql
    chown -R mysql:mysql ./  
    
  8. 安装

    ./scripts/mysql_install_db --user=mysql --basedir=/usr/local/mysql/ --datadir=/usr/local/mysql/data/ #安装数据库
    #若报错如下:
    FATAL ERROR: please install the following Perl modules before executing
    ./scripts/mysql_install_db:Data::Dumper
    #解决:安装autoconf库(之后若出现相似错误,则按提示安装即可)
    yum -y install autoconf 
    
  9. 启动,重启,关闭

    [root@bdilab001 mysql]# service mysql restart
    Shutting down MySQL..                                      [  OK  ]
    Starting MySQL.                                            [  OK  ]
    #service mysql stop
    #service mysql restart
    
  10. 修改密码

    #添加软连接
    ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql
    #进入mysql命令行
    mysql –u root –p
    #若报错如下
    mysql  Ver 14.14 Distrib 5.6.10, for linux-glibc2.5 (x86_64) using  EditLine wrapper
    Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    
    Usage: mysql [OPTIONS] [database]
      -?, --help          Display this help and exit.
      -I, --help          Synonym for -?
      --auto-rehash       Enable automatic rehashing. 
      。。。
    #输入
    [root@bdilab001 mysql]# mysql
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 1
    Server version: 5.6.10 MySQL Community Server (GPL)
    
    Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    
    mysql> use mysql
    
    #如果第一次安装直接第三步,如果忘了密码从第一步开始
    #1.进入/etc/my.cnf 在[mysql]下添加skip-grant-tables 启动安全模式
    vi /etc/my.cnf
    #2.重启服务
    service mysql restart
    #3.登录
    mysql -u root -p  
    #或者
    mysql
    #4.使用mysql数据库
    mysql> use mysql
    Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A
    
    Database changed
    #5.改变密码
    mysql> update user set password= passworD ("bdilab@1308") where user='root';
    Query OK, 4 rows affected (0.00 sec)
    Rows matched: 4  Changed: 4  Warnings: 0
    #6.刷新权限
    mysql>  flush privileges;
    Query OK, 0 rows affected (0.00 sec)
    #7.退出MySql编辑模式
    mysql> exit
    
  11. 设置用户访问权限

    #进入mysql命令行
    [root@bdilab001 mysql]# mysql
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 1
    Server version: 5.6.10 MySQL Community Server (GPL)
    
    Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    
    mysql>
    #执行权限赋予命令
    GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '123456' WITH GRANT OPTION; 
    #刷新权限
    mysql>  flush privileges;
    Query OK, 0 rows affected (0.00 sec)
    #退出MySql编辑模式
    mysql> exit
    

四、安装Hadoop

只在node2节点操作,安装单机版Hadoop和Hbase

导入hadoop-2.10.1.tar.gzhbase-2.3.5-bin.tar.gz,进入hadoop目录

tar -zxvf hadoop-2.10.1.tar.gz
tar -zxvf hbase-2.3.5-bin.tar.gz
cd hadoop-2.10.1
#现在的绝对路径是 /root/hadoop-2.10.1

修改/root/hadoop-2.10.1/etc/hadoop/hadoop-env.sh

#第25行,值填你的java路径,和etc/profile里面的一样,前面不要有 # (井号)
export JAVA_HOME=/root/jdk1.8.0_202

修改/root/hadoop-2.10.1/etc/hadoop/yarn-env.sh

#第23行,值填你的java路径,和etc/profile里面的一样,前面不要有 # (井号)
export JAVA_HOME=/root/jdk1.8.0_202

修改/root/hadoop-2.10.1/etc/hadoop/core-site.xml

<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://localhost:9000</value>
	</property>
	<property>
		<name>io.file.buffer.size</name>
		<value>131072</value>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>file:/home/cloud/temp/hadoop-2.10.1</value>
		<description>Abasefor other temporary directories.</description>
	</property>
</configuration>

修改/root/hadoop-2.10.1/etc/hadoop/hdfs-site.xml

<configuration>
	<property>
		<name>dfs.namenode.secondary.http-address</name>
		<value>localhost:9001</value>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:/root/hadoop-2.10.1/dfs/name</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:/root/hadoop-2.10.1/dfs/data</value>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>
	<property>
		<name>dfs.webhdfs.enabled</name>
		<value>true</value>
	</property>
</configuration>

修改/root/hadoop-2.10.1/etc/hadoop/mapred-site.xml

#先创建文件
cp /root/hadoop-2.10.1//etc/hadoop/mapred-site.xml.template  /root/hadoop-2.10.1//etc/hadoop/mapred-site.xml

再改

<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
	<property>
		<name>mapreduce.jobhistory.address</name>
		<value>localhost:10020</value>
	</property>
	<property>
		<name>mapreduce.jobhistory.webapp.address</name>
		<value>localhost:50030</value>
	</property>
	<property>
		<name>mapreduce.reduce.memory.mb</name>
		<value>4096</value>
	</property>
</configuration>

修改/root/hadoop-2.10.1/etc/hadoop/slaves,如果里面是localhost就不用改了

localhost

修改/root/hadoop-2.10.1/etc/hadoop/yarn-site.xml

<configuration>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>
	<property>
		<name>yarn.resourcemanager.address</name>
		<value>localhost:8032</value>
	</property>
	<property>
		<name>yarn.resourcemanager.scheduler.address</name>
		<value>localhost:8030</value>
	</property>
	<property>
		<name>yarn.resourcemanager.resource-tracker.address</name>
		<value>localhost:8031</value>
	</property>
	<property>
		<name>yarn.resourcemanager.admin.address</name>
		<value>localhost:8033</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address</name>
		<value>localhost:8088</value>
	</property>
</configuration>

初始化主节点

/root/hadoop-2.10.1/bin/hadoop namenode -format

启动hadoop

/root/hadoop-2.10.1/sbin/start-dfs.sh
#要输入yes就yes
/root/hadoop-2.10.1/sbin/start-yarn.sh

查看进程

jps
20200 Jps
17106 SecondaryNameNode
18483 ResourceManager
16811 DataNode
16559 NameNode
18639 NodeManager
#一共五个进程

然后在浏览器输入http://node2:50070查看详情,node2换成ip,注意是http没有s

关闭Hadoop命令。注意如果开启了HBase,在关闭Hadoop之前必须关闭HBase

/root/hadoop-2.10.1/sbin/stop-yarn.sh
/root/hadoop-2.10.1/sbin/stop-dfs.sh

五、安装Hbase

只在node2进行

回到上层目录,本文是/root/。然后进入hbase目录

cd ..
cd hbase-2.3.5

修改/root/hbase-2.3.5/conf/hbase-env.sh

#第28行,值填你的java路径,和etc/profile里面的一样,前面不要有 # (井号)
export JAVA_HOME=/root/jdk1.8.0_202
#第126行,把前面#去掉
export HBASE_MANAGES_ZK=true

修改/root/hbase-2.3.5/conf/hbase-site.xml

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:9000/hbase</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>localhost</value>
  </property>
  <property>
    <name>hbase.tmp.dir</name>
    <value>file:/home/cloud/hbase/tmp</value>
  </property>
  <property>
    <name>hbase.master.port</name>
    <value>60000</value>
  </property>
  <property>
    <name>hbase.master.info.port</name>
    <value>60020</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
  </property>
</configuration>

修改/root/hbase-2.3.5/conf/regionservers,里面是localhost就不用改

localhost

在启动hbase之前, 一定要确保hadoop启动了(通过jps和50070确定),没启动的话启动hadoop

启动hadoop

/root/hadoop-2.10.1/sbin/start-dfs.sh
/root/hadoop-2.10.1/sbin/start-yarn.sh

启动hbase

/root/hbase-2.3.5/bin/start-hbase.sh 

jps查看

jps
19388 Jps
19090 HRegionServer
18772 HQuorumPeer
18940 HMaster
15138 NameNode
15390 DataNode
15759 SecondaryNameNode
16277 ResourceManager
16434 NodeManager
#hadoop5个,hbase三个HRegionServer、HQuorumPeer、HMaster

然后在浏览器输入http://node2:60020查看详情,node2换成ip,注意是http没有s

备注,关闭hbase命令。

#一定要先关HBase再关Hadoop!
/root/hbase-2.3.5/bin/stop-hbase.sh 

六、安装Redis

只在master进行

首先导入redis-6.2.4.tar.gz,进入目录

tar -zxvf redis-6.2.4.tar.gz
cd redis-6.2.4

安装

make
#等待执行结束
make install

修改配置文件,本机位置是/root/redis-6.2.4/redis.conf

#第75行有个bind,注释掉
# bind 127.0.0.1 -::1

#第94行
protected-mode no

#第257行
daemonize yes

#第901行
requirepass bdilab@1308

保存后,启动redis-server

#目录是/root/redis-6.2.4/,使用redis.conf启动
/root/redis-6.2.4/src/redis-server /root/redis-6.2.4/redis.conf

启动一个redis-cli测试

#启动
/root/redis-6.2.4/src/redis-cli 
#输入set name 1,意为创建一个变量name,值为1
127.0.0.1:6379> set name 1
(error) NOAUTH Authentication required.
#出现这个 NOAUTH Authentication required说明密码配置生效了
127.0.0.1:6379> AUTH bdilab@1308
OK
127.0.0.1:6379> set testname 1
OK
127.0.0.1:6379> exit

七、NFS搭建

7.1服务器端安装

  1. 在每个节点上安装nfs服务(master,node)

    yum -y install nfs-utils rpcbind
    #查看是否安装正常
    [root@bdilab001 nfs]# rpm -qa nfs-utils
    nfs-utils-1.3.0-0.68.el7.x86_64
    
    #创建/nfs/aiflow目录
    mkdir /nfs
    mkdir /nfs/aiflow
    
    #更改目录权限(*很重要)
    chown -R nfsnobody.nfsnobody /nfs/aiflow/
    
  2. nfs服务端配置目录(master)

    NFS服务器端的主要配置文件为/etc/exports,通过此配置文件可以设置服务端的共享文件目录。这个目录文件的修改需要非常小心,若填错会导致系统不可启动

    echo "/nfs/aiflow *(rw,async,no_root_squash)" >> /etc/exports
    #每条配置记录由NFS共享目录、NFS客户端地址和参数这3部分组成,格式如下[NFS共享目录] [NFS客户端地址1(参数1,参数2,参数3……)] [客户端地址2(参数1,参数2,参数3……)]NFS共享目录:nfs服务端上共享出去的文件目录;
    	#NFS客户端地址:允许其访问的NFS服务端的客户端地址,*表示所有客户端IP都可以访问;
    	#访问参数:括号中逗号分隔项,主要是一些权限选项,rw表示读写权限
    
    #输入以下命令使共享目录生效
    exportfs -r
    
  3. 启动客户端服务(master)

    [root@bdilab001 nfs]# systemctl start rpcbind
    [root@bdilab001 nfs]# systemctl enable rpcbind
    [root@bdilab001 nfs]# systemctl start nfs
    [root@bdilab001 nfs]# systemctl enable nfs
    Created symlink from /etc/systemd/system/multi-user.target.wants/nfs-server.service to /usr/lib/systemd/system/nfs-server.service.
    
    #查看启动成功
    [root@bdilab001 nfs]# showmount -e localhost
    Export list for localhost:
    /nfs/aiflow *
    #需要放行TCP和UDP的端口111,20048和2048(这个只需要TCP)
    

7.2 客户端安装

  1. 启动服务(node)

    [root@bdilab003 /]# systemctl start rpcbind
    [root@bdilab003 /]# systemctl enable rpcbind
    [root@bdilab003 /]# systemctl start nfs
    [root@bdilab003 /]# systemctl enable nfs
    Created symlink from /etc/systemd/system/multi-user.target.wants/nfs-server.service to /usr/lib/systemd/system/nfs-server.service.
    
  2. 将客户端目录挂载到服务端的共享目录(node)

    #显示nfs挂载信息
    [root@bdilab003 /]# showmount -e 120.27.69.55
    Export list for 120.27.69.55:
    /nfs/aiflow *
    #若是没有开放111,20048端口会报如下错误(*必须要tcp和udp都开放)
    [root@bdilab002 /]# showmount -e 120.27.69.55
    clnt_create: RPC: Port mapper failure - Timed out
    #挂载
    mount -t nfs 120.27.69.55:/nfs/aiflow /nfs/aiflow
    #120.27.69.55是服务器地址,将node节点的/nfs/aiflow目录挂载到服务器上/nfs/aiflow目录上。
    #如果报错如下,则是没有设置目录权限或者没有开放2049端口(TCP)
    mount.nfs: Connection timed out
    
  3. 测试

    [root@bdilab002 /]# df -h
    #可以找到下面一行
    120.27.69.55:/nfs/aiflow   99G   31G   64G  33% /nfs/aiflow
    

八、(可选)安装harbor(docker的镜像仓库,防止镜像丢失)

只在master进行

先安装彻底、y

导入docker-compose-Linux-x86_64

cp ./docker-compose-Linux-x86_64 /usr/local/bin/
chmod +x /usr/local/bin/docker-compose-Linux-x86_64 
chmod 777 /usr/local/bin/docker-compose-Linux-x86_64 
ln -s /usr/local/bin/docker-compose-Linux-x86_64 /usr/bin/docker-compose
#查看版本
docker-compose --version
docker-compose version 1.29.2, build 5becea4c

进入目录

#回到根目录
cd 
#创建目录
mkdir harbor
cd harbor

导入harbor-offline-installer-v2.2.3.tgz

tar -zxvf harbor-offline-installer-v2.2.3.tgz

安装Harbor

回到harbor目录修改配置文件

#此时的绝对路径是/root/harbor/harbor

cp harbor.yml.tmpl harbor.yml
#修改harbor.yml
vim harbor.yml

#以下为要修改的内容
#第5行,监听地址,可以是域名,这里填的是master的 内网ip
hostname: 137.27.69.33

#第10行,http端口
  port: 8081
  
#注释12-18行
#注释就是在这一行的最前面加 #  (井号)
  
#第34行,密码
harbor_admin_password: bdilab@1308

修改完毕按esc,输入:wq (前面有冒号)保存退出

运行./install.sh

./install.sh

#输出以下内容即安装成功
[Step 5]: starting Harbor ...
Creating network "harbor_harbor" with the default driver
Creating harbor-log ... done
Creating harbor-portal ... done
Creating redis         ... done
Creating registry      ... done
Creating harbor-db     ... done
Creating registryctl   ... done
Creating harbor-core   ... done
Creating nginx             ... done
Creating harbor-jobservice ... done
✔ ----Harbor has been installed and started successfully.----

浏览器访问http://120.27.69.33:8081,用户名admin,密码bdilab@1308,注意是http,且访问公网ip(120开头这个是外网)和内网ip都行

备注:启动和停止操作

docker-compose down #停止
docker-compose up -d #启动

修改所有节点的docker配置文件以访问harbor

#先停止harbor(在目录/root/harbor/harbor下)
cd /root/harbor/harbor
docker-compose down   #停止

#修改docker配置文件,在/etc/docker/daemon.json
vim /etc/docker/daemon.json

#修改为以下内容
{
  "insecure-registries": ["172.31.121.126:8081"]
}

重启所有节点的docker

systemctl daemon-reload
systemctl restart docker

等几分钟后在master启动

docker-compose up -d  #启动

docker登录

docker login 172.31.121.126:8081
#然后根据提示输入用户名和密码

使用方法

#首先查看本地镜像
docker images

#给一个镜像打tag,最后面为新的tag地址,格式为    仓库地址  +  要上传到的仓库项目名  +  镜像名:tag
docker tag SOURCE_IMAGE[:TAG] 137.27.69.33:8081/library/REPOSITORY[:TAG]

#推送到harbor仓库,然后可以在在线地址看到
docker push 137.27.69.33:8081/library/REPOSITORY[:TAG]

#从harbor下载到本地docker
docker push 137.27.69.33:8081/library/REPOSITORY[:TAG]

然后可以在harbor中pull很多gcr.io等等这些不能在公网下载到的镜像保存下来,防止丢失

创建secret使得kubernetes可以从私有仓库拉取镜像

[root@bdilab001 harbor]# kubectl create secret docker-registry aiflow --docker-server=aiflow --docker-username=admin --docker-password=bdilab@1308
secret/aiflow created
#kubectl create secret docker-registry secret名称(在创建yaml的时候需要使用) --docker-server=Harbor私有镜像仓库的域名 --docker-username=用户名 --docker-password=密码

#查看已创建的secret
[root@bdilab001 harbor]# kubectl get secrets 
NAME                  TYPE                                  DATA   AGE
aiflow                kubernetes.io/dockerconfigjson        1      3m38s
default-token-gbntq   kubernetes.io/service-account-token   3      8d
istio.default         istio.io/key-and-cert                 3      6d14h

九、部署AIflow项目

posted on 2022-07-14 17:28  匿名者nwnu  阅读(97)  评论(0编辑  收藏  举报