aiflow部署文档
一、简介
本文档用来进行aiflow项目的完全部署
二、安装环境
2.1 安装系统镜像版本
使用Centos镜像文件:CentOS-7-x86_64-DVD-1908.iso
cat /etc/centos-release
CentOS Linux release 7.7.1908 (Core)
2.2 环境准备
准备不低于2台虚拟机。 1台master,其余的做node。本文档只准备了3台虚拟机。均在阿里云
主机名 | 内网 | 配置 |
---|---|---|
master | 172.31.121.126 | 1处理器4核心 8G |
node1 | 172.31.121.127 | 1处理器4核心 8G |
node2 | 172.31.121.128 | 1处理器4核心 8G |
2.3 配置ssh免密登录
首先每个虚拟机用root身份登录。
关闭所有节点的selixux以及firewalld
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
systemctl disable firewalld
systemctl stop firewalld
禁用交换分区。为了保证 kubelet 正常工作,你 必须 禁用交换分区。
swapoff -a
free -m #查看当前swap情况,如果swap不都为0,重启 reboot
vi /etc/fstab #在swap分区这行前加 '#' 禁用掉,:wq保存退出
然后在每个执行一遍生成密钥
ssh-keygen -t rsa
#连续按三下回车,生成的密钥保存在/root/.ssh/下
然后把每台机器的id_rsa.pub公钥记录下来,每台生成的都不一样
#node2的
cat /root/.ssh/id_rsa.pub
#底下这么长一串是一整行
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCe3noZeJokxnJe2wA6gh/xINExd7+zcG2BWvs2Uu2sN0QKVtRcqp7N4NCdKNM3nYDl4FpX/bl4+ZOD5UWpPFLZsmiDZbE1PX3baGnmpgL5gzyWQwUzyLfqyLIA/MaJCSpQHszuHlSHpJF6eFxqttHietM2gKLAYzUHAYWX+UNs2XeEHPqMlnDvtA02tRVc0UT8W3uM31FwjHOTDgoDw6tWhZTQm9Lft87HJ1iYYBn root@node2
#node1的
cat /root/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDBN7ZksfWKHln8/eTdOGVxOZqQOV+b0eZfp27Kq/leMQWdqD8DEiPh7LUBQZ9xsghH0FkdwVw6oRL8/JDFARMIlaW1Ml3nB72ZLQB0QUfzp6OJWl7vW2vUbPqceBA+Sm7UQ/by2HkwqNqJdNx+RLupZUu780gIBJ3aeP0fjAr19U46Iksgs+c+jV7EAnWTfM05vTODUVwgt2PoCZhHPO0hLfrGheBIHbuK+tAzQWMkWErw/xljKM2v root@node1
#master的
cat /root/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxHp/nZ1h6r/9lRkvf3F1a+Ger0jVXXJivL6vLXS0bx/gV4Q1M6LLKW2FcU3EDIbyGe8eNxApNq7KW4IgYD/+vVxBpWbDNLHXKtPls9GxgSLMX5siDIV08QGfk5Wz0e6U9Fsb3JmoKozGryDuXyY9V/KwaU+LJwo7WTMb9Q4VIR5bnGLyuGu/ygeK201W5qnjVfm2o4/JnHEU82DbOnL71NYbkg+8QakvpizHZYKKUaT92ocOCGIYA+K6a6wDX root@master
只在master上操作,在/root/.ssh/下新增一个authorized_keys文件
vim /root/.ssh/authorized_keys
#把前面三个密钥都复制进来,一个是一整行
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCe3noZeJokxnJe2wA6gh/xINExd7+zcG2BWvs2Uu2sN0QKVtRcqp7N4NCdKNM3nYDl4FpX/bl4+ZOD5UWpPFLZsmiDZbE1PX3baGnmpgL5gzyWQwUzyLfqyLIA/MaJCSpQHszuHlSHpJF6eFxqttHietM2gKLAYzUHAYWX+UNs2XeEHPqMlnDvtA02tRVc0UT8W3uM31FwjHOTDgoDw6tWhZTQm9Lft87HJ1iYYBn root@node2
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDBN7ZksfWKHln8/eTdOGVxOZqQOV+b0eZfp27Kq/leMQWdqD8DEiPh7LUBQZ9xsghH0FkdwVw6oRL8/JDFARMIlaW1Ml3nB72ZLQB0QUfzp6OJWl7vW2vUbPqceBA+Sm7UQ/by2HkwqNqJdNx+RLupZUu780gIBJ3aeP0fjAr19U46Iksgs+c+jV7EAnWTfM05vTODUVwgt2PoCZhHPO0hLfrGheBIHbuK+tAzQWMkWErw/xljKM2v root@node1
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxHp/nZ1h6r/9lRkvf3F1a+Ger0jVXXJivL6vLXS0bx/gV4Q1M6LLKW2FcU3EDIbyGe8eNxApNq7KW4IgYD/+vVxBpWbDNLHXKtPls9GxgSLMX5siDIV08QGfk5Wz0e6U9Fsb3JmoKozGryDuXyY9V/KwaU+LJwo7WTMb9Q4VIR5bnGLyuGu/ygeK201W5qnjVfm2o4/JnHEU82DbOnL71NYbkg+8QakvpizHZYKKUaT92ocOCGIYA+K6a6wDX root@master
只在master上操作, 发送到每个node节点,scp和第一次ssh时需要输入密码。发完了ssh测试一下
scp /root/.ssh/authorized_keys root@node1:/root/.ssh/
scp /root/.ssh/authorized_keys root@node2:/root/.ssh/
#ssh测试一下
ssh node1
ssh node2
2.4 yum配置阿里源
master和node都要用
进入目录,备份原源,获取新源
#进入目录
cd /etc/yum.repos.d/
#备份原源
cp ./CentOS-Base.repo ./CentOS-Base.repo.bak
#获取阿里源
curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
#清理缓存
yum clean all
#生成新的缓存
yum makecache
# !!!!千万千万不要yum update,看到任何教程让这样做都不要做,轻则系统重启重则完全重装
#清理epel源
rpm -e epel-release
#获取epel源
curl -o /etc/yum.repos.d/epel-7.repo http://mirrors.aliyun.com/repo/epel-7.repo
#清理缓存
yum clean all
#生成新的缓存
yum makecache
2.5 配置Java环境
master和node都要做
删除自带的openJDK
cd
rpm -qa | grep java
java-1.8.0-openjdk-headless-1.8.0.101-3.b13.el7_2.x86_64
java-1.8.0-openjdk-1.8.0.101-3.b13.el7_2.x86_64
#显示前缀为pyhton、tzdata和javapackages的三个可以不删
rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.101-3.b13.el7_2.x86_64
rpm -e --nodeps java-1.8.0-openjdk-1.8.0.101-3.b13.el7_2.x86_64
传入jdk-8u202-linux-x64.tar.gz,解压,记录当前路径
tar -zxvf jdk-8u202-linux-x64.tar.gz
cd jdk1.8.0_202
pwd #记录路径
/root/jdk1.8.0_202
修改 /etc/profile
vim /etc/profile
#在文件的export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL这一行下面写入这几行
#注意看JAVA_HOME=后面是记录的路径
export JAVA_HOME=/root/jdk1.8.0_202
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
应用修改,查看java安装是否成功
source /etc/profile
java -version
#显示如下内容即可,
java version "1.8.0_202"
Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)
2.6 安装Python环境
master和node都要做
安装编译环境
cd
yum -y groupinstall "Development tools"
#这是一行
yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel
yum install libffi-devel -y
传入Python-3.8.10.tgz,解压,记录当前路径
tar -zxvf Python-3.8.10.tgz
cd Python-3.8.10/
执行配置,安装
#如果有会提示file exist,不用管
mkdir /usr/local/python3
#prefix是预安装路径,不要改
./configure --prefix=/usr/local/python3
make && make install
创建软连接
ln -s /usr/local/python3/bin/python3 /usr/local/bin/python3
ln -s /usr/local/python3/bin/pip3 /usr/local/bin/pip3
#删除原来的python2
rm -rf /usr/bin/python
ln -s /usr/local/python3/bin/python3 /usr/bin/python
查看安装是否成功
python3 -V
Python 3.8.10
pip3 -V
pip 21.1.1 from /usr/local/python3/lib/python3.8/site-packages/pip (python 3.8)
导入所需的包,每个节点都要
pip3 install --upgrade pip
pip3 install kfp
pip3 install kubernetes
2.7 前期准备
完成kubeflow环境搭建.md的内容
三、安装MySQL
只在master进行
-
将mysql-5.6.10-linux-glibc2.5-x86_64.tar.gz包上传到master的/usr/local目录下
-
删除原本的mysql
yum list installed | grep mysql yum -y remove 文件名(此处为上一步查到的结果) rpm -qa|grep mariadb # 查询出来已安装的mariadb rpm -e --nodeps 文件名 // 卸载mariadb,文件名为上述命令查询出来的文件
-
删除etc目录下的my.cnf
rm /etc/my.cnf
-
解压
tar -zxvf mysql-5.6.10-linux-glibc2.5-x86_64.tar.gz #将解压好的文件夹命名为mysql mv 解压出来的文件夹名 mysql #进入mysql目录 cd mysql
-
执行以下命令来创建mysql用户组
groupadd mysql #执行以下命令来创建一个用户名为mysql的用户并加入mysql用户组 useradd -g mysql mysql
-
创建配置文件my.cnf
#复制一份my.cnf到/etc目录下 cp /usr/local/mysql/my-default.cnf /etc/my.cnf #修改配置,只需将端口一行放开且改为3307 vim /etc/my.cnf [root@bdilab001 mysql]# cat /etc/my.cnf # For advice on how to change settings please see # http://dev.mysql.com/doc/refman/5.6/en/server-configuration-defaults.html # *** DO NOT EDIT THIS FILE. It's a template which will be copied to the # *** default location during install, and will be replaced if you # *** upgrade to a newer version of MySQL. [mysqld] # Remove leading # and set to the amount of RAM for the most important data # cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%. # innodb_buffer_pool_size = 128M # Remove leading # to turn on a very important data integrity option: logging # changes to the binary log between backups. # log_bin # These are commonly set, remove the # and set as required. # basedir = ..... #datadir = /var/lib/mysql port = 3307 # server_id = ..... # socket = ..... # Remove leading # to set options mainly useful for reporting servers. # The server defaults are faster for transactions and fast SELECTs. # Adjust sizes as needed, experiment to find the optimal values. # join_buffer_size = 128M # sort_buffer_size = 2M # read_rnd_buffer_size = 2M sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
-
修改当前目录拥有着为mysql用户
cd /usr/local/mysql chown -R mysql:mysql ./
-
安装
./scripts/mysql_install_db --user=mysql --basedir=/usr/local/mysql/ --datadir=/usr/local/mysql/data/ #安装数据库 #若报错如下: FATAL ERROR: please install the following Perl modules before executing ./scripts/mysql_install_db:Data::Dumper #解决:安装autoconf库(之后若出现相似错误,则按提示安装即可) yum -y install autoconf
-
启动,重启,关闭
[root@bdilab001 mysql]# service mysql restart Shutting down MySQL.. [ OK ] Starting MySQL. [ OK ] #service mysql stop #service mysql restart
-
修改密码
#添加软连接 ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql #进入mysql命令行 mysql –u root –p #若报错如下 mysql Ver 14.14 Distrib 5.6.10, for linux-glibc2.5 (x86_64) using EditLine wrapper Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Usage: mysql [OPTIONS] [database] -?, --help Display this help and exit. -I, --help Synonym for -? --auto-rehash Enable automatic rehashing. 。。。 #输入 [root@bdilab001 mysql]# mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.6.10 MySQL Community Server (GPL) Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> use mysql
#如果第一次安装直接第三步,如果忘了密码从第一步开始 #1.进入/etc/my.cnf 在[mysql]下添加skip-grant-tables 启动安全模式 vi /etc/my.cnf #2.重启服务 service mysql restart #3.登录 mysql -u root -p #或者 mysql #4.使用mysql数据库 mysql> use mysql Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed #5.改变密码 mysql> update user set password= passworD ("bdilab@1308") where user='root'; Query OK, 4 rows affected (0.00 sec) Rows matched: 4 Changed: 4 Warnings: 0 #6.刷新权限 mysql> flush privileges; Query OK, 0 rows affected (0.00 sec) #7.退出MySql编辑模式 mysql> exit
-
设置用户访问权限
#进入mysql命令行 [root@bdilab001 mysql]# mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.6.10 MySQL Community Server (GPL) Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> #执行权限赋予命令 GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '123456' WITH GRANT OPTION; #刷新权限 mysql> flush privileges; Query OK, 0 rows affected (0.00 sec) #退出MySql编辑模式 mysql> exit
四、安装Hadoop
只在node2节点操作,安装单机版Hadoop和Hbase
导入hadoop-2.10.1.tar.gz和hbase-2.3.5-bin.tar.gz,进入hadoop目录
tar -zxvf hadoop-2.10.1.tar.gz
tar -zxvf hbase-2.3.5-bin.tar.gz
cd hadoop-2.10.1
#现在的绝对路径是 /root/hadoop-2.10.1
修改/root/hadoop-2.10.1/etc/hadoop/hadoop-env.sh
#第25行,值填你的java路径,和etc/profile里面的一样,前面不要有 # (井号)
export JAVA_HOME=/root/jdk1.8.0_202
修改/root/hadoop-2.10.1/etc/hadoop/yarn-env.sh
#第23行,值填你的java路径,和etc/profile里面的一样,前面不要有 # (井号)
export JAVA_HOME=/root/jdk1.8.0_202
修改/root/hadoop-2.10.1/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/cloud/temp/hadoop-2.10.1</value>
<description>Abasefor other temporary directories.</description>
</property>
</configuration>
修改/root/hadoop-2.10.1/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>localhost:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/root/hadoop-2.10.1/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/root/hadoop-2.10.1/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
修改/root/hadoop-2.10.1/etc/hadoop/mapred-site.xml
#先创建文件
cp /root/hadoop-2.10.1//etc/hadoop/mapred-site.xml.template /root/hadoop-2.10.1//etc/hadoop/mapred-site.xml
再改
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>localhost:50030</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4096</value>
</property>
</configuration>
修改/root/hadoop-2.10.1/etc/hadoop/slaves,如果里面是localhost就不用改了
localhost
修改/root/hadoop-2.10.1/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>localhost:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>localhost:8088</value>
</property>
</configuration>
初始化主节点
/root/hadoop-2.10.1/bin/hadoop namenode -format
启动hadoop
/root/hadoop-2.10.1/sbin/start-dfs.sh
#要输入yes就yes
/root/hadoop-2.10.1/sbin/start-yarn.sh
查看进程
jps
20200 Jps
17106 SecondaryNameNode
18483 ResourceManager
16811 DataNode
16559 NameNode
18639 NodeManager
#一共五个进程
然后在浏览器输入http://node2:50070查看详情,node2换成ip,注意是http没有s
关闭Hadoop命令。注意如果开启了HBase,在关闭Hadoop之前必须关闭HBase
/root/hadoop-2.10.1/sbin/stop-yarn.sh
/root/hadoop-2.10.1/sbin/stop-dfs.sh
五、安装Hbase
只在node2进行
回到上层目录,本文是/root/。然后进入hbase目录
cd ..
cd hbase-2.3.5
修改/root/hbase-2.3.5/conf/hbase-env.sh
#第28行,值填你的java路径,和etc/profile里面的一样,前面不要有 # (井号)
export JAVA_HOME=/root/jdk1.8.0_202
#第126行,把前面#去掉
export HBASE_MANAGES_ZK=true
修改/root/hbase-2.3.5/conf/hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>file:/home/cloud/hbase/tmp</value>
</property>
<property>
<name>hbase.master.port</name>
<value>60000</value>
</property>
<property>
<name>hbase.master.info.port</name>
<value>60020</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
</configuration>
修改/root/hbase-2.3.5/conf/regionservers,里面是localhost就不用改
localhost
在启动hbase之前, 一定要确保hadoop启动了(通过jps和50070确定),没启动的话启动hadoop
启动hadoop
/root/hadoop-2.10.1/sbin/start-dfs.sh
/root/hadoop-2.10.1/sbin/start-yarn.sh
启动hbase
/root/hbase-2.3.5/bin/start-hbase.sh
jps查看
jps
19388 Jps
19090 HRegionServer
18772 HQuorumPeer
18940 HMaster
15138 NameNode
15390 DataNode
15759 SecondaryNameNode
16277 ResourceManager
16434 NodeManager
#hadoop5个,hbase三个HRegionServer、HQuorumPeer、HMaster
然后在浏览器输入http://node2:60020查看详情,node2换成ip,注意是http没有s
备注,关闭hbase命令。
#一定要先关HBase再关Hadoop!
/root/hbase-2.3.5/bin/stop-hbase.sh
六、安装Redis
只在master进行
首先导入redis-6.2.4.tar.gz,进入目录
tar -zxvf redis-6.2.4.tar.gz
cd redis-6.2.4
安装
make
#等待执行结束
make install
修改配置文件,本机位置是/root/redis-6.2.4/redis.conf
#第75行有个bind,注释掉
# bind 127.0.0.1 -::1
#第94行
protected-mode no
#第257行
daemonize yes
#第901行
requirepass bdilab@1308
保存后,启动redis-server
#目录是/root/redis-6.2.4/,使用redis.conf启动
/root/redis-6.2.4/src/redis-server /root/redis-6.2.4/redis.conf
启动一个redis-cli测试
#启动
/root/redis-6.2.4/src/redis-cli
#输入set name 1,意为创建一个变量name,值为1
127.0.0.1:6379> set name 1
(error) NOAUTH Authentication required.
#出现这个 NOAUTH Authentication required说明密码配置生效了
127.0.0.1:6379> AUTH bdilab@1308
OK
127.0.0.1:6379> set testname 1
OK
127.0.0.1:6379> exit
七、NFS搭建
7.1服务器端安装
-
在每个节点上安装nfs服务(master,node)
yum -y install nfs-utils rpcbind #查看是否安装正常 [root@bdilab001 nfs]# rpm -qa nfs-utils nfs-utils-1.3.0-0.68.el7.x86_64 #创建/nfs/aiflow目录 mkdir /nfs mkdir /nfs/aiflow #更改目录权限(*很重要) chown -R nfsnobody.nfsnobody /nfs/aiflow/
-
nfs服务端配置目录(master)
NFS服务器端的主要配置文件为/etc/exports,通过此配置文件可以设置服务端的共享文件目录。这个目录文件的修改需要非常小心,若填错会导致系统不可启动
echo "/nfs/aiflow *(rw,async,no_root_squash)" >> /etc/exports #每条配置记录由NFS共享目录、NFS客户端地址和参数这3部分组成,格式如下[NFS共享目录] [NFS客户端地址1(参数1,参数2,参数3……)] [客户端地址2(参数1,参数2,参数3……)]NFS共享目录:nfs服务端上共享出去的文件目录; #NFS客户端地址:允许其访问的NFS服务端的客户端地址,*表示所有客户端IP都可以访问; #访问参数:括号中逗号分隔项,主要是一些权限选项,rw表示读写权限 #输入以下命令使共享目录生效 exportfs -r
-
启动客户端服务(master)
[root@bdilab001 nfs]# systemctl start rpcbind [root@bdilab001 nfs]# systemctl enable rpcbind [root@bdilab001 nfs]# systemctl start nfs [root@bdilab001 nfs]# systemctl enable nfs Created symlink from /etc/systemd/system/multi-user.target.wants/nfs-server.service to /usr/lib/systemd/system/nfs-server.service. #查看启动成功 [root@bdilab001 nfs]# showmount -e localhost Export list for localhost: /nfs/aiflow * #需要放行TCP和UDP的端口111,20048和2048(这个只需要TCP)
7.2 客户端安装
-
启动服务(node)
[root@bdilab003 /]# systemctl start rpcbind [root@bdilab003 /]# systemctl enable rpcbind [root@bdilab003 /]# systemctl start nfs [root@bdilab003 /]# systemctl enable nfs Created symlink from /etc/systemd/system/multi-user.target.wants/nfs-server.service to /usr/lib/systemd/system/nfs-server.service.
-
将客户端目录挂载到服务端的共享目录(node)
#显示nfs挂载信息 [root@bdilab003 /]# showmount -e 120.27.69.55 Export list for 120.27.69.55: /nfs/aiflow * #若是没有开放111,20048端口会报如下错误(*必须要tcp和udp都开放) [root@bdilab002 /]# showmount -e 120.27.69.55 clnt_create: RPC: Port mapper failure - Timed out #挂载 mount -t nfs 120.27.69.55:/nfs/aiflow /nfs/aiflow #120.27.69.55是服务器地址,将node节点的/nfs/aiflow目录挂载到服务器上/nfs/aiflow目录上。 #如果报错如下,则是没有设置目录权限或者没有开放2049端口(TCP) mount.nfs: Connection timed out
-
测试
[root@bdilab002 /]# df -h #可以找到下面一行 120.27.69.55:/nfs/aiflow 99G 31G 64G 33% /nfs/aiflow
八、(可选)安装harbor(docker的镜像仓库,防止镜像丢失)
只在master进行
先安装彻底、y
导入docker-compose-Linux-x86_64
cp ./docker-compose-Linux-x86_64 /usr/local/bin/
chmod +x /usr/local/bin/docker-compose-Linux-x86_64
chmod 777 /usr/local/bin/docker-compose-Linux-x86_64
ln -s /usr/local/bin/docker-compose-Linux-x86_64 /usr/bin/docker-compose
#查看版本
docker-compose --version
docker-compose version 1.29.2, build 5becea4c
进入目录
#回到根目录
cd
#创建目录
mkdir harbor
cd harbor
导入harbor-offline-installer-v2.2.3.tgz
tar -zxvf harbor-offline-installer-v2.2.3.tgz
安装Harbor
回到harbor目录修改配置文件
#此时的绝对路径是/root/harbor/harbor
cp harbor.yml.tmpl harbor.yml
#修改harbor.yml
vim harbor.yml
#以下为要修改的内容
#第5行,监听地址,可以是域名,这里填的是master的 内网ip
hostname: 137.27.69.33
#第10行,http端口
port: 8081
#注释12-18行
#注释就是在这一行的最前面加 # (井号)
#第34行,密码
harbor_admin_password: bdilab@1308
修改完毕按esc,输入:wq (前面有冒号)保存退出
运行./install.sh
./install.sh
#输出以下内容即安装成功
[Step 5]: starting Harbor ...
Creating network "harbor_harbor" with the default driver
Creating harbor-log ... done
Creating harbor-portal ... done
Creating redis ... done
Creating registry ... done
Creating harbor-db ... done
Creating registryctl ... done
Creating harbor-core ... done
Creating nginx ... done
Creating harbor-jobservice ... done
✔ ----Harbor has been installed and started successfully.----
浏览器访问http://120.27.69.33:8081,用户名admin,密码bdilab@1308,注意是http,且访问公网ip(120开头这个是外网)和内网ip都行
备注:启动和停止操作
docker-compose down #停止
docker-compose up -d #启动
修改所有节点的docker配置文件以访问harbor
#先停止harbor(在目录/root/harbor/harbor下)
cd /root/harbor/harbor
docker-compose down #停止
#修改docker配置文件,在/etc/docker/daemon.json
vim /etc/docker/daemon.json
#修改为以下内容
{
"insecure-registries": ["172.31.121.126:8081"]
}
重启所有节点的docker
systemctl daemon-reload
systemctl restart docker
等几分钟后在master启动
docker-compose up -d #启动
docker登录
docker login 172.31.121.126:8081
#然后根据提示输入用户名和密码
使用方法
#首先查看本地镜像
docker images
#给一个镜像打tag,最后面为新的tag地址,格式为 仓库地址 + 要上传到的仓库项目名 + 镜像名:tag
docker tag SOURCE_IMAGE[:TAG] 137.27.69.33:8081/library/REPOSITORY[:TAG]
#推送到harbor仓库,然后可以在在线地址看到
docker push 137.27.69.33:8081/library/REPOSITORY[:TAG]
#从harbor下载到本地docker
docker push 137.27.69.33:8081/library/REPOSITORY[:TAG]
然后可以在harbor中pull很多gcr.io等等这些不能在公网下载到的镜像保存下来,防止丢失
创建secret使得kubernetes可以从私有仓库拉取镜像
[root@bdilab001 harbor]# kubectl create secret docker-registry aiflow --docker-server=aiflow --docker-username=admin --docker-password=bdilab@1308
secret/aiflow created
#kubectl create secret docker-registry secret名称(在创建yaml的时候需要使用) --docker-server=Harbor私有镜像仓库的域名 --docker-username=用户名 --docker-password=密码
#查看已创建的secret
[root@bdilab001 harbor]# kubectl get secrets
NAME TYPE DATA AGE
aiflow kubernetes.io/dockerconfigjson 1 3m38s
default-token-gbntq kubernetes.io/service-account-token 3 8d
istio.default istio.io/key-and-cert 3 6d14h
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· SQL Server 2025 AI相关能力初探
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南