【Lustre相关】应用部署-01-源码编译IB驱动及lustre软件包
一、编译安装
系统版本:CentOS Linux release 7.9.2009 (Core)
内核版本:3.10.0-1160.el7.x86_64
网卡型号:Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
软件版本:lfs 2.12.9 ib
注:使用CentOS-7-x86_64-Everything-2009
ISO,选择Minimal install
安装,勾选Debugging Tools
、Development Tools
软件包
1、安装e2fsprogs
下载地址:https://downloads.whamcloud.com/public/e2fsprogs/latest/el7/RPMS/x86_64/
下载安装e2fsprogs软件包
[root@node91 01-e2fsprogs]# ls
e2fsprogs-1.46.6-wc1.el7.x86_64.rpm e2fsprogs-libs-1.46.6-wc1.el7.x86_64.rpm libcom_err-devel-1.46.6-wc1.el7.x86_64.rpm
e2fsprogs-debuginfo-1.46.6-wc1.el7.x86_64.rpm e2fsprogs-static-1.46.6-wc1.el7.x86_64.rpm libss-1.46.6-wc1.el7.x86_64.rpm
e2fsprogs-devel-1.46.6-wc1.el7.x86_64.rpm libcom_err-1.46.6-wc1.el7.x86_64.rpm libss-devel-1.46.6-wc1.el7.x86_64.rpm
[root@node91 01-e2fsprogs]# yum install *.rpm
2、安装lustre内核版本
下载地址:https://downloads.whamcloud.com/public/lustre/lustre-2.12.9-ib/el7.9.2009/server/RPMS/x86_64/
安装lustre内核版本,重启后查看当前内核版本信息为3.10.0-1160.49.1.el7_lustre.x86_64
[root@node91 02-kernel-lustre]# ls
kernel-3.10.0-1160.49.1.el7_lustre.x86_64.rpm kernel-debuginfo-common-x86_64-3.10.0-1160.49.1.el7_lustre.x86_64.rpm kernel-headers-3.10.0-1160.49.1.el7_lustre.x86_64.rpm
kernel-debuginfo-3.10.0-1160.49.1.el7_lustre.x86_64.rpm kernel-devel-3.10.0-1160.49.1.el7_lustre.x86_64.rpm
[root@node91 02-kernel-lustre]# yum install *.rpm
[root@node91 02-kernel-lustre]# reboot
3、编译安装IB驱动包
下载地址:https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/#tabs-1
相关版本选择如下:
Archive Version
-Version(Archive):5.8-1.1.2.1-LTS
-OS Distribution:RHEL/CentOS/Rocky
-OS Distribution Version:RHEL/CentOS 7.9
-Architecture:x86_64
-Download:MLNX_OFED_LINUX-5.8-1.1.2.1-rhel7.9-x86_64.tgz
- 安装依赖包:
yum install libusbx pciutils lsof tcl fuse-libs tcsh tk python-devel createrepo
- IB驱动编译安装
tar -zxvf MLNX_OFED_LINUX-5.8-1.1.2.1-rhel7.9-x86_64.tgz
cd MLNX_OFED_LINUX-5.8-1.1.2.1-rhel7.9-x86_64
./mlnxofedinstall --all --force --without-kmod-iser --without-xpmem-modules --without-libxpmem --add-kernel-support
dracut -f
/etc/init.d/openibd restart
- 检查
openibd
、opensmd
服务状态
[root@node91 MLNX_OFED_LINUX-5.8-1.1.2.1-rhel7.9-x86_64]# /etc/init.d/openibd status
HCA driver loaded
Configured IPoIB devices:
ib0 ib1
Currently active IPoIB devices:
ib0
ib1
Configured Mellanox EN devices:
Currently active Mellanox devices:
ib0
ib1
The following OFED modules are loaded:
rdma_ucm
rdma_cm
ib_ipoib
mlx5_core
mlx5_ib
ib_uverbs
ib_umad
ib_cm
ib_core
mlxfw
4、编译安装lustre
下载地址:https://downloads.whamcloud.com/public/lustre/lustre-2.12.9-ib/el7.9.2009/server/SRPMS/
- 安装依赖包
yum -y install automake xmlto asciidoc elfutils-libelf-devel zlib-devel binutils-devel newt-devel python-devel libyaml-devel
yum -y install pesign numactl-devel pciutils-devel ncurses-devel libselinux-devel
yum -y install attr cifs-utils gssproxy keyutils libbasicobjects libcollection libevent libini_config libldb libnfsidmap libpath_utils libref_array libtalloc libtdb libtevent libtirpc libverto-libevent libwbclient net-tools nfs-utils psmisc quota quota-nls resource-agents rpcbind samba-client-libs samba-common samba-common-libs tcp_wrappers
- 下载源码文件,编译软件包
wget https://downloads.whamcloud.com/public/lustre/lustre-2.12.9-ib/el7.9.2009/server/SRPMS/lustre-2.12.9-1.src.rpm
rpm2cpio lustre-2.12.9-1.src.rpm |cpio -div
tar -zxvf lustre-2.12.9.tar.gz
cd lustre-2.12.9
time ./configure --with-o2ib=/usr/src/ofa_kernel/default 2>&1 | tee log-configure.txt
time make -j $(nproc) rpms 2>&1 | tee log-make.txt
- 解决dysm错误,编译安装mlnx kmod软件
Re: [lustre-discuss] ksym errors on kmod-lustre RPM after 2.12.0 build against MOFED 4.5-1
rpmbuild --rebuild --define 'KMP 1' mlnx-ofa_kernel-5.8-OFED.5.8.1.1.2.1.src.rpm
rpm -ivh /root/rpmbuild/RPMS/x86_64/kmod-mlnx-ofa_kernel-5.8-OFED.5.8.1.1.2.1.x86_64.rpm
- 安装lustre编译软件包
[root@node91 04-lustre]# ls *.rpm
kmod-lustre-2.12.9-1.el7.x86_64.rpm lustre-2.12.9-1.el7.x86_64.rpm lustre-osd-ldiskfs-mount-2.12.9-1.el7.x86_64.rpm
kmod-lustre-osd-ldiskfs-2.12.9-1.el7.x86_64.rpm lustre-debuginfo-2.12.9-1.el7.x86_64.rpm lustre-resource-agents-2.12.9-1.el7.x86_64.rpm
kmod-lustre-tests-2.12.9-1.el7.x86_64.rpm lustre-iokit-2.12.9-1.el7.x86_64.rpm lustre-tests-2.12.9-1.el7.x86_64.rpm
[root@node91 04-lustre]# yum install *.rpm
二、软件部署
1、IB网络配置
- 查看当前存在两个ib网卡
[root@node91 ~]# ibstatus
Infiniband device 'mlx5_0' port 1 status:
default gid: fe80:0000:0000:0000:e8eb:d303:0032:056e
base lid: 0xa4
sm lid: 0x33
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 200 Gb/sec (4X HDR)
link_layer: InfiniBand
Infiniband device 'mlx5_1' port 1 status:
default gid: fe80:0000:0000:0000:e8eb:d303:0032:2d6a
base lid: 0xa5
sm lid: 0x33
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 200 Gb/sec (4X HDR)
link_layer: InfiniBand
- 修改ib0网卡配置,重启网络服务
[root@node91 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ib0
CONNECTED_MODE=no
TYPE=InfiniBand
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ib0
UUID=32420cf2-6708-4cc7-b2b6-c27b55e3480b
DEVICE=ib0
ONBOOT=yes
IPADDR=30.6.1.147
PREFIX=16
[root@node91 ~]# systemctl restart network
2、配置lustre集群
- 修改lustre相关配置,加载相关模块
modinfo lustre
echo "options lnet networks=o2ib(ib0)" > /etc/modprobe.d/lustre.conf
depmod -a
systemctl restart lustre
- 关闭防火墙
systemctl disable firewalld
systemctl stop firewalld
- 创建mdt、mgt、ost,挂载lustre集群
mkdir /lustre/mdt0 -p
mkdir /lustre/ost0 -p
mkfs.lustre --mgs --mdt --index 0 --backfstype=ldiskfs /dev/sdb
mkfs.lustre --fsname lustre --mgs --mdt --index 0 --backfstype=ldiskfs /dev/sdb
mount -t lustre /dev/sdb /lustre/mdt0/
mkfs.lustre --fsname=lustre --ost --mgsnode=30.6.1.147@o2ib --index 0 --backfstype=ldiskfs /dev/sdc
mount /dev/sdc /lustre/ost0/
mount -t lustre /dev/sdc /lustre/ost0/
mkdir /lustrefs
mount -t lustre 30.6.1.147@o2ib:/lustre /lustrefs/