省局云平台新
10.34.1.53 windows2012 administrator ahWater321
10.34.1.15 ruiy ruiy@321
10.34.1.16
10.34.1.17
底层系统及Raid配置,R740由于主板芯片版本高,大部分低版本的OS V-系统相关驱动与主板新版不支持或是不兼容,本次OS版本为Ubuntu18
硬盘调整及Raid
ahswjcloudcontroller1 10.34.1.15 [3 * 1.6T Raid5][128G、2.8T、80C]
ahswjcloudcompute01 10.34.1.16 [6 * 1.6T Raid5][128G、7.5T、80C]
ahswjcloudcompute02 10.34.1.17 [8 * 1.6T Raid5][256G、11T、104C]
网平台Ip网络规划
水情ip子网池信息:10.34.1.1 ~ 10.34.1.61
网关10.34.1.62
子网掩码 10.34.1.0/26 255.255.255.192 跨网段,同网段不用
Ip池:10.34.1.19,10.34.1.61
集群主机基础环境配置
配置ssh,root远程,静态ip配置,网络标识id统一
为了便于;kola-ansible 管控引导环境,所以集群云平台节点网卡名称保持一致,为eno’N’
/etc/hosts 配置本地解析配置
Ubuntu18 各节点网络配置
部署节点到下发节点sshpass配置
几个动态的随机文件,根据环境修改
/etc/kolla/passwords.yml
/etc/kola/multimode
/etc/kola/globals.yml
Openstack分发版本列表
运行kolla-ansible 下发
主要如下
Kola-ansible -I multimode bootstrap-servers 带有kolla引导服务器部署依赖关系
Kola-ansible -I multimode prechecks 对主机进行预部署检测
Kola-ansible -I multimode deploy 执行下发任务v
分区误区,错误设置了/var分区
https://blog.51cto.com/yuweibing/1976882
kolla/ubuntu-binary-nova-consoleauth
kolla/ubuntu-binary-nova-
Docker hub上指定镜像的搜索
https://hub.docker.com/search?q=kolla%2Fubuntu-binary-nova-consoleauth&type=image
因为当前R740主板相关芯片驱动不被老OS支持,安装成功的linux是ubuntu18
https://ubuntu.com/server/docs/network-configuration
替换成Centos7.7版本
修改主机名,配置/etc/hosts本地解析,修改网卡名称一致
关闭selinux,停用networkmanager
\
查看日志mariadb 的innodb 为FCFS错误,直接找到kolla-ansible 引导的配置文件将innodb 的FCFS注释即可;
haproxy 高可用插件下的rabbitmq 运行不起来 的问题,haproxy 负载了几乎整个openstack 环境的 基础支撑插件及核心的组件,别的插件组件容器都能正常的运行监听,唯独haproxy与rabbitmq 的15672(rabbitmq_manager)的端口冲突经过日志查看 别的类似memcached,mariadb与haproxy的端口监听状态是haproxy使用vip 监听一个11211或是3306实际控制端口的ip监听一个11211或是3306,而rabbitmq的容器发布的监听就比较怪了,封装的容器监听的地址是0.0.0.0 !!! 明显kolla-ansible 框架bug!,部分组件的兼容性问题python、cryptograph、paramiko等
容器dw监控
部分版本没有获取组件容器镜像
RUNNING HANDLER [common : Restart fluentd container] ****************************************************************************************************************************************
fatal: [ahswjcloudcompute02]: FAILED! => {"changed": false, "msg": "Unknown error message: error pulling image configuration: Get https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/e6/e6f7b8d6a8dfaf7c7976da365aa95291ccac24a12c0211241cf775e85f0973bd/data?verify=1596522626-XnSQQqdrXTU53RB22bcsQsnojaY%3D: dial tcp: lookup production.cloudflare.docker.com on 202.102.192.68:53: read udp 10.34.1.17:36175->202.102.192.68:53: i/o timeout"}
Docker volume rm mariadb;
Find / ! -path ‘/proc/*’ ! -path ‘/sys/*’ ! -path ‘/run/*’ -type f | xargs grep -l “python 2 is no longs supported by the python core team”;
找到控制节点上的mariadb 容器中的python cryptograph 的插件__init__.py然后把里面的warnnings 给注释掉,重启控制节点上的所以容器,然后再重启运行引导部署节点的kolla-ansible脚本即可;
简单小计:
由于kolla-ansible 批量依次完成openstack第三方支持组件及自身相应核心组件容器,执行过程是依次且相互依赖。
主要的问题及解决
1、 Openstack版本选择,比如选择openstack的stein版本,需要手动简单查看默认docker hub镜像仓库是否有你选择的 版本对应的核心组件以及可选择组件的容器镜像包(此处由于hub是开放平台,后面自己可以发布提交对应组件的容器镜像或是发行内网的docker 仓库)
2、 Creating haproxy mysql user CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed即python 的cryptography 的版本兼容性问题,此处直接影像的就是创建数据库及用户的报错,直接kolla-ansible 执行out
主要解决方案是在openStack云平台环境的控制节点,主要此处不是kolla-ansible的 下发引导节点调整cryptogryph的2.版本或是3版本的问题,直接在控制节点搜索cryptograph的__init__.py 注释里面的告警即可。
3、 Mariadb数据库的FCFS innodb问题,简单粗暴的解决方案就是在kolla-ansible deploy下发引导节点找到kolla-ansible 的task的xml配置文件,修改注释里面的innodb即可。
4、 个别openstack 版本的选择的时候可能 会出现haproxy 高可用负载与rabbitmq 15672 的一个web管理插件的监听端口冲突问题,导致haproxy、rabbitmq两者只能运行一个问题,通过日志追踪及相关监听端口的调试知,是mariadb启动时监听端口设置的是0.0.0.0为主机的*所以端口!!!!!此处已经查看相应的rabitmq的配置文件,里面的配置都是控制节点管理口ip此处我们的环境是10.34.1.15ip,其他相关的ip是haproxy的两个vip,一个是内置的,一个是对外的!这个问题只有个别openstack版本上遇到。简单直接的解决方案就是在控制节点上docker exec -it rabbitmq /bin/bash 找到rabbitmq的配置,禁用rabitmq的冲突15672端口及web管理组件。
Openstack 容器os版本与宿主机的系统类型的 无关性。
pip install python-openstackclient python-glanceclient python-neutronclient -i https://pypi.tuna.tsinghua.edu.cn/simple/
Pip –ignore-installed
Openstack网络配置脚本
TASK [mariadb : Copying over galera.cnf] ****************************************************************************************************************************************************
fatal: [ahswjcloudcontroller1]: FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_memtotal_mb'"}
find / ! -path '/proc/*' ! -path "/sys/*" -type f | xargs grep -l "Python 2 is no longer supported by the Python core team"
RUNNING HANDLER [nova : Restart nova-libvirt container] *************************************************************************************************************************************
fatal: [ahswjcloudcompute02]: FAILED! => {"msg": "The conditional check 'config_json.changed | bool or nova_libvirt_confs.changed | bool or nova_libvirt_container.changed | bool or ( ceph_conf is not none and ceph_conf.changed | bool ) or ( nova_ceph_keyring is defined and nova_ceph_keyring.changed | bool ) or ( libvirt_secrets_xml is defined and libvirt_secrets_xml.changed | bool ) or ( libvirt_secrets_key is defined and libvirt_secrets_key.changed | bool )' failed. The error was: error while evaluating conditional (config_json.changed | bool or nova_libvirt_confs.changed | bool or nova_libvirt_container.changed | bool or ( ceph_conf is not none and ceph_conf.changed | bool ) or ( nova_ceph_keyring is defined and nova_ceph_keyring.changed | bool ) or ( libvirt_secrets_xml is defined and libvirt_secrets_xml.changed | bool ) or ( libvirt_secrets_key is defined and libvirt_secrets_key.changed | bool )): 'unicode object' has no attribute 'changed'\n\nThe error appears to be in '/usr/share/kolla-ansible/ansible/roles/nova/handlers/main.yml': line 52, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Restart nova-libvirt container\n ^ here\n"}
fatal: [ahswjcloudcompute01]: FAILED! => {"msg": "The conditional check 'config_json.changed | bool or nova_libvirt_confs.changed | bool or nova_libvirt_container.changed | bool or ( ceph_conf is not none and ceph_conf.changed | bool ) or ( nova_ceph_keyring is defined and nova_ceph_keyring.changed | bool ) or ( libvirt_secrets_xml is defined and libvirt_secrets_xml.changed | bool ) or ( libvirt_secrets_key is defined and libvirt_secrets_key.changed | bool )' failed. The error was: error while evaluating conditional (config_json.changed | bool or nova_libvirt_confs.changed | bool or nova_libvirt_container.changed | bool or ( ceph_conf is not none and ceph_conf.changed | bool ) or ( nova_ceph_keyring is defined and nova_ceph_keyring.changed | bool ) or ( libvirt_secrets_xml is defined and libvirt_secrets_xml.changed | bool ) or ( libvirt_secrets_key is defined and libvirt_secrets_key.changed | bool )): 'unicode object' has no attribute 'changed'\n\nThe error appears to be in '/usr/share/kolla-ansible/ansible/roles/nova/handlers/main.yml': line 52, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Restart nova-libvirt container\n ^ here\n"}
Openstackclient 客户端 import queue 问题
Import queue 改成import Queue as queue openstackcloud.py utils.py