openstack 创建虚拟机失败
创建虚拟机失败
问题发现
客户按照往常一样提创建资源需求给我,但这次需要的虚拟机的数量比较多,所以我先找比较空闲资源的物理机,找到较空闲资源物理机后,我这次虚拟机创建通过指定物理机来进行创建,但发现创建失败了。
环境信息
openstack 版本: train
部署方式:kolla-ansible
排查思路
1、首先在管理界面看到返回的错误是找不到可用的物理机,但具体为什么找不到呢?明明资源是充足的
File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/scheduler/manager.py", line 199, in select_destinations
raise exception.NoValidHost(reason="")
NoValidHost: No valid host was found.
2、去到后端nova-conductor、nova-schedule节点,根据req-id查看相应的日志。
$ cd /var/lib/docker/volumes/kolla_logs/_data/nova/
$ grep -irn req-63cfc2f4-d151-4a38-a651-99b4aa9a4aec *
nova-api.log:165107:2022-07-22 08:21:34.722 33 INFO nova.osapi_compute.wsgi.server [req-63cfc2f4-d151-4a38-a651-99b4aa9a4aec 60b79ed1cde041d9842d8d08033b648e 05f44ecf38e649b7b0c0be54ddad96f6 - default default] [req-ba11ea4b-821c-487d-af74-0f845c5e0859] 10.2.4.252,10.2.4.253 "POST /v2.1/servers HTTP/1.0" status: 202 len: 870 time: 1.4336519
nova-api.log:165486:2022-07-22 08:22:30.114 34 INFO nova.osapi_compute.wsgi.server [req-45795b4a-f5b5-43e0-a312-a623cadc1c62 60b79ed1cde041d9842d8d08033b648e 05f44ecf38e649b7b0c0be54ddad96f6 - default default] [req-6fef5294-3b25-4193-8488-b4bda4927eba] 10.2.4.252,10.2.4.253 "GET /v2.1/servers/ec836cc9-73b8-42e9-b137-8bebf5b76220/os-instance-actions/req-63cfc2f4-d151-4a38-a651-99b4aa9a4aec HTTP/1.0" status: 200 len: 2347 time: 0.0660629
nova-conductor.log:61606:2022-07-22 08:21:35.370 31 ERROR nova.conductor.manager [req-63cfc2f4-d151-4a38-a651-99b4aa9a4aec 60b79ed1cde041d9842d8d08033b648e 05f44ecf38e649b7b0c0be54ddad96f6 - default default] [req-ba11ea4b-821c-487d-af74-0f845c5e0859] Failed to schedule instances: NoValidHost_Remote: No valid host was found.
nova-conductor.log:61645:2022-07-22 08:21:35.473 31 WARNING nova.scheduler.utils [req-63cfc2f4-d151-4a38-a651-99b4aa9a4aec 60b79ed1cde041d9842d8d08033b648e 05f44ecf38e649b7b0c0be54ddad96f6 - default default] [req-ba11ea4b-821c-487d-af74-0f845c5e0859] Failed to compute_task_build_instances: No valid host was found.
nova-conductor.log:61656:2022-07-22 08:21:35.475 31 WARNING nova.scheduler.utils [req-63cfc2f4-d151-4a38-a651-99b4aa9a4aec 60b79ed1cde041d9842d8d08033b648e 05f44ecf38e649b7b0c0be54ddad96f6 - default default] [req-ba11ea4b-821c-487d-af74-0f845c5e0859] [instance: ec836cc9-73b8-42e9-b137-8bebf5b76220] Setting instance to ERROR state.: NoValidHost_Remote: No valid host was found.
nova-scheduler.log:512:2022-07-22 08:21:35.344 28 INFO nova.scheduler.manager [req-63cfc2f4-d151-4a38-a651-99b4aa9a4aec 60b79ed1cde041d9842d8d08033b648e 05f44ecf38e649b7b0c0be54ddad96f6 - default default] [req-ba11ea4b-821c-487d-af74-0f845c5e0859] Got no allocation candidates from the Placement API. This could be due to insufficient resources or a temporary occurrence as compute nodes start up.
3、发现scheduler报错的关键信息,Got no allocation candidates from the Placement API. This could be due to insufficient resources or a temporary occurrence as compute nodes start up.看起来说还是资源不足。
4、到创建虚拟机的所在计算节点(virtl-62.vim1.local),查看nova-compute的日志
$ cd /var/lib/docker/volumes/kolla_logs/_data/nova/
$ tail -f nova-compute.log 出现很多resource_tracker,是虚拟机迁移之后出现placment资源没有更新的问题。
2022-07-22 09:14:15.671 7 WARNING nova.compute.resource_tracker [req-d3c07302-43dd-4acc-b184-207b711d9077 - - - - -] [None] Instance 93dc1f42-50a0-4727-b740-80e594492d4b has been moved to another host virtl-32.vim1.local(virtl-32.vim1.local). There are allocations remaining against the source host that might need to be removed: {u'resources': {u'VCPU': 16, u'MEMORY_MB': 65536}}.
2022-07-22 09:15:15.035 7 WARNING nova.compute.resource_tracker [req-d3c07302-43dd-4acc-b184-207b711d9077 - - - - -] [None] Instance f3dd8ca2-6db6-4c02-96ae-579d54fd5993 has been moved to another host virtl-14.vim1.local(virtl-14.vim1.local). There are allocations remaining against the source host that might need to be removed: {u'resources': {u'VCPU': 16, u'MEMORY_MB': 65536}}.
2022-07-22 09:15:15.081 7 WARNING nova.compute.resource_tracker [req-d3c07302-43dd-4acc-b184-207b711d9077 - - - - -] [None] Instance 924529c3-7b15-4180-ac91-613ab811a338 has been moved to another host virtl-16.vim1.local(virtl-16.vim1.local). There are allocations remaining against the source host that might need to be removed: {u'resources': {u'VCPU': 16, u'MEMORY_MB': 65536}}.
2022-07-22 09:15:15.127 7 WARNING nova.compute.resource_tracker [req-d3c07302-43dd-4acc-b184-207b711d9077 - - - - -] [None] Instance ae74fc48-5a87-4064-ade4-2425605b26b5 has been moved to another host virtl-7.vim1.local(virtl-7.vim1.local). There are allocations remaining against the source host that might need to be removed: {u'resources': {u'VCPU': 16, u'MEMORY_MB': 65536}}
解决办法
判断是因为placment的资源没有更新造成的,我们进入数据库,找到placement表,查看virtl-62.vim1.local对应的虚拟机资源信息。
1、进入openstack的数据库
$ use placement;
2、根据virtl-62.vim1.local主机的uuid,找到主机resource_providers资源信息。
$ select * from resource_providers where uuid='89420e52-ed6c-4154-a279-4009549f52ae';
+---------------------+---------------------+-----+--------------------------------------+------------------------+------------+------------------+--------------------+
| created_at | updated_at | id | uuid | name | generation | root_provider_id | parent_provider_id |
+---------------------+---------------------+-----+--------------------------------------+------------------------+------------+------------------+--------------------+
| 2022-03-31 12:51:02 | 2022-07-22 01:17:14 | 145 | 89420e52-ed6c-4154-a279-4009549f52ae | virtlsrv-62.vim1.local | 77 | 145 | NULL |
+---------------------+---------------------+-----+--------------------------------------+------------------------+------------+------------------+--------------------+
3、在【排查过程】上述的nova-compute resource_tracker 日志中找到一个Instance id,然后去allocations表里面查找,你会发现Instance记录竟然还在virtl-62.vim1.local主机上,按理来说不应该存在。
$ select * from allocations where consumer_id='93dc1f42-50a0-4727-b740-80e594492d4b';
+---------------------+------------+------+----------------------+--------------------------------------+-------------------+-------+
| created_at | updated_at | id | resource_provider_id | consumer_id | resource_class_id | used |
+---------------------+------------+------+----------------------+--------------------------------------+-------------------+-------+
| 2022-04-10 02:10:55 | NULL | 9408 | 145 | 93dc1f42-50a0-4727-b740-80e594492d4b | 0 | 16 |
| 2022-04-10 02:10:55 | NULL | 9411 | 145 | 93dc1f42-50a0-4727-b740-80e594492d4b | 1 | 65536 |
+---------------------+------------+------+----------------------+--------------------------------------+-------------------+-------+
2 rows in set (0.000 sec)
4、为了确保Instance不是运行在virtl-62.vim1.local主机上。通过以下方法查看,返回为0。
$ nova list --host virtl-62.vim1.local --all | grep 93dc1f42-50a0-4727-b740-80e594492d4b
5、然后我采用直接查看instance的详情信息,看他运行在哪台主机上,发现是virtl-15.vim1.local 主机上,并不是virtl-62.vim1.local主机上。
$ nova show 93dc1f42-50a0-4727-b740-80e594492d4b | grep hostname
| OS-EXT-SRV-ATTR:hypervisor_hostname | virtl-15.vim1.local
6、所以我们从allocations表,把resource_provider_id=145(代表就是virtl-62.vim1.local主机),consumer_id=93dc1f42-50a0-4727-b740-80e594492d4b的数据删除掉。
$ delete from allocations where consumer_id='93dc1f42-50a0-4727-b740-80e594492d4b';
Query OK, 2 rows affected (0.001 sec)
以此类推,把nova-compute出现很多的resource_tracker日志上的虚拟机都删除掉。接下来resource_tracker错误信息也没有了,创建虚拟机也正常了。