pve_ceph问题汇总
在同一个网络内,建立了两个同名的群集
Jun 24 11:56:08 cu-pve05 kyc_zabbix_ceph[2419970]: ]} Jun 24 11:56:08 cu-pve05 corosync[3954]: error [TOTEM ] Digest does not match Jun 24 11:56:08 cu-pve05 corosync[3954]: alert [TOTEM ] Received message has invalid digest... ignoring. Jun 24 11:56:08 cu-pve05 corosync[3954]: alert [TOTEM ] Invalid packet data Jun 24 11:56:08 cu-pve05 corosync[3954]: [TOTEM ] Digest does not match Jun 24 11:56:08 cu-pve05 corosync[3954]: [TOTEM ] Received message has invalid digest... ignoring. Jun 24 11:56:08 cu-pve05 corosync[3954]: [TOTEM ] Invalid packet data Jun 24 11:56:08 cu-pve05 kyc_zabbix_ceph[2419970]: Response from "192.168.7.114:10051": "processed: 3; failed: 48; total: 51; seconds spent: 0.001189" Jun 24 11:56:08 cu-pve05 kyc_zabbix_ceph[2419970]: sent: 51; skipped: 0; total: 51 Jun 24 11:56:08 cu-pve05 corosync[3954]: error [TOTEM ] Digest does not match Jun 24 11:56:08 cu-pve05 corosync[3954]: alert [TOTEM ] Received message has invalid digest... ignoring. Jun 24 11:56:08 cu-pve05 corosync[3954]: alert [TOTEM ] Invalid packet data Jun 24 11:56:08 cu-pve05 corosync[3954]: [TOTEM ] Digest does not match Jun 24 11:56:08 cu-pve05 corosync[3954]: [TOTEM ] Received message has invalid digest... ignoring. Jun 24 11:56:08 cu-pve05 corosync[3954]: [TOTEM ] Invalid packet data Jun 24 11:56:08 cu-pve05 corosync[3954]: error [TOTEM ] Digest does not match Jun 24 11:56:08 cu-pve05 corosync[3954]: alert [TOTEM ] Received message has invalid digest... ignoring. Jun 24 11:56:08 cu-pve05 corosync[3954]: alert [TOTEM ] Invalid packet data Jun 24 11:56:08 cu-pve05 corosync[3954]: [TOTEM ] Digest does not match Jun 24 11:56:08 cu-pve05 corosync[3954]: [TOTEM ] Received message has invalid digest... ignoring. Jun 24 11:56:08 cu-pve05 corosync[3954]: [TOTEM ] Invalid packet data Jun 24 11:56:09 cu-pve05 corosync[3954]: error [TOTEM ] Digest does not match Jun 24 11:56:09 cu-pve05 corosync[3954]: alert [TOTEM ] Received message has invalid digest... ignoring. Jun 24 11:56:09 cu-pve05 corosync[3954]: alert [TOTEM ] Invalid packet data Jun 24 11:56:09 cu-pve05 corosync[3954]: [TOTEM ] Digest does not match Jun 24 11:56:09 cu-pve05 corosync[3954]: [TOTEM ] Received message has invalid digest... ignoring. Jun 24 11:56:09 cu-pve05 corosync[3954]: [TOTEM ] Invalid packet data Jun 24 11:56:09 cu-pve05 corosync[3954]: error [TOTEM ] Digest does not match Jun 24 11:56:09 cu-pve05 corosync[3954]: alert [TOTEM ] Received message has invalid digest... ignoring. Jun 24 11:56:09 cu-pve05 corosync[3954]: alert [TOTEM ] Invalid packet data
删掉其中一个后,又报下面的错,在节点视图下的syslog下看到的
Jul 11 18:48:01 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket error on write Jul 11 18:48:04 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket closed (con state CONNECTING) Jul 11 18:48:07 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket error on read Jul 11 18:48:35 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket error on read Jul 11 18:48:39 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket error on read Jul 11 18:48:42 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket closed (con state CONNECTING) Jul 11 18:48:45 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket closed (con state CONNECTING) Jul 11 18:48:51 cu-pve03 pvedaemon[4111390]: worker exit Jul 11 18:48:51 cu-pve03 pvedaemon[4692]: worker 4111390 finished Jul 11 18:48:51 cu-pve03 pvedaemon[4692]: starting 1 worker(s) Jul 11 18:48:51 cu-pve03 pvedaemon[4692]: worker 4148787 started Jul 11 18:49:00 cu-pve03 systemd[1]: Starting Proxmox VE replication runner... Jul 11 18:49:00 cu-pve03 systemd[1]: Started Proxmox VE replication runner. Jul 11 18:49:06 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket error on read Jul 11 18:49:10 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket closed (con state CONNECTING) Jul 11 18:49:13 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket error on read Jul 11 18:49:16 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket closed (con state CONNECTING) Jul 11 18:49:23 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket closed (con state CONNECTING) Jul 11 18:49:30 cu-pve03 kernel: libceph: mon1 192.168.7.5:6789 socket closed (con state CONNECTING) Jul 11 18:49:36 cu-pve03 kernel: libceph: mon1 192.168.7.5:6789 socket error on write Jul 11 18:49:39 cu-pve03 kernel: libceph: mon1 192.168.7.5:6789 socket closed (con state CONNECTING) Jul 11 18:49:42 cu-pve03 kernel: libceph: mon1 192.168.7.5:6789 socket error on read Jul 11 18:49:50 cu-pve03 kernel: libceph: mon1 192.168.7.5:6789 socket error on write Jul 11 18:50:00 cu-pve03 systemd[1]: Starting Proxmox VE replication runner... Jul 11 18:50:00 cu-pve03 systemd[1]: Started Proxmox VE replication runner. Jul 11 18:50:00 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket error on read Jul 11 18:50:07 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket error on write Jul 11 18:50:14 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket closed (con state CONNECTING) Jul 11 18:50:26 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket closed (con state CONNECTING) Jul 11 18:50:31 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket error on write Jul 11 18:50:34 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket closed (con state CONNECTING) Jul 11 18:50:37 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket error on write Jul 11 18:50:40 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket error on write Jul 11 18:50:55 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket closed (con state CONNECTING) Jul 11 18:51:00 cu-pve03 systemd[1]: Starting Proxmox VE replication runner... Jul 11 18:51:00 cu-pve03 systemd[1]: Started Proxmox VE replication runner. Jul 11 18:51:02 cu-pve03 kernel: libceph: mon1 192.168.7.5:6789 socket closed (con state CONNECTING) Jul 11 18:51:09 cu-pve03 kernel: libceph: mon1 192.168.7.5:6789 socket error on read Jul 11 18:51:16 cu-pve03 kernel: libceph: mon1 192.168.7.5:6789 socket closed (con state CONNECTING) Jul 11 18:51:19 cu-pve03 kernel: libceph: mon1 192.168.7.5:6789 socket error on read Jul 11 18:51:33 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket closed (con state CONNECTING) Jul 11 18:51:43 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket error on read Jul 11 18:51:48 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket error on read Jul 11 18:51:51 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket error on write Jul 11 18:52:00 cu-pve03 systemd[1]: Starting Proxmox VE replication runner... Jul 11 18:52:00 cu-pve03 systemd[1]: Started Proxmox VE replication runner. Jul 11 18:52:03 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket closed (con state CONNECTING) Jul 11 18:52:14 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket error on write Jul 11 18:52:26 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket error on write Jul 11 18:52:34 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket closed (con state CONNECTING) Jul 11 18:52:37 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket error on write Jul 11 18:52:40 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket closed (con state CONNECTING) Jul 11 18:52:43 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket closed (con state CONNECTING) Jul 11 18:52:50 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket closed (con state CONNECTING) Jul 11 18:53:00 cu-pve03 systemd[1]: Starting Proxmox VE replication runner... Jul 11 18:53:00 cu-pve03 systemd[1]: Started Proxmox VE replication runner. Jul 11 18:53:01 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket closed (con state CONNECTING) Jul 11 18:53:05 cu-pve03 kernel: libceph: mon1 192.168.7.5:6789 socket error on write Jul 11 18:53:08 cu-pve03 kernel: libceph: mon1 192.168.7.5:6789 socket error on write Jul 11 18:53:11 cu-pve03 kernel: libceph: mon1 192.168.7.5:6789 socket error on write Jul 11 18:53:14 cu-pve03 kernel: libceph: mon1 192.168.7.5:6789 socket closed (con state CONNECTING) Jul 11 18:53:28 cu-pve03 kernel: libceph: mon1 192.168.7.5:6789 socket closed (con state CONNECTING) Jul 11 18:53:51 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket error on read Jul 11 18:53:55 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket error on read Jul 11 18:53:58 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket closed (con state CONNECTING) Jul 11 18:54:00 cu-pve03 systemd[1]: Starting Proxmox VE replication runner... Jul 11 18:54:00 cu-pve03 systemd[1]: Started Proxmox VE replication runner. Jul 11 18:54:01 cu-pve03 kernel: libceph: mon0 192.168.7.4:6789 socket closed (con state CONNECTING) Jul 11 18:54:06 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket error on read Jul 11 18:54:09 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket closed (con state CONNECTING) Jul 11 18:54:17 cu-pve03 kernel: libceph: mon2 192.168.7.6:6789 socket closed (con state CONNECTING) Jul 11 18:54:18 cu-pve03 kernel: libceph: mds0 192.168.7.5:6800 socket closed (con state CONNECTING) Jul 11 18:54:32 cu-pve03 pveproxy[4118327]: worker exit Jul 11 18:54:32 cu-pve03 pveproxy[7729]: worker 4118327 finished Jul 11 18:54:32 cu-pve03 pveproxy[7729]: starting 1 worker(s) Jul 11 18:54:32 cu-pve03 pveproxy[7729]: worker 4150738 started Jul 11 18:54:37 cu-pve03 kernel: libceph: mon1 192.168.7.5:6789 socket error on read
问题点: 1、te环境backup 40MB/s 2、数据库虚拟机备份,考虑挂载点的问题,/ceph/fileserver/... 3、复制kycfs上的文件。
vzdump 202 --compress lzo --storage kycfs --mode snapshot --node cu-pve05 --remove 0 vzdump 151 --mode stop --remove 0 --storage kycfs --compress lzo --node cu-pve02 --bwlimit 200000 -------------------------------------------------------- INFO: starting new backup job: vzdump 151 --mode stop --remove 0 --storage kycfs --compress lzo --node cu-pve02 INFO: Starting Backup of VM 151 (qemu) INFO: Backup started at 2019-07-10 16:26:44 INFO: status = stopped INFO: update VM 151: -lock backup INFO: backup mode: stop INFO: ionice priority: 7 INFO: VM Name: cu-dbs-151 INFO: include disk 'scsi0' 'kycrbd:vm-151-disk-0' 100G INFO: include disk 'scsi1' 'kycrbd:vm-151-disk-1' 300G INFO: snapshots found (not included into backup) INFO: creating archive '/mnt/pve/kycfs/dump/vzdump-qemu-151-2019_07_10-16_26_44.vma.lzo' INFO: starting kvm to execute backup task INFO: started backup task 'a29c0ebc-52ee-4823-a5e6-56e7443c2cae' INFO: status: 0% (499122176/429496729600), sparse 0% (423092224), duration 3, read/write 166/25 MB/s INFO: status: 1% (4353687552/429496729600), sparse 0% (4277657600), duration 22, read/write 202/0 MB/s --------------------------------------------------------- INFO: starting new backup job: vzdump 192 --compress lzo --bwlimit --storage kycfs --mode snapshot --node cu-pve06 --remove 0 INFO: Starting Backup of VM 192 (qemu) INFO: Backup started at 2019-07-10 16:28:53 INFO: status = stopped INFO: update VM 192: -lock backup INFO: backup mode: stop INFO: ionice priority: 7 INFO: VM Name: cu-tpl-192 INFO: include disk 'ide0' 'kycrbd:vm-192-disk-0' 100G INFO: creating archive '/mnt/pve/kycfs/dump/vzdump-qemu-192-2019_07_10-16_28_53.vma.lzo' INFO: starting kvm to execute backup task INFO: started backup task '58adf55a-971c-49aa-b42d-595f8e3a0cf3' INFO: status: 0% (197656576/107374182400), sparse 0% (114630656), duration 3, read/write 65/27 MB/s INFO: status: 1% (1090519040/107374182400), sparse 0% (556826624), duration 15, read/write 74/37 MB/s INFO: status: 2% (2181038080/107374182400), sparse 0% (563113984), duration 42, read/write 40/40 MB/s INFO: status: 3% (3257532416/107374182400), sparse 0% (581787648), duration 69, read/write 39/39 MB/s