-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
problem
On KVM, the libvirt discoverer creates and uses guid based on IP address, however, when the IP address of a KVM host is swapped with other hosts or changed, it can cause side effects such as the new host trying to re-add/configure itself to an old DB entry in the cloud.host table. It can can further cause issues if host has local storage and cause host in Alert stage. When the host is forced removed it can cause deletion of VMs, volumes and local storage pool causing data loss.
versions
Reproduce with ACS 4.22 with mixed arch kvm hosts, in adv zone.
The steps to reproduce the bug
In my env, I swapped IP address of two KVM hosts, and removed one of the hosts to add in another zone but keeping the same mgmt network IP. The zones have two phy networks, cloudbr0 to carry guest and public tariff and cloudbr1 to carry mgmt/storage traffic.
First I observed the host to be flaky to occupy the same host VO/row so I forced removed it which caused destroying all VMs and volumes on that kvm hosts. Luckily had volume snap hosts so I could recover them later one but hit an NPE issue that I fixed in a recent PR.
Next had to debug code and errors to understand the culprit was the guid logic that tried to create a UUID based on ip address string. Ideally this is a design bug, a long term fix may need to be discussed.
What to do about it?
No response