ETCD Backup and Restore
This document demonstrates the restoration of the cluster from ETCD backup.
/var/lib/rancher/rke2 is the default data directory for rke2.
Snapshots
Snapshots are enabled by default.
The snapshot directory defaults to /var/lib/rancher/rke2/server/db/snapshots
.
Single-node
You must stop RKE2 service if it is enabled via systemd.
systemctl stop rke2-server
Select the snapshot.
ls /var/lib/rancher/rke2/server/db/snapshots/
3. Next, you will initiate the restore from snapshot.
rke2 server --cluster-reset --cluster-reset-restore-path=/var/lib/rancher/rke2/server/db/snapshots/<SNAPSHOT>
Once the restore process is complete, start the rke2-server service.
systemctl start rke2-server
Multi-node cluster
Restoring a Snapshot to Existing Nodes
When RKE2 is restored from backup, it moves the old data directory to “/var/lib/rancher/rke2/server/db/etcd-old-%date%/ ” and sets up a new single-member etcd cluster.
You must stop RKE2 service on all server nodes if it is enabled via systemd.
systemctl stop rke2-server
Select the snapshot.
ls /var/lib/rancher/rke2/server/db/snapshots/
Next, you will initiate the restore from the snapshot on the first server node with the following commands:
rke2 server --cluster-reset --cluster-reset-restore-path=/var/lib/rancher/rke2/server/db/snapshots/<SNAPSHOT>
Once the restore process is complete, start the rke2-server service on the first server node as follows:
systemctl start rke2-server
run the following commands on the other server node in the cluster
Remove the rke2 db directory on the other server nodes as follows:
rm -rf /var/lib/rancher/rke2/server/db
Start the rke2-server service on other server nodes with the following command:
systemctl start rke2-server
When rke2 resets the cluster, it creates an empty file at /var/lib/rancher/rke2/server/db/reset-flag
. This file is harmless to leave in place, but must be removed in order to perform subsequent resets or restores. This file is deleted when rke2 starts normally.
rm -rf /var/lib/rancher/rke2/server/db/reset-flag
Restoring a Snapshot to New Nodes
For rke2 v.1.20.9 and earlier, back up and restore certificates first due to a known issue with bootstrap data not saving on restore. See the note below for additional version-specific restore details.
Back up the following:
/var/lib/rancher/rke2/server/cred
,/var/lib/rancher/rke2/server/tls
,/var/lib/rancher/rke2/server/token
,/etc/rancher
,/var/openebs
.Restore the certs in Step 1 above to the first new server node.
Install rke2 on the first new server node by running the following command:
curl -sfL https://get.rke2.io |INSTALL_RKE2_TYPE=server sh -
Stop RKE2 service on all server nodes if it is enabled and initiate the restore from the snapshot on the first server node with the following commands:
systemctl stop rke2-server
rke2 server \
--cluster-reset \
--cluster-reset-restore-path=<PATH-TO-SNAPSHOT>
Once the restore process is complete, start the rke2-server service on the first server node as follows:
systemctl start rke2-server