Upgrading a cluster created with hetzner-k3s v1.x to v2.x
The v1 version of hetzner-k3s is quite old and hasn’t been supported for some time. I understand that many haven’t upgraded to v2 because, until now, there wasn’t a simple process to do this.
The good news is that the migration is now possible and straightforward, as long as you follow these instructions very carefully and take your time. This upgrade also allows you to replace deprecated instance types with newer ones. Note that this migration requires hetzner-k3s v2.2.4 or higher. Using the very latest version is recommended.
Prerequisites
- I suggest installing the hcloud utility. It will make it easier and faster to delete old master nodes.
Upgrading configuration and first steps
- Backup apps and data – As with any migration, there’s some risk involved, so it’s better to be prepared in case things don’t go as planned.
- Backup kubeconfig and the old config file
- Uninstall the System Upgrade Controller
- Create a resolv file on existing nodes. You can do this manually or automate it using the
hcloudCLI: - Convert the config file to the new format. You can find guidance here for the initial v2.0.0 release. Make sure you read all the following release notes to apply any other changes to the config format introduced by newer versions of the tool, until the very latest release.
- Remove or comment out empty node pools from the config file.
- Set
embedded_registry_mirror.enabledtofalseif necessary, depending on the current version of k3s (refer to this documentation). - Add
legacy_instance_typeto ALL node pools, including both masters and workers. Set it to the current instance type for each node pool, even if it’s now deprecated. This step is critical for the migration. - Set
instancetype for all the node pools to supported instance types, if your current ones have been deprecated by Hetzner. This is important to replace the existing nodes with new ones based on supported instance types. - Run the
createcommand using the latest version of hetzner-k3s and the new config file. - Wait for all CSI pods in
kube-systemto restart, and make sure everything is running correctly.
Rotating control plane instances with the new instance type
Replace one master at a time (unless your cluster has a load balancer for the Kubernetes API, switch to another master's kube context before replacing master1):
- Drain the master first, then delete it both from the cluster using both kubectl and from the Hetzner console (or the
hcloudCLI) to delete the actual instance. - Rerun the
createcommand to recreate the master with the new instance type. Wait for it to join the control plane and reach the "ready" status. - SSH into each master and verify that the etcd members are updated and in sync:
sudo apt-get update
sudo apt-get install etcd-client
export ETCDCTL_API=3
export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
export ETCDCTL_CACERT=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
export ETCDCTL_CERT=/var/lib/rancher/k3s/server/tls/etcd/server-client.crt
export ETCDCTL_KEY=/var/lib/rancher/k3s/server/tls/etcd/server-client.key
etcdctl member list
Repeat the steps above carefully for each master. Once all three masters have been replaced:
- Rerun the
createcommand once or twice more to ensure the configuration is stable and the masters no longer restart. - Debug DNS resolution. If there are issues, restart the agents for DNS resolution with this command, then restart CoreDNS:
- Address any issues with your workloads before proceeding to rotate the worker nodes.
Rotating a worker node pool
- Increase the node count for the pool by 1.
- Run the
createcommand to create the extra node needed during the pool rotation.
Replace one worker node at a time (except for the last one you just added):
- Drain a node.
- Delete the drained node using both kubectl and the Hetzner console (or the
hcloudCLI). - Rerun the
createcommand to recreate the deleted node. - Verify everything is working as expected before moving on to the next node in the pool.
Once all the existing nodes in the pool have been rotated:
- Drain the very last node in the pool (the one you added earlier).
- Verify everything is functioning correctly.
- Delete the last node using both kubectl and the Hetzner console (or the
hcloudCLI). - Update the
instance_countfor the node pool by reducing it by 1. - Proceed with the next node pool.
Finalizing
- Remove the
legacy_instance_typesetting from both master and worker node pools. - Rerun the
createcommand once more to double-check everything. - Optionally, convert the currently zonal cluster to a regional one with masters in different locations (see this guide).