Skip to content

Troubleshooting

Common Issues and Solutions

SSH Connection Problems

If the tool stops working after creating instances and you experience timeouts, the issue might be related to your SSH key. This can happen if you're using a key with a passphrase or an older key, as newer operating systems may no longer support certain encryption methods.

Solutions: 1. Enable SSH Agent: Set networking.ssh.use_agent to true in your configuration file. This lets the SSH agent manage the key.

For macOS:

eval "$(ssh-agent -s)"
ssh-add --apple-use-keychain ~/.ssh/<private key>

For Linux:

eval "$(ssh-agent -s)"
ssh-add ~/.ssh/<private key>

  1. Test SSH Manually: Verify you can SSH to the instances manually:

    ssh -i ~/.ssh/your_private_key root@<server_ip>
    

  2. Check Key Permissions: Ensure your private key has correct permissions:

    chmod 600 ~/.ssh/your_private_key
    

Enable Debug Mode

You can run hetzner-k3s with the DEBUG environment variable set to true for more detailed output:

DEBUG=true hetzner-k3s create --config cluster_config.yaml

This will provide more detailed output, which can help you identify the root of the problem.

Cluster Creation Fails after Node Creation

Symptoms: Instances are created but cluster setup fails.

Possible Causes: - Network connectivity issues between nodes - Firewall blocking communication - Hetzner API rate limits

Solutions: 1. Check Network Connectivity: Verify nodes can communicate with each other 2. Review Firewall Rules: Ensure necessary ports are open 3. Wait and Retry: If it's a rate limit issue, wait a few minutes and retry

Load Balancer Issues

Symptoms: Load balancer stuck in "pending" state

Solutions: 1. Check Annotations: Ensure proper annotations are set on your services 2. Verify Location: Make sure the load balancer location matches your node locations 3. Check DNS Configuration: If using hostname annotation, ensure DNS is properly configured

Node Not Ready

Symptoms: Nodes show up as NotReady status

Solutions: 1. Check Node Status:

kubectl describe node <node-name>
kubectl get nodes -o wide

  1. Check Kubelet:

    ssh -i ~/.ssh/your_private_key root@<node-ip>
    systemctl status k3s-agent  # for workers
    systemctl status k3s-server  # for masters
    journalctl -u k3s-agent -f
    

  2. Restart K3s:

    ssh -i ~/.ssh/your_private_key root@<node-ip>
    systemctl restart k3s-agent  # or k3s-server
    

Pod Stuck in Pending State

Symptoms: Pods remain in Pending state indefinitely

Solutions: 1. Check Resource Availability:

kubectl describe pod <pod-name> -n <namespace>
Look for events indicating insufficient resources.

  1. Add More Nodes: If nodes are at capacity, either scale up existing node pools or add new nodes

  2. Check Taints and Tolerations: Ensure pods have tolerations for any node taints

Storage Issues

Symptoms: PVCs stuck in Pending state, pods can't mount volumes

Solutions: 1. Check Storage Classes:

kubectl get sc

  1. Describe PVC:

    kubectl describe pvc <pvc-name> -n <namespace>
    

  2. Check CSI Driver:

    kubectl get pods -n kube-system | grep csi
    

Network Plugin Issues

Symptoms: Pods can't communicate with each other, DNS resolution fails

Solutions: 1. Check CNI Pods:

kubectl get pods -n kube-system | grep -E '(flannel|cilium)'

  1. Restart CNI: Restart the relevant CNI pods

Upgrade Issues

Symptoms: Cluster upgrade process gets stuck

Solutions: 1. Clean up Upgrade Resources:

kubectl -n system-upgrade delete job --all
kubectl -n system-upgrade delete plan --all

  1. Remove Labels:

    kubectl label node --all plan.upgrade.cattle.io/k3s-server- plan.upgrade.cattle.io/k3s-agent-
    

  2. Restart Upgrade Controller:

    kubectl -n system-upgrade rollout restart deployment system-upgrade-controller
    

Getting Help

If you're still experiencing issues after trying these solutions:

  1. Check GitHub Issues: Search existing issues at github.com/vitobotta/hetzner-k3s/issues
  2. Create New Issue: If your issue hasn't been reported, create a new issue with:
  3. Your configuration file (redacted)
  4. Full debug output (DEBUG=true hetzner-k3s ...)
  5. Operating system and Hetzner-k3s version
  6. Steps to reproduce the issue
  7. GitHub Discussions: For general questions and discussions, use GitHub Discussions

Useful Commands for Troubleshooting

# Check cluster status
kubectl cluster-info
kubectl get nodes
kubectl get pods -A

# Check resource usage
kubectl top nodes
kubectl top pods -A

# Check events
kubectl get events -A --sort-by='.metadata.creationTimestamp'

# Check specific pod details
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace>

# Check node details
kubectl describe node <node-name>

# Check network connectivity
kubectl run test-pod --image=busybox -- sleep 3600
kubectl exec -it test-pod -- nslookup kubernetes.default
kubectl exec -it test-pod -- ping <other-pod-ip>