Almost 3000 lines of code for automating draining nodes and rebooting it. And it requires that another component has already queued up an update that requires a reboot.
Looking at the issues, people try to shoehorn a thousand unique behaviours into a general purpose tool, just to avoid a bit of old school sysadmin-ing. There's a guy wanting to change TZ of the running cluster, and want "Kured" to support that use case so it's only updated during night - in an ever changing TZ.
Insert the "No god no" meme here - you really shouldn't be updating nodes in place and thus shouldn't be restarting nodes.
I'm aware bare metal exists and it's not always practical to just provision more servers, yet I think for most workloads you're not getting the benefit of Kubernetes if you have say 3 servers and lose 1/3 of your capacity to do software updates.
I’ve never understood the gatekeeping people wrap around kubernetes.
Even with small 3 node cluster of of raspberry pis, you can run anything you can run in simple docker, and have it survive outages/reboots/etc.
At home, I have a few raspberry pis, orangepi RV (riscv nodes), and my main nodes are large high core and RAM VMs running on Proxmox.
Each one has different capabilities. Some have lots of fast storage attached for longhorn, some have 10Gb/25Gb networking, etc.
And the great part is if I wanted to collapse down to just the SBCs? I would just need to scale down some replicas of high men or high cpu stuff I’m testing.
Of course at job, I just pick the node shape and capabilities I need and don’t think about it.
Yeah, I’m probably the exception for running kubernetes at home, but I would argue if you are running more than a handful of docker containers, you should probably be using kubernetes anyway.
Especially if you care about things being up, or want to be able to seamlessly shuffle stuff around for maintenance. Not to mention my entire infrastructure is repeatable with just a small git repo of fluxcd stuff
I'm not personally trying to gatekeep kubernetes, everyone should do what works for them. However, if I'm putting my professional credibility and/or my sleep schedule on the line, I would not advise anyone to do this.
Even at home, I run stuff that needs to be highly available enough that I wouldn't go this route when there are better options.
It just takes time to design your hardware/software stack to be able to survive reboots and recover back to ideal states. I guess nobody really enjoys rebooting machines, but at the same time, I don't think people should be afraid of doing it.
Rescheduler is impractical because scheduling is environment specific. You might have for example a database that needs three nodes and you have three servers, there's no where to reschedule those pods to in that case.
In the cloud you can use cluster autoscaler or karpenter to automatically handle the unhomed pods however.
What happens is, that when one node goes down, the pod gets removed and moves to another node. The node comes backup but k8s doesn't rebalance that pod despite having an anti-affinity.
For me, that this feature doesn't exist in core k8s, is bad. It should be able to do so. Controllable for sure.
Almost 3000 lines of code for automating draining nodes and rebooting it. And it requires that another component has already queued up an update that requires a reboot.
Looking at the issues, people try to shoehorn a thousand unique behaviours into a general purpose tool, just to avoid a bit of old school sysadmin-ing. There's a guy wanting to change TZ of the running cluster, and want "Kured" to support that use case so it's only updated during night - in an ever changing TZ.
Insert the "No god no" meme here - you really shouldn't be updating nodes in place and thus shouldn't be restarting nodes.
I'm aware bare metal exists and it's not always practical to just provision more servers, yet I think for most workloads you're not getting the benefit of Kubernetes if you have say 3 servers and lose 1/3 of your capacity to do software updates.
I’ve never understood the gatekeeping people wrap around kubernetes.
Even with small 3 node cluster of of raspberry pis, you can run anything you can run in simple docker, and have it survive outages/reboots/etc.
At home, I have a few raspberry pis, orangepi RV (riscv nodes), and my main nodes are large high core and RAM VMs running on Proxmox.
Each one has different capabilities. Some have lots of fast storage attached for longhorn, some have 10Gb/25Gb networking, etc.
And the great part is if I wanted to collapse down to just the SBCs? I would just need to scale down some replicas of high men or high cpu stuff I’m testing.
Of course at job, I just pick the node shape and capabilities I need and don’t think about it.
Yeah, I’m probably the exception for running kubernetes at home, but I would argue if you are running more than a handful of docker containers, you should probably be using kubernetes anyway.
Especially if you care about things being up, or want to be able to seamlessly shuffle stuff around for maintenance. Not to mention my entire infrastructure is repeatable with just a small git repo of fluxcd stuff
I'm not personally trying to gatekeep kubernetes, everyone should do what works for them. However, if I'm putting my professional credibility and/or my sleep schedule on the line, I would not advise anyone to do this.
Even at home, I run stuff that needs to be highly available enough that I wouldn't go this route when there are better options.
I'd love to hear about your HA solution for things like this.
We have 2 main servers and a 3th 'side/batch-node'.
When we restart one node, postgresql switches automatically over, fe/be is webscale anyway.
It works very well.
It just takes time to design your hardware/software stack to be able to survive reboots and recover back to ideal states. I guess nobody really enjoys rebooting machines, but at the same time, I don't think people should be afraid of doing it.
What’s the usecase where you are okay cordoning a node but not okay with just terminating it and starting a new one?
Physical nodes where you have to reclaim them and don’t run any virtualisation ?
I like it. K8s should be more opiniated about this.
Whats also missing is rebalancing of pods. Rescheduler
Rescheduler is impractical because scheduling is environment specific. You might have for example a database that needs three nodes and you have three servers, there's no where to reschedule those pods to in that case.
In the cloud you can use cluster autoscaler or karpenter to automatically handle the unhomed pods however.
What happens is, that when one node goes down, the pod gets removed and moves to another node. The node comes backup but k8s doesn't rebalance that pod despite having an anti-affinity.
For me, that this feature doesn't exist in core k8s, is bad. It should be able to do so. Controllable for sure.