Kubereboot/Kured: Kubernetes Reboot Daemon

19 points | by ankitg12 a day ago

11 comments

adamtulinius 19 hours ago
Almost 3000 lines of code for automating draining nodes and rebooting it. And it requires that another component has already queued up an update that requires a reboot.
Looking at the issues, people try to shoehorn a thousand unique behaviours into a general purpose tool, just to avoid a bit of old school sysadmin-ing. There's a guy wanting to change TZ of the running cluster, and want "Kured" to support that use case so it's only updated during night - in an ever changing TZ.
edude03 20 hours ago
Insert the "No god no" meme here - you really shouldn't be updating nodes in place and thus shouldn't be restarting nodes.
I'm aware bare metal exists and it's not always practical to just provision more servers, yet I think for most workloads you're not getting the benefit of Kubernetes if you have say 3 servers and lose 1/3 of your capacity to do software updates.
[-]
- k_roy 20 hours ago
  I’ve never understood the gatekeeping people wrap around kubernetes.
  Even with small 3 node cluster of of raspberry pis, you can run anything you can run in simple docker, and have it survive outages/reboots/etc.
  At home, I have a few raspberry pis, orangepi RV (riscv nodes), and my main nodes are large high core and RAM VMs running on Proxmox.
  Each one has different capabilities. Some have lots of fast storage attached for longhorn, some have 10Gb/25Gb networking, etc.
  And the great part is if I wanted to collapse down to just the SBCs? I would just need to scale down some replicas of high men or high cpu stuff I’m testing.
  Of course at job, I just pick the node shape and capabilities I need and don’t think about it.
  Yeah, I’m probably the exception for running kubernetes at home, but I would argue if you are running more than a handful of docker containers, you should probably be using kubernetes anyway.
  Especially if you care about things being up, or want to be able to seamlessly shuffle stuff around for maintenance. Not to mention my entire infrastructure is repeatable with just a small git repo of fluxcd stuff
  [-]
  - edude03 14 hours ago
    I'm not personally trying to gatekeep kubernetes, everyone should do what works for them. However, if I'm putting my professional credibility and/or my sleep schedule on the line, I would not advise anyone to do this.
    Even at home, I run stuff that needs to be highly available enough that I wouldn't go this route when there are better options.
    [-]
    - k_roy 13 hours ago
      I'd love to hear about your HA solution for things like this.
- AntiUSAbah 19 hours ago
  We have 2 main servers and a 3th 'side/batch-node'.
  When we restart one node, postgresql switches automatically over, fe/be is webscale anyway.
  It works very well.
- zzyzxd 18 hours ago
  It just takes time to design your hardware/software stack to be able to survive reboots and recover back to ideal states. I guess nobody really enjoys rebooting machines, but at the same time, I don't think people should be afraid of doing it.
captn3m0 17 hours ago
What’s the usecase where you are okay cordoning a node but not okay with just terminating it and starting a new one?
Physical nodes where you have to reclaim them and don’t run any virtualisation ?
AntiUSAbah 19 hours ago
I like it. K8s should be more opiniated about this.
Whats also missing is rebalancing of pods. Rescheduler
[-]
- edude03 14 hours ago
  Rescheduler is impractical because scheduling is environment specific. You might have for example a database that needs three nodes and you have three servers, there's no where to reschedule those pods to in that case.
  In the cloud you can use cluster autoscaler or karpenter to automatically handle the unhomed pods however.
  [-]
  - AntiUSAbah 4 hours ago
    What happens is, that when one node goes down, the pod gets removed and moves to another node. The node comes backup but k8s doesn't rebalance that pod despite having an anti-affinity.
    For me, that this feature doesn't exist in core k8s, is bad. It should be able to do so. Controllable for sure.