Previously, I talked about the structure of my homelab, particularly with regards to the K3S cluster I was running on Pine64 SoPine compute modules, running impressive 1GHz quad-core ARM CPUs with 2GB DDR3. Unfortunately, that part of my homelab met a rather sudden dead-end a couple of weeks ago.
I still don't understand how it's possible to have 931% load on a 4-core CPU, but hey, that's my Control Plane for ya! Actually, the situation was so bad that the node was barely starting. I'm thinking it could be an issue with the micro-SD card on which everything resides, and I'll have a closer look at this later.
Either way, the cluster was fairly limited in terms of... well... everything. There wasn't enough storage for Longhorn to do sufficient replicas of each volume, and you can see for yourself the CPU/RAM usage. In their place, a new cluster of Dell Optiplex Micros with i3-8300T, 8GB DDR4, and actual SSDs/NVMEs.
With 4 cores flying at 3.2GHz and RAM upgradeable to 32GB, these new nodes would make an amazing upgrade. I am contemplating adding the old nodes as workers to the cluster, but I'm concerned about the different CPU architectures. Ah well, future project.
Quite a few things stayed the same. MetalLB for static external IPs for my services, Longhorn for storage, and K3S as the backbone for the cluster.
The new nodes are in a HA setup, with all 3 acting as both Control Plane nodes and workers. This is a step up from the previous cluster which had a single CP and thus a single point of failure. Here, a node can die or catch fire and the cluster will stay operational!
Another major change is moving the nodes to another subnet so that MetalLB has more free IPs to give away without me having to worry about IP conflicts.
And finally, though it changes nothing operationally, the nodes now run OpenSUSE Leap 15.6 instead of Armbian.
Because I wanted to do things right this time, now knowing better than me from 2 years ago, I automated nearly everything through Ansible for setting up the nodes, and ArgoCD to easily redeploy applications from manifest stored on Github. I don't need to worry about storage as I have a Minio instance running to which Longhorn makes backups every night.
https://github.com/xelab04/kates
So now, finally, my website is back to running! And that's all for me this time round, I'm going to come up with more shenanigans to run on the cluster soon.