I’m hoping to get some input from the large bot developers out there on how you handle scaling.
We are developing a chat bot (only pulling messages not pushing them, so no throttling to worry about) for our customers that will ingest their chat messages and run various analysis on them. How are you handling auto scaling of your bot?
We are utilizing tmi.js for the main workings of the bot. When connecting our bot to channels we need to supply the list of channels to tmi. The question is how do you handle scaling when the server/container you are running your bot on starts to struggle?
Our current thought process is this: We will utilize distributed locking to manage the channels subscribed to our service. This will prevent multiple instances of the bot joining chat the same chat as a peer.
The bot itself will just be a docker image that runs on a Kubernetes cluster. We would set auto scaling for this service based on memory and CPU. Kubernetes will handle spin up and spin down of new resources related to current load.
There are two scenarios then that we will need to solve for: adding more instances and removing instances. Removing instances wouldn’t be too hard, we would have our service checking our locking db for any available channels that we can lock and if found subscribe to them and start listening.
Adding instances however seems a little more tricky. We would have a service monitoring CPU and Memory and if the service usage crosses a threshold we would select a number of channels to disconnect from which would allow a new instances to pick them up and subscribe (one should have already started if we have our thresholds set correctly)
What are others doing? Are we on the right path or is there something we are missing that could make this simpler? The goal here is automation, so no hardcoded config files. We also need to properly utilize our resources, so no just making the server huge to handle all our customers peaks at the same time.