I hate using such vague terms as “seemingly random” but it’s the best I’ve got.
I run a small (~two dozen) moderation and command bot network for two teams of streamers and over the last few months, these seemingly random disconnects have been happening more frequently. It usually presents the same way; the bot runs normally for a time, then stops receiving messages from chat, and then after a time it fails with a message containing Error 10053 (though sometimes it is Error 10054). All the information I’ve found on the error has been essentially “It could be any number of things.” so that hasn’t been too useful.
Rate limits are not a concern, with the bot only operating in chats with a moderation badge and even peak message rates are below 20 messages per 30 seconds. The failure intervals appear to be random, ranging from less than 15 minutes after start to sometimes going several days without issue, and each individual bot is affected differently, though it is not uncommon for multiple bots to be affected simultaneously.
At first, I thought perhaps I was dropping PINGs and addressed that by logging the incoming PINGs and outgoing PONGs. I received a PING every 4.5-5 minutes and responded accordingly, right up until they just stopped coming. I thought this was the cause of the disconnect; somehow my responses were getting lost and Twitch disconnected me for lack of activity. Then I set up an independent PONG that went off every 5 minutes exactly, and it seemed fixed. I never noticed suddenly losing PINGs after that, but the abrupt and random disconnect continued to happen.
And continues today.
This only seems to happen with sockets that are held open, such as reading chat. Sockets temporarily opened (short duration, <5m) never encounter this issue.
Bots disconnect abruptly (and largely independent of each other) after a random time since launch. Sometimes 15 minutes, sometimes multiple days.
Problem presents initially as a lack of messages, eventually progressing to Errors 10053 or 10054. Progressing to errors occurs more readily with bots that have scheduled messages to send (seems to trigger the error if you send more than a handful of messages while this issue is active?) but occurs to bots that don’t send messages as well, presenting as the initial lack of messages.
PINGs are responded to appropriately, as well as a consistent independent PONG just in case.
No apparent disruption in responses until they simply stop arriving.
Bots are hosted on AWS, written in Python, and utilize the Python sockets library.
Thank you for any thoughts you may have on the subject. If you need any additional info or have any suggestions, I will try to respond quickly.