Webhooks: How long should we retain events ids to prevent duplicates?

Hi,

The EventSub docs states the following:

Twitch sends event notifications at least once, but if Twitch is unsure whether you received a notification, it’ll resend the event. Under some circumstances this means you may receive a notification twice. If Twitch resends the message, the message ID will be the same.

If receiving the same message more than once is an issue for you, you’ll need to track messages that you’ve processed. The Twitch-Eventsub-Message-Id request header contains the message’s ID. If you’ve already processed the message, don’t process the message and immediately return a 2XX status code.

So in order to prevent replay attacks or duplicate tasks we need to keep track of recent event ids via the Twitch-Eventsub-Message-Id header, that’s fine. But the documentation does not specify the maximum time an event with the same ID can be re-sent multiple times.

So, for how long should we keep those IDs? (A safe maximum)

This info would be very useful when weighing different approaches to prevent replay attacks or duplicated jobs. For example, it would help to resolve the following questions you may have when designing your architecture: Should I just store them in-memory? (Depends on the maximum time to know if it is feasible) or should I store them in the database instead? What is the estimated required space for storing them?, etc.

Also, as a suggestion - add it to the documentation.

Thanks!

The heat death of the universe!

The problem here is not just Twitch resending an event, but an attacker obtaining a valid payload/set of headers and spamming them at your server. Which is the true nature of a replay attack

This depends on the topic, the data being consmed and what you are doing with the data.

Like if you get a duplicate stream up/online for a stream you thought was already up then you do a check against the stream cache in your DB rather than a message ID compare for example. Similar for the subscription data. Can’t become a sub twice on the new sub feed within 28 days ish (weird math not persisting)

So in summary: do what works best for your application, the data it consumes from Twitch and what it does with it.

You might in fact not need to dedupe by ID at all in some cases. Theres a number of “what ifs” involved depending on what you do really.

Generally most of us run a rolling cache and sit on them for 24 hours or so. But I don’t apply this rule to certain topics due to the nature of the data meaning a replay occuring is irrelevant to my application and it’s state.

My storage provider would be very pleased to hear that! :blush: Just kidding lol

The problem here is not just Twitch resending an event, but an attacker obtaining a valid payload/set of headers and spamming them at your server. Which is the true nature of a replay attack

Yeah, you’re right but that’s only true if you assume that after deleting them of your storage layer you would treat them as valid events. My idea was to consider invalid every event which is already processed or older than this maximum time. So you wouldn’t accept expired events.

It is not much different from how JWT works - signing with HMAC the JSON claims and timestamp (the same as the headers here) and considering older messages to be invalid/expired in a short time span just in case the claims have changed.

Yeah I have thought of other approaches. I’m just weighing my options and thought that if the maximum time was short that would be the most stateless option and with less resources required.

Yep, I spawn expensive tasks in response to these events (stream.online/stream.offline) and instead of dealing with event deduplication I could ensure my server to process those tasks one at a time per streamer and ignoring events with the same broadcaster_user_id. But because it is a multi-threaded paradigm the complexity would be higher, it would become stateful because I would need to store the state of those workers so I only accept one at a time instead of just spawning workers as events arrive, and then to be able to access this state across different threads I would need mutexes and/or other synchronization tools so I thought that maybe just deduplicating events would be easier. Also I’m using my own library for helix and if I move the event-deduplication part to the helix client I’m developing that could be reused in the future in other projects or even open-sourced, so that’s a plus.

Hmm, so my idea was maybe something like a redis cache for storing the IDs of those events (stream.online and stream.offline) and just set a TTL of 24h or something like that and also ignoring events with a timestamp longer than those 24h.

The thing is, I was hoping for twitch to specify a maximum time after which it never resends the same event instead of choosing blindly a number like e.g.: 24h. On second thought that could depend on the event but maybe the maximum of all the event types would be useful as a reference.

Twitch will retry with exponential backoff like 4/5? times before it gives up on a given message if it didn’t get a 2x. So in practice you should NEVER get a duplicate message ID. But you need to protect yourself in case a hacker does the replay attack (not Twitch,Twitch should never resend if it managed to deliver successfully the first time but it does practice “send at least once”, but the second one if sent may have a different ID), personally can’t say I’ve got many “legit” dupes from Twitch.

Generally speaking Twitch won’t ever resend the same event. But a hacker might.

And in the example of a stream.online, you might get a second stream.online with a different message ID. So you may want to deduplicate events recieved based on expected “next” event in an event series. (considering stream on/off)

This would be my solution.

You wouldn’t but it might happen!

TLDR: In practice 24 hours makes sense. As you’d do a time compare on the event and ignore it even if it’s valid (again depends on the topic and use case. But twitch “things” that have “time caches” tend to reside on Twitch for about 5 day. (Thinking about hypetrain events and the like which also don’t have their lifetime noted in the docs and can spontaneousley reset their cache anyway)

Real TLDR: 24 hours makes sense.

Ouch, for my requirements that means that I need 100% to prevent my server to spawn multiple workers for the same streamer, I can’t just rely on event deduplication then or it would be as complex or even more complex than having a stateful worker pool with mutexes. So I think I will just accept any event with valid headers and with a short timestamp (let’s consider events with a timestamp older than 10 minutes invalid like the docs states to mitigate replay attacks) , keep track of my active workers instead of the events and just make sure that I never have two workers performing the same job.

Thanks anyway, this is useful! As a suggestion, IMO things like this:

witch will retry with exponential backoff like 4/5? times before it gives up on a given message if it didn’t get a 2x. So in practice you should NEVER get a duplicate message ID. But you need to protect yourself in case a hacker does the replay attack

And in the example of a stream.online, you might get a second stream.online with a different message ID.

…should be in the EventSub docs. Because they are very useful when considering your architecture and if they were I wouldn’t be here wasting your precious time!

Thanks and also, this is offtopic, but I want to thank you also for your persistent work which has made our feedback heard and as a result we have the vod_offset for clips back, finally!

Thank you for everything! :blush:

It sorta is in the List of Request headers

But probably could be elaborated/expanded. (Most webhooky type things will retry if they failed the first attempt thinking paypal IPN/etc)

Heh, no time wasted here :smiley:

:+1:

Well thats just a uservoice and people upvoting :smiley: