JTV 2 Receiving Messages


Turbo user only problem.

How come JTV 2 sends to different types of data for the same user?? As you can see the first 4 lines are the first message.

The following two lines were setting users to mod permissions

Then the next 4 lines were identical from the first 3 lines, aside from the message. Then its followed up by the last line which is identical to the previous line.

My question is why when the first message was sent it did not have that following 5th line like the second message does?

Sending the information (SPECIALUSER, USERCOLOR; EMOTESET) in seperate messages is just the way the system works. It also sends several SPECIALUSER messages if the user is both a subscriber and turbo user for example. Frankly this easier on the developer since there are shorter messages to parse. This information is generally repeated for all messages since it is broadcasted and jtv can’t keep track of what information you may already have cached.

As to the last message in your picture, I am not sure if this is a bug - in your code or theirs - or a user sending the same message in quick succession.

Yeah, I see that they send (SPECIALUSER, USERCOLOR, and EMOTESET) a lot. I was sending messages in successions. I’ve come to realize that it may have something to do with my code, it shouldn’t, but I receive the message from twitch sometimes with a new line following the EMOTESET this I am able to detect. When they do not send the new line right after EMOTESET I’m struggling to detect this even though I’m using a regex to detect the message.

data = data.receive() #my receive function is below

re.match(':[_\d\w]+![_\d\w]+@[\d\w]+\.tmi\.twitch\.tv\s[P]', data) 


def receive(self):
    try:
        data = self.sock.recv(4096)
        data = data.decode("utf-8")
        if 'PING' in data:
            self.sock.send(data.replace('PONG', 'PING').encode("utf-8"))
            return None
        return data
    except KeyboardInterrupt:
        print("User Interrupt")
        exit(1)
    except:
        return None

Not quite sure what you mean. Are you saying you have trouble identifying the message if it’s not preceded by an EMOTESET message? Or that Twitch doesnt break the line after an EMOTESET message?
I use Java and never have this problem - messages from jtv are always on a different line than other messages. If you have trouble discerning the jtv messages from other private messages I suggest putting a capture group around the username and matching it against “jtv”.

As I’ve been shown by @BatedUrGonnaDie and @george in a recent post, you can request meta data on the same line. Examples of the format is shown in the aforementioned post.

If you want to use regex to handle these I just wrote a quick regex that only handles PRIVMSG (since matching to all the available messages would create a monster bigger than the one below:)

@color=(#[0-9A-F]+)?;emotes=([\d:\-\/,]+)?;subscriber=([01]);turbo=([01]);user_type=([a-z_]+)?\s:([_a-z0-9]+)![_a-z0-9]+@[_a-z0-9]+\.tmi\.twitch\.tv\sPRIVMSG\s#([_a-z0-9]+)\s:([^\r\n]+)

The regex has 8 capture groups:

$1: the color hex code
$2: the emotes by image_id:start_index-end_index
$3: subscriber status (0 or 1)
$4: turbo status (0 or 1)
$5: user type (staff, admin, global_mod or mod)
$6: username
$7: channel
$8: message

But again, monsters like these are the reason I stay away from regex and parse the messages manually :wink:

EDIT: The above regex doesn’t work with unexpected tags, as requested below. Modify as needed.

1 Like

Regex is great for searching & replacing text, but it’s not meant to be an IRC parser, no matter how fancy you get. I highly recommend building a proper tokenizer, it’ll serve you better in the long run.

1 Like

This is how I’m turning any tags into a simple map of strings (in Go, but it’s just an example of simple string splits).
This is after splitting out the tag part of the message, not including @ or trailing space.

func tagsToMap(tagString string) map[string]string {
    result := make(map[string]string)
    splitTags := strings.Split(tagString, ";")
    for _, tag := range splitTags {
        splitTag := strings.SplitN(tag, "=", 2) 
        result[splitTag[0]] = splitTag[1]
    }
    return result
}

Then no matter if tags are added or removed or which order they are in, we have a simple map that we can use instead of unwieldy regex. (The example should properly catch any tags as defined by specification).

2 Likes

Might want to verify that splitTag has 2 elements. At one point TMI was sending only the tag name (and no value (=blah was missing)) to indicate true for a boolean property. We decided to instead to be more explicit for booleans with values 0 or 1 so that clients could more easily detect that its set to false instead of simply not being returned.

So while your code will work with the current tags implementation, it may not work in the future if we decide to send tag keys (without values).

Go’s SplitN returns an empty string if there’s nothing after the split, so with: color=#FF6BFF;emotes=; etc
result[“color”] = “#FF6BFF
result[“emotes”] = “”
But it might not in other languages, if their split functions are different.

@Sunspots afaik this behaviour is the same for all major languages.

Just want to add that I had a little bit of trouble with supplementary characters and the emote metadata.

The line/string:

Kappa 𠜎 Kappa

will have this emote field:

emotes=25:0-4,8-12;

This is correct in the way it is represented, but if you iterate over the string; supplementary characters like 𠜎 will (depending on the encoding) be contained in 2 16-bit characters. Thus the position of the next emotes will be off by one for each previous supplementary character. This can be something others also need to take into account.

The general iterating pattern for strings with supplementary characters in java is something like:

for (int i = 0; i < str.length();) {
    //do something with character or codepoint
    i += Character.charCount(str.codePointAt(i));
}

For me, the regular web client isn’t even able to parse or display that message at all (when inbound from server).
"TMI.js [irc] ERROR: Failed parsing IRC message "@color=#FF6BFF;emotes=25:0-4,8-12;subscriber=0;turbo=0;user_type= :sunspots!sunspots@sunspots.tmi.twitch.tv PRIVMSG #sunspots :Kappa 𠜎 Kappa

I sent it from the web client and it displays just fine to me. This is what I recieve:

@color=#1E90FF;emotes=25:0-4,8-12;subscriber=0;turbo=0;user_type= :livewhiletrue!livewhiletrue@livewhiletrue.tmi.twitch.tv PRIVMSG #livewhiletrue :Kappa 𠜎 Kappa

This is as read from UTF-8 encoded console output on Eclipse Luna.

I also tested letting my bot echo whatever I typed and recieved the same error as you when posting “Kappa 𠜎 Kappa”

Interestingly, when only posting “𠜎”, I also receive this message:

TMI.js [irc] WARNING: Invalid emotes tag: 
TMI.js [irc] ERROR: Failed parsing IRC message @color=;emotes=;subscriber=0;turbo=0;user_type=mod :mybot!mybot@mybot.tmi.twitch.tv PRIVMSG #livewhiletrue :𠜎

On second thought I think this could be helped by changing the outgoing encoding (just new to all these things, and learning as I go)

edit: for those who want to test their applications I got the test characters from here: http://www.i18nguy.com/unicode/supplementary-test.html

That warning is posted for every message on every channel not containing an emote, so not related. This is on Chrome Version 40.0.2214.115 m

And TMI won’t recognize encodings other than UTF-8 (and ASCII by default)

Due to an exploit with emote parsing I reported a few weeks ago, Twitch currently drops all messages containing Unicode symbols that cause surrogate pairs in JavaScript in TMI.js on the website.

Could you link the report thread, if public?

So are you saying that the twitchclient 2 will shoot back, this is just a random number from my head, 7 tokens. If I see 7 tokens then I would execute a certain way. Where as if i saw 4 tokens I would execute a different way? Could you elaborate a bit more I’m not to familiar with the word tokenizer.

In lexical analysis, tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens becomes input for further processing such as parsing or text mining. Tokenization is useful both in linguistics (where it is a form of text segmentation), and in computer science, where it forms part of lexical analysis.

What I’m trying to say is that IRC has a standard and IRCv3 has a standard, and that parsing incoming messages based on these standards by iterating over the string and creating “tokens” out of them you’re more likely to have a forwards-compatible parser than simply using a regex.

There are many examples of IRC tokenizers; I personally like Twisted’s tokenizer for it’s simplicity.

Oh yes so I was on the right track that’s what I was thinking in my head just was harder to put into words. I still need to refine the way I’m doing it, but in the grand scheme of things im on the right track not using regex anymore.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.