The encoded stream starts with a STX marker and ends with an ETX marker. STX and ETX occurrences in the stream are escaped and internally encoded as well so the receiver side can simply check for STX ...
Splitting text strings into tokens is useful because GPT models see text in the form of tokens. Knowing how many tokens are in a text string can tell you a) whether the string is too long for a text ...