A Binary RRWeb Encoding Format
Published on Sunday, November 30, 2025
RRWeb (Record, Replay Web) is a tool for recording and replaying a users interactions on a website. It has a protocol definition which is, typically, serialized to JSON. My employer sells an RRWeb derivative product and, as a backend engineer, I find myself thinking about the protocol quite a bit. Particularly its size which is significant and can cause strain when websites are particularly large and contain lots of repetitive content or events.
For these reasons I began thinking about ways to minimize the size of the payloads.
Initial Self-Describing Implementation
I won't dwell too long on this since its fairly common. I used tag bytes to describe the data that follows. These tags can indicate a complex, nested object or a simple, fixed-size integer follows. Integer encoding was done using LEB128. Floats were encoded using 4 bytes of precision (32-bit). Strings are variable length and prefixed with a length value (which is itself LEB128 encoded). Arrays are tagged and have a length value. Objects are the same as arrays but have the expectation that there are two distinct encoded objects per entry (the key and the value). Boolean, null, and undefined values are encoded into the tag byte and do not need a value byte to follow.
The results of this encoding format are similar to serializing with msgpack. The byte size is reduced by about 10%.
Specializing
We can extend our type tag to include specialized types. Particularly, we're going to intern string values and store an "INTERNED_STRING" tag type and a "STRING_POINTER" tag type. Interned strings are special because we're going to extract every string from the payload, de-duplicate it, and replace it with a pointer (an integer representing an index in an array). The serializer will move these interned string types to the front of the message so that when deserializing you can always translate a string pointer to a string (no index errors).
By interning strings we can significantly reduce payload size. A website like Reddit with lots of repetitive content will see its total byte size reduced by two-thirds or 66%. A website with less repetitive content, like the New York Times, will see its byte size reduced by half.
We can optimize the format further by adding type tags which represent complex types (e.g. a DOM node or a click event) and type tags which encode more granular information such as "array[float]" or "map[string, string]".
Encoding as a typed array or typed map means we can drop type information for each entry which can save a significant amount of data if enough repititions are encountered. For specialized types such as "DOM_NODE" our deserializer will expect integer, integer, float, and string. These heterogenous types rely on a known ordering in order to deserialize. If the ordering is ever invalidated then the deserializer can not function.
This moves us away from a world of self-description and into a world of specialized encoders and decoders. Versioning and backwards and forwards compatibilty become very important in this domain. There's a subtlety to format design that ensures long-term happiness.
I've glossed over this but there were lots of different types which received a specialized encoding. It is not a trivial amount of work and requires some amount of "hemming and hawing" for each type to keep yourself from making mistakes.
These changes further reduced payload size. Websites such as Reddit were one-fifth as large (an 80% reduction) whereas websites such as the New York Times were unaffected.
Miscellaneous Changes
There was one change in particular that failed which really disappointed me. Timestamps are set for each RRWeb event. They're 64-bit so take up a decent amount of room. The RRWeb player can only render to the screen as frequently as the user's screen refreshes. For a 60hz monitor this is roughly 16 milliseconds. For web content you can imagine this is overkill and 30hz or 15hz refresh-rate would be sufficient for a satisfactory user experience.
Events then could be encoded without timestamps and instead bucketed into a "frame" which is assigned a single timestamp which is a delta from the previous timestamp. Smaller and fewer timestamps should mean large data savings.
This, however, did not work out. There was basically zero-difference. Events were too sparse and not in great enough quantity to meaningfully impact byte size. I unfortunately had to abandon it. What a shame.
Honorable Mentions
Because RRWeb is serialized to a string (and expects to be serialized to a string within the library itself) certain binary properties are base64 encoded. This inflates the size of these properties by ~33%. Since we're using a binary format we can write the data without encoding. This is most commonly used for images and is a source of significant size reduction for certain websites. Though not every website with images will be encoded in this way.
Sparse arrays were used to efficiently encode objects with optional fields. These took some inspiration from hash array mapped tries. I maintain a bitmap denoting if a field is populated or not and then encode the fields in order of their appearance in the bitmap. They had an impact on total byte size but it was not significant.
Floats were encoded using four-bytes by default but some properties could have their float-precision reduced without meaningful impacts to the viewer's experience. This was a small impact change but could have been larger given the right input data.
The new format is streamable. You can begin playing the stream prior to receiving the whole message similar to watching a YouTube video. This is a cool feature that could improve the responsiveness of the RRWeb player but I didn't implement a player so it remains untested.
End
And if you skipped to the end without reading, the uncompressed size was reduced somewhere between 50-80% depending on the site being recording. The compressed size (gzip) was reduced by 10-20%.
All in all, a fun project.