Decoding an Elm 0.19 History Export

阿新 • • 發佈：2018-12-29

Decoding an Elm 0.19 History Export

As web application developers, one of our top goals is to make it as easy as possible to solve the problems our users face. One of our great challenges is understanding in enough detail exactly what those problems are.

The Elm language and web framework give us powerful tools to that end — quite literally, we can

see what our users see. All data in an Elm program lives in one global model that can only be changed by one update function. If we export the state of that model and its history, we can look at what the user did and how they ended up in a mess.

A library like ElmRings takes us even further by allowing us to snag that data remotely, giving our support teams and developers the ability to debug a user’s session without having to be in the same room.

It’s not enough to just extract and record that data, though. “All data” means all data: each password keystroke, any half-composed unsent message, every API response, all show up in the history in their full plaintext glory. A successful rollout of Elm history as a support tool requires us to sanitize that data to protect our users.

To do that, we need to understand just what information we have and how it’s structured.

The Elm History Format

To keep things simple, let’s look at a simple Elm program that defines a counter and buttons to increment and decrement the count.

The program has two messages, each of which take a value, and thus two actions.

Click here to see a simple version of this program in action

If we run the program with the Elm debugger enabled and press these buttons a few times, we can then see and explore the history in our browser:

If we then export the history, we’ll see something like this:

$: 0

That strange-looking first line tells us that this JSON represents a (special) Elm object.

In Elm 0.19, the history export and import are handled in Elm itself, so it makes sense that the whole history data itself is a valid Elm object.

Elm objects stored as JSON follow a pattern, which we see here and will see again in the history section:

The type constructor is stored in the special key “$” (these are sometimes called tags, the term used in the metadata section)
Arguments are stored in order in the keys “a”, “b”, “c”, etc.
Typed values used as arguments are nested using the same format
Elm records are stored as plain JSON objects (which can, of course, contain Elm typed values)

Since the value in the “a” field doesn’t have a “$” constructor, we can conclude that it’s a big Elm record. It contains two sections, metadata and history.

Metadata

Each history export contains metadata about the version of Elm and, more interestingly, the program’s type definitions.

The export specifies which type defines the messages Elm uses to trigger model updates. It also lists any type aliases and any union types in your program. (Remember, union types can take args, like Maybe — Maybe Int, Maybe String, etc.)

Including this type information serves a critical purpose. Using Elm’s debugger, you’re exporting the state of a statically typed program into an untyped data store and then importing it back into another browser session.

If the program can’t ensure that the shape of the imported data matches the shapes of the current program, things are going to go catastrophically wrong. By including the type definitions, Elm can warn us of type mismatches in its typical (read “wonderfully helpful”) fashion:

Of course, it’s still possible your app will behave differently than the user’s who exported the history — you can change the behavior of an app without changing any data structures — but this imposes an important limit on what can change.

History

After the metadata, we have the history section, which documents everything that’s happened in a user’s session.

Each entry in the history hash represents an Elm message object. Following the pattern described above, the JSON data

represents the Elm message

The constructor of the overall value is “MessageType”, the first argument is a plain String, and the second argument is an instance of the type “AnotherTypeValue”, which takes no arguments.

As mentioned above, Elm records are stored as plain JSON objects; any Elm objects stored inside the record follow the same format as above. As an example, here’s a simple Elm record that contains the title of a book:

Reflections

Let’s take a moment to reflect on what we’ve seen. What questions and conclusions come up from the exported metadata and history?

This is everything.

That history data is why it’s so important to store this data safely. As we’ve now seen, the history export stores every interaction the user has with the browser and your app has with your server in fine detail. This detail makes it powerful, but it also makes scrubbing the data of sensitive information critical.

What’s missing?

If I had been asked to quickly whiteboard how to design an Elm history export, I would have probably included two things that aren’t here: outgoing commands and a complete copy of the model at the time of export.

Neither of those are here, and as it turns out, neither of those are necessary!

Remember, in Elm the application model is constructed from the initial state and then changed only by incoming messages. When you import a history file, Elm reruns the update function for each of the incoming messages in sequence. This gives you the same final state as if those messages had come in through user interaction.

Any outgoing commands that produce new incoming messages will be captured in the message list; any that don’t don’t actually matter from Elm’s perspective. Since the goal of the Elm debugger is to export and import the state of the Elm app, we simply don’t need to track commands or to export the complete state of the model (much as that info might sometimes be useful for human debugging).

Eagle-eyed readers may have picked up on a key phrase: from the initial state. Any flags passed into the application and used to calculate the initial state of the model are not included in a history export.

I suspect this was deliberate choice by the Elm team to ensure that an imported session is consistent with the environment into which it’s imported…either way, be aware. If your user is on an iPad and you’re debugging in a Chrome browser and your app behaves differently on those two platforms, you won’t see exactly what your user saw.

You could work around that limitation by having your app’s first message contain all the initialization flags for setup, instead of processing them in the model init function. This would give you consistency across platforms, but could also cause problems. (For purely documentation purposes, you could also fire a message with the data that has no impact on the model.) It’s up to you to figure out which is appropriate for your app.

Lists?!

The first time I saw an Elm list in a history export, I nearly spat out my mint julep in surprise (and did in fact lose my monocle, but that’s a different story for another time ?). I’d expected an object with an arbitrary number of arguments…instead, it was something much more interesting.

Let’s take an Elm list of three simple book records:

Straightforward, no? Now train your peepers on how it exports:

This is because in Elm, List is implemented as a “linked list”. Each element is an object with three properties:

Its constructor is :: (aka “cons”), which takes two arguments.
Its first argument is the value of the list item at that point.
Its second argument is the rest of the list.

There are good technical reasons for this (makes it easy to alter and duplicate lists, etc.), none of which you actually need to know to write good Elm (or Javascript or Ruby or, most likely, anything you’re using for work or fun). If you’re curious, though, you can read more about linked lists on Wikipedia as well as this interesting discussion of Arrays vs. Lists in Elm.

Why hashes?

“Why hashes?”, you might wonder. Wouldn’t it be simpler to use JSON arrays instead, something like {“MessageType”: [“Arg1”, {“AnotherType”: []}]}) or [“MessageType”, “Arg1”, [“AnotherType”]]?

I believe in the olden times, I did use an array because I just assumed it’d be fast. I later ran some benchmarks and was totally wrong, objects were a lot faster. So I switched everything.

Conclusions

Looking back the data structures we saw, I’m struck by one final reflection: Elm history data is specific to your application.

Given that our goal is to use these history data to support our users, this has big implications. If the history we export has a unique shape based on our app’s Elm data structures, the way we secure and display users’ actions will have to be highly custom as well. There is no one-size-fits-all solution.

With that in mind, in the next blog post we’ll go over what how we can sanitize and work with the Elm history data that we had previously collected and now understand.

Decoding an Elm 0.19 History Export