Embeddable is a white label analytics platform that offers self service reporting through our Custom Canvas feature.
While building this feature, we encountered an interesting architectural decision about handling states across large multi-tenant user bases, where we had to consider concurrent state management. We opted for an event based state approach.
Read on to learn about how we arrived at this solution, how it solves issues of concurrent state, and what other advantages it can have.
Our Scenario
Custom Canvas is a self service analytics tool in Embeddable where end-users can customize their own embedded analytics dashboard. Customizable dashboards are one of our core-features, so it made sense to extend that feature to end-users as well.
But this posed quite a few challenges around managing dashboard state. We want our customers to be able to offer dashboards to any amount of users, and we wanted our state solution to be flexible.
Some customers may want to offer an individual dashboard to every user, while others may want to have multiple users per dashboard, multiple dashboards per user, or both!
This is why our state synchronization needs to be independent of users, and work on any amount of clients simultaneously.
At the same time, we wanted to keep it simple. Offering full blown collaborative editing environment is beyond the scope of this feature. To find that sweet-spot solution, let’s look at the simplest naive approach first, and work from there.
The ‘Last Come, First Serve’ Problem
The simple solution to this would be to dump all state into an object, send it to the database, and then send the newest state to all clients using web-sockets. This works well with one user editing at a time. With some smart client-side logic behind it, it could even work while multiple users edit the same dashboard, as long as everyone has quick internet.
Problem 1 - Asynchronous State
But networks are unreliable. One user could be in the middle of adding a new chart to a embedded dashboard, only to have their changes erased by someone with a slower internet connection resizing a completely different chart. Because the user with the slower internet did not yet receive the newest state, and their changes arrived later, their state is newer and overrides all others. The “new” state does not have the first users analytics charts, so they get deleted of their dashboard.
Problem 2 - Horizontal Scalability
A similar problem emerges at a server level, once we try to scale our back-end horizontally. With horizontal scaling we deploy multiple servers. This way, if one sever is slightly slower, a similar issue can happen. Again, the slower server wins, and overwrites all state which can be vastly different from a very recent, but slightly older state. This can cause multiple refreshes in the embedded dashboard in quick succession.
Breaking Down What State Means
The state of a Custom Canvas dashboard can be broken down as follows:
- What charts are on a dashboard
- How those charts are configured
- Where those charts are positioned
- How those charts are sized
Separating state out like that shows us another option of how to represent state: by a series of actions. Let’s break up our state some more, to describe it by the smallest possible actions in our system:
- Adding a chart
- Removing a chart
- Changing a charts settings
- Re-ordering a chart
- Re-sizing a chart
If we record all actions a user takes, we can arrive at the same state they currently have. Even better, we can use a technique called server reconciliation to merge two users states together. Once all events have arrived, we order them by their timestamps, re-calculate a state from them, and only then send the resulting state back to client.
Let’s go back to the example from before. As soon as the changes to resize a chart come in, they are added to the other changes that just came in adding a new chart. The resulting state contains both changes, which are now sent back to both users using web-sockets.
Keyed State, not User State
In order to stay as flexible as possible, we reference state by a key, with no further meaning attached to the key. This way a user can have multiple dashboard states, or multiple users can access the same dashboard, while leaving account management up to our customers.
Squashing State to Reduce Events
Representing state as a series of events works fine for a while. But at a certain point, this becomes a performance consideration. Luckily we don’t have to keep a full state timeline to keep the benefits. Running through the oldest x amount of changes, and building a new base state solves this problem. However keeping the event based state unsquashed also has an advantage, which is that it allows for debugging and rollback.
Debugging Timeline
A secondary benefit of storing all changes is that it makes debugging problems a lot easier. Instead of having to search through log files, a timeline of state changes is available, should something break. Especially when releasing a brand new feature, this can be an invaluable time-safe to tracking down edge-case bugs.
Should an invalid state emerge, we can not only track down how it happened, but also fix it without resetting the full dashboard. By keeping all the events, we can roll back the state to the last working state, by simply deleting all events that occurred after the state broke.







