Pushing and Pulling: Three reactivity algorithms

5 comments

Something I rarely see addressed in articles about reactivity systems is how the desired syntax/developer experience affects the algorithms.

For example, I think all the algorithms discussed in this article require knowing the graph topology up front. Even dynamic dependencies need to be known ahead of time, unless I'm misreading.

However, take Vue's reactivity for example. Here's a simple input and output:

    const input = ref(1);
    const output = computed(() => ref.value + 1);
Without actually attempting to evaluate the closure that defines `output`, there's no way of knowing what it depends on.*

My working understanding of such systems, which I think are more or less similar to all the new "signals" libraries popping up in JS, is that they are... Pull-push systems? First, the UI code (i.e. the HTML or JSX template functions) request the value of an output. The reactivity system evaluates the graph (pull), recording dependencies as it goes.

Then, later, when an input changes, it can use the graph it built up to update (push) dirty states and work out which of the currently-live output nodes need to be re-evaluated.

*The only other approach would be to analyse the syntax of the source code itself to create that static dependency graph. Which I understand is what e.g. Svelte does.

Yeah, the UX/DX of the turning these algorithms into something usable is really interesting, and something I didn't get to talk much about.

With the variations on the push algorithm, you do kind of need to know the graph topology ahead of time, at least to be able to traverse it efficiently and correctly (this is the topological sorting thing). But for pull (and therefore for push-pull), the dependencies can be completely dynamic - in a programming language, you can call `eval` or something, or in a spreadsheet you could use `indirect` to generate cell references dynamically. For push-pull specifically, when you evaluate a node you would generally delete all of its upstream connections (i.e. for each cell it depends on currently, remove that dependency) and then rebuild that node's connections to the graph while evaluating it.

Signals libraries are exactly where I found this concept, and they work basically like you describe. I think this is a big part of what makes signals work so well in comparison to, say, RxJS - they're basically the same concept (here's a new reactive primitive, let's model our data in terms of this primitive and then plug that into the UI, so we can separate business logic from rendering logic more cleanly), but the behaviour of a signal is often easier to understand because it's not built from different combinators but just described in "ordinary" code. In effect, if observables are monads that need to be wired together with the correct combinators, then signals are do-notation.

With your push-pull algorithm, were you considering that the graph had already been built up, e.g. by an earlier pull phase? And the push-pull bit is just for subsequent updates? If so, then I think I'm following :).

I've been working in the context of reactivity on the backend where you're often loading up the calculations "from scratch" in a request.

I agree with your monad analogy! We looked into using something like Rx in the name of not reinventing the wheel. If you build out your calculation graph in a DSL like that, then you can do more analysis of the graph structure. But as you said in the article, it can get complicated. And Rx and co are oriented towards "push" flows where you always need to process every input event. In our context, you don't necessarily care about every input if the user makes a bunch of changes at once; it's very acceptable to e.g. debounce the end result.

Svelte 5 does runtime dependency wiring afaik.

From what I understand, Svelte 4 calculated dependencies at compile time. Whereas Svelte 5 does it automatically at runtime based on which values are accessed in tracking contexts (effect / computed value callbacks). This means objects marked as reactive have to be wrapped in proxies to intercept gets / sets. And plain values have to be transformed to objects at compile time. The deps are also recalculated after every update, so accessing state in conditionals can cause funkiness.

But after working with Svelte 5 daily for a few months, I don't think I like the implicitness. For one, reactivity doesn't show up in type signatures. So I have to remember whats reactive and what's not. Another is that making plain values (string, numbers, etc) reactive is really a trap. The reactivity doesn't cross file boundaries so you have to do weird things like exporting a getter / setter instead of the variable.

I've worked through the same process in SolidJS, which had the dynamic dependency tracking from the beginning.

I agree that not seeing reactivity in the type system can be irritating. In theory, you can wrap reactive elements in `Computed` objects (Angular's signals have this, I believe) so you can follow them a bit better, but the problem is that you can still accidentally end up with implicitly reactive values, so it only works as a kind of opt-in "here be reactivity" signal, and you can't guarantee that just because you can't see a `Computed`, that reactivity has been ruled out.

That said, I find I eventually built up a good intuition for where reactivity would be, usually with the logic that functions were reactive and single values weren't, kind of like thunks in other contexts. For me, at least, it feels much simpler to have this implicit tracking, because then I don't need to define dependencies explicitly, but I can generally see them in my code.

I agree with all of that. With reactive systems I prefer just making objects deeply reactive and using them as view models, while limiting the amount of derived / computed values. Both Vue and Mobx work well for this.

Oh, that's me! Feel free to ask me any questions.

There's some great discussion over on lobste.rs (https://lobste.rs/s/2zk3oe/pushing_pulling_three_reactivity), but I particularly recommend this link that someone posted there to a post that covers much of the same topic but in a lot more detail with reference to various existing libraries and tools and how they do things: https://lord.io/spreadsheets/

https://lord.io/spreadsheets/ is one of my favorite technical blog posts of all time. Highly recommend everyone check it out!

Yours might go a little less into the details, but its really clear and I like the diagrams and explanation around glitch hazards. Please do follow up on your tangents if you have time.

I really enjoyed your post and was surprised to see it not posted here. I guess now I can leave the comment I wasn't able to leave on lobste.rs :)

The format made for good lunchtime reading -- the care you put into making it easily readable shows. Are the illustrations actually hand-drawn? Looking forward to the next part(s) that you hinted at!

The illustrations are hand-drawn on my tablet, and then converted to SVG and touched up via an incredibly laborious and probably fairly unnecessary process.

Not the author, but when I want to make diagrams like this, I usually use tldraw! It has a nice nearly-hand-drawn feel that's casual and approachable for these kinds of sketches.

I've been working on a reactivity system for rust over the past couple of years, which uses a lot of these ideas! It also tries to make random concurrent modification less of a pain, with transactional memory and CRDT stuff. And gives you free undo/redo.

Still kind of WIP, but it isn't secret. People are welcome to check it out at https://gitlab.com/samsartor/hornpipe

When I first started working with dataflow computation I was fortunate to have a computer scientist point me in the direction of an introductory compiler textbook.

It's worth considering that the dataflow graph (as an abstract mathematical graph), the computation graph (the partial order of function execution required to compute the data), the traversal strategy, the runtime representation of the graph, the runtime data structure for the graph, and the runtime data structures for efficient reactive update are all separate but related aspects.

For instance, push and pull are both directed graphs. They have the same connectivity, but the direction of the arrows is reversed. You can only efficiently traverse edges in the direction that you represent. A dataflow graph has edges pointing from sources to sinks, a data dependency graph has edges pointing from sinks to sources. [Side note: if a computation can produce multiple results the data dependency graph and the computation dependency graph are not exactly the same thing and you need to be clear on the distinction, but I am assuming here single-output nodes]. In a dataflow graph you want to evaluate the changed nodes prior to evaluating the downstream nodes that depend on them. As TFA states, this necessitates a postorder (children first) traversal of the data dependency graph, starting at all terminal sinks, and terminating at sources or already visited nodes. You can use a sense-reversing "visited" flag on each node to avoid a reset pass. As noted in the article this traversal need only be performed when the graph topology changes. But for stable traversal order the topological sort can be cached in an array. Needless to say that arrays are much faster to iterate over than any kind of pointer chasing. [Witness the rise of Entity-Component systems over OO models]. I suspect that there is a cut-over point where it is more efficient to iterate the entire array (perhaps with memoized results, or JIT compilation) than to perform a more surgical "update only what is downstream of the changes" approach. Another approach is to assign all nodes a contiguous integer id, and maintain a dirty node bitmask where bit-indices correspond to node ids. In addition, each source has a bitmask that is 1 for all downstream dependent nodes. When a source changes, bitwise-or source.downstream_dependents bitmask with the global dirty_nodes bitmask. To evaluate (not necessarily immediately), iterate in topological order processing only the dirty nodes. In any case, the point I'm trying to make is that the data structure that is best for building or manipulating the graph could very well be different from the data structure that is best for computing the desired results. There will be trade-offs to be made. For this reason alone it's best to keep the graph-theoretic properties and the implementation data structures separate in your head.

In my view the interesting requirements raised by the article are (1) lazy evaluation (e.g. of expensive or conditionally required data). This might be where control flow graphs of basic blocks enter the story. and, (2) dynamic reconfiguration during node evaluation. Some questions I'd be asking about dynamic reconfiguration are: what happens if you delete a node that has yet to be evaluated? will new subgraphs "patched in" to the existing graph (how exactly?), or are they always disconnected components that can be evaluated after the current graph traversal completes?