Optimizing Performance with Robin Hansen

We talk about Robin's work optimizing Elm, and the opportunities that remain for Elm performance improvements.
January 31, 2022


Hello, Jeroen.
Hello, Dillon.
And once again, we're back with Robin Hanson.
Thanks so much for coming back on, Robin.
Thanks for having me on.
What's it been, a year?
Yeah, I haven't talked to you all year long.
It's really exciting to sit down with you.
And Jeroen, I have a feeling you're going to be itching to ask a bunch of performance
related questions because today we're talking about performance with Robin, the Elm Performance
Guy and Jeroen, the guy who's trying to dethrone Robin as the one who's optimized performance
most in Elm.
Yeah, I'm looking forward to it.
I let you win one game and now you're all confident.
You will never win again.
Yeah, so yeah, this is really exciting.
I think it's kind of an exciting time for performance stuff in Elm.
I think maybe these things have been happening in back channels right now, but I think we
might be seeing some performance improvements in Elm Optimize Level 2, which we've talked
about in previous episodes.
It's just a sort of post processor that goes in and tweaks the Elm compiled output to do
some performance optimizations.
So I think we've got some exciting stuff coming.
So I'm curious, before we get into some of these details about these performance optimizations
and everything, you've got a long history of doing performance work in Elm, working
on these data structures and benchmarking things.
Why do you do it?
Like, why do you care about Elm performance?
Okay, so there are two answers to this question.
The first one is like what people want to hear.
And the second answer to this question is the truth.
I think what people want to hear is that performance is really, really important.
I think it's I think the worst thing that can happen to Elm is that someone sits down,
writes a production app, and then it's laggy.
And for a language with a relatively small following, like Elm, where people might not
know how to fix a laggy application, that would be bad for the reputation of the language
and further adoption of the language.
So performance should not be your primary concern when doing the stuff that Elm is good
Because most of the time, optimizing for performance is simply not going to matter for the sort
of applications that you typically do with Elm.
But if you do get a performance problem, I think that would be very bad for Elm.
And so I've been working on performance things simply because I don't want people to have
a performance problem.
Wait, now, is that the truth?
Or is that what people want to hear?
The truth.
That is a true answer.
But really what got me into this is fixing performance things or improving performance
problems is a relatively simple and fun activity.
Because if you do it correctly, no one is going to notice anything.
And so you don't have to go through a lot of API design discussions.
There's a lot less things to consider.
So it's a relatively easy thing to get into.
And it's also a relatively easy thing to measure the improvements of.
And of course, if you can improve something...
And you can probably attest to this, Jeroen.
If you make something 10 times faster or 50 times faster, it feels kind of good.
Kind of.
Slightly good.
It's a hell of a drive.
It's super exciting.
So it's fun.
But it's also important.
I think to avoid that.
To avoid people having a bad experience with Elm.
Although in most cases, people won't have them.
So Elm is a pretty high level language.
Like you were describing, if people get painted into a corner and there's a performance issue,
they might not have much they can do about it with Elm because it's pretty high level.
Doesn't give you a lot of control about expressing low level things that would affect performance
in a way that a language like Rust would maybe.
But at the same time, on the other side of the coin, because it's this high level, very
declarative and pure language, does that give you the opportunity to do more with performance
because it's more constrained?
Both yes and no.
Like so in Elm you have...
Well for the HTML library specifically, you have the HTML lazy namespace which provides
functions which allows you to avoid computation in the cases where nothing has ever changed.
And the reason why that is a good optimization when you can apply it and the reason it works
and is very, very fast is because of Elm's purity.
So you can do the same things in React, but it requires that you have made sure that everything
is pure.
And when you do need such an optimization in React, I think you are going to have a
problem applying that optimization because things aren't pure by default.
And so there are definitely certain things which are much, much easier in Elm because
of purity.
But on the other hand, there are things which is harder because of purity as well.
Like a dictionary maybe.
So that doesn't necessarily mean that data structures can't be faster in a pure language
compared to a language which allows you to use mutable data structures.
So one example of this is the dictionary implementation in Clojure, the HashMap implementation in
Clojure more specifically.
It turns out that when reading from a Clojure HashMap, admittedly when you have a HashMap
consisting of maybe like five or six million entries.
Kind of big.
Which you do hardly ever.
But in the case you have such a big dictionary, it turns out that Clojure can actually be
faster for reading from said dictionary simply because of the tree structure which makes
it more cache friendly than your typical mutable HashMap, which is one continuous array.
So it can be faster by doing things in a purer way.
But you will normally struggle to make it as fast as mutable alternatives because you
have to copy a lot of stuff around.
Do you need to have immutability under the hood in an immutable language?
Because I mean, Richard has been talking a lot about these types of optimizations in
this rock language that he's been developing.
We'll link to a talk where he goes into some details on this.
But he uses some optimizations under the hood to perform mutation when possible in a way
where the user doesn't have the ability to mutate data.
But the compiler might see, well, the user won't notice that I've mutated something as
far as they're concerned.
They have the illusion of immutability.
And that's all we need.
So like, does that trade off apply to optimizing stuff in Elm?
Or for practical reasons, is that not a good approach?
Or for philosophical reasons, is that not the desired approach?
So if you can do it, then you can definitely get a lot of performance out of that.
And Rock has, at least from what I've seen, proven that you can have almost as fast code
written in a purely functional language as long as the compiler is able to utilize these
tricks under the hood.
And it's important to say that we don't really care about things actually being pure under
the hood, as long as you have the illusion of that being the case.
But currently in Elm, I don't think we make use of such optimizations.
No, that's kind of what I'm researching at the moment.
Like some of the optimizations that Rock does is kind of what I'm looking at at the moment.
There's some good results, but also it's like limited in what you can do, what you cannot
And I think that Rock has much more solid foundations to do it at the moment.
Yeah, when it's baked into the core of what the compiler is attempting to do, then the
compiler can track information around where a mutation happens and optimize for that.
But another very important aspect is that Rock doesn't have to compile to JavaScript.
And so it has a lot more control over what it can and cannot do.
For good and bad, you know, compiling to JavaScript is a lot easier.
But you lose some control along the way.
One thing that I'm thinking of which Elm does do and which most functional languages do
is tail call optimization.
Now tail call optimization isn't done first and foremost for performance.
It's done for safety.
So for those who don't know, tail call optimization is when you have a recursive function call
where the recursive call, the result of the recursive call will be the result of the calling
function, if that's correct, will not actually be compiled down to a function calling itself
over and over.
It will be compiled down to a while loop.
And that is to avoid adding elements to the stack and eventually causing a stack overflow
That's the main use of it.
But because you avoid a lot of function calls, you also increase performance a lot.
So that's a case where the language only allows you to use functions and functions calling
But as long as we keep the illusion that that is what is happening, we don't really care
about how it's compiled.
And so compiling it down to a while loop is perfectly fine and faster and safer.
Yeah, so while loop plus mutations as well.
Otherwise it doesn't make much sense.
Yeah, Jeroen, I think you've been trying to make more opportunities for tail call recursion
so that the Elm compiler isn't as limited in where it can apply that optimization, right?
And very promising results so far, but that's all I will say at the moment.
So that's sort of like, when I think about all this performance stuff, one of the things
that I think about is this idea of a compiler.
So like, for example, Svelte and the creator of Svelte, Rich Harris, talks a lot about
this idea of, he talks about a compiler for JavaScript and for JavaScript front end apps.
And the way he talks about it, he says, hey, we've got, instead of just writing interpreted
code, what if we had something that could be more intelligent and could understand how
to help us do what we're trying to achieve by understanding things better?
That's kind of how he talks about a compiler.
In Elm, it's almost like water to a fish.
Compiler is just such a ubiquitous concept in Elm that we almost don't think of it.
But what can the compiler do, knowing what it knows, to make our job easier?
So ideally, we shouldn't have to know this particular way of writing something is more
efficient than this other way, because the compiler can deduce that, especially with
like a pure language.
And so I find this to be like one of the really interesting things in Elm in particular is
how sophisticated can we get with the work that the compiler can take on to optimize
things intelligently for us?
That's a very good point.
And there are a bunch of things that the Elm compiler can do, knowing the semantics of
the language.
So currently, if you do a simple operation like checking two objects for equality, say,
if you were to do a value based comparison, a value based equality check of two objects
in JavaScript, that would be hard, I guess, to get something that works fast, is safe
from a Stack Overflow perspective, because doing that isn't baked into the language.
The right code, making sure that all the contents of two objects are in fact exactly the same.
It also has to be unambiguous, like, do you check the prototype of the object?
Yes, exactly.
And so that is actually surprisingly difficult in JavaScript to get that working 100% of
every single case.
In Elm, it's very simple.
First of all, because it's baked in, but also because of not allowing mutation, the implementation
of equality checking can actually be a shallow comparison, because you know that two objects
who have the same identity are also equal.
And so you can skip a lot of the work necessary to check two objects for equality.
And so having a compiler that understands or which lays certain restrictions on how
you write code can in fact make certain things a lot easier and more performant when compiled
So if you look at the output of the Elm compiler, the JavaScript it produces, if you look at
how equality is implemented, if you were to hand that over to a JavaScript developer and
say does this perform a deep equality check?
And he would say no, there are tons of issues with this.
But in the context of Elm, it works just fine, because it can rely on the fact that mutation
doesn't happen and these sorts of things.
Like having the same identity, if you have two objects with the same identity, that doesn't
necessarily mean that the object hasn't changed.
But in Elm that is in fact true.
So there's a bunch of stuff you can do knowing all the restrictions that Elm places on you.
Yeah, it's really interesting.
This blog post series you wrote about successes and failures in optimizing Elm's runtime performance,
which we'll link to in the show notes, you talk a lot about essentially how there are
all these optimizations baked into v8, which really it's sort of like a heuristics based
optimization, right?
Because their JavaScript is an interpreted language.
And then you have this sort of just in time compiler, which applies heuristics, which
can then get deoptimized.
That's why they're heuristics, because it's interpreting things as it goes and saying,
oh, hey, this will probably make it perform better.
And then it assumes that the shape of an object has these fields.
And then suddenly, boom, now there's a null in there that it didn't expect, or now something
is a string that was an int elsewhere.
And now it deoptimizes.
So it's doing these heuristics.
And as somebody doing these performance tunings in Elm compiler output, you're doing this
strange work of sort of trying to understand those heuristics and trying to activate the
heuristics in a way that they can predict Elm.
But you're not predicting it.
You know it because it's statically compiled code.
But you're trying to get this like just in time optimization to kick in in those places.
So it's like a weird it's a weird dance, isn't it?
Yeah, so like and that really boils down to the fact that the just in time compiler understands
JavaScript very, very well and has to account for all this sort of stuff so you can do in
And there are certain things that you can't do in Elm and certain things you can do, which
the JavaScript just in time compiler naturally has no knowledge about.
So really a lot of the stuff that I've done with this performance work is so Elm makes
it so that these things are always true.
How can I tell that to the JavaScript just in time compiler?
How can I make a JavaScript engine understand these things?
And that is sometimes very hard.
I actually have no clue how you would do that.
Is it just you write you transform the code to something that is relatively simple or
something like that?
Yeah, so like so one thing that the Elm compiler does today, which is which wasn't which was
originally done to reduce asset size, but which has a very cool performance benefit,
is that when it reads your entire Elm project compiles into JavaScript, it compiles all
your Elm code and all the dependencies and the core library, the runtime, everything
into one single namespace.
And when you call functions, and if you don't run this through Elm optimize level two, if
you if you call single error functions, then there are two things that comes out of this.
One is that you can see the function in scope.
And so it knows that the function cannot be no because it's right there.
And second of all, it knows that it's actually a function and not some crazy evaluated thing
that evaluated to a function.
So by having functions in the same scope, and readily available, the JavaScript engine
can infer a surprising amount of things about that function.
It doesn't have to look it up in the window or global, for instance, that would have a
performance cost.
And so the natural way to do namespacing in Elm is to create an object with certain fields
and that those fields point to functions, say.
But in Elm, you're just referencing a local function, the function that rise within local
So you know, it's a function, you know, it's not no, you don't have to look it up in an
object, which means you don't have to check is this an object, is the object referencing
actually there?
And if that property exists, is it no, right?
So there are a bunch of things that the compiler just doesn't have to deal with, because it
can see the function in the local scope.
And V8 understands that it makes it run faster than if you had to go through objects with
a lookup, for instance.
So one thing that I've seen in when just asking the V8 engine to just tell me, what are the
steps you go through to, like, how do you optimize this plain regular JavaScript function
into assembly, then every time you do like an object lookup, it will produce this check,
which checks is this thing that I got from this object null.
And that will always happen because in JavaScript, you can always go into a REPL and then add
stuff which can change.
And so even though the just in time compiler can be reasonably certain at some point that
this thing isn't null, that doesn't mean it cannot be null later.
So it always has to like defensively add a bunch of checks.
And that's kind of annoying because we have all those guarantees and then we still have
to re prove it again, kind of like going through paperwork for administration.
You have to send a sign up form, send it over and then do the same one again for another
service or something.
It is very bureaucratic, isn't it?
In a way.
At least it's faster.
So that makes me think about WebAssembly.
And of course, I mean, I think that WebAssembly can become maybe a silver bullet where it
solves all the performance issues, right?
In people's minds.
And that's not necessarily, it's not as simple as that.
What is WebAssembly?
What is this WebAssembly that you're talking about?
Should we define WebAssembly?
So it is essentially, correct me if I'm wrong, but it is something that gives you lower level
control rather than this like interpreted language of JavaScript that can run natively
in the browsers.
It's something that can be executed natively in the browsers.
It actually has, it's actually typed.
So you write these sort of essentially byte code instructions, right?
And you can have it as a compile target so you can compile Rust or whatever languages
to that compile target.
And it gives you more low level control over memory management.
It doesn't come with built in garbage collection, things like that.
But it gives you more nuanced control over performance and doesn't rely as much on these
heuristics for just in time optimizations.
Is that a fair summary?
It's pretty correct.
So it's a very low level language in the same way that Java byte code and.NET byte code.
In fact, it's very similar to those sort of things, which most developers don't look at
at all.
But the big difference from WebAssembly and compared to something like Java byte code
is that there are way fewer instructions and there are way fewer built in things.
Like it doesn't have a garbage collector.
That is one thing it just doesn't have.
It doesn't have strings or any sort of data structure.
All you get is this one huge continuous array, some instructions to look into that.
And you get functions and you get four types.
Five if you include functions.
And those four types are 32 bit integer, 64 bit integer, 32 bit float, and 64 bit float.
And that's really all you have to work with.
Regarding the point that people think that WebAssembly will come in and solve all our
performance problems, that's not really true.
Like if you have a compiler that spits out very easily to optimize JavaScript and you
have a compiler that compiles into very performant WebAssembly, you can probably expect about
the same performance.
However, the thing about WebAssembly is that it doesn't, since it's not JavaScript and
since you don't have to do a lot of crazy stuff to get good performance, WebAssembly,
like there is no guesswork involved.
The compiler doesn't have to guess how do I compile this in the most optimal way.
It simply just, okay, these byte codes can be compiled directly into this.
And so it's much faster to compile and it doesn't have to guess how this should be compiled,
which means it doesn't get a lot of stuff wrong.
And the result of that is that you can expect to a much higher degree what the performance
of compiled WebAssembly will be compared to JavaScript.
Because in JavaScript, everything depends on what happens at runtime.
So if you have a very simple program, all it does is that it takes an array of a thousand
elements and wants to call the plus operation on them.
It's a very simple thing to write in JavaScript.
It's relatively simple to write in WebAssembly.
In WebAssembly, if that array contains integers or if it contains strings, it will be pretty
much the same performance if you implement it to support both.
You will get the same performance every time.
In JavaScript, if the just in time compiler only sees arrays of integers, you will get
very good performance.
But if it sees sometimes an array of integers and sometimes an array of strings, then you
will get worse performance than if it only sees integers.
It can't specialize the code as well.
So in WebAssembly, you can write code where I expect it to have this performance profile
and it will pretty much always have that.
Whereas in JavaScript, it all depends on what the just in time compiler sees when the program
is running.
And you also remove all those checks that we mentioned like, is this indeed an integer?
Is this indeed a string?
Those won't have to be done in WebAssembly, but they're done under the hood in JavaScript
all the time.
That's true.
But so there was this blog post and I don't remember the name of it.
I can try to find out later and maybe we can add it to show notes.
But there was this blog post where somebody wrote, I think it was the Firefox team, which
rewrote PDF reader, I think.
They rewrote it in WebAssembly and said, look, it's a hundred times faster or something because
the previous version was in JavaScript.
Well, that's promising.
A PDF viewer it was.
So it was the built in PDF viewer in Firefox.
They rewrote to WebAssembly and it was 50, 100 times faster, something along those lines.
And then there was a followup blog post to that where someone just changed the JavaScript
version and they got about the same performance.
But the thing is, so if you compile to WebAssembly, it is much easier for you to create WebAssembly,
which will give you the best performance.
Whereas in JavaScript, you have to not only know JavaScript very well, but you have to
know how the different JavaScript engines compile optimal code.
And so that is, it's much harder to create optimal JavaScript that compiles and optimize
as well than WebAssembly in theory, I guess.
So one thing that is pretty tricky with compiling to JavaScript and expecting good performance
is that you need to compare it to multiple implementations of engines.
So you need to run benchmarks on Chrome, on Firefox, on Safari, and they have very different
engines and therefore have very different results on benchmarks.
So if you change some code, sometimes you will have better performance on Chrome and
worse performance on Safari, for instance.
Would that also be the case with WebAssembly?
Would each browser have their own implementation of WebAssembly?
Well yes, they will.
But at the same time, there are only so many ways of compiling a WebAssembly program because
there are very few byte codes and there are very few data structures.
And essentially, there aren't many ways that a single byte code instruction can be compiled.
And so you are likely, so if you compile WebAssembly a specific way, you are likely to get the
best possible performance for that code.
And of course, the Firefox WebAssembly compiler could be a worse compiler than the Chrome
one, but at the very least, you're not relying on how good the compiler is at guessing how
it should optimize the code.
I'm guessing that will be true for the beginning, but maybe not later.
For instance, I'm guessing the V8 or actually the engines for the different JavaScript engines,
they were not trying to be smart at the beginning, but then they noticed, oh, we can try to be
smart to improve performance.
And then they just piled improvement over improvement and made it very complex and unintuitive.
And I'm guessing maybe that could be true for WebAssembly as well, maybe not to the
same extent.
So I mean, that's always possible, right?
You always run the risk that Safari adds another WebAssembly specialized compiler, which does
runtime profiling to improve code.
That can of course happen.
But one thing that has happened a lot in my performance work is that...
So when I was implementing the array data structure for Elm, one thing that surprised
me was that, okay, I was going to implement
And in my mind, the Elm array, for those who don't know, is a tree structure that if you
have 32 elements or less, it's just a normal JavaScript array.
If you have more than 32 elements, it will become a tree where each level of the tree
has 32 elements.
And so it will grow...
So if you have 60 elements, then the Elm array will be one array with two elements.
Those elements point to arrays where the first array contains the first 32 elements and the
second array contains the next 28.
And as you add more elements, the tree grows.
That was probably not the best summary of how an Elm array works.
But the important thing for this particular story is to know that an array consists of
multiple JavaScript arrays under the hood.
So when I was implementing, the natural thing for me to do was to implement
that in terms of the built in JavaScript instead of writing a for loop and kind of
like reimplementing myself.
But it turned out that using the built in for JavaScript arrays was very fast
in Chrome.
But compared to a for loop doing array.push, it was slower in Firefox.
In Firefox, writing the actual loop was way faster than using
And in WebAssembly, you wouldn't have such a difference.
If you were going to implement the, you would do it pretty much the only way you
can in WebAssembly.
And even though the performance can be worse in one browser compared to another, there
wouldn't be...
You wouldn't do it...
You wouldn't get better performance by doing it in a less obvious way, I guess.
There aren't that many ways of doing the same thing.
And so you can just count on the most obvious thing also being the fastest thing.
So what did you end up doing with the
How did you make that choice?
Well, really, since Elm is supposed to be used...
If I were doing this and I only cared about Chrome, then I would do whatever is fastest
for Chrome.
But because Elm can be used in a lot of scenarios, I had to do it the way which overall gave
the best thing.
And if I remember correctly, the performance difference for Firefox was so big that I ended
up prioritizing what was fastest for Firefox because the difference in Chrome wasn't that
So you kind of have to find one solution that works best when all browsers are considered.
So are you secretly hoping for Chrome to just win the competition used by everyone?
No, I think this is a slight departure from performance, but I think in the browser space
we're very well served with competition.
So I think the current...
I was sad to see Microsoft just adopt Chrome as their web browser, essentially, even though
I have no fond feelings towards Microsoft.
I think it's good with some competition in the browser space.
Of course, from a performance perspective, it would be nice if everything worked the
same way.
It would make my life a lot easier.
But I think for most people, it would be better with competition in the browser space.
So it seems like it comes down to control, like WebAssembly gives you more control over
Now, if you have more control over performance, that means it's not going to do an optimization
that you didn't build into it, which V8 or whatever SpiderMonkey's pre compilers are
going to do.
And to bring this back to Rock, one of the reasons why Rock can perform a lot of mutations,
which are safe to do in practice without losing purity, is because they have full control
of how the code compiles.
So in JavaScript, you have a garbage collector.
No matter what you do, you are going to create a language which on some level is garbage
So when you're compiling to WebAssembly or regular assembly, you don't have a garbage
collector, which gives you the freedom to implement memory management how you want to.
In Rock, one of the things they've done is that they use a reference counting sort of
garbage collection.
And while that, from a throughput perspective, is in general worse than a tracing garbage
collector, what it gives them is that they know when they have an object, they know exactly
how many is looking at that object.
And if the person who wants to change the object is also the only person who can observe
the object, doing a mutation is perfectly fine.
And so by using reference counting, they can actually get this performance optimization,
which is difficult to get with the garbage collected language.
And so that level of control, the problem with it is that you have to implement everything
But the upside is that you can do a lot of things you wouldn't normally be able to do.
So whatever the future holds for Elm, Brian Carroll has done some really cool experiments
prototyping WebAssembly output for Elm, which it's sort of an early prototype.
We don't know if that would ever be production ready or if it's just a proof of concept,
but either way, it's very interesting work.
But whatever the future holds for Elm, I kind of wonder what, I mean, in particular, the
two of you, Robin and Jeroen, you've been digging into performance a lot.
Jeroen has been doing that as a passion project lately.
And I wonder, are we scratching the surface for performance stuff in Elm?
Or is there a lot more that we have left?
Because one of the really interesting parts of the Elm story to me is in the early days,
there was a blog post, I think, comparing performance between these different front
end frameworks.
And Elm was one of the top performers, right?
And that's very interesting when you have this very high level language, and you have
these things that, you know, I mean, if it's your cup of tea, things like immutability
are really exciting in terms of reducing the cognitive load of the developer being able
to easily trace what your code is doing.
And it seems like it would be a burden for performance, but then suddenly you're getting
better performance.
And that's one of the really fascinating things to me is how can you take these characteristics
of the Elm language and leverage them to actually be ahead of the pack with performance?
So where do you guys think we are with performance optimizations in Elm?
Because I'm seeing all these like blog posts that you're writing, Robin, and I'm seeing
Jeroen's messages about like his screenshots on Twitter with these large percentage improvements
on certain benchmarks.
So are those things going to keep happening for a while?
Or are we reaching the limit of how much we can optimize Elm's performance?
Go ahead Jeroen.
Yeah, we talked about this in private and Robin said, we probably did the easy stuff.
So what I'm doing, like I'm seeing a function and I see a way to improve it performance
It's mostly just about removing unnecessary work or duplicate work, which happens a lot
more often than expected.
Like if you loop over a list two times, then it's slower than looping over it once.
So I'd see it a lot in a few functions and that's just more about how you write those
So it's easy to optimize those.
On a more optimizer level, so a compiler or Elm optimize level two or any other tool to
make all those manual changes not necessary, that would be a lot more work.
So you could write it an optimizer that says, well, here we are unnecessarily looping over
the list two times and we could merge those into one or write using a while loop or something
like that.
But that's a lot more work.
You need some knowledge that you may or may not have about what every function does.
So yeah, it's more complex and there's also a bundle size that we need to care about in
Elm, which is a trade off.
So from my point of view of what I've seen, I'm still touching things that feel pretty
So yeah, I don't know what's remaining.
But I'm starting to see other areas of explorations and then the scientific papers become a bit
Let's put it that way.
That's when the postdocs start doing the optimizations.
And also since we're using a pure function language, I don't know if it's the most researched
I'm sure a lot more people have researched how to improve the performance of C code than
Haskell code or Elm code for that matter.
So yeah.
I think there are two very interesting...
I think I'll go as far as to say that we have a lot of knowledge and a lot of ideas about
how we can make Elm code compile faster.
And there are certainly...
Compile to faster output.
Because it compiles pretty darn fast.
I mean, any more improvements are welcome.
Thank you for that.
So I think there are several people who knows a lot of easy wins, I guess we can say.
Elm optimized level two does this thing where it's able to compile a lot of stuff into direct
function calls instead of going through carrying helpers.
That happens today.
And from the benchmarks I've seen, that can easily increase performance by up to 20% in
some cases.
Of the overall program.
And then there are things that I've written about in the series of blog posts that I wrote
before Christmas where updating a record can be made up to eight times faster in some cases.
Which is huge.
Especially for applications that are continually looping over and up.
I mean, like games, for example, if you're on every frame updating game state and records.
So in that way, we are scratching the surface, I think, in what we can add.
We know that there are a lot of games that can be easily added to make Elm code run faster.
I know of several ways that the way that Elm is compiled in JavaScript could be changed
in order to increase the runtime performance of Elm code.
However, a lot of those optimizations would increase the JavaScript bundle size.
Sometimes by a lot.
So there are a lot.
So one of the things that make...
There are two things that kind of make performance work very difficult.
One of them is how much of a code size increase are we willing to accept in order to get optimal
And that is not going to be an easy thing to answer because that's always going to change
depending on what you do.
Like if you're writing a single page application, then as long as the characters the user types
on his keyboard arrives in a timely manner, performance isn't a concern.
And so asset size is probably the most important thing.
But for people writing games and physics engines and, you know, WebGL stuff, they would probably
accept pretty big code size increase in order to get most optimal performance.
And so that is a question which is very difficult to deal with when doing performance optimizations.
And the same for tooling like Elm Review and Elm Pages both do pretty heavy lifting in
a Node.js environment in your command line or your build step.
And if they can have big performance gains, whatever, give it 50% larger bundle size.
For a CLI app, that's an easy win.
For something that's running in your browser, that's probably not the right trade off.
And then, of course, another thing is the guesswork involved by the JavaScript just
in time compiler.
So there are certain things we could do, which, like in most languages, is the way to increase
performance like function inlining.
That would add most likely would increase the code size of the output.
But the thing is that the JavaScript just in time compiler already has inlining enabled.
So we could go through the hassle of creating a function inlining pass, but it wouldn't
necessarily give us better performance because the JavaScript engine might already do those
exact things.
And so that's one area where WebAssembly would be an easier thing to work with.
It wouldn't be easier because you'd have to implement a lot of stuff yourself.
But you would, to a much larger degree, understand if something was worth looking into because
it's a more predictable target.
So, like, a lot of the time that I've spent looking into performance has simply been,
so in theory, this should give better performance.
But in actuality, that may not be the case.
And so there are a bunch of experiments which I've done which sounds reasonable or sounds
completely unreasonable.
And I've been surprised by the result on more than one occasion.
So regarding bundle size, do you have a sense, Robin, because for anyone who doesn't know,
you've been working on Stabble.
It's a, what's it called, a stack language?
It's a stack based programming language, or stack oriented.
Stack oriented.
And it's really interesting, like, I know that one of the things that you wanted to
experiment with for that project was just outputting something to WebAssembly.
And so that's what it does.
And so you have a grasp of some of these real world applications of WebAssembly.
And how does it, how is it for bundle size?
Are WebAssembly output, is the bundle size larger, smaller, could go either way?
So it's difficult to know.
It depends on the language you want to compile to.
But I believe Brian Carroll posted some numbers on this.
Because in theory, WebAssembly bytecode instruction takes potentially just a byte.
So doing plus one two is smaller than writing one plus two in JavaScript.
Because it's compiled very efficiently.
On the other hand, you have to reimplement garbage collection, strings, carrying in the
case of Elm.
So it's not necessarily a clear win.
But I believe Brian Carroll has posted numbers on this sometime in the past.
And I believe with the garbage collection and with, admittedly not with all the semantics
of Elm in place.
But I think it had proof of concept garbage collector.
And I think a Hello World app or like the counter, the button counter example in Elm.
I think that compiled to, I'm taking this from memory so I could be very wrong.
But I believe it was something in the order of 12, 13 kilobytes before GZIP.
Oh, before GZIP.
So, and of course, the larger application becomes, the more in favor of WebAssembly
implementation becomes.
So I believe, and also with my experiments with Stavel, I believe that asset size would
be the one clear win from WebAssembly.
I didn't know it was, that the instructions were so condensed.
So WebAssembly has two formats.
There's a text format, which is meant for like, it's meant for, you can handwrite it,
but usually it's for viewing, debugging, sanity checking, that sort of stuff.
But the actual WebAssembly format is binary and it is very dense.
Like one of the things it does is that all integer literals are encoded using variable
sized encoding.
So even though you are representing a 32 bit integer, if the int literal is the number
10, it only takes up eight bits in the WebAssembly output.
So it's a very, very dense and optimized for size format.
That's huge.
I mean, the tiny bundle size potential is huge.
Well, or tiny, I don't know, but it's, that could be just as interesting as any performance
gains there.
So that's, that is super interesting.
Yeah, it's super interesting.
But of course, like Brian Carroll has been working on this for years and I don't think
is close to like a production ready compiler, which kind of goes to show that, you know,
WebAssembly has a lot of potential benefits, but working with it is very difficult.
Well not difficult, but very time consuming.
And I think with the current state of the compiler, you would have to do a lot of work
to get anywhere close to what Brian Carroll has got running today.
And one thing that's easy to do, not easy, one thing that's important to keep in mind
is that the Elm compiler is not an optimizing compiler.
Even though it type checks your code, it doesn't actually retain that information to the code
generation stage.
So there are a ton of things you would have to improve or complicate, I guess is a better
Like there, you would have to add a ton of complication to the Elm compiler in order
to be able to output WebAssembly, and that is very likely to come at a cost to compiler
Yes, which Evan has painstakingly optimized, I think largely by just reducing the amount
of memory that's passed around and that would be additional memory that you're passing around.
So yeah, it would have a cost for performance.
So yeah, it's not like WebAssembly is interesting.
It is super interesting, but it's also, it's not easy.
And of course, the JavaScript has a lot of faults, but it has a world class garbage collector
built in and it is pretty good at optimizing high level code.
So you wouldn't necessarily get better performance.
You will get a lot of complications in JavaScript interop.
You would probably get smaller asset sizes, but to get there would be a huge amount of
But it's not a clear improvement over what we have today.
Well, one of the things that has always fascinated me is like when you can have a paradigm that
you just slightly changed the way you're working and it has huge implications.
Like for example, I always found it really interesting how you take Elixir and this web
framework Phoenix and simply by having this one property of immutability, which actually
it feels fairly similar to writing something like Ruby.
You can even rebind variables and under the hood it's using immutable data, but it can
feel very familiar for somebody who's used to writing Ruby.
But you take Ruby on Rails and Elixir Phoenix and suddenly you can get this incredible request
throughput because the optimizations they can perform under the hood largely with trivial
You have this immutability that you can rely on and suddenly this very challenging problem
of parallelization, which requires a lot of work, including by the application developer
to manage how to safely share memory.
Those problems suddenly all just go away.
And I think that there's similar potential in Elm.
This is big picture, long term, who knows what will happen.
But when I look at the big picture of trends of programming languages, everything becomes
a question of parallelization rather than brute performance.
So like CPUs aren't getting any faster.
For five, 10 years they haven't gotten any faster.
The clock speed is not improving because it would start to get to the temperature of the
surface of the sun just the way that the physics of increasing clock speed works.
But what you can do by getting more transistors on a chip is you can have more parallel processing,
but you can't do it at a faster clock speed.
That's just a limit that we hit a long time ago and that's not going to change.
Can't they just improve physics?
Maybe quantum computers.
So when we're on the topic of Elixir, Joe Armstrong, who is one of the creators of the
Erlang programming language, which Elixir compiles down to, said that like so Erlang
has this notion of easy parallel writing, parallel programs is very easy.
Part of that is immutability.
Part of that is isolated actor processes.
It's a super interesting language.
So if you haven't checked it out, do.
But he worked on a project where they had an Erlang program and then they swapped out
the hardware from like a four core CPU to a 64 core CPU.
And then the same exact same program just ran, I believe it was 34 times faster or something.
And the product manager said, well, we got 64 cores.
Shouldn't it run even faster?
And his response was, well, if you were to take a C++ program and just swap out the CPU,
it would be zero times faster.
So it's yeah.
Wait, yeah.
Zero times faster or one time faster?
I don't know math.
It would be one extra speed and a 0% performance increase.
All right.
I mean, you could say it just crashes and then it's just zero times faster.
That's also likely, I would say.
This is to me.
I mean, Yaron and I had this sort of episode in the new year where we talked about what's
working for Elm.
And that was like one of the points that came up was, hey, we've got this language with
some really unique characteristics.
And how can we, instead of saying, oh, performance is really hard with immutability, how can
we say, well, but these things become easier and these things we have more opportunities.
I think parallelization is one of them.
And I don't know, looking 10 years down the road, are web apps going to be leveraging
parallelization more?
I don't know.
And I believe WebAssembly has primitives for delegating things in a parallel way.
So if I'm not mistaken.
So that could be an interesting space, long term, big picture.
And I think if you if you I forget if it is in 0.19, it could be 0.18.
But I believe if you look into Elm core and look at the process namespace, then you will
get to that.
There will be a comment there in the documentation that refers to in the future, we might have
multiple actors or multiple mailboxes or something along those lines, which is a clear reference
to Erlang actors.
And so this aspect has actually been thought about by Evan since multiple years.
So yeah, that might be like one aspect we tap into.
And of course, when just to underline the point even more, one of the big things when
Clojure came out, Clojure was like the first functional program that I it wasn't the first
functional program that I learned.
It was the first immutable by default language.
And one of the big draws to Clojure was that because of immutability, concurrency is suddenly
super easy.
And so even though you have to pay the price of immutable code, adding concurrency to program
is so easier that in a lot of cases, you actually get more correct and better performing programs.
And on the web, in a web browser, your code is single threaded.
So if you are doing work on the main thread, which if you just open up an index.js and
load that and do some work, that is that is blocking the main thread, including if a user
tries to scroll or tries to click a button, and there's an animation from a built in button
element on the page that's blocked, the render thread needs the opportunity to run.
And you're running on that same thread.
So you can you can use worker threads to do work, you do need to send send memory back
and forth.
But this is another potential space that could be very interesting for Elm because this sort
of Elm architecture is a very natural fit for performing the main work off of the main
thread, and then sending messages back to tell the main thread to update.
Who knows if anything like that will ever happen.
But these are these are the things that, again, it's like Elm is a compiler, and what can
we do make taking advantage of that.
And so I from this whole conversation, I really do get the sense that whatever the future
holds, there's more opportunity.
And we were not done picking off the low hanging fruit even but there are who knows, maybe
there's some big thing in the future that could could even blow those out of the water.
So it's it'll be interesting to see what happens.
And there are multiple cases of this also.
Like we talked a little bit about Elixir.
I mentioned Clojure, right, like immutability by default enables concurrency, or easy concurrency.
There is there is also like so one interesting thing is JavaScript itself.
One of the reasons node.js took off was because it has this event loop built in.
And so even though you can't perform computationally expensive things, because you will block the
thread, the node runtime or the JavaScript runtime makes it very easy to do to do event
based programming.
And if you write servers that, you know, call a database, and then they just wait for the
results, node was really, really good at utilizing the one thread it has, which languages like
Java and dotnet, which is spawn threads, wasn't that good at
same with Ruby, Ruby's had a lot of issues with blocking file, oh, IO operations.
So really, the reason why node took off was because in practice, you managed to get servers
which could handle more load without like without careful engineering, right, just by
default, you could handle tons of requests, as long as those requests weren't doing anything
And that's like JavaScript has a lot of flaws, but even JavaScript because of the limitations
it has, was able to outperform naive implementations in the server space, which is partly why it
took off.
So today, there are better alternatives, but back in 2009, or whatever it was, it was,
it was very interesting how you could handle a lot of requests on a single node.js server
compared to naive Java program, right, which is actually I believe why Ryan doll chose
JavaScript as the target language.
It wasn't originally his intent.
I can't remember maybe it was go or something else that he had in mind, but that event driven
architecture was just such a good fit for for JavaScript that he went with that.
Yeah, I never actually understood whether it was part of JavaScript or just part of
the implementations of JavaScript, that it was limited to a single thread.
I mean, I think that's the semantics of JavaScript basically that any anything you do runs on
a single thread, but then there's this concept of being able to have a queue up callbacks,
the callback queue and stuff.
Like I think the concept of like a callback queue and everything is baked into the semantics
of JavaScript.
And then the specifics of the things that can be done in a non blocking way are specific
to the node runtime or to the web runtime, like set timeout, for example, set timeout
is not part of JavaScript.
Set timeout is part of a runtime like the browser runtime or the nodejs runtime.
It doesn't exist independent of that.
But it uses the same mechanisms that you mentioned before that are built in.
Part of the spec, I guess.
Yeah, exactly.
Those same semantics of a callback queue.
So yeah, so having languages which have limits, those limits can enable certain features that
can be very well suited to certain kinds of programs.
And Elm definitely, if there's one thing that Elm has a lot of, it's limits.
And those exact limits can be utilized to some pretty interesting results.
HTML lazy, which we talked about earlier is one example of that.
Doing the similar kind of optimization in React takes a lot more planning.
I guess like you need to know that you do not perform mutation in this component or
it will be slow or it will produce buggy behavior.
Whereas now it's very likely that you can just tap into that optimization.
I wouldn't say it is limited.
I would say it has limitations and those enable you to have no limits.
Oh, that's great.
Hey, that's another t shirt.
Love it.
So Robin, when you're sitting down to write Elm application code, I mean, I'm sure performance
is this thing that you can't help but think about no matter what you do.
But are you typically just focused on writing the application code or do you run into places
where as an Elm application developer, you find that you need to really think about performance
and tune performance?
Does that happen very often?
You are correct in that when I write Elm code, it's very difficult for me to not think about
this is suboptimal from a performance perspective.
Fortunately, that's something I've become better to ignore as I've grown older.
So I would say that today I don't focus too much on performance normally.
Now I take it like if we have a performance problem, that's when I'm called in.
So like the reasons Elm CSS improvements are a result of that.
This application is laggy.
You know Elm very well.
How can you improve the situation?
And we improved it by using HTML lazy.
And then I got home and thought about how could we have avoided that optimization in
the first place?
Like could we have changed the framework to not have needed HTML lazy in that case?
So that's how it works now.
But one thing that I have learned is that there are certain things which do improve
performance but which also at least I think improve readability of the code.
There are many cases where that is opposite.
Like improving performance worsens code.
But I've found several things that improves performance and increases readability.
And usually this involves data structures.
Most often you can recognize a pattern, realize that this would be more efficient and more
readable by using the correct data structure.
And really in Elm we have this mantra, making possible states impossible.
And in a lot of cases making impossible states impossible also improves performance.
Because there's less error handling and it's easier to get exactly what you want with safety
guarantees but also performance guarantees.
Less checks as well.
So one simple thing that I also use in other programming languages like Java and Kotlin
is whenever I see list.find or something similar to do like a give me the item with this key.
To me that's like this should be a dictionary.
Like why isn't this a dictionary?
Like sometimes using a dictionary would be worse overall but in many cases it just screams
associative lookup.
You have a dictionary for this.
Yeah, usually I see and list.head and I'm thinking I should reach out for find.
And then maybe I should reach out for dicts.
But in the case that you mentioned, list.head, that's a perfect use of just using
a different data structure which gives you both performance and it improves the intent.
What did you, what are you trying to do?
So that's also a valid case.
Using dictionaries or using sets instead of like manually or through some other means
deduplicating your code usually also improves performance and makes it very clear what the
intent is.
And then using sippers or nonempty lists, same thing.
But retrieving the head of a nonempty list lets you avoid a case of which has performance
Now granted in many cases the performance improvement we're talking about is small and
But the true benefit is clearer code.
It's nice to realize that you can actually have both.
And you have to really consider the cost if you're doing performance optimization that
makes the code harder to reason about.
Also what's to prevent someone in the future from looking at that code and saying, oh,
this is kind of ugly and then tweaking it and then breaking the performance up.
But if it's the most elegant way to express it, it's a lasting improvement that's good
for your code base.
That's also kind of like what motivated me to improve performance of Elm CSS.
Because where I work, a lot of the people who write Elm code are working on their first
Elm application.
They learned Elm because they were hired at V or in some other back project.
And then we teach them Elm in a day or two and then we throw them out into the deep waters
of an Elm application.
Now figure it out.
And so there aren't many people that I work with on a day to day basis who has years of
Elm experience.
And so expecting them to not mess up code that involves HTML lazy is kind of a stretch.
So not having HTML lazy, like if we didn't need HTML lazy, it is less likely that performance
will degrade at some point.
Robin, how do you go about finding your next opportunity?
Is it like you were kind of describing with this Elm CSS case, scratching your own itch
where you're driving home from work and you're like, hmm, can we avoid doing an HTML lazy
Is that usually where you find your next opportunities for improvements?
I would love to say yes, because that's the way it should be.
But that is only something I realized once I turned 32.
Before that, I was probably where Jeroen is now.
Like he has discovered that performance work is really fun.
And so he starts looking at, well, maybe I can make this faster.
And oh, I could.
Maybe I should make this faster.
And there's nothing there's not necessarily anything wrong with that.
I don't mean to single you out.
No, I turned 32 in like three months.
Looking forward to it.
Prepare for wisdom.
But really, I did the same thing.
So the way I got into performance work was that I re implemented Elm arrays for zero
18, I think.
And the main reason for that was because Elm arrays were buggy.
They were written in JavaScript entirely.
And then there was like a very thin layer of Elm code to expose it to Elm.
And it had like it did have in certain cases, mutability, like if visible mutability, it
could cause runtime exceptions.
It wasn't good.
It wasn't pretty.
So the main reason was to rewrite it in as much Elm code as possible to make it safer.
But for it to be acceptable, it had to have at least the same ballpark of performance
to what was already there.
And so that's how I got into performance work.
I was trying to make an Elm array replacement, which didn't come at the cost of a huge performance
And like having a benchmark and seeing those numbers go up when you make changes became
addictive and then I just started looking around like the Elm core library seeing what
else can I make faster.
But really the biggest performance, the most important performance improvements are the
ones you notice is a problem.
Because I realized that I've spent a lot of time fixing things which aren't an issue and
which aren't necessarily likely to be an issue.
Wait, are you saying that string.pad improving the performance of that function is not big?
I'm just saying unless you have a performance problem, fixing a performance problem isn't
necessarily going to bring value to someone.
That's not to say that making something faster just for the sake of making it faster won't
be very useful somewhere down the line.
And if you enjoy optimizations, especially optimizations which doesn't make code look
worse and is harder to grasp, then there's no harm in it.
But if you want to be entirely certain that the work you do has meaning, then ideally
you should just come over something where you think this should be faster and then fix
And I might add, fix that in a scientific way.
Don't just think that, oh, if I replace this list fine with dict.get, then it will be much
And while it's probably faster now, do measurements and be certain that you are in fact making
something better.
And in an inevitable way.
Yes, yes.
So like a thousand times improvement is cool on paper, but if it in practice doesn't change
anything, then not saying that you should stop doing what you're doing, you're doing
awesome stuff.
I'm currently working on something that I think has users as well as improving performance.
So I'm very happy about that.
Okay, good, good.
But yeah, I remember that in some places I thought like list that appends is faster than
plus plus and I started using it everywhere.
And then I ran a benchmark just to just on list append versus plus plus.
Yeah, no difference.
So I did a lot of changes that were unnecessary and that didn't read much better.
So yeah, benchmark it.
And ultimately those things don't last.
You know, I mean, like, again, like somebody could refactor it because some something looks
ugly or it's a hack or if some if some code is using list dot append and it's a little
bit awkward and they're like, why doesn't this use plus plus and change it?
They're probably going to change it.
Maybe maybe it changes which one's faster than the other.
So there's always a cost to to making code uglier, right?
It's like a make it work.
Make it right.
Make it fast.
But that should be the last resort if you if you need to.
And if you benchmark it and see there's a problem.
So don't do this at home, kids.
Only do it at work.
So yeah, performance work is is a hobby.
It doesn't always bear fruits sometimes to do.
And that's great.
So it's it's as long as you're not hurting anyone.
Well, we do know that we've gotten a lot of amazing performance improvements from your
work, Robin.
So thank you for your work.
Thank you for being on to talk about this with us.
And yeah, thanks so much for coming back on.
Oh, my pleasure.
If anybody wants to to find out more, where should they where should they follow you?
Where can they go to read more?
Any any resources to leave people with?
I think perhaps the I think the best way is to follow me on Twitter.
That's at Rob Higgin.
Yeah, we'll drop a link in the show notes for people to do that.
Because sometimes I do when I do when I do stuff that's related to work or relatable
to work, then I post on the Beck blog.
And when I do stuff that's purely my own invention, I do it on my own dev2 account.
In either case, it ends up on Twitter.
So that's probably the best way to.
All right.
Thanks again, Robin.
Jeroen, until next time.
Until next time.