Optimizing Performance with Robin Hansen

Optimizing Performance with Robin Hansen

We talk about Robin's work optimizing Elm, and the opportunities that remain for Elm performance improvements.

PublishedJanuary 31, 2022

Episode#49

Robin Hansen (twitter) (github)
elm-optimize-level-2
Html.Lazy API
Outperforming Imperative with Pure Functional Languages - talk about Roc by Richard Feldman
Tail call optimization
Successes, and failures, in optimizing Elm’s runtime performance - Robin's blog post series on his elm-optimize-level-2 optimizations
WASM
Maybe you don't need Rust and WASM to speed up your JS
Brian Carroll's blog posts on an Elm WASM prototype
Robin's Stabel language - a stack oriented language that compiles to WASM
What's Working for Elm Elm Radio episode
Process.elm comment about the potential for parallelizing work

Transcript

And once again, we're back with Robin Hanson.

Thanks so much for coming back on, Robin.

Thanks for having me on.

What's it been, a year?

Yeah, I haven't talked to you all year long.

It's really exciting to sit down with you.

And Jeroen, I have a feeling you're going to be itching to ask a bunch of performance

related questions because today we're talking about performance with Robin, the Elm Performance

Guy and Jeroen, the guy who's trying to dethrone Robin as the one who's optimized performance

Yeah, I'm looking forward to it.

I let you win one game and now you're all confident.

You will never win again.

Yeah, so yeah, this is really exciting.

I think it's kind of an exciting time for performance stuff in Elm.

I think maybe these things have been happening in back channels right now, but I think we

might be seeing some performance improvements in Elm Optimize Level 2, which we've talked

about in previous episodes.

It's just a sort of post processor that goes in and tweaks the Elm compiled output to do

some performance optimizations.

So I think we've got some exciting stuff coming.

So I'm curious, before we get into some of these details about these performance optimizations

and everything, you've got a long history of doing performance work in Elm, working

on these data structures and benchmarking things.

Why do you do it?

Like, why do you care about Elm performance?

Okay, so there are two answers to this question.

The first one is like what people want to hear.

And the second answer to this question is the truth.

I think what people want to hear is that performance is really, really important.

I think it's I think the worst thing that can happen to Elm is that someone sits down,

writes a production app, and then it's laggy.

And for a language with a relatively small following, like Elm, where people might not

know how to fix a laggy application, that would be bad for the reputation of the language

and further adoption of the language.

So performance should not be your primary concern when doing the stuff that Elm is good

Because most of the time, optimizing for performance is simply not going to matter for the sort

of applications that you typically do with Elm.

But if you do get a performance problem, I think that would be very bad for Elm.

And so I've been working on performance things simply because I don't want people to have

a performance problem.

Wait, now, is that the truth?

Or is that what people want to hear?

That is a true answer.

But really what got me into this is fixing performance things or improving performance

problems is a relatively simple and fun activity.

Because if you do it correctly, no one is going to notice anything.

And so you don't have to go through a lot of API design discussions.

There's a lot less things to consider.

So it's a relatively easy thing to get into.

And it's also a relatively easy thing to measure the improvements of.

And of course, if you can improve something...

And you can probably attest to this, Jeroen.

If you make something 10 times faster or 50 times faster, it feels kind of good.

It's a hell of a drive.

It's super exciting.

But it's also important.

I think to avoid that.

To avoid people having a bad experience with Elm.

Although in most cases, people won't have them.

So Elm is a pretty high level language.

Like you were describing, if people get painted into a corner and there's a performance issue,

they might not have much they can do about it with Elm because it's pretty high level.

Doesn't give you a lot of control about expressing low level things that would affect performance

in a way that a language like Rust would maybe.

But at the same time, on the other side of the coin, because it's this high level, very

declarative and pure language, does that give you the opportunity to do more with performance

because it's more constrained?

Both yes and no.

Like so in Elm you have...

Well for the HTML library specifically, you have the HTML lazy namespace which provides

functions which allows you to avoid computation in the cases where nothing has ever changed.

And the reason why that is a good optimization when you can apply it and the reason it works

and is very, very fast is because of Elm's purity.

So you can do the same things in React, but it requires that you have made sure that everything

And when you do need such an optimization in React, I think you are going to have a

problem applying that optimization because things aren't pure by default.

And so there are definitely certain things which are much, much easier in Elm because

But on the other hand, there are things which is harder because of purity as well.

Like a dictionary maybe.

So that doesn't necessarily mean that data structures can't be faster in a pure language

compared to a language which allows you to use mutable data structures.

So one example of this is the dictionary implementation in Clojure, the HashMap implementation in

Clojure more specifically.

It turns out that when reading from a Clojure HashMap, admittedly when you have a HashMap

consisting of maybe like five or six million entries.

Which you do hardly ever.

But in the case you have such a big dictionary, it turns out that Clojure can actually be

faster for reading from said dictionary simply because of the tree structure which makes

it more cache friendly than your typical mutable HashMap, which is one continuous array.

So it can be faster by doing things in a purer way.

But you will normally struggle to make it as fast as mutable alternatives because you

have to copy a lot of stuff around.

Do you need to have immutability under the hood in an immutable language?

Because I mean, Richard has been talking a lot about these types of optimizations in

this rock language that he's been developing.

We'll link to a talk where he goes into some details on this.

But he uses some optimizations under the hood to perform mutation when possible in a way

where the user doesn't have the ability to mutate data.

But the compiler might see, well, the user won't notice that I've mutated something as

far as they're concerned.

They have the illusion of immutability.

And that's all we need.

So like, does that trade off apply to optimizing stuff in Elm?

Or for practical reasons, is that not a good approach?

Or for philosophical reasons, is that not the desired approach?

So if you can do it, then you can definitely get a lot of performance out of that.

And Rock has, at least from what I've seen, proven that you can have almost as fast code

written in a purely functional language as long as the compiler is able to utilize these

tricks under the hood.

And it's important to say that we don't really care about things actually being pure under

the hood, as long as you have the illusion of that being the case.

But currently in Elm, I don't think we make use of such optimizations.

No, that's kind of what I'm researching at the moment.

Like some of the optimizations that Rock does is kind of what I'm looking at at the moment.

There's some good results, but also it's like limited in what you can do, what you cannot

And I think that Rock has much more solid foundations to do it at the moment.

Yeah, when it's baked into the core of what the compiler is attempting to do, then the

compiler can track information around where a mutation happens and optimize for that.

But another very important aspect is that Rock doesn't have to compile to JavaScript.

And so it has a lot more control over what it can and cannot do.

For good and bad, you know, compiling to JavaScript is a lot easier.

But you lose some control along the way.

One thing that I'm thinking of which Elm does do and which most functional languages do

is tail call optimization.

Now tail call optimization isn't done first and foremost for performance.

It's done for safety.

So for those who don't know, tail call optimization is when you have a recursive function call

where the recursive call, the result of the recursive call will be the result of the calling

function, if that's correct, will not actually be compiled down to a function calling itself

It will be compiled down to a while loop.

And that is to avoid adding elements to the stack and eventually causing a stack overflow

That's the main use of it.

But because you avoid a lot of function calls, you also increase performance a lot.

So that's a case where the language only allows you to use functions and functions calling

But as long as we keep the illusion that that is what is happening, we don't really care

about how it's compiled.

And so compiling it down to a while loop is perfectly fine and faster and safer.

Yeah, so while loop plus mutations as well.

Otherwise it doesn't make much sense.

Yeah, Jeroen, I think you've been trying to make more opportunities for tail call recursion

so that the Elm compiler isn't as limited in where it can apply that optimization, right?

And very promising results so far, but that's all I will say at the moment.

So that's sort of like, when I think about all this performance stuff, one of the things

that I think about is this idea of a compiler.

So like, for example, Svelte and the creator of Svelte, Rich Harris, talks a lot about

this idea of, he talks about a compiler for JavaScript and for JavaScript front end apps.

And the way he talks about it, he says, hey, we've got, instead of just writing interpreted

code, what if we had something that could be more intelligent and could understand how

to help us do what we're trying to achieve by understanding things better?

That's kind of how he talks about a compiler.

In Elm, it's almost like water to a fish.

Compiler is just such a ubiquitous concept in Elm that we almost don't think of it.

But what can the compiler do, knowing what it knows, to make our job easier?

So ideally, we shouldn't have to know this particular way of writing something is more

efficient than this other way, because the compiler can deduce that, especially with

like a pure language.

And so I find this to be like one of the really interesting things in Elm in particular is

how sophisticated can we get with the work that the compiler can take on to optimize

things intelligently for us?

That's a very good point.

And there are a bunch of things that the Elm compiler can do, knowing the semantics of

So currently, if you do a simple operation like checking two objects for equality, say,

if you were to do a value based comparison, a value based equality check of two objects

in JavaScript, that would be hard, I guess, to get something that works fast, is safe

from a Stack Overflow perspective, because doing that isn't baked into the language.

The right code, making sure that all the contents of two objects are in fact exactly the same.

It also has to be unambiguous, like, do you check the prototype of the object?

And so that is actually surprisingly difficult in JavaScript to get that working 100% of

every single case.

In Elm, it's very simple.

First of all, because it's baked in, but also because of not allowing mutation, the implementation

of equality checking can actually be a shallow comparison, because you know that two objects

who have the same identity are also equal.

And so you can skip a lot of the work necessary to check two objects for equality.

And so having a compiler that understands or which lays certain restrictions on how

you write code can in fact make certain things a lot easier and more performant when compiled

So if you look at the output of the Elm compiler, the JavaScript it produces, if you look at

how equality is implemented, if you were to hand that over to a JavaScript developer and

say does this perform a deep equality check?

And he would say no, there are tons of issues with this.

But in the context of Elm, it works just fine, because it can rely on the fact that mutation

doesn't happen and these sorts of things.

Like having the same identity, if you have two objects with the same identity, that doesn't

necessarily mean that the object hasn't changed.

But in Elm that is in fact true.

So there's a bunch of stuff you can do knowing all the restrictions that Elm places on you.

Yeah, it's really interesting.

This blog post series you wrote about successes and failures in optimizing Elm's runtime performance,

which we'll link to in the show notes, you talk a lot about essentially how there are

all these optimizations baked into v8, which really it's sort of like a heuristics based

optimization, right?

Because their JavaScript is an interpreted language.

And then you have this sort of just in time compiler, which applies heuristics, which

can then get deoptimized.

That's why they're heuristics, because it's interpreting things as it goes and saying,

oh, hey, this will probably make it perform better.

And then it assumes that the shape of an object has these fields.

And then suddenly, boom, now there's a null in there that it didn't expect, or now something

is a string that was an int elsewhere.

And now it deoptimizes.

So it's doing these heuristics.

And as somebody doing these performance tunings in Elm compiler output, you're doing this

strange work of sort of trying to understand those heuristics and trying to activate the

heuristics in a way that they can predict Elm.

But you're not predicting it.

You know it because it's statically compiled code.

But you're trying to get this like just in time optimization to kick in in those places.

So it's like a weird it's a weird dance, isn't it?

Yeah, so like and that really boils down to the fact that the just in time compiler understands

JavaScript very, very well and has to account for all this sort of stuff so you can do in

And there are certain things that you can't do in Elm and certain things you can do, which

the JavaScript just in time compiler naturally has no knowledge about.

So really a lot of the stuff that I've done with this performance work is so Elm makes

it so that these things are always true.

How can I tell that to the JavaScript just in time compiler?

How can I make a JavaScript engine understand these things?

And that is sometimes very hard.

I actually have no clue how you would do that.

Is it just you write you transform the code to something that is relatively simple or

something like that?

Yeah, so like so one thing that the Elm compiler does today, which is which wasn't which was

originally done to reduce asset size, but which has a very cool performance benefit,

is that when it reads your entire Elm project compiles into JavaScript, it compiles all

your Elm code and all the dependencies and the core library, the runtime, everything

into one single namespace.

And when you call functions, and if you don't run this through Elm optimize level two, if

you if you call single error functions, then there are two things that comes out of this.

One is that you can see the function in scope.

And so it knows that the function cannot be no because it's right there.

And second of all, it knows that it's actually a function and not some crazy evaluated thing

that evaluated to a function.

So by having functions in the same scope, and readily available, the JavaScript engine

can infer a surprising amount of things about that function.

It doesn't have to look it up in the window or global, for instance, that would have a

performance cost.

And so the natural way to do namespacing in Elm is to create an object with certain fields

and that those fields point to functions, say.

But in Elm, you're just referencing a local function, the function that rise within local

So you know, it's a function, you know, it's not no, you don't have to look it up in an

object, which means you don't have to check is this an object, is the object referencing

actually there?

And if that property exists, is it no, right?

So there are a bunch of things that the compiler just doesn't have to deal with, because it

can see the function in the local scope.

And V8 understands that it makes it run faster than if you had to go through objects with

a lookup, for instance.

So one thing that I've seen in when just asking the V8 engine to just tell me, what are the

steps you go through to, like, how do you optimize this plain regular JavaScript function

into assembly, then every time you do like an object lookup, it will produce this check,

which checks is this thing that I got from this object null.

And that will always happen because in JavaScript, you can always go into a REPL and then add

stuff which can change.

And so even though the just in time compiler can be reasonably certain at some point that

this thing isn't null, that doesn't mean it cannot be null later.

So it always has to like defensively add a bunch of checks.

And that's kind of annoying because we have all those guarantees and then we still have

to re prove it again, kind of like going through paperwork for administration.

You have to send a sign up form, send it over and then do the same one again for another

service or something.

It is very bureaucratic, isn't it?

At least it's faster.

So that makes me think about WebAssembly.

And of course, I mean, I think that WebAssembly can become maybe a silver bullet where it

solves all the performance issues, right?

In people's minds.

And that's not necessarily, it's not as simple as that.

What is WebAssembly?

What is this WebAssembly that you're talking about?

Should we define WebAssembly?

So it is essentially, correct me if I'm wrong, but it is something that gives you lower level

control rather than this like interpreted language of JavaScript that can run natively

in the browsers.

It's something that can be executed natively in the browsers.

It actually has, it's actually typed.

So you write these sort of essentially byte code instructions, right?

And you can have it as a compile target so you can compile Rust or whatever languages

to that compile target.

And it gives you more low level control over memory management.

It doesn't come with built in garbage collection, things like that.

But it gives you more nuanced control over performance and doesn't rely as much on these

heuristics for just in time optimizations.

Is that a fair summary?

It's pretty correct.

So it's a very low level language in the same way that Java byte code and.NET byte code.

In fact, it's very similar to those sort of things, which most developers don't look at

But the big difference from WebAssembly and compared to something like Java byte code

is that there are way fewer instructions and there are way fewer built in things.

Like it doesn't have a garbage collector.

That is one thing it just doesn't have.

It doesn't have strings or any sort of data structure.

All you get is this one huge continuous array, some instructions to look into that.

And you get functions and you get four types.

Five if you include functions.

And those four types are 32 bit integer, 64 bit integer, 32 bit float, and 64 bit float.

And that's really all you have to work with.

Regarding the point that people think that WebAssembly will come in and solve all our

performance problems, that's not really true.

Like if you have a compiler that spits out very easily to optimize JavaScript and you

have a compiler that compiles into very performant WebAssembly, you can probably expect about

the same performance.

However, the thing about WebAssembly is that it doesn't, since it's not JavaScript and

since you don't have to do a lot of crazy stuff to get good performance, WebAssembly,

like there is no guesswork involved.

The compiler doesn't have to guess how do I compile this in the most optimal way.

It simply just, okay, these byte codes can be compiled directly into this.

And so it's much faster to compile and it doesn't have to guess how this should be compiled,

which means it doesn't get a lot of stuff wrong.

And the result of that is that you can expect to a much higher degree what the performance

of compiled WebAssembly will be compared to JavaScript.

Because in JavaScript, everything depends on what happens at runtime.

So if you have a very simple program, all it does is that it takes an array of a thousand

elements and wants to call the plus operation on them.

It's a very simple thing to write in JavaScript.

It's relatively simple to write in WebAssembly.

In WebAssembly, if that array contains integers or if it contains strings, it will be pretty

much the same performance if you implement it to support both.

You will get the same performance every time.

In JavaScript, if the just in time compiler only sees arrays of integers, you will get

very good performance.

But if it sees sometimes an array of integers and sometimes an array of strings, then you

will get worse performance than if it only sees integers.

It can't specialize the code as well.

So in WebAssembly, you can write code where I expect it to have this performance profile

and it will pretty much always have that.

Whereas in JavaScript, it all depends on what the just in time compiler sees when the program

And you also remove all those checks that we mentioned like, is this indeed an integer?

Is this indeed a string?

Those won't have to be done in WebAssembly, but they're done under the hood in JavaScript

But so there was this blog post and I don't remember the name of it.

I can try to find out later and maybe we can add it to show notes.

But there was this blog post where somebody wrote, I think it was the Firefox team, which

rewrote PDF reader, I think.

They rewrote it in WebAssembly and said, look, it's a hundred times faster or something because

the previous version was in JavaScript.

Well, that's promising.

A PDF viewer it was.

So it was the built in PDF viewer in Firefox.

They rewrote to WebAssembly and it was 50, 100 times faster, something along those lines.

And then there was a followup blog post to that where someone just changed the JavaScript

version and they got about the same performance.

But the thing is, so if you compile to WebAssembly, it is much easier for you to create WebAssembly,

which will give you the best performance.

Whereas in JavaScript, you have to not only know JavaScript very well, but you have to

know how the different JavaScript engines compile optimal code.

And so that is, it's much harder to create optimal JavaScript that compiles and optimize

as well than WebAssembly in theory, I guess.

So one thing that is pretty tricky with compiling to JavaScript and expecting good performance

is that you need to compare it to multiple implementations of engines.

So you need to run benchmarks on Chrome, on Firefox, on Safari, and they have very different

engines and therefore have very different results on benchmarks.

So if you change some code, sometimes you will have better performance on Chrome and

worse performance on Safari, for instance.

Would that also be the case with WebAssembly?

Would each browser have their own implementation of WebAssembly?

Well yes, they will.

But at the same time, there are only so many ways of compiling a WebAssembly program because

there are very few byte codes and there are very few data structures.

And essentially, there aren't many ways that a single byte code instruction can be compiled.

And so you are likely, so if you compile WebAssembly a specific way, you are likely to get the

best possible performance for that code.

And of course, the Firefox WebAssembly compiler could be a worse compiler than the Chrome

one, but at the very least, you're not relying on how good the compiler is at guessing how

it should optimize the code.

I'm guessing that will be true for the beginning, but maybe not later.

For instance, I'm guessing the V8 or actually the engines for the different JavaScript engines,

they were not trying to be smart at the beginning, but then they noticed, oh, we can try to be

smart to improve performance.

And then they just piled improvement over improvement and made it very complex and unintuitive.

And I'm guessing maybe that could be true for WebAssembly as well, maybe not to the

So I mean, that's always possible, right?

You always run the risk that Safari adds another WebAssembly specialized compiler, which does

runtime profiling to improve code.

That can of course happen.

But one thing that has happened a lot in my performance work is that...

So when I was implementing the array data structure for Elm, one thing that surprised

me was that, okay, I was going to implement array.map.

And in my mind, the Elm array, for those who don't know, is a tree structure that if you

have 32 elements or less, it's just a normal JavaScript array.

If you have more than 32 elements, it will become a tree where each level of the tree

has 32 elements.

And so it will grow...

So if you have 60 elements, then the Elm array will be one array with two elements.

Those elements point to arrays where the first array contains the first 32 elements and the

second array contains the next 28.

And as you add more elements, the tree grows.

That was probably not the best summary of how an Elm array works.

But the important thing for this particular story is to know that an array consists of

multiple JavaScript arrays under the hood.

So when I was implementing array.map, the natural thing for me to do was to implement

that in terms of the built in JavaScript array.map instead of writing a for loop and kind of

like reimplementing array.map myself.

But it turned out that using the built in array.map for JavaScript arrays was very fast

But compared to a for loop doing array.push, it was slower in Firefox.

In Firefox, writing the actual loop was way faster than using array.map.

And in WebAssembly, you wouldn't have such a difference.

If you were going to implement the array.map, you would do it pretty much the only way you

can in WebAssembly.

And even though the performance can be worse in one browser compared to another, there

You wouldn't do it...

You wouldn't get better performance by doing it in a less obvious way, I guess.

There aren't that many ways of doing the same thing.

And so you can just count on the most obvious thing also being the fastest thing.

So what did you end up doing with the array.map?

How did you make that choice?

Well, really, since Elm is supposed to be used...

If I were doing this and I only cared about Chrome, then I would do whatever is fastest

But because Elm can be used in a lot of scenarios, I had to do it the way which overall gave

the best thing.

And if I remember correctly, the performance difference for Firefox was so big that I ended

up prioritizing what was fastest for Firefox because the difference in Chrome wasn't that

So you kind of have to find one solution that works best when all browsers are considered.

So are you secretly hoping for Chrome to just win the competition used by everyone?

No, I think this is a slight departure from performance, but I think in the browser space

we're very well served with competition.

So I think the current...

I was sad to see Microsoft just adopt Chrome as their web browser, essentially, even though

I have no fond feelings towards Microsoft.

I think it's good with some competition in the browser space.

Of course, from a performance perspective, it would be nice if everything worked the

It would make my life a lot easier.

But I think for most people, it would be better with competition in the browser space.

So it seems like it comes down to control, like WebAssembly gives you more control over

Now, if you have more control over performance, that means it's not going to do an optimization

that you didn't build into it, which V8 or whatever SpiderMonkey's pre compilers are

And to bring this back to Rock, one of the reasons why Rock can perform a lot of mutations,

which are safe to do in practice without losing purity, is because they have full control

of how the code compiles.

So in JavaScript, you have a garbage collector.

No matter what you do, you are going to create a language which on some level is garbage

So when you're compiling to WebAssembly or regular assembly, you don't have a garbage

collector, which gives you the freedom to implement memory management how you want to.

In Rock, one of the things they've done is that they use a reference counting sort of

garbage collection.

And while that, from a throughput perspective, is in general worse than a tracing garbage

collector, what it gives them is that they know when they have an object, they know exactly

how many is looking at that object.

And if the person who wants to change the object is also the only person who can observe

the object, doing a mutation is perfectly fine.

And so by using reference counting, they can actually get this performance optimization,

which is difficult to get with the garbage collected language.

And so that level of control, the problem with it is that you have to implement everything

But the upside is that you can do a lot of things you wouldn't normally be able to do.

So whatever the future holds for Elm, Brian Carroll has done some really cool experiments

prototyping WebAssembly output for Elm, which it's sort of an early prototype.

We don't know if that would ever be production ready or if it's just a proof of concept,

but either way, it's very interesting work.

But whatever the future holds for Elm, I kind of wonder what, I mean, in particular, the

two of you, Robin and Jeroen, you've been digging into performance a lot.

Jeroen has been doing that as a passion project lately.

And I wonder, are we scratching the surface for performance stuff in Elm?

Or is there a lot more that we have left?

Because one of the really interesting parts of the Elm story to me is in the early days,

there was a blog post, I think, comparing performance between these different front

end frameworks.

And Elm was one of the top performers, right?

And that's very interesting when you have this very high level language, and you have

these things that, you know, I mean, if it's your cup of tea, things like immutability

are really exciting in terms of reducing the cognitive load of the developer being able

to easily trace what your code is doing.

And it seems like it would be a burden for performance, but then suddenly you're getting

better performance.

And that's one of the really fascinating things to me is how can you take these characteristics

of the Elm language and leverage them to actually be ahead of the pack with performance?

So where do you guys think we are with performance optimizations in Elm?

Because I'm seeing all these like blog posts that you're writing, Robin, and I'm seeing

Jeroen's messages about like his screenshots on Twitter with these large percentage improvements

on certain benchmarks.

So are those things going to keep happening for a while?

Or are we reaching the limit of how much we can optimize Elm's performance?

Go ahead Jeroen.

Yeah, we talked about this in private and Robin said, we probably did the easy stuff.

So what I'm doing, like I'm seeing a function and I see a way to improve it performance

It's mostly just about removing unnecessary work or duplicate work, which happens a lot

more often than expected.

Like if you loop over a list two times, then it's slower than looping over it once.

So I'd see it a lot in a few functions and that's just more about how you write those

So it's easy to optimize those.

On a more optimizer level, so a compiler or Elm optimize level two or any other tool to

make all those manual changes not necessary, that would be a lot more work.

So you could write it an optimizer that says, well, here we are unnecessarily looping over

the list two times and we could merge those into one or write using a while loop or something

But that's a lot more work.

You need some knowledge that you may or may not have about what every function does.

So yeah, it's more complex and there's also a bundle size that we need to care about in

Elm, which is a trade off.

So from my point of view of what I've seen, I'm still touching things that feel pretty

So yeah, I don't know what's remaining.

But I'm starting to see other areas of explorations and then the scientific papers become a bit

Let's put it that way.

That's when the postdocs start doing the optimizations.

And also since we're using a pure function language, I don't know if it's the most researched

I'm sure a lot more people have researched how to improve the performance of C code than

Haskell code or Elm code for that matter.

I think there are two very interesting...

I think I'll go as far as to say that we have a lot of knowledge and a lot of ideas about

how we can make Elm code compile faster.

And there are certainly...

Compile to faster output.

Because it compiles pretty darn fast.

I mean, any more improvements are welcome.

Thank you for that.

So I think there are several people who knows a lot of easy wins, I guess we can say.

Elm optimized level two does this thing where it's able to compile a lot of stuff into direct

function calls instead of going through carrying helpers.

That happens today.

And from the benchmarks I've seen, that can easily increase performance by up to 20% in

Of the overall program.

And then there are things that I've written about in the series of blog posts that I wrote

before Christmas where updating a record can be made up to eight times faster in some cases.

Especially for applications that are continually looping over and up.

I mean, like games, for example, if you're on every frame updating game state and records.

So in that way, we are scratching the surface, I think, in what we can add.

We know that there are a lot of games that can be easily added to make Elm code run faster.

I know of several ways that the way that Elm is compiled in JavaScript could be changed

in order to increase the runtime performance of Elm code.

However, a lot of those optimizations would increase the JavaScript bundle size.

Sometimes by a lot.

So there are a lot.

So one of the things that make...

There are two things that kind of make performance work very difficult.

One of them is how much of a code size increase are we willing to accept in order to get optimal

And that is not going to be an easy thing to answer because that's always going to change

depending on what you do.

Like if you're writing a single page application, then as long as the characters the user types

on his keyboard arrives in a timely manner, performance isn't a concern.

And so asset size is probably the most important thing.

But for people writing games and physics engines and, you know, WebGL stuff, they would probably

accept pretty big code size increase in order to get most optimal performance.

And so that is a question which is very difficult to deal with when doing performance optimizations.

And the same for tooling like Elm Review and Elm Pages both do pretty heavy lifting in

a Node.js environment in your command line or your build step.

And if they can have big performance gains, whatever, give it 50% larger bundle size.

For a CLI app, that's an easy win.

For something that's running in your browser, that's probably not the right trade off.

And then, of course, another thing is the guesswork involved by the JavaScript just

in time compiler.

So there are certain things we could do, which, like in most languages, is the way to increase

performance like function inlining.

That would add most likely would increase the code size of the output.

But the thing is that the JavaScript just in time compiler already has inlining enabled.

So we could go through the hassle of creating a function inlining pass, but it wouldn't

necessarily give us better performance because the JavaScript engine might already do those

And so that's one area where WebAssembly would be an easier thing to work with.

It wouldn't be easier because you'd have to implement a lot of stuff yourself.

But you would, to a much larger degree, understand if something was worth looking into because

it's a more predictable target.

So, like, a lot of the time that I've spent looking into performance has simply been,

so in theory, this should give better performance.

But in actuality, that may not be the case.

And so there are a bunch of experiments which I've done which sounds reasonable or sounds

completely unreasonable.

And I've been surprised by the result on more than one occasion.

So regarding bundle size, do you have a sense, Robin, because for anyone who doesn't know,

you've been working on Stabble.

It's a, what's it called, a stack language?

It's a stack based programming language, or stack oriented.

Stack oriented.

And it's really interesting, like, I know that one of the things that you wanted to

experiment with for that project was just outputting something to WebAssembly.

And so that's what it does.

And so you have a grasp of some of these real world applications of WebAssembly.

And how does it, how is it for bundle size?

Are WebAssembly output, is the bundle size larger, smaller, could go either way?

So it's difficult to know.

It depends on the language you want to compile to.

But I believe Brian Carroll posted some numbers on this.

Because in theory, WebAssembly bytecode instruction takes potentially just a byte.

So doing plus one two is smaller than writing one plus two in JavaScript.

Because it's compiled very efficiently.

On the other hand, you have to reimplement garbage collection, strings, carrying in the

So it's not necessarily a clear win.

But I believe Brian Carroll has posted numbers on this sometime in the past.

And I believe with the garbage collection and with, admittedly not with all the semantics

of Elm in place.

But I think it had proof of concept garbage collector.

And I think a Hello World app or like the counter, the button counter example in Elm.

I think that compiled to, I'm taking this from memory so I could be very wrong.

But I believe it was something in the order of 12, 13 kilobytes before GZIP.

Oh, before GZIP.

So, and of course, the larger application becomes, the more in favor of WebAssembly

implementation becomes.

So I believe, and also with my experiments with Stavel, I believe that asset size would

be the one clear win from WebAssembly.

I didn't know it was, that the instructions were so condensed.

So WebAssembly has two formats.

There's a text format, which is meant for like, it's meant for, you can handwrite it,

but usually it's for viewing, debugging, sanity checking, that sort of stuff.

But the actual WebAssembly format is binary and it is very dense.

Like one of the things it does is that all integer literals are encoded using variable

sized encoding.

So even though you are representing a 32 bit integer, if the int literal is the number

10, it only takes up eight bits in the WebAssembly output.

So it's a very, very dense and optimized for size format.

I mean, the tiny bundle size potential is huge.

Well, or tiny, I don't know, but it's, that could be just as interesting as any performance

So that's, that is super interesting.

Yeah, it's super interesting.

But of course, like Brian Carroll has been working on this for years and I don't think

is close to like a production ready compiler, which kind of goes to show that, you know,

WebAssembly has a lot of potential benefits, but working with it is very difficult.

Well not difficult, but very time consuming.

And I think with the current state of the compiler, you would have to do a lot of work

to get anywhere close to what Brian Carroll has got running today.

And one thing that's easy to do, not easy, one thing that's important to keep in mind

is that the Elm compiler is not an optimizing compiler.

Even though it type checks your code, it doesn't actually retain that information to the code

generation stage.

So there are a ton of things you would have to improve or complicate, I guess is a better

Like there, you would have to add a ton of complication to the Elm compiler in order

to be able to output WebAssembly, and that is very likely to come at a cost to compiler

Yes, which Evan has painstakingly optimized, I think largely by just reducing the amount

of memory that's passed around and that would be additional memory that you're passing around.

So yeah, it would have a cost for performance.

So yeah, it's not like WebAssembly is interesting.

It is super interesting, but it's also, it's not easy.

And of course, the JavaScript has a lot of faults, but it has a world class garbage collector

built in and it is pretty good at optimizing high level code.

So you wouldn't necessarily get better performance.

You will get a lot of complications in JavaScript interop.

You would probably get smaller asset sizes, but to get there would be a huge amount of

But it's not a clear improvement over what we have today.

Well, one of the things that has always fascinated me is like when you can have a paradigm that

you just slightly changed the way you're working and it has huge implications.

Like for example, I always found it really interesting how you take Elixir and this web

framework Phoenix and simply by having this one property of immutability, which actually

it feels fairly similar to writing something like Ruby.

You can even rebind variables and under the hood it's using immutable data, but it can

feel very familiar for somebody who's used to writing Ruby.

But you take Ruby on Rails and Elixir Phoenix and suddenly you can get this incredible request

throughput because the optimizations they can perform under the hood largely with trivial

parallelization.

You have this immutability that you can rely on and suddenly this very challenging problem

of parallelization, which requires a lot of work, including by the application developer

to manage how to safely share memory.

Those problems suddenly all just go away.

And I think that there's similar potential in Elm.

This is big picture, long term, who knows what will happen.

But when I look at the big picture of trends of programming languages, everything becomes

a question of parallelization rather than brute performance.

So like CPUs aren't getting any faster.

For five, 10 years they haven't gotten any faster.

The clock speed is not improving because it would start to get to the temperature of the

surface of the sun just the way that the physics of increasing clock speed works.

But what you can do by getting more transistors on a chip is you can have more parallel processing,

but you can't do it at a faster clock speed.

That's just a limit that we hit a long time ago and that's not going to change.

Can't they just improve physics?

Maybe quantum computers.

So when we're on the topic of Elixir, Joe Armstrong, who is one of the creators of the

Erlang programming language, which Elixir compiles down to, said that like so Erlang

has this notion of easy parallel writing, parallel programs is very easy.

Part of that is immutability.

Part of that is isolated actor processes.

It's a super interesting language.

So if you haven't checked it out, do.

But he worked on a project where they had an Erlang program and then they swapped out

the hardware from like a four core CPU to a 64 core CPU.

And then the same exact same program just ran, I believe it was 34 times faster or something.

And the product manager said, well, we got 64 cores.

Shouldn't it run even faster?

And his response was, well, if you were to take a C++ program and just swap out the CPU,

it would be zero times faster.

Zero times faster or one time faster?

I don't know math.

It would be one extra speed and a 0% performance increase.

I mean, you could say it just crashes and then it's just zero times faster.

That's also likely, I would say.

I mean, Yaron and I had this sort of episode in the new year where we talked about what's

working for Elm.

And that was like one of the points that came up was, hey, we've got this language with

some really unique characteristics.

And how can we, instead of saying, oh, performance is really hard with immutability, how can

we say, well, but these things become easier and these things we have more opportunities.

I think parallelization is one of them.

And I don't know, looking 10 years down the road, are web apps going to be leveraging

parallelization more?

And I believe WebAssembly has primitives for delegating things in a parallel way.

So if I'm not mistaken.

So that could be an interesting space, long term, big picture.

And I think if you if you I forget if it is in 0.19, it could be 0.18.

But I believe if you look into Elm core and look at the process namespace, then you will

There will be a comment there in the documentation that refers to in the future, we might have

multiple actors or multiple mailboxes or something along those lines, which is a clear reference

to Erlang actors.

And so this aspect has actually been thought about by Evan since multiple years.

So yeah, that might be like one aspect we tap into.

And of course, when just to underline the point even more, one of the big things when

Clojure came out, Clojure was like the first functional program that I it wasn't the first

functional program that I learned.

It was the first immutable by default language.

And one of the big draws to Clojure was that because of immutability, concurrency is suddenly

And so even though you have to pay the price of immutable code, adding concurrency to program

is so easier that in a lot of cases, you actually get more correct and better performing programs.

And on the web, in a web browser, your code is single threaded.

So if you are doing work on the main thread, which if you just open up an index.js and

load that and do some work, that is that is blocking the main thread, including if a user

tries to scroll or tries to click a button, and there's an animation from a built in button

element on the page that's blocked, the render thread needs the opportunity to run.

And you're running on that same thread.

So you can you can use worker threads to do work, you do need to send send memory back

But this is another potential space that could be very interesting for Elm because this sort

of Elm architecture is a very natural fit for performing the main work off of the main

thread, and then sending messages back to tell the main thread to update.

Who knows if anything like that will ever happen.

But these are these are the things that, again, it's like Elm is a compiler, and what can

we do make taking advantage of that.

And so I from this whole conversation, I really do get the sense that whatever the future

holds, there's more opportunity.

And we were not done picking off the low hanging fruit even but there are who knows, maybe

there's some big thing in the future that could could even blow those out of the water.

So it's it'll be interesting to see what happens.

And there are multiple cases of this also.

Like we talked a little bit about Elixir.

I mentioned Clojure, right, like immutability by default enables concurrency, or easy concurrency.

There is there is also like so one interesting thing is JavaScript itself.

One of the reasons node.js took off was because it has this event loop built in.

And so even though you can't perform computationally expensive things, because you will block the

thread, the node runtime or the JavaScript runtime makes it very easy to do to do event

based programming.

And if you write servers that, you know, call a database, and then they just wait for the

results, node was really, really good at utilizing the one thread it has, which languages like

Java and dotnet, which is spawn threads, wasn't that good at

same with Ruby, Ruby's had a lot of issues with blocking file, oh, IO operations.

So really, the reason why node took off was because in practice, you managed to get servers

which could handle more load without like without careful engineering, right, just by

default, you could handle tons of requests, as long as those requests weren't doing anything

And that's like JavaScript has a lot of flaws, but even JavaScript because of the limitations

it has, was able to outperform naive implementations in the server space, which is partly why it

So today, there are better alternatives, but back in 2009, or whatever it was, it was,

it was very interesting how you could handle a lot of requests on a single node.js server

compared to naive Java program, right, which is actually I believe why Ryan doll chose

JavaScript as the target language.

It wasn't originally his intent.

I can't remember maybe it was go or something else that he had in mind, but that event driven

architecture was just such a good fit for for JavaScript that he went with that.

Yeah, I never actually understood whether it was part of JavaScript or just part of

the implementations of JavaScript, that it was limited to a single thread.

I mean, I think that's the semantics of JavaScript basically that any anything you do runs on

a single thread, but then there's this concept of being able to have a queue up callbacks,

the callback queue and stuff.

Like I think the concept of like a callback queue and everything is baked into the semantics

And then the specifics of the things that can be done in a non blocking way are specific

to the node runtime or to the web runtime, like set timeout, for example, set timeout

is not part of JavaScript.

Set timeout is part of a runtime like the browser runtime or the nodejs runtime.

It doesn't exist independent of that.

But it uses the same mechanisms that you mentioned before that are built in.

Part of the spec, I guess.

Those same semantics of a callback queue.

So yeah, so having languages which have limits, those limits can enable certain features that

can be very well suited to certain kinds of programs.

And Elm definitely, if there's one thing that Elm has a lot of, it's limits.

And those exact limits can be utilized to some pretty interesting results.

HTML lazy, which we talked about earlier is one example of that.

Doing the similar kind of optimization in React takes a lot more planning.

I guess like you need to know that you do not perform mutation in this component or

it will be slow or it will produce buggy behavior.

Whereas now it's very likely that you can just tap into that optimization.

I wouldn't say it is limited.

I would say it has limitations and those enable you to have no limits.

Oh, that's great.

Hey, that's another t shirt.

So Robin, when you're sitting down to write Elm application code, I mean, I'm sure performance

is this thing that you can't help but think about no matter what you do.

But are you typically just focused on writing the application code or do you run into places

where as an Elm application developer, you find that you need to really think about performance

and tune performance?

Does that happen very often?

You are correct in that when I write Elm code, it's very difficult for me to not think about

this is suboptimal from a performance perspective.

Fortunately, that's something I've become better to ignore as I've grown older.

So I would say that today I don't focus too much on performance normally.

Now I take it like if we have a performance problem, that's when I'm called in.

So like the reasons Elm CSS improvements are a result of that.

This application is laggy.

You know Elm very well.

How can you improve the situation?

And we improved it by using HTML lazy.

And then I got home and thought about how could we have avoided that optimization in

the first place?

Like could we have changed the framework to not have needed HTML lazy in that case?

So that's how it works now.

But one thing that I have learned is that there are certain things which do improve

performance but which also at least I think improve readability of the code.

There are many cases where that is opposite.

Like improving performance worsens code.

But I've found several things that improves performance and increases readability.

And usually this involves data structures.

Most often you can recognize a pattern, realize that this would be more efficient and more

readable by using the correct data structure.

And really in Elm we have this mantra, making possible states impossible.

And in a lot of cases making impossible states impossible also improves performance.

Because there's less error handling and it's easier to get exactly what you want with safety

guarantees but also performance guarantees.

Less checks as well.

So one simple thing that I also use in other programming languages like Java and Kotlin

is whenever I see list.find or something similar to do like a give me the item with this key.

To me that's like this should be a dictionary.

Like why isn't this a dictionary?

Like sometimes using a dictionary would be worse overall but in many cases it just screams

associative lookup.

You have a dictionary for this.

Yeah, usually I see list.map and list.head and I'm thinking I should reach out for find.

And then maybe I should reach out for dicts.

But in the case that you mentioned list.map, list.head, that's a perfect use of just using

a different data structure which gives you both performance and it improves the intent.

What did you, what are you trying to do?

So that's also a valid case.

Using dictionaries or using sets instead of like manually or through some other means

deduplicating your code usually also improves performance and makes it very clear what the

And then using sippers or nonempty lists, same thing.

But retrieving the head of a nonempty list lets you avoid a case of which has performance

Now granted in many cases the performance improvement we're talking about is small and

But the true benefit is clearer code.

It's nice to realize that you can actually have both.

And you have to really consider the cost if you're doing performance optimization that

makes the code harder to reason about.

Also what's to prevent someone in the future from looking at that code and saying, oh,

this is kind of ugly and then tweaking it and then breaking the performance up.

But if it's the most elegant way to express it, it's a lasting improvement that's good

for your code base.

That's also kind of like what motivated me to improve performance of Elm CSS.

Because where I work, a lot of the people who write Elm code are working on their first

Elm application.

They learned Elm because they were hired at V or in some other back project.

And then we teach them Elm in a day or two and then we throw them out into the deep waters

of an Elm application.

Now figure it out.

And so there aren't many people that I work with on a day to day basis who has years of

Elm experience.

And so expecting them to not mess up code that involves HTML lazy is kind of a stretch.

So not having HTML lazy, like if we didn't need HTML lazy, it is less likely that performance

will degrade at some point.

Robin, how do you go about finding your next opportunity?

Is it like you were kind of describing with this Elm CSS case, scratching your own itch

where you're driving home from work and you're like, hmm, can we avoid doing an HTML lazy

Is that usually where you find your next opportunities for improvements?

I would love to say yes, because that's the way it should be.

But that is only something I realized once I turned 32.

Before that, I was probably where Jeroen is now.

Like he has discovered that performance work is really fun.

And so he starts looking at, well, maybe I can make this faster.

And oh, I could.

Maybe I should make this faster.

And there's nothing there's not necessarily anything wrong with that.

I don't mean to single you out.

No, I turned 32 in like three months.

Looking forward to it.

Prepare for wisdom.

But really, I did the same thing.

So the way I got into performance work was that I re implemented Elm arrays for zero

And the main reason for that was because Elm arrays were buggy.

They were written in JavaScript entirely.

And then there was like a very thin layer of Elm code to expose it to Elm.

And it had like it did have in certain cases, mutability, like if visible mutability, it

could cause runtime exceptions.

It wasn't good.

It wasn't pretty.

So the main reason was to rewrite it in as much Elm code as possible to make it safer.

But for it to be acceptable, it had to have at least the same ballpark of performance

to what was already there.

And so that's how I got into performance work.

I was trying to make an Elm array replacement, which didn't come at the cost of a huge performance

And like having a benchmark and seeing those numbers go up when you make changes became

addictive and then I just started looking around like the Elm core library seeing what

else can I make faster.

But really the biggest performance, the most important performance improvements are the

ones you notice is a problem.

Because I realized that I've spent a lot of time fixing things which aren't an issue and

which aren't necessarily likely to be an issue.

Wait, are you saying that string.pad improving the performance of that function is not big?

I'm just saying unless you have a performance problem, fixing a performance problem isn't

necessarily going to bring value to someone.

That's not to say that making something faster just for the sake of making it faster won't

be very useful somewhere down the line.

And if you enjoy optimizations, especially optimizations which doesn't make code look

worse and is harder to grasp, then there's no harm in it.

But if you want to be entirely certain that the work you do has meaning, then ideally

you should just come over something where you think this should be faster and then fix

And I might add, fix that in a scientific way.

Don't just think that, oh, if I replace this list fine with dict.get, then it will be much

And while it's probably faster now, do measurements and be certain that you are in fact making

something better.

And in an inevitable way.

So like a thousand times improvement is cool on paper, but if it in practice doesn't change

anything, then not saying that you should stop doing what you're doing, you're doing

I'm currently working on something that I think has users as well as improving performance.

So I'm very happy about that.

Okay, good, good.

But yeah, I remember that in some places I thought like list that appends is faster than

plus plus and I started using it everywhere.

And then I ran a benchmark just to just on list append versus plus plus.

Yeah, no difference.

So I did a lot of changes that were unnecessary and that didn't read much better.

So yeah, benchmark it.

And ultimately those things don't last.

You know, I mean, like, again, like somebody could refactor it because some something looks

ugly or it's a hack or if some if some code is using list dot append and it's a little

bit awkward and they're like, why doesn't this use plus plus and change it?

They're probably going to change it.

Maybe maybe it changes which one's faster than the other.

So there's always a cost to to making code uglier, right?

It's like a make it work.

But that should be the last resort if you if you need to.

And if you benchmark it and see there's a problem.

So don't do this at home, kids.

Only do it at work.

So yeah, performance work is is a hobby.

It doesn't always bear fruits sometimes to do.

And that's great.

So it's it's as long as you're not hurting anyone.

Well, we do know that we've gotten a lot of amazing performance improvements from your

So thank you for your work.

Thank you for being on to talk about this with us.

And yeah, thanks so much for coming back on.

Oh, my pleasure.

If anybody wants to to find out more, where should they where should they follow you?

Where can they go to read more?

Any any resources to leave people with?

I think perhaps the I think the best way is to follow me on Twitter.

That's at Rob Higgin.

Yeah, we'll drop a link in the show notes for people to do that.

Because sometimes I do when I do when I do stuff that's related to work or relatable

to work, then I post on the Beck blog.

And when I do stuff that's purely my own invention, I do it on my own dev2 account.

In either case, it ends up on Twitter.

So that's probably the best way to.

Thanks again, Robin.

Jeroen, until next time.

Until next time.