Elm Code Generation

We discuss different use cases for code generation in Elm applications, and our favorite code generation tips.
May 24, 2021

Watchers for rerunning codegen



Hello, Jeroen.
Hello, Dillon.
Well, what are we talking about today?
Today we're talking about code generation.
Uh huh.
Well, have you ever done code generation?
I have done my fair share.
Uh huh.
I'm guessing you too.
I've done a lot of code generation, and I really enjoy it.
It's very cool.
So like code generation, well, as is our tradition, let's lay out some definitions.
So what is code generation in the context of Elm?
Well, code generation is just writing Elm code.
So you are a code generator, Dillon.
That's true.
You are a very good code generator.
I'm a code generation machine.
And if you want to become even more productive at generating code, what if you wrote code
that could write code?
Wouldn't that be cool?
I thought you were going to say, oh, what if you want to become even better just by
my book?
Yeah, no, I mean, I mean really, uh, you know, Elm is a code generator.
A compiler is a code generator.
When we're talking about code generation right now, what we're really talking about is doing
things that you wouldn't be able to do just by writing some simple Elm code or, you know,
perhaps adding some safety mechanisms.
So like, yeah, basically generating code through automated scripts or tools and for different
purposes that we will go through.
So it's like, I mean, one of, um, you know, obviously one of, one of the examples that
is top of mind for me would be Elm GraphQL.
And I think that that's a pretty good, pretty simple example in a way, because, so like
one of the things that I think is really interesting to think about for code generation is what
is the source of truth and what's the mental model.
And so for, for Elm GraphQL, what's the purpose of the code generation?
What is it giving you that just writing code by itself couldn't give you?
Why not just use a plain old API, right?
Like if you can write an API without code generation, that's far better.
You shouldn't reach for code generation unless you really need it.
So first of all, why would you go through for code generation and not write the code
So if you take the example of Elm GraphQL, you've got this external thing, the schema
and Elm doesn't know about that at compile time.
At runtime, you can teach Elm about your schema, but you want compiler guarantees.
So that's one of the really cool things you can do with code generation is you can sort
of bring this information so that not only is Elm aware of it at runtime when your code
is running, but the compiler that you can teach the Elm compiler about external things
because the Elm compiler, you can't just tell the Elm compiler.
Now with, with certain fancy magical languages, you can do this, but Elm is not a fancy magical
It's a very simple, predictable, boring compiler, which is really one of the things we like
about it.
But we also like our compiler to know about the types in our GraphQL API.
That's very nice.
So you can try to get a field and the compiler says, uh, there's no such field, or, uh, you're
treating this as if it was an int, but it's actually a string.
We like that.
We like the compiler to know that external information.
So that's one reason that you would do code generation is to bring in information about
some external source of truth.
And well, you know, types without borders, you're erasing the borders between the type
system of the Elm compiler and your GraphQL schema.
If you were to write it yourself, then you have a potential bug because you're saying
that this field is an integer and actually it's a float or something like that.
So at runtime you would have some kind of error, an HTTP error or...
It's a good way to keep two sources of data in sync with the Elm compiler because you
can, you know, you can figure stuff out by running Elm code, but we like to do better.
So that's one thing that like some languages have macros and other metaprogramming tools.
And now, so like, let's talk about that a little bit.
So like, like what is a macro?
Which Elm doesn't have macros.
It doesn't have any macros out of the box.
And there's, I don't think there's a tool out there at the moment that has macro support
or something.
Basically your macros are a way of changing your code.
Probably at compile time or maybe at runtime in some cases, depending on, yeah, I don't
know, like configuration.
I think that's a good way of defining it.
It's basically like a way of programmatically changing code.
So you like essentially take, I mean, sometimes people talk about it as like code as data.
So like a lot of Lisp variants are known for being able to use a macro where you say this
function is going to take this code as an argument.
So whatever code you pass to it, you can sort of go in and tweak the abstract syntax tree
or inspect it, which is quite handy, but it can make it difficult to understand things
and it can make it harder to predict like buckle script slash rescript, whatever they're
calling it these days.
What was the old name?
Reason ML was the old name.
Now they call it rescript.
They've got a feature, these PPX macros.
And I mean, that's extremely powerful.
You can basically like go do whatever side effects you want to make code available.
But if there's a problem, how do you go figure out what the problem is?
Or if you're trying to understand what your code can do, where do you go to figure that
Whereas if you're just using vanilla code generation, which is what Elm gives us, then
you can go and you can look at that generated code.
It's in the folder somewhere.
Your IDE will even help you command click into the code and see exactly what the code
Most of my experience with macros are from C from a very long time ago where you basically
just replace constants with other values, like for enums, mostly.
That's what I use it for.
I'm guessing it's way more powerful than that, but not to my knowledge.
I guess the word macro is used and it can mean very different things.
So I think C macros are precompiler, like a compiler preprocessor macros, which is very
different than like a Lisp macro, which is arbitrarily allowing you to modify the code.
And the other one is Babel.
So Babel has, that's for JavaScript.
You configure Babel with a bunch of plugins and you tell it what transformations to do.
And basically it allows you to write a new JavaScript version code and transform it to
ES5 or something.
Or it can allow you to do pretty much anything.
Like if you want to add a macro that says all the strings in the source code, make them
uppercase, that could work.
And it can be very powerful, but as you said, it can be a big pain to debug when something
goes wrong.
And that's actually something that I think it's quite nice to embrace as Elm users that
this is a benefit of Elm that it doesn't have this complexity to deal with.
If something goes wrong, we don't have to dig through some cryptic messages or just
not have any way to understand what's going on.
We can see exactly the code that was generated and that's quite useful.
Because what you have in mind is generating Elm files, right?
That's all it is.
It's writing an Elm file to a disk.
So one of the things that, I mean, we're recording this episode about code generation and I want
to emphasize, I don't think that this concept is only useful to library authors.
I think that this concept is useful.
It's extremely useful for library authors and sure, code generation sounds scary, but
I would like to make it sound less scary.
And I think that code generation should feel like something, again, you don't want to,
if you can do something, if you can accomplish a goal without code generation, you by all
means should.
And it does add complexity.
It's funny what you said, because to me, code generation is mostly for applications.
I would have almost said, you know what, code generation can also be useful for libraries.
That's interesting.
Well, then I guess you're already thinking the way that I'm hoping people will.
I always get the feeling that code generation feels intimidating to people, but maybe, I
mean, I don't know, maybe that's not.
I guess it's something that you have to get used to, because people who have used ElmGraphQL
or similar projects, they've already bought in the idea of, I have my Elm project and
I have a set of build tools that generate the code that I need to run before I run Elm
make or before I need to push something to production.
I think that's most of the upfront cost that you have or most of the setup costs that you
So once you've bought into that, you're free to do a lot more code generation.
Yes, absolutely.
And it is extremely useful for applications, as you say.
And I think that code generation seems very intimidating.
One of the things that I always like to say about code generation is that code generation
makes it sound like such a difficult, complicated thing.
But you could do a code generation script that is, you know, like code generation could
be as simple as what is the build time?
Like you want a timestamp for when you build a script and you want that available as a
value in Elm.
So it's going to be like just a POSIX time value, right?
Or a string like for the build version or something.
So that's going to be some little piece of metadata that you have that, you know, that
timestamp represents your build.
So every time you do a new build, you have a new timestamp.
So how would you go about that?
It's a very simple example.
So let's go through all the steps together.
That sounds great.
So what I always like to say about code generation is that really it's just a type of templating.
And you can use different tools to template.
If you want to get really fancy, you can pull in some abstract syntax trees and output those
to strings.
But why would you do that if you're just generating a module, you know, build module build, exposing
timestamp and then timestamp colon POSIX dot time timestamp equals one, two, three, four,
five from POSIX time or something like that.
That's the code we're generating.
Why wouldn't you just do string templating?
So what I would say is just templating.
It's just templating.
Just write a little node JS script.
So just write like generate build module dot JS.
And then, you know, you're just going to you know, if you're not familiar with node, you've
got a couple of things to learn.
You can use whatever your preferred tool is.
If you prefer to use Ruby or Python, you're not going to do it in Elm because how are
you going to write to a file with vanilla Elm?
That's not really going to help you out.
Well then you use node.js.
Oh, yeah.
So at the end of the day, all you're trying to do is write out this file.
In fact, if you want, just use bash.
That that would be fine, too.
That's not what I would reach for.
But let's just say we're using node.js.
So we're just going to, you know, say just write a little string template and we're just
going to do exactly that string I mentioned, except the one, two, three, four, five part.
We're going to do, you know, date dot whatever.
Turn it into POSIX time using some fancy JavaScript thing.
And then we're going to do Fs dot write file sync, probably.
And that's it.
Now we've done some code generation and then we're going to hook that into our build.
And every time we run a build, we run that code generation thing.
And then we're probably going to want to get a little bit of extra confidence that it's
a fresh generated file.
And therefore we'll probably get ignore our build dot elm module or put it in like a gen
Maybe we'll output it to a gen folder, which is ignored.
Which is usually what I go for.
Yeah, because that's very nice, because then you know that you're not accidentally forgetting
to to generate it on on your build server and it's going to fail to compile.
If you do elm make, it's going to say, what is build dot elm?
And you say, oh, whoops, I forgot to run my build script.
That's it.
Now we've just done your first code generation.
You do need to change your project's source directories to include that new folder.
If you generated it in a gen folder or if you, you know, you could just generate it
right in the source directory and add source slash build dot elm to your to your get ignore.
But in general, it's I think it's a nice practice to have a separate source directory for generated
I don't even see what a drawback for that could be.
No, I think that's that's a great practice.
So that's it.
We just did code generation.
Yeah, I think my biggest drawback for or I think my biggest pain point for code generation
would be having to set up a new kind of dev server.
Like when you are developing with this project, you need to run the code generation before
every compile before every before you start developing whatever, whichever makes sense
for your use case.
And you probably won't use like vanilla elm make might use Webpack to trigger the code
generation or it's just like post install scripts, whatever makes more sense.
But you have to set something up so that people in your team don't clone the repo and do elm
make and then they get a compiler error.
You need to make it easy for them to to get started with the project.
So, yeah, there's sort of like the lifecycle of these generated files because so like there's
a particular like conceptually pretty much any any I'm going to go ahead and say any
generated code has a source of truth.
There's some that's sort of the point of generating code.
There's some source of truth that you're mirroring.
And depending on what that source of truth is, you that's going to influence when you
want to generate it.
So like in the example of like a sort of build timestamp, that one is the lifecycle is simpler
because you just want to generate a unique one for every build.
So, you know, maybe for your dev server, you say, OK, fine, the file needs to exist.
But the important thing is just that the file exists and that every time I ship a new thing
to production, it gives me unique build ID or timestamp.
That's that's the only thing that matters.
So for that source of truth, the for that mental model, you just need to be sure that
you're generating a fresh one for each build, which the gitignore sort of takes care of.
If the source of truth is like, let's say, for example, a GraphQL schema, then in order
to mirror that source of truth, you want to rerun the code generation.
You want to make sure that it's generated initially upon starting up a dev server or
running your production build.
But you also want to make sure in dev mode that it's being regenerated as needed ideally.
So that does get a little bit trickier in cases like that where you're mirroring a source
of truth.
You can you know, you could make a ChalkiDAR.
You know, ChalkiDAR is like a an NPM tool which gives you a file watcher.
They provide there's a ChalkiDAR CLI, which we'll link to in the show notes.
That can be a handy way to just say, hey, anytime these files change, like these files
are the source of truth.
These GraphQL schemas falls.
Maybe you've got your, you know, your server of choice set up to output a new GraphQL schema
file every time that changes.
So then that's your source of truth.
And you set up the ChalkiDAR watcher to watch that schema file.
Anytime that schema file is touched, it's going to rerun your code generation.
So that definitely is a challenge.
But what you can also do is just have either NPM scripts or make file scripts that every
time you rerun the, you start your dev server, it recompiles or regenerates the code that
is needed.
And then you just tell your teammates that every time your GraphQL schema changes, you
need to rerun the dev server.
And as long as everyone is aware of these, I guess it's a fine trade off.
And yeah, I mean, even with like a GraphQL schema, if it's not changing that frequently
or for whatever reason, it's, you know, or maybe you typically make non breaking changes
or, you know, in that, in that case, you could say, you know, we're not even going to bother
with a file watcher.
We're just going to leave it to the individual developer to manually run the generate script.
And we're going to guarantee that it's generating it from scratch for every production build.
So it's always up to date for production.
And if something were to go wrong, we would know because it wouldn't compile.
So worst case scenario, it doesn't compile, then the build fails, the developer goes and
fixes the problem.
So in some cases, you might not need to go to the trouble of, or you could start with
something simple.
Yeah, but at least consider your teammates and your own productivity.
So just like you don't want to make mistakes because your GraphQL schema is out of date
with your generated code.
So make it easy for all of you.
And if you're not going to get a compiler error, if something goes wrong, but instead
you're going to have some different values at runtime that weren't what you were testing
against when you were developing locally, you know, that's something considered too,
that that could cause cause bugs because you're looking at something different in your development.
So then what you're shipping to production.
So these are things to keep in mind.
One of the biggest challenges that I've seen with like doing code generation for LM applications,
not libraries is just like coming up with the right mental model.
So I think one of the most important things is you really want to have a very simple mental
model as much as possible.
I think it's really valuable to piggyback on other mental models.
So like, like with, with a GraphQL schema, you know, it, it, it seems, it seems obvious
if you're not like designing the source of truth, but GraphQL is a very simple mapping.
You've got this concept, there's a GraphQL schema, you can make requests to it.
And you know, the developers already have a mental model of this GraphQL schema that
they can make queries against.
And so they can take those concepts and they can map those concepts into a slightly modified
concept of, okay, there's this GraphQL schema.
I can make HTTP requests like this.
And they map that concept to, okay, there's this Elm code and it maps onto the GraphQL
schema in this clear one to one mapping like this, right?
So that is, um, they do have to sort of map their mental model there, but it's a clear
Another example would be with, um, with Elm SPA, you know, you've got like this file based
routing and there's a clear mapping between the wiring that you would do for a single
page app in Elm and, uh, the, the module, uh, module names on the file system.
Elm SPA is a very different story though, uh, because it's not a matter of a source
of truth.
It's a matter of boilerplate mostly and maybe conventional or making it easy to, to develop
it, but it's not a matter of source of truth in this case.
I mean, in a sense it is a source of truth as well because it, it enforces the mapping
where if you have, yeah, but just like any Elm code in any way, it's like normally in
a single page app, the source of truth is whatever you do in your top level main.elm,
not your like pages main.elm.
But with Elm SPA, you can, you can map what's happening in your file system to, to know,
okay, this is going to map to a page that's going to get wired in in this way with this
URL routing and you can map all of those concepts to what would actually be happening.
So normally in a single page app, the source of truth is what happens in main.elm, which
could or could not follow convention.
And you don't know until you, until you look at the code and you could make a mistake there.
But if you have a code generation tool, it helps keep you honest about that source of
So it can actually simplify the mental model because it can express a certain concept more
But to me, that is point of tools like Elm SPA is like removing potential failures, potential
bugs because you've, yeah, because you forgot to wire things, a page with the main or you
did it wrong.
And code generation is a pretty good tool for making sure that something that is very
similar for plenty of things is done in a very consistent and well done way.
Right, right, right.
So I think we're, I think we're talking about two different sources of truth here because,
you know, my assertion was that anytime you're doing code generation, there is a source of
truth and what you pick as the source of truth is going to be important for like understanding
like how people using that code generator are going to map that source of truth onto
what's actually happening.
They need to create a mental map.
And what you're talking about is what is the value where Elm GraphQL has the value of like
taking the source of truth and letting the Elm compiler know about it and Elm SPA, it's
more convenient than keeping these two things in sync.
Yeah, exactly.
Like Elm SPA, you have a source of truth, which is all the different files in the file
system, but they are not a source of truth in themselves.
You make them into a source of truth.
Whereas GraphQL schema is the source of truth.
It is something that can be made out of sync with the Elm project.
Right, right.
So there's this book, I actually have to admit, I sort of read half of it and didn't finish
I really liked the principles in this book, the design of everyday things.
It's a really cool book.
Are you familiar with it at all?
You've read 50% more than I did.
And one of the things, like I think about this sometimes when I'm using everyday things,
the author, Don Norman talks about sort of the mappings of these mental models.
So like, you know, how you can have like a door, basically, he says, if you have a door
that, like, have you ever gone into a door and you try pushing it and then you realize
you have to pull it?
Yeah, like, is it something like, if a door has no handle, then you are expected to push?
So those are like the affordances of the door.
So there are these like clear physical cues that show how to interact with it.
And like, they're not even conventions.
They're like, just here's what exists.
And so what are you going to try to do with a door with no handle?
You're going to push it.
It's like, what will your instincts tell you?
Because just looking at it, you're like, what are the possible interactions I can do with
Well, I'm not going to touch the handle because there's no handle.
Yeah, there's nothing that you can pull.
And so that's, there are these affordances that give cues to how something can be used.
If something has a handle, then naturally you want to pull it.
If something has like a little metal bar, then you're like, oh, push on the metal bar.
And I think it's important to consider like what people's intuition will be interacting
with your sort of code generation interface.
And that's the source of truth, right?
So like consider how your source of truth is giving cues that make it easy for people
to understand how to interact with something and what the implications are going to be.
Another thing he talks about is like sort of these mappings.
Like if you have like two different lights on two different sides and one switch is on
the right side and one switch is on the left side and one light is on the right side and
one light is on the left side, you could map either switch to either light.
But if there's a clear physical mapping, it's going to be easier to interact with that and
intuit what the behavior is going to be.
So you can create a clear mental model.
And I think that that's like I think about this with my with my garage door opener.
I always walk into the garage door to the garage to open the door and I want to open
the door.
I'm looking on the right and there's a button on the left and a button on the right.
And I have to always flip it in my head.
I'm like, oh, it's the opposite of the intuitive one.
I press the left button and I do and my brain always goes through that process.
So if you can avoid having, you know, the consumers of your code generation interface
go through that mental mapping by creating like my point is just like you are you are
creating a mental model that users are going to have to learn and interact with.
So be aware of that that mental model that people are going to have to interact with
and what the affordances are, what what cues you show how to use something, you know, make
impossible states impossible.
But make the source of truth have a very clear mapping so that users don't have more concepts
to learn.
And so it's predictable what it's going to do.
It's easier said than done.
But ideally, if you can try piggybacking on a mental model that people already have.
For instance, for GraphQL, we already have something like decoders.
Is that what you're thinking?
Yeah, yeah, we have we have decoders and and we have the you know, I mean, also, like,
you know, with Elm GraphQL, it's not a perfect one to one mapping of GraphQL.
You know, there are certain concepts like field aliases that don't exist in Elm GraphQL
because it abstracts those away.
But there are other concepts, you know, like objects, interfaces, unions in GraphQL that
have a predictable mapping in ways that you know, objects, the interfaces, you know, the
APIs for objects are generated in a specific way.
And so there are certain conventions that I'm sort of configuring the right switch with
the right light and the left switch with the left light where I'm trying to like find a
clear mapping to make it intuitive for these concepts to go together.
So what are other approachable usages of code generation?
So one that I can think of is like generation for images.
Like static assets sort of static assets.
So often you have like images, logos, icons that are in your file system, your assets
folder or something, and you want to reference them in your Elm code.
So what you can do is just have a have a string pointing to their location, which you can
also do in order to make it more type safe and prevent errors from typos is to create
an API for it using code generation.
So you create a script which loads the source of truth, which is listing all the icons in
your assets folder and then generating a Elm file using that.
And then one nice thing that you can do is when you say icon.add button or whatever,
you have a code completion with your editor and you have something a lot more type safe.
You can even get inline documentation in your IDE if you generate doc comments, which is
pretty cool.
I often think of like code generation is really just you're instead of like writing an API,
you're writing something that generates an API for a very specific thing.
I saw on Twitter if people have like small in house code generation tools that they use
and there were a lot of interesting use cases people mentioned like internationalization
is a cool one to and you can actually you can enforce certain things in in your code
generation as well.
So you can make impossible states impossible.
You can say, you know, if there's a certain like key for your internationalization dictionary,
you can say, well, if that key is missing in a particular translation dictionary, then
that should be an error.
For example, build error or right.
You can actually write because code generation introduces a build step.
So you can actually introduce not just compiler errors, but builders, which is so I get, you
know, the concept of making possible states impossible applies not only within the code
that you generate, but the process of, well, I'm only going to generate code with a certain
known valid structure.
And if I can't produce that, I'll give a builder.
Another common one seems to be just taking like you're describing sort of mapping assets
to a particular URL.
I think, you know, another common one is environment variables.
Mm hmm.
Like secrets that are injected as a build time in the CI.
Although you shouldn't inject secrets because it's going to be in your bundle.
That is true.
Well, not secrets, but it's production.
Everything else.
Mm hmm.
Wait, I should probably go back to work and remove a few things.
You should go remove those secrets.
So we've already covered a few projects that use code generation.
So we've covered LMSPA.
We've covered LMS tailwind modules.
I've seen a few other libraries which do code generation as you are probably familiar with
because that's your primary intents for code generation apparently.
So chat tech has an Elm vector package, which is a vector is like a list or an array of
a fixed size.
And it has a package which has vector one, vector two, vector three, up until vector
So all of those are modules with functions that go well with each of those.
Sorry, until 60.
And they're using the same code for every one of those.
And it would be tedious and error prone to write them by hand.
So what Chad did there is write a script that generates those modules.
And then he stopped at arbitrarily 60 modules.
So Rupert Smith created a few packages for working with AWS.
So he uses Elm, sorry, he uses Elm AWS code gen, which in turn uses one of his packages,
which is called Salix, which I don't know much about it, but he seems pretty excited
about it, which takes the, not the documentation, but the API definitions from AWS services,
and then creates packages to work with that.
So he has one to work with S, no, not with S3, with authentication, with the Cognito
service and Elastin containers.
So you can, he did a good job apparently from, for generating APIs that work well with those
I don't know if it's well made in the sense like it has the Elm philosophy of preventing
impossible states or if it's just a type safe way to create AWS service APIs.
But I think at least as a base tool, which allows you to do anything, it's very interesting.
Yeah, I haven't used that one.
So another code generation tool, I've done so much code generation, I'm like losing track
of all the code generation I've done.
But Elm TS Interop is a code generation tool.
It happens to generate, rather than generating Elm code, it's actually generating TypeScript
declaration files.
But it's that same sort of principle of having a common source of truth and keeping those
in sync.
It's just that it works the other way around.
Instead of taking some external schema and making the Elm compiler aware of it through
generating Elm code, it's taking your Elm decoders and encoders and it's making TypeScript
aware of those so you can use those ports.
It actually, the pro version uses some code generation to make it a little bit more convenient
so you can have a module where you just declare these top level values in your sort of port
definitions file and then it takes those and for every top level exposed encoder, it generates
a port from Elm to JavaScript and for every top level decoder, it generates from JavaScript
to Elm ports unless its name is Flags, in which case it generates your Flags decoder.
So now there's another type of code generation which we haven't gotten into yet, which is
for scaffolding.
I was expecting that one.
So these are like, there's like, what's the mental model?
And then there's like, what is the purpose of the code generation?
So we've talked about keeping your sources of truth in sync, but scaffolding is another
really interesting area.
So you've got some examples written down of scaffolding here.
So Elm SPA is one of those.
It creates a new project with a bunch of predefined pages and files and Elm JSON.
So a lot of things already.
Elm review also has a new rule and a new package command or commands.
So a new package creates a new project entirely.
It's not just a git clone.
It's really generation because you are giving an information and then it generates a project
based on that.
And same thing for the new rule, you give it a rule name and generates some scaffold,
some base files with the information that you gave it and changes the rest of the files
that are expected to be there.
So yeah, scaffolding in this case is more about getting started rather than having a
source of truth, I think.
Yes, right.
It gives you a convenient starting point, but it's for the scaffolding approach, you're
generally using it, like you said, to get started, but you're not going to rerun it
every time to keep it in sync.
So the source of truth, like, so like html to, which we talked about in our Elm
Tailwind Modules episode, it's this tool that I built that generates Elm code from HTML
and it also can parse out your Tailwind Modules to generate code for that.
So it's a code generator for a code gen tool.
Because why not, turtles all the way down.
But yeah, that's, you know, the source of truth for that, you know, the mental model
is here's this HTML, it's got a clear mapping to Elm code and my Elm review rule as well,
it's sort of, you know, able to hook in even more to the source of truth of your, like
your imports in a particular module and take that context, which is cool.
But it's not something that you're generally going to keep around that starting HTML and
then edit the HTML anytime you want to change your templates.
It's like, hey, let me help you get started with this view.
But then you own it from there, rather than, hey, let me keep around this HTML file.
I'm going to run the code generator on that every time to get this Elm code.
And I actually never touch that Elm code.
If I want to change the view, I go change the HTML file, which is the source of truth.
That's something different.
That's not the scaffolding approach.
So when would you go for the scaffolding and when would you go for code generation?
That's a good question.
I think it's really like, if you can accomplish a task without human intervention, then you
may as well just code generate it.
But like with a view, you know, if you were to just like, you couldn't just say, here's
some HTML, please map this into Elm code, it would defeat the purpose.
Because the point is like, you want Elm code.
So you can say over these things to create a list item for each of these things
from the model or whatever.
And well, so what are you going to do?
Create like a templating language within your HTML code?
I mean, you could if you wanted to.
But in that case, most of the time, you're just going to want something to help you out
and say, oh, here's some HTML that I'm copying from this handy online template.
It's like
You've got a nice template I want to use, I want to copy paste it as my starting point,
and then take ownership of the code.
So at Humia, we also used code generation, kind of like Elm SPA to generate all the boilerplate
for linking the main file with the individual pages.
And one reason for that was sure to remove any errors that could be made because you
forgot to wire something.
But mostly because it was much more performant in a sense.
When you add a file, you had to add a new constructor to a custom type, which was like
type page equal homepage, etc.
And that was actually pattern match in 2030.
I don't know how many pattern matches.
So you would have to add a constructor and then spend 10, 20 minutes just fixing compiler
errors because, hey, you forgot to handle this constructor here and there and there.
For code that was very similar to what was next to it.
So you would pretty much copy paste some code, go to another place, copy paste some code
and adapt it to use the constructor.
And that 20 or 30 times.
If you were well versed with the code base, that would take you 10 minutes.
If you weren't, then it would take longer.
And that was just a lot of time not well spent.
So by refactoring and making it work with code generation, now adding new pages just
takes a few seconds.
We even have like a script that creates the file and now we can focus on more important
When you do code generation, you can abstract certain details.
So again, it's an important distinction if you have code that a human is then modifying
versus the human never touches this code.
It's generated and machine maintained.
I think it's very important to have a clear distinction between those.
So for example, the code that Elm GraphQL generates for your GraphQL schema, I even
put like auto generated code, don't modify.
And in fact, I think I have an error there.
If you somehow went in and tweaked it, it would say, looks like a human changed this.
So delete the file and then I'll go regenerate it.
But I'm not going to touch this file because it looks like a human modified it.
Wait, how do you do that?
Like when you regenerate, you say, oh, this file has been changed?
I can't remember exactly what I did.
I mean, obviously it's sort of like a heuristic, but I don't know.
If the file doesn't have that comment that I generate on top of every file or something
like that, then I say, it looks like this isn't the file I created.
And I don't want to touch, like I just am going to blow away anything.
So make sure that this folder is a clean working space for me before I do my job, because I'm
just going to blow everything away with disregard for everything that a human touched.
But I think that's very important that there should be a clear separation.
Now, obviously humans use that code, but they shouldn't touch the generated code.
You should either have code generated that's entirely maintained by the code generator,
or you should have code that's scaffolded and then you take ownership over it.
Or parts of it.
Like you can generate part of it, scaffold another, and then use those together.
What would be an example?
I don't know.
I haven't encountered that.
I mean, like if there's something, if there are parts of it that you, that could make
use of code generation and parts of it that need to be changed by someone, then you generate
the one.
I see.
Right, right, right.
You generate the former and scaffold the latter.
Yes, you can certainly mix them within a project.
But it's good to be clear about like a file should either be generated or not.
Like it should either be maintained by a code generator or not.
And it's good to not mix those up.
Now it is, and another interesting thing that can happen is you can use actual Elm code,
like parsing the AST as the source of truth.
But that can definitely get complicated, but that can be a good way to keep the, that can
be a good way to piggyback on top of a certain mental model.
Cause you say, okay, well I know when I, when I write a decoder that does this, I expect
it to work this way.
And then you write some code generation that generates something to in support of that.
But that can definitely get messy because now it's starting to feel more like metaprogramming
where you change some Elm code and you're like, Oh, I can rename this.
I can change this and things start to break and unexpected ways.
So it can remove some of the guarantees in a certain way, which is why the mental model
is really important because you, you, if you can, if you can make it so you can map very
directly onto an existing concept, then somebody could say, well, if I make this change, I
expect it to not break.
If I make this change, I expect it to break.
And you could actually retain that property, that guarantee because you've mapped onto
that mental model.
So what are your tips and tricks that you've learned with code generation?
One of my favorite tips, which I've probably talked about it some, actually maybe I haven't
talked about it on Elm radio yet.
My biggest tip is end to end testing your generated code.
It's awesome.
Um, I've definitely, you know, given, given you my rant on this before, but, um, I love
Uh, it's so easy to get confidence about your generated code.
Basically the workflow is, you know, write an end to end test, which is some sort of
snapshot test.
So somehow you're saying, um, like with Elm GraphQL, for example, you've got this, this
is the code it generates given this schema.
And so in your, you know, in your build, you, you have like a test schema.
You say, you know, I mean, with Elm GraphQL, I started with like the, the GitHub schema,
the simple Star Wars example that they use for all the GraphQL stuff.
I had a few schemas and I said, okay, I'm going to actually run on my build server.
Um, I'm going to generate these schemas and I'm gonna, I'm gonna compare the code that
was generated before with the code that's generated for this build.
So every time it changes, I need to check in the changes.
That's kind of the snapshot testing approach.
And you're also running actual tests with the generated code, I'm guessing.
Sometimes, uh, yeah, sometimes I do that as well.
And do you generate the code and do you generate the test code?
Well, I do that sometimes.
With the Elm GraphQL generated code, I essentially just, uh, what I do is I, when, when I was
initially building it, I started by first, I would manually generate the code, quote
unquote, generate, uh, that I wanted it to look like.
So that was sort of the code gen target and then use that.
And then I, what I did is I wrote examples in the examples folder and those were sort
of manual tests.
So you know, every time I'm like adding a new feature or changing something, I go through
and make sure all the examples are working as expected.
And not only does, am I testing the behavior, but I'm testing the, I'm testing the user
So I'm making sure that interacting with it feels nice.
The types look good.
Could I get it any simpler?
Is it intuitive?
And then once I've got that working with the hand generated code, I check that in as my
snapshot and then I iterate on that failing test until I get, I get it green with the
actual code generation target being what I'm generating.
So approval testing or snapshot testing is awesome for, for code generation.
I also really like, so this is almost like a whole separate type of code generation,
but as you were hinting at testing generated code is really nice.
Like for, for LMTS Interop, I've got, I've got this tool for the pro version, which is,
you know, you, you write some TypeScript type definitions in a little sort of VS code TypeScript
embedded editor in the browser.
And it generates LMTS Interop, you know, encoders and decoders based on that, that are going
to yield the same TypeScript types.
And you know, there's a lot going on there.
It's non trivial, like that project is non trivial and it would be really easy to mess
something up.
So I, I generate a test suite.
So I for, for, I have some like sample input, which is like TypeScript files and I generate
a test.
So I actually generate like an Elm test test suite for each of those examples.
So I just have an examples folder.
I write it, you know, I write a dot TS file and it actually takes that code.
It feeds it into this code, which is, you know, Elm code to generate this stuff.
Make sure that it, make sure that it runs, it does a reversible test and feeds input
in and it says that if you run the encoder and the decoder, you get the same value.
So it's generating an Elm test test suite for each of those.
So, so generating Elm test test suites is awesome.
That's that's a technique I like to use.
I do that for HTML to as well.
I'll put a link to the source code for that example.
So I what I do is I, I generate for a bunch of different HTML inputs.
I generate a test suite and I, and I make sure that it's compiling.
So the generated code should be compiling with the Elm tailwind modules default published
So that's pretty handy.
And I mean, we should point out like Elm test itself is just a code gen tool actually.
People don't, people don't think about it, but.
Yeah, it generates a an Elm file and then compiles it.
They're like Elm explorations tests does have some kernel code, I think, but a lot of it
is just using code generation.
Which is the same for like for Elm verify examples, which I think we've talked about.
It's a really handy package that lets you write your, your examples in your Elm doc
comments that it will actually execute and compare your expected output that you write
with a little comment notation in the docs.
It creates a test for each of those, right?
Yeah, exactly.
That's all it does.
You can go in and look at it and then it just runs it with, with Elm tests.
So you know, that's, I would definitely recommend considering, considering that technique if,
if an opportunity presents itself, like I've, I've used that both in libraries I maintain
and applications.
It's a really handy technique sometimes.
I've wondered about doing something like very similar to that for Elm review, like give
it, give us a set of examples and then compare the error that you get with what you would
really get.
But it's a bit more tricky than I think like, do you really want to compare the errors?
Code generation can be really nice for end to end testing things.
So for for Elm pages 2.0, I've been considering taking the, the routes.
So I'm trying to, I've gotten rid of routes in this generated code.
I used to have generated code that had all of the routes, but then if you have a thousand
routes, then you've got something in your generated code that would you, if you ever
refer to it anywhere, now you've pulled all that into your bundle.
So it's quite nice if you can avoid pulling that into your bundle and instead avoid incurring
that cost at runtime and in your bundle size and incur that cost at build time.
Well, it's a minimal cost to having a thought, you know, reference to a thousand files by
using Elm review.
So I've been thinking about adding some sort of hook that basically gives here's what I've
been thinking about.
Actually, tell me what you think of this.
What I've been considering is, so there's like, there's a route type that's generated.
So Elm pages 2.0 similar to Elm SPA has this file based routing.
It generates a route dot Elm module, which enumerates, you know, you've got your blog
route, you've got your index route.
You know, some of the routes have route parameters.
Some of them don't have route parameters.
If it's a single page for the ones that have route parameters like blog, well, if you've
got blog, slug, post one, slug, post two, that are your valid static routes.
And you want to say, well, I don't want to link to post negative one because that doesn't
exist or post zero that doesn't exist.
So I want to use an Elm review rule to check all of the static routes.
If they're static pages, then I want to check if, if it exists.
So, well, I, before with Elm pages 1.0, I was generating, you know, I was generating
a record that referenced all of those URLs and you could use that to in a type safe way
refer to static routes.
Now I just have these route variants.
So the blog route is going to be a single route and it's not going to, if you have a
hundred blog posts, it's not going to generate something for each of those.
So what I'm thinking is generate, uh, like have some sort of, um, CLI command for Elm
pages where I can generate a module that, that you can use in an Elm review rule to
verify that the routes are static.
So it would be, so basically before running Elm review, you would first have to run this
generated code because the Elm review rule would depend on that generated code.
I think that's a, that's a good idea.
I actually did something very similar recently to, for, for work where I basically detect
unused CSS project.
So we have a bunch of CSS files and we have a lot of dead code in there because it's CSS.
So what I did is I go through all of those, find all the classes and then generate an
Elm file.
And then I use that as a source of truth for a rule that I called no unused CSS classes.
And then wherever it would find a class, it was marked as used and the ones that it couldn't
find marked as unused.
So using that and another rule where I disallow dynamic classes like homepage dash dash plus
plus modifier, like homepage constructing CSS classes, I disallow.
So with those two, two rules in place, I was able to remove all the unused CSS classes
from the project, which was about 4,000 lines of code.
So that was nice.
Oh, that's fascinating.
Yeah, that's cool.
That's a, that's a good use case for, for code generation.
So yeah, I mean, basically like Elm review could get arbitrary access to files and look
at those and parse the, but I mean, honestly, then you'd have to write like a CSS parser
in Elm and et cetera, et cetera, which you could probably easily find an NPM package
that parses CSS and does all those things.
So yeah.
And then the watch mode can be a bit different, like, Oh, when this, which files should I
But th th those are questions that I have, uh, at the top of my mind, but yeah.
I'll also, I also want to like generate files using Elm review so that you can like, yeah.
Like you say, this is a safe, unsafe value type of thing.
For instance, or I can auto fix the CSS files, like give me the CSS files.
I will parse them and look at the ones that the classes that have been, that are not used
and I will remove them for you.
And I just write those back to a file to the original file.
But that has a lot of challenges, but, um, it's something that I am interested in exploring.
So a few tips that I have for when you do code generation is one, I would move all the
generated files to a generated folder, like source gen or generated slash, whatever with
Elm review.
Uh, since we're on the topic, a lot of people don't include the generated codes in the files
that Elm review should look at.
So they do Elm review source slash and test slash through that.
It only looks at those files because they don't want errors for the, for the generated
And that makes sense.
But the problem is that you, you're then limiting the amount of information that Elm review
So it will not know the contents of some generated files, which it will need for some rules.
So what I recommend is never calling Elm review with arguments like source or tests, but instead
to use ignore errors for directories in your review config and ignoring the generated code,
same for vendor code.
Like you don't want to report errors for those.
And this is like almost half of the bug reports that I get are false positives because they
didn't, they limited the amount of information that the rules could have access to.
Another thing that I would really recommend is like at the top of every file that you
generate, add a comment saying, this is how I generated the file.
Like to generate this file, I used this script or this file.
That way people who will look at the generated file will know, Oh, if I need to change something,
I need to go there because sometimes like it is not obvious where it comes from.
If you can indicate the source of truth as even better.
So if you are generating like icons for your, that are in your assets, you can add a comments
or documentation for each of the functions that you generate that says, Hey, this represents
icon, blah, blah, blah, which is at assets slash whatever.
So I think that's always pretty nice to have and it really doesn't cut costs a lot.
It's just, it's just one of those things that's sort of a habit because while, while you're
in there, you've got the context, you've got the data you need, just output it in the generated
code in a comment.
Also like do you format your generated code?
What I try to do usually is to make the generator code look kind of like Elm format and that
is usually enough.
I find.
That's usually the first thing I reach for, but it, it, it, it depends on the, on the
use case.
Sometimes I, I format it.
I mean, you can get pretty close to Elm format.
Well, depending on how simple it is, if it's like our build timestamp, then it would be
overkill to run Elm format because you just, you know, you're just generating a simple
hard coded thing.
Just use your Elm format target.
It is fine to run Elm format, but it's, it is not as fast as just not running it.
And it's another moving part.
And yeah.
What do you think about things like, you know, if you wanted to generate an enum, so you've
got like an Elm custom type and you want to have access to, you know, let's say like colors
and you want to have all colors is a list of that enum, which is like the colors in
your app.
And then you want to have from string to string.
What do you think about using that for, for code generation?
You mean generating that, that file or?
Generating that file.
And what would you use for the source of truth for that?
Well, it really depends on what your source of truth is.
Like if your source of truth is what your designer gives you, like a Figma file or,
or similar, then if you can get the source of truth as a file or as an API call, I guess,
then sure.
That sounds like a good idea.
When you have something like all colors, like a list of all colors, I would worry about
the order.
Like it, does the order matter?
If so, in which order should you put it?
And yeah.
That's a good reminder of just like, it really does depend on the use case.
And, and that, I think that's all the more reason to, you know, take advantage of this
for your own use cases to do code generation, because if you're just relying on community
code generation tools only, they don't know about your data source and your specific domain
and you do, and you can build in those, that understanding of those constraints.
So take advantage of that.
And you know, I mean, build simple abstractions reach for non code generated solutions first,
but if there's some source of truth, you can keep in sync with, if there's some helpful
scaffolding tool you can build for yourself, go build it and, you know, take advantage.
So there's one more thing we didn't touch on.
We sort of hinted at, you know, doing simple string templating to generate, you know, our,
our build timestamp.
So you know, you can do that with a simple node script.
Like if I write it in, in Elm, I even just go for string concatenation.
Like I don't even go, I don't even go for like a string where I replace things.
I just go for string concatenation.
It's just very easy because Elm is a simple language, so it's easy to generate also.
Well, yeah.
And that, you know what I find extremely helpful.
This is going to shock you, Jeroen.
But what I find very helpful for keeping myself honest about using the simplest thing that
could work is starting with hard coding.
Oh, I'm not surprised.
I was teasing.
I'm sure, I'm sure our listeners are bored of me saying this at this point, but I find
hard coding extremely useful for code generation as well.
Just because validate that the thing you're generating, I mean, first of all, don't even
generate something, just write a module.
Does it work the way you want?
Oh, nice.
It would be great if I had this module generated.
Okay, well, write a JS file, generate it.
Or if you're going into Elm to do it, set up something that runs some Elm code, generate
it from Elm just as a single hard coded string, and then pull out the pieces that you need
to remove the hard coding from.
And well, do that the simplest way you can.
If that's with string concatenation, great.
If you want to use string interpolation with a library for that, like Luke's string interpolation
library, great.
If you want to get really fancy, you can use Rupert's Elm syntax DSL to generate code,
but it really shouldn't be the first thing you reach for.
At least that's not how I like to do it.
I like to do the simplest thing that could possibly work.
If the particular problem you're solving, it would be helpful to generate it through
an AST DSL, then pull that abstraction in when you need it.
And if you've got approval tests helping you, you can tweak the way you're generating your
code and if you haven't changed your generation output, it doesn't matter how you arrived
at it.
You just rerun your approval, your snapshot test, and it tells you nothing's changed and
you're good to go.
I'd like to touch on one last thing.
So a lot of people want to use Elm review to forbid writing code one way.
For instance, the page boilerplate that you have in your main code.
Sometimes I hear people saying, oh, it would be useful to have an Elm review rule that
checks whether the boilerplate is well written and does not have an error that you didn't
forget to do this and that you did this exactly that way.
And this is one instance where code generation can be useful, as I explained before, because
like Elm review rules are very good at finding things that are wrong, but you need to specify
which ones you're expecting.
Like it needs to look for one specific thing, another specific thing, but it cannot tell
you whether things are okay as a general thing.
Maybe if you multiply the number of rules or the number of checks, but if you want your
code to be written in one very specific way or maybe a few, but a limited number of ways,
then code generation is probably a better bet.
Right, because it can enforce consistency in a different way, in a simpler way.
And also it will be less painful to write.
It's easier to just run a script that generates a code rather than writing the code yourself,
which takes time, as I mentioned before, and then have Elm review tell you, oh, you did
this wrong.
Oh, you did that wrong.
Oh, and then you got a compiler errors.
Oh, well, yeah, just run the script and you're done.
If you can do that, I think that's a good idea.
Right, right.
And if you have to remember to do one thing, one place, another thing, another place, another
thing, another place, that might be a cue to consider code generation.
Although there's certainly a cost and we should reiterate, code generation does add complexity.
It does add moving parts.
It does make it harder to trace why something's behaving a certain way.
If something goes wrong, do we remember to generate this code?
Is the source of truth being pulled in correctly?
Or just more things, more moving parts?
Also, it works really well for things that are very similar.
When I did the code generation for the pages, one problem is that a lot of pages were working
slightly differently.
Their update function took another argument.
Their view function didn't take that argument.
And I had to do a lot of refactoring work to make them all consistent, to have them
all use the same abstraction.
But once I got that, generating the code was very easy.
But it didn't work.
I couldn't have made that code generation work until I got to a place where all the
pages were using the same abstraction, where they were very similar to each other.
And I think it takes experience to be able to see, number one, where you could make those
things become similar first.
And number two, when they won't then later start diverging and needing to be different
Because if you start doing code generation and now you're like, oh, well, yeah, the code
generation worked really well for this simple case.
But for this more nuanced case, there's not a clear source of truth.
So I can no longer generate this thing.
You can run into problems there.
But the thing is, if you want to opt out of code generating, well, you have Elm code.
So you just move it and you check it in your Git repo.
And probably the biggest thing to watch out for is if your code generation is creating
a whole set of abstractions that people have to learn, that's a red flag.
Because now you're getting bound to this other abstraction.
Unless it's a better abstraction.
If it's a better abstraction.
But I've seen code where there's code generation happening from something where you have to
piece things together just right.
And if they don't get pieced together just right, the generated code is not going to be happy.
Oh, yeah.
And that's a bad time.
And now, well, how do you back out of that and eject that?
Well, you're depending on this extremely complex generated code that you're not going to go
maintain by hand.
And actually, that's another tip too, is try to keep the generated code as simple as possible,
ideally, if you can.
You don't want to have to go and debug generated code.
Because that would mean that you have to debug the code generation scripts, which is not
as easy as debugging Elm code.
And oftentimes, if you can, generate intermediary little helper functions or APIs for the generated
code to use.
So that doesn't vary.
So you can just say, here's some code that's only used by the generated code, but it reduces
the actual code that we're generating, because it can leverage those APIs.
All right.
So you covered everything about code generation?
Well, we've certainly covered a lot.
And yeah, hopefully people will give it a try and let us know how it goes.