Codecs do two-way Encode/Decode. We talk about this pattern in Elm, and some codec tools for JSON and Bytes.
November 16, 2020

Backwards compatibility

Keeping data in sync

Elm codec API - string, Bool, object


Hello, Jeroen.
Hello, Don.
How are you doing today?
I'm doing pretty well.
How about you?
I'm good.
I'm good.
I am excited to dive into the topic today.
Today we're talking about codecs, which is something we don't talk about quite often,
I think.
Yeah, it seems to be an emerging pattern that you see popping up more and more.
I think it's one of those little community patterns that evolves over time and there's
this sort of seed of an idea that sprouts all over in the community.
It seems to be one of these things.
So, maybe let's describe what a codec is.
So, a codec is a short word for codec...
Man, I don't know, actually.
Let's look it up.
I think it's for encoder and decoder, or decoder and encoder.
I don't know which way it is, but it's a contraction of both.
Code decode.
So, it's a utility or a tool or a library that allows you to encode something from A
to B and decode it from B to A.
Oh, I've got a trivia fact for you, Jeroen.
So, a codec...
Is a portmanteau?
A portmanteau of coder, decoder.
You also looked at the Wikipedia page.
I'm looking at the Wikipedia page as well.
So, it's kind of cool because that is the definition that it is something that decodes
and encodes.
So, that's a very good choice of word, which I think maybe Leonardo, Mini Bill, was the
first one to come up with that term, I assume.
He created the Elm codec library for JSON encoding and decoding simultaneously.
So that's sort of the origin, I think, of that term in the Elm community.
In the Elm community, probably.
Maybe someone did something for older versions of Elm.
As far as Elm 0.19, I think that's the earliest one I know, at least.
So, that's essentially the...
That's the concept, is something that both encodes and decodes.
And so, that is a...
There is an Elm codec library, but there are actually now multiple libraries emerging for
things like dealing with bytes, for example.
And I think it's worth noting that this is both a pattern and a set of...
A library and a set of libraries, but it's also a pattern.
It's more of a pattern.
It's more a pattern than anything.
And I think that...
One of the things that I think is really interesting is...
So there's this sort of reversible nature, right?
When you're...
This idea that you're building up an encoder and a decoder at the same time and that it's
And that I think would be sort of the textbook definition of this concept of a codec.
But then there's another concept here, which I think is really interesting and valuable
in the Elm community, which is the idea that you build up as much information as you can
at once because you sort of...
You can lose that information otherwise.
So if you build up a decoder and an encoder, right, then they're two separate things.
And you have to... maybe you have some fuzz tests to make sure that they're reversible
or you have to do a little extra work to keep them in sync and make sure they are reversible.
Making sure that when you encode something and then decode it, you always get the same
thing back and the same thing for decoding than encoding.
And so that's like...
We can do better than just trying hard to do that and then making sure we did a good
job after the fact through...
If we automate it with some tests and stuff, that's great.
But what if we could build it up in such a way where we keep things in lockstep through
the API we use to build it up?
So I don't know if that specific concept is...
Maybe it's part of a codec, but also a pattern that comes up outside of this idea of codecs.
Like for example, in Elm GraphQL, that's an essential concept as well, that you are building
up the information you need to perform a GraphQL query as well as the information you need
to decode that.
And in Elm, these pipelines that you use to build up, whether it's just a vanilla decoder
and Elm GraphQL selection set, there are all these different data types, like a validation
sort of API for validating form input or whatever.
You're building up data in a way where the Elm type system keeps track of the data as
you build it up, but you're not only building up that data, you can transform that data
along the way, and you're tracking this underlying information about that data.
And so there's something there that maybe that's a separable principle that you can
use outside of this notion of a codec, which is like a reversible encoder, decoder.
So where would you use this as a very simple instance?
Would you use this, for instance, when you deal with HTTP requests?
What is the smallest thing where you would apply codecs?
It's a good question.
I think that probably one of the most common use cases for a codec is going to be data
that you control, because if it's data that you don't control, it's coming back from a
server that you don't control, then there might be a disparity between the data you're
decoding and the data you're encoding.
But if you control the data, for example, if it's user settings and you want to be able
to serialize it and deserialize it, if you want to update the user settings, which you
have in some Elm data type, and then you want to serialize that, put it into local storage,
and then read it back out, that's a perfect example that you just want this reversible
You control the serialization and deserialization, and you control both of those things in your
Elm code.
Yeah, I imagine that you would have multiple decoders built in.
For instance, you want to save your user settings.
Those might change over time.
At first, you only save A, and then later on, you add B. So your first codec would only
encode A and decode A, but your second decoder later on would also encode B and decode B,
but it would still need to work in the version where B does not exist.
Yeah, making it backwards compatible and nonbreaking.
That's a shorter way of saying it.
Yeah, that's definitely a whole topic on its own.
And it is interesting how codecs fit into that picture because they are compatible with
it, but they don't in themselves enforce that concept, but they can help with that.
That's a good point.
I don't think I've seen one implementation of that where a backwards compatibility would
necessarily work.
I don't know.
They do say things about that in their frequently asked questions.
And Mario Rogic has this idea he calls Evergreen that he gave a really great Elm Europe talk
about that one year.
And he has that actually working in practice in Lambda.
So this idea of managing breaking changes is something that exists out in the wild.
And you could certainly apply that concept to codecs.
And I think under the hood, I believe Lambda may even be using one of these codec libraries
under the hood.
I know it's serializing and deserializing these things as bytes.
I think I've seen that there are limitations in what you can do with your migrations in
Lambda because of that.
Because it's trying to stay backwards compatible.
When you succeed, then that's just an amazing experience for your user.
So yeah, I agree with the way you can control the data.
As we said, when you don't control the data as much as you want only in Elm, then you're
dependent on what it looks like from the server or from some public API.
And then it will probably look like something that is a copy of the API response.
What I mean is you don't want to parse the API endpoint and put it into a model just
like you want to model it so that you don't have impossible states.
Although you can capture that in a codec where you can build up custom types.
And then of course, it's reversible so you build up a custom, you can encode a custom
type and decode it as well.
And if you have something like, if you use some of these serverless sort of Elm, things
that run like Elm code for serverless functions and return JSON data and stuff, right?
If you do code sharing, then you can use a tool like that to ensure that these two data
types are in sync.
Which, of course, you do run into these tricky questions about backwards compatibility there
as well because...
Yeah, and synchronization.
Yeah, just like we talked about in the GraphQL episode last time.
Yeah, those same sort of concepts.
What's that called, the Ottoman general problem or I don't know, some sort of general problem,
which is this sort of conceptual problem in computer science that if you have a general
who needs to send a message to the troops that they're going to attack, but they will
only attack if he needs to send the message to the troops.
Then the troops need to send a confirmation that they received the message to attack because
both sides of this hill need to attack simultaneously.
And if they don't both have a synchronized plan to attack at the same time, then they
need to call off the attack.
So they can only do it if they get confirmation.
So the one side, side A needs confirmation that side B, but then side B needs confirmation
that side A received their confirmation.
There's no resolution.
It's a mess.
There's no way to solve that problem.
So there are certain conceptual limits to these things, but we kind of discussed that
in the Elm GraphQL episode that there are ways to get around that.
But yeah, those are some tricky problems.
But anywhere that you're able to sort of control the serialization format in your Elm code,
whether that's Elm running on a serverless function and then being serialized in a serverless
function run by Elm, and then a shared code base that deserializes it on the client side
in a browser in Elm, that same concept applies that you control the format in a shared Elm
code base.
So should we get into the more sort of detailed API of what you can do with Minibil Elm codec?
So like, I mean, first of all, the simplest thing you could imagine doing is just doing
a codec for a string, right?
So if you just do codec.string, now you have something that knows I can get a decoder for
this, which is json.decode.string under the hood, and I can get an encoder from this,
which is encode.string.
So it keeps track of that information.
And you got one for every basic type.
You got audience, you got floats, ints, characters.
So if it was as simple as, you know, if you were only dealing with built in Elm types,
you know, string, bool, int, float, and then maybes and lists of those, or even dicts and
sets and tuples, if that was all you were doing, then you wouldn't have to do very much
You pretty much get all of that for free.
You just say this is a list of strings.
This is a list of ints or whatever, and it's just going to work.
There's no extra effort for you to describe how to serialize and deserialize because it
knows how to encode it into a list of strings, and it knows how to decode that into a list
of strings.
But once you get into things like objects, now that becomes a little more intricate because
now if you're building an object, if you're turning that object into a record on the decode
side, on the Elm side, then you need to then tell it how to pick apart that field from
a record, or it may not be a record.
It may be some other data type, but you need to tell it how to grab that field so it can
encode it when it's encoding that value.
Yeah, and it doesn't really have to look just the same on Elm and JavaScript.
So you can have a record on the Elm side and have an array on the JavaScript side if you
try to optimize it one way or another.
You could do it like that if you wanted to.
Can you do that with the Elm codec?
You mean using codec.object?
Yeah, because your codec.object takes a constructor just like decode.succeed.
Which would typically be a record constructor that people would pass in there, but it can
be any function that takes those data types.
And then you use the builder pattern where you say codec.field and then the name of the
field in JavaScript.
So in this case, you're saying, I tried to get this field with this name.
So you're saying you could turn it into a list on the Elm side, but then when it turns
it into a JavaScript object, codec.field is always going to turn it into a JavaScript
object on the encode side.
Yeah, it does look like this API does not allow you to do this, what I said.
You could imagine that you encode it differently in JavaScript.
That's right.
Have you used Elm serialize?
Did you try using that?
Okay, so you want to give a brief intro to that library?
Yeah, so I used Martin Stewart's Elm serialize, which was previously called Elm codec bytes,
if I remember correctly.
Now it's called Elm serialize.
So it's basically the same thing as Elm codec, but it doesn't work with JSON.
It works with bytes or strings because you can convert it to the ideas that you generate
more performance, JSON representation or bytes.
Right, because you don't care about the underlying type that's used.
You just care that you can take a data type in Elm, deserialize it to some format, which
you don't care what that format is, as long as you can also get it back using that codec
that you build up.
So the use case is like you only want yourself or this code to be able to read it.
You don't care about the rest.
And when you're in that use case, you can optimize a lot of it.
So in this case, you do turn things into like an array with zeros instead of custom type
And it's a very, very compact JSON object that you get or bytes.
So I use it in Elm review to cache the internal ESCs for files.
That saves a lot of disk space.
It takes like, I think it saves like 60% of disk space compared to the Elm syntax, original
decoder and encoder.
Which is like JSON.
Which is JSON and with long fields.
So that one is meant to be human readable.
What I use isn't.
It's only meant for my internal cache.
So yeah, you can use codecs for caching, which has been my only use case for now.
For Elm codec, you could imagine that it's not codec.field, but you could have codec
index at and do something else where you would have decoder that decodes one way and encoder
that encodes one way.
And so that's nice to be able to build up an object.
And it's this underlying principle again, that at the point that you have that information,
you're basically giving it all the information it needs for one step.
So you could build up a decoder and then you could build up an encoder.
But to do that, you now have to duplicate all this information about it.
When you say encode.string one place, and then you have to say decode.string in the
other place.
Whereas when you build up a codec, you don't need to do that because you have that information
in a codec.
And we've talked about making possible states impossible in terms of data modeling.
We've also talked about the role of API design in making impossible states impossible.
And I think this is one of those instances where maybe this is like another special case
of that where through the API design, you capture all the information when you have
it so you don't have duplicate sources of truth.
You have one source of truth and you take all that information.
So that's what a codec is doing.
Instead of building up the encoder and the decoder separately, we have to duplicate all
this information.
You build them up together.
And there are certain things that you cannot make mistakes on.
And then there are places where you can make a mistake.
So you build up a codec.
If you build up a codec for an object, you do codec.object and then you give it a record
So the example in the docs is a point with a record that's got an x and y float.
So then you do codec.object, you give it the point record constructor, and then you pipe
that into codec.field with the string x, which is the field name in JavaScript.
Now if you gave that something like empty string, you could mess that up, but you're
going to have to do something like that.
So that's not anything new that you could mess up compared to.
Jason decoding or Jason encoding.
And then you need to give it the second argument you give to codec.field.
In the example he gives.x, which would be a function that takes that record type and
pulls off the value for the encoder.
From the Elm code.
Yeah, from the Elm record.
Yeah, right.
It takes the value from the Elm record.
So it could be any function which takes whatever value you're dealing with in this point, it's
a point record, and then gets some sort of float data type back.
And of course you could put.y where you meant.x and you can make mistakes there.
And then you say codec.float as the final argument that says this is how you serialize
and deserialize it.
That part is guaranteed to be in sync.
You can't mess that part up.
So there are just more guardrails to things being in sync, and there is some manual work,
but at least you're capturing the information while you have it.
And I think that's the underlying concept that I really like.
And I think that that concept is applicable in more places than just this pattern.
Yeah, totally.
So as a recap, field takes as a first argument where in the JavaScript you need to write
or read the data from.
You got the second argument that is where do you get the value from the record from
in Elm.
And the third argument is what is the type of that field.
And because you have both reading and decoding and encoding at the same place, it's much
less likely that you're going to make a mistake.
That's another good point.
Because you can, if you see, if you give a JavaScript object field name x and then you
give a record accessor function.y, then you can look right next to each other and you
can see visually it's more obvious that something's gone wrong.
So yes, not only are you not losing that information as you build it up in two separate places,
but you can see the information together.
So if there's a mistake, it's easier to catch.
And you will not be able to leave out of a field, for instance, or as easily at least
because you can still do that in the object constructor or the record constructor.
Since you want the encode format and the decode format to always be in sync, you really want
to have all the guardrails available to make sure that you don't mess those up.
And when you have an encode that is very far away from decode, like maybe just a few lines
away, but that might still be too far away, then you're more likely to mess up.
And this is my main reasoning for using something like a codec is when the decoding and encoding
are the same or represent the same thing, you want to have all the tools available.
Should we talk about how you decode custom types?
That's one of the really clever design elements in this library is that solution to creating
a codec for a custom type.
So essentially, there's a little bit of a learning curve to figuring out how to work
with this, but you're building up this codec.custom, which it takes a function, and that function
is going to take an argument for every single variant that you have.
So if you have a teacher and student variant for a person, then so you'd have an argument
for teacher, an argument for student, and an argument for the actual value that you
So you can do a case statement on that value on that person, and then say if it's a teacher,
then you destructure that teacher, and then you have to call that argument that you had
for teacher, and that argument is the encoder being passed in.
Yeah, it's so weird.
That's the thing that hurts your brain the most, and that's the really clever thing about
this design is it finds a solution to sort of capturing that information, but it definitely
hurts your brain at first.
So if you think about it, what do you need to teach it in order to have it reversibly
decode and encode a custom type?
And what you need to teach it is, well, how do I pick apart the information from a custom
type and then encode that, and then how do I decode that?
And for encoding it, so you build up these encoders using these variant zero, variant
one, variant two functions, which take a name to serialize for that variant on the JavaScript,
and then they take the constructor to build it up, so that tells it when it decodes it
how to build it up, and then you take the individual codecs for each of the arguments
of the custom type.
So it really hurts your brain, but I find with understanding any complex pattern in
an Elm API, I find it helpful to put yourself in the shoes of the library author and say,
like, instead of just saying, like, oh, this is difficult to write, you say, like, what
does this API need to accomplish?
And then it helps you understand the design.
And so again, this API needs, what it needs to accomplish is you need a way to tell it
how to, you know, how to take these values, decode them from JSON, and then build up a
custom type, and you need to be able to pick apart a custom type and turn it into a serialized
JSON object.
So I think it helps to keep that in mind when you look at it.
Why is it passing in these encoder functions?
Often as a library author, you're trying to avoid people from making mistakes.
So if it looks a bit odd, it's probably because they're trying to add you to give you some
And in this case, that is exactly the point.
Do you know what the what the resulting JSON looks like?
Because as you said, there's a name for the variance.
So that's a string that you place.
So I'm guessing you have something like a record with an object with a type field where
Yeah, I'm trying to remember the exact format, like what the name is.
There's one field that's called args.
And I think args is just going to have the, you know, if it's variant two, then it's going
to have two values in the args list.
Or if it's variant zero, I think it's going to have zero values in the args list.
And then I don't remember what the actual variant name like teacher or student, that
I don't remember what that's placed under.
Seats or constructor, maybe?
I think it might be called something else.
So this does tie...
So if the library chooses a JSON format for you, then this does not help you with working
with external formats.
So if you want to, if you want your custom type to look at a certain way, when communicating
with HTTP requests, for instance, then it's probably not this that you're going to use.
I'm sure that you can use something else.
Yeah, you can use one off to do something pretty custom, I think.
Oh, by the way, I found it.
I believe it's called tag.
Tag and args.
So you get an object that has those two fields.
Yeah, that makes sense.
So there is one tricky thing that I've encountered when using this library, which is...
So sometimes you have certain areas that you don't care to serialize, deserialize.
Like you want to send it one way, but not the other.
And there is, you can sort of trick it a little bit by creating your own custom...
Let me look this up.
Yeah, so you can fudge it a little bit because you can use the function, which
is essentially an escape patch.
So takes basically an encoder function, something that takes some value
and turns it into an encode value.
And it takes as a second argument, a JSON decoder, and you can completely fudge it,
Like you can turn it into an arbitrary or you can do encode.null or whatever.
You can do decode.succeed to any value.
So you can sort of use that as an escape patch if you care about sending it, but you don't
care about receiving it or vice versa.
So that's something I've used.
So like one use case that we haven't hit on yet actually, but that I've found this library
useful for is using Elm program test.
So for Elm program test, you can send...
You can make assertions about JS ports that have been sent.
And when you do that, you send the JS port as it's some JSON value.
And so you can make assertions about that JSON value, but you need a decoder in order
to do that.
So when you use Elm program test and you want to make assertions about ports, you need to
send both an encoder and a decoder?
Let's say you're making an assertion about a value that you send to JavaScript through
a port in your Elm program test test case.
Now when you're making assertions and you say, I expect this outgoing to JavaScript
port to have been called with some value.
How are you going to assert about that value?
You need to have a decoder in your Elm program test set up that tells it how to turn...
Because it received some JSON value.
And so now you need to be able to decode that.
And so that's pretty low level.
And it would be a pain to be trying to keep them in sync constantly.
And again, this gets to the broader sort of way of thinking, that broader mindset, which
is if you notice that you have multiple sources of truth, how can you bring those sources
of truth together?
So there are fewer places where you could introduce a divergence between those two sources
of truth.
And so codecs are an example of that concept, but there are many places you can apply that
So that's one place that I felt frustrated of having these two sources of truth, that
it's like I'm encoding these values, but then I'm making assertions about these values in
my Elm program test cases where in fact, I don't even know...
I'm going to start having my tests fail if the encode format diverges from the decoder
I wrote in my Elm program tests.
And those are just two sources of truth that are diverging.
So I'd like to have less things that could go wrong there.
But again, there are certain things that if I don't want to assert on something, it does
get interesting because then you can use those escape patches and do and give
it something that hardcodes either the encoding or the decoding part, depending on what you
care about.
If you do use that, then that is where you would likely need more tests than the rest
because this part you can fail.
Even though you do, even if you use a codec, it's not a bad idea to have some tests around
your codecs and it can be useful to have like fuzz tests.
It's a really easy fuzz test to write to say that a codec is reversible.
It'd be interesting to build up along with the codec to build up a fuzz tester that generates
random values for that and then creates a unit test, a fuzz test that it's reversible.
You mean with the same API?
But then I wonder whether it would actually help assert anything.
I guess it would.
Which is kind of crazy, but it actually could give additional information about correctness.
Not necessarily perfect information, but additional information at least.
Something interesting.
Someone do that.
It would be a fun experiment.
So Leonardo, AKA Mini Bill, gave a talk at the London online meetup recently and he was
saying that his initial prototype of the Elm codec library was using this, it included
like a form encoder and he resisted the temptation to add additional features and just tried
to keep it simple, which I think is a good call.
It's usually a good call, yeah.
But that does again illustrate that point that this pattern has a lot of different applications.
So a form codec, is that for instance when you want to save the result of a form to record
somewhere, but you also want to open that form with preexisting data?
I think that would be the idea.
That's interesting.
I've never thought of it.
There are so many applications of this concept.
I mean, again, it's just relentlessly looking for ways to reduce sources of truth down to
one and to make impossible states impossible.
These are sort of the natural logical conclusions of that mindset, I think.
Another use case I found actually today is for routing.
So we're looking at refactoring the code today in the workplace and we have two functions
for routing.
One to parse the URL into a page or a route.
We call that a route in our case.
And then a function that takes a route and turns it to a string so that you can create
a list.
It's very...
I mean, that must be like almost every single LMSPA must have that function.
Those ones are defined in different places.
They're also spread across plenty of modules because we want to...
Because at least we split it up.
Which might not be a good idea, but we'll see what we'll do about that.
And yeah, it's pretty easy for them to get out of sync or even just forgetting to add
a route in one place.
So that's quite annoying.
But if we do that through a codec, then we only have to do it in one place.
And that is pretty good, I think.
Pretty nice.
And that's a good point about if you add a new route, like if you add a new variant to
a custom type, then if you're using this sort of codec style pattern where you have a case
statement where you build up that encoder, then it keeps you honest there.
I guess you would probably have a case statement anyway in the encoder regardless, but you're
sharing more information.
Yeah, but you might forget to handle the decoding.
I guess that's where it becomes more useful, isn't it?
On the decoding side, you're less likely to miss it.
And maybe you can do some stringification for you.
Because you can't...
Essentially, you cannot have a decoder for which there is no encoder.
Or you cannot have an encoder for which there is no decoder.
Meaning if you had the encoder, you would probably do case route of variant, variant,
variant, variant, and then you encode each of those.
And as you said, you're not going to forget to handle a case of those.
Assuming you don't do a wildcard, the compiler is going to tell you to handle a new case,
and then you'll add the encoder for it.
And that's not going to be a problem so much.
But then as you said, for the decoder part, you may forget to handle that case where you
turn that back into a route.
Or for the URL parser part or the JSON decoder or whatever type of codec we're talking about,
that's the part you might forget, right?
If you add the variant to your custom type.
In a regular encode decode thing.
When you have those two things split apart and defined separately.
But when you do it as a codec, in order to define a new custom type variant, you now
add a pipe chain and you say codec dot variant one or variant zero or depending on how many
arguments that custom type has.
And and now you get a new argument in that function that gives you that encoder.
And that's the only way to access the encoder to encode the value.
So you have that case statement.
You could handle that case statement, but you don't have the encoder to encode that
custom type.
So it keeps you honest about keeping those things.
So that's that's the really interesting thing about that, isn't it?
And it's basically the other way around that the compiler will lead you like, oh, you forgot
to add this case, this branch in this case of expression.
So you add it and then, oh, well, I can't encode it.
OK, well, I need to add a variant in the arguments.
Oh, well, you need to add a variant in the builder pattern.
Mm hmm.
And right.
You're done.
There's another example that that has been on my mind a lot, which is, you know, we've
talked on this show about, you know, in Elm pages, how there's this optimized decoder,
which will, you know, you you build up a JSON decoder using an API that's exactly like JSON
decode, you know, and then when you.
Go back to episode one, by the way.
And when you access a field in that optimized decoder, it's exactly like accessing a field
in a JSON decoder, except that under the hood, when you run your build, it's tracking which
field you're accessing.
And then it's actually reserializing that JSON that came in and taking out the fields
that you didn't decode, because it can guarantee if you didn't, if you do, you know, decode
dot field X and decode dot field Y, but you don't do decode dot field Z and there's a
Z in there, then it can throw away Z and the decoder will run exactly the same way because
you didn't touch that JSON value.
So it takes the the raw JSON value.
It tracks which ones you touch by doing decode dot field and then it takes that JSON and
then you can strip out all of the unused values.
Now that applies not only for unused fields, but for unused indices in JavaScript arrays.
And it can do that because if you do like decode dot at four, then you're only accessing
index four in this JSON array.
It can null out every single other thing.
So when you do, when you run it with a regular JSON decoder, it's guaranteed to access the
thing that was at four the exact same way because it's at the same index, but you know
everything else.
So that's actually thought it would be a bit cooler.
I thought it would remove the first four items and then change the index of the decoder.
Well, I guess it could maybe.
You could, it would, that would become pretty tricky.
You'd need to store some meta information in order to do that.
So like what I do in Elm pages is I run this like when you do Elm pages build, I take the
decoder, run it, it tracks what data is used in the JSON, marks those things as being consumed
and then it creates a stripped down version of the JSON and then that gets stored as the
data that's fetched when you actually run the webpage.
So now as a user, when you go and you hit an Elm pages website page, then it's going
to get that minify JSON data, but now it's running a vanilla Elm JSON decoder on that.
And so it can run the decoder without any special fancy logic.
It just runs the decoder and it's guaranteed to decode in an equivalent way.
Do you know how much you save by the way, usually?
Oh, I mean, well, it depends, right?
I mean, if it's a giant REST API, I mean, it's not uncommon at all to have deeply nested
fields coming back in REST APIs or, you know, 200 different fields.
So like I think it would be, I don't think I can claim to accurately give like the median
range or something like that, but I will say, imagine that you're getting two fields in
a REST response and there are a hundred fields and those fields, some of those fields are,
you know, arrays and nested objects and stuff.
January me's.
Yeah, you can, you could sort of do the mental math and imagine that you could easily have,
you know, 2% of the entire data response that you're accessing.
It's not, it's not such a stretch of the imagination.
So it very much depends on the API.
But so that's another interesting application of that idea of using this information that
you build up.
So you're, you're decoding, but you're also building up information as you decode.
And actually, you know, Ilias originally built this library, JSON decode exploration where
what it does is it, you can run a JSON decoder and you can give a warning if there are unused
JSON fields.
That was where I had the idea.
So I asked him one day, you know, do you think you'd be able to use this to strip out unused
JSON values?
And then within like a day he had built a working prototype, which I think ended up
pretty much being the release with a couple of like updated docs and test cases after
But so the sky's the limit with what you can do with this concept of, you know, building
up multiple pieces of information at the same time in a pipeline.
I'm wondering whether you can do something like that for parsers.
Like if you try to parse something like ISO 8601.
Your favorite.
That's it.
I'm sorry, it is 8602.
Brain dump here.
Brain freeze at the last moment.
So yeah, if you parse ISO 8601.
Which is a date format for people, for new listeners.
Because surely you'll know it otherwise.
So yeah, you parse it, but then you might also just want to rewrite it as a string.
So like for like parsing languages or like in Armoury.
We parse the Arm code and then sometimes we want to rewrite Arm code.
Like what could the Arm format work with as a data structure under the domain?
I don't know if you can.
Yeah, maybe.
It's interesting.
It is very interesting.
Because you can definitely do the same thing.
But the question is, would you be able to do it in a way that you, where you would get
the same guarantees as what you have with the Arm Codex API?
That's fascinating.
Well, that's apparently an exercise for the listener.
I think it would depend.
So if you want to make it reversible, if you want to make parsing a syntax tree reversible,
it would need to be a concrete, not an abstract syntax tree.
Meaning you would need to keep track of how many white space characters they used.
You know, if it's a syntax that supports trailing commas, you'd need to keep track of whether
there was a trailing comma, how many spaces were before or after.
Anything that could make that syntax.
That would need to be part of the data structure you parse it into.
I don't think you would need to keep track of every white space, but if you at least
keep track of the position of everything, then you can infer things again.
You'd need enough information to be able to infer that.
You would need to have a lossless format if you wanted it to be truly reversible.
But if your goal is not to be reversible, yeah, I wonder how much mileage you would
get out of that.
It's also, it can also be, the parsing is the harder part.
Taking an AST and then printing it out is significantly easier.
But you still don't want to make mistakes when serializing.
Or do you serializing?
You want them to be the same when possible.
Although since syntaxes typically aren't changing that frequently, it might be less of an issue
for that kind of thing.
I wonder.
I wonder.
But it is an interesting concept.
I mean, if you can build a parser the same way, but with these additional guarantees,
that sounds like a win win to me.
And if you're parsing a phone number and then you know you have a way to turn that back
into a phone number in the same format, then if you have confidence that whatever format
you're spitting back out can be parsed, that could be kind of nice potentially.
So let's say that you parse a phone number and you accept, so if you throw away parentheses
and then you parse a digit, parse a digit, parse a digit, throw away a paren, throw
away spaces, throw away dashes, parse a digit, parse a digit, whatever.
Well now, how are you going to serialize that?
You're just going to print out a string of the digits.
Or maybe you like...
Well, you could remove them as part of the public API, but internally have the type,
you know, big type, which does keep all that white space information.
Yes, you could.
It wouldn't be as performant because you would have a lot of memory that you would maybe
not use.
I guess it's only useful if you change what you parse.
Because if you just want to reprint it, then just keep whatever you parsed originally in
memory and then keep it back.
But if for instance, if you want to do AST manipulations for Elm syntax, then it can
be interesting.
Yeah, it's an interesting concept.
Yeah, I think that this pattern is ripe for use for sure.
There are many places it could apply.
It's kind of neat that we're sort of at the beginning of the exploration of this concept.
And I think we'll be seeing more and more applications over time.
Oh, you know what this could be used for?
Like a lot of people ask, hey, I have this HTML stored in my database.
Can I embed it into Elm?
Well, you could parse it to HTML and then have that HTML somehow be reversible into
a string but also viewable as an HTML from Elm HTML.
I guess that can make sense.
I guess I'm expecting a lot from our listeners now.
Yeah, it's very interesting.
One thing that we haven't mentioned that I think is worth mentioning is you can also
use this technique to encapsulate low level serialization details.
For example, if you wanted to serialize something into bytes, again, this is going to be something
that you control the serialization and deserialization and are using the same code to do both, like
serializing some settings or cached data or whatever it may be.
Then it's a lot of work to get a correctly working byte decoder and encoder, which if
you're not familiar, I don't know when, like a year ago maybe, this official Elm bytes
decoding package was released that allows you to do binary decoding and encoding, which
is quite cool.
You can pack the data into a much more condensed data format and potentially get some performance
gains from that.
Although actually, I think that as far as decoding the data, JSON decoders are currently
very fast.
That may change with WebAssembly at some point.
For Elm Review, I still use the JSON optimized version and not the bytes because you can't
send bytes over ports for now.
Got it.
Okay, so you have to serialize the bytes to some sort of like base64 encoded string,
which represents bytes, but it's not actually directly sending bytes.
Which in my case was less performance according to benchmarks.
So the decoding and encoding portion may be slower, and it is slower, than if you were
decoding and encoding JSON, but the storage is smaller.
So one thing that we haven't touched on is that I think that it's a lot of work to get
a byte decoder and encoder lined up correctly.
So there are certain details, like with bytes, it's much more low level than JSON decoding
and encoding.
So if you have a list in JSON, it just is going to have an opening array and a closing
array notation.
For bytes, there's no such thing.
You have to tell it when to start looking for a list, when to stop looking for a list,
which means that you're typically going to encode, you need to encode the exact number
of bytes to consume until you stop treating it as that list.
And so that's very low level.
But if you're using a package that does like a codec for your bytes, then it can build
in that assumption about how it encodes that.
So it says, I'm going to encode a number at this position for bytes.
And you know, and it encodes, I'm sure it encodes assumptions about like big endian
or little endian too, and things like that, right?
It's a lot more high level dealing with a package like this.
So it's not just that there are fewer things you can get wrong from the decoding encoding
perspective, but it can handle some of those low level serialization details.
Bytes have been out for two years apparently already.
Two years, nice.
Yeah, in November.
So by the time this episode airs, I think it will have been two years.
Very nice.
We're just starting to see a lot of explorations happening with bytes, I feel like now.
Yeah, that'd be pretty cool.
To see more.
Well, good stuff.
Are there any words of wisdom we want to leave for listeners about how to get started?
Where to look for opportunities to use this pattern?
I'm not sure I have anything worth sharing here.
Is it worth using Elm codec for JSON?
And are there opportunities, you know, that you think you could find to use it, you know,
in the code bases you work with or?
I use it for caching, as I mentioned before, and I think that's a great opportunity for
it, especially using Martin's Elm serialize.
By the way, shout out to him.
He did an amazing job.
And apart from that, I haven't played much with decoders, with codecs.
I think for people who happen to do testing on their ports with Elm program tests, I would
definitely recommend that.
But otherwise, I think just keep an eye out for opportunities.
And you know, especially if you are serializing and deserializing a format, keep an eye out
for that.
And more broadly, whether it's a codec or not, look out for opportunities to keep track
of information as a single source of truth and build up multiple pieces of information
at once rather than separately.
Yeah, definitely.
Yeah, if you can have this, if you have multiple representations of a single thing, then it
might be useful to have something like codec or that kind of pattern.
And let us know what else you come up with.
Tweet us.
We'd love to hear what you end up doing with codecs and with that pattern and what instances
of that pattern being applied that you've seen.
And find us on Twitter and ask us a question.
[00:53:40] question.
Submit a question and we'll hear from you.
All right.
Thank you for joining us on Value Room.
Until next time.
Until next time.