Parse, Don't Validate

We discuss the Alexis King's article and how those techniques apply in Elm.
August 24, 2020

the difference between validation and parsing lies almost entirely in how information is preserved

Shotgun parsing is a programming antipattern whereby parsing and input-validating code is mixed with and spread across processing code—throwing a cloud of checks at the input, and hoping, without any systematic justification, that one or another would catch all the “bad” cases.

Why the term "parse"?

a parser is just a function that consumes less-structured input and produces more-structured output [...] some values in the domain do not correspond to any value in the range—so all parsers must have some notion of failure

  • Conditionally return types
  • Don't have to repeatedly check condition
  • Look out for "lowest common denominator" built-in values being passed around (like empty String)
  • Maybe.withDefault might indicate an opportunity to parse

Two ways to use this technique:


Hello, Jeroen.
Hello, Dillon.
How are you doing today?
I'm doing pretty well.
How about you?
I can't complain.
I'm thinking about this concept of parsing, but not Elm parsers.
Parsing versus validating.
parse don't validate.
I've heard that article before.
Yeah, this is kind of fun.
It's actually, it kind of feels like a book club a little bit.
I reread this article which is sort of circulated around.
It's by Alexis King, Lexi Lambda.
It really encapsulates a lot of really nice ideas and expresses them very well.
Yeah, yeah, definitely.
So can you give a summary of what it is about?
It was nice rereading this article because I sort of tried to capture a couple of quotes
that I think nicely summarized the article itself.
And this one really stuck out to me.
The difference between validation and parsing lies almost entirely in how information is
So I think that it's really keeping track of information.
As you go deeper into the logic and function calls in your code, you should have more and
more highly structured data.
You should have more things that you've proven through the data types that you have.
A perfect example is JSON, right?
At the periphery of your application, maybe like through like a flag or a port, you could
have some JSON data coming through.
But in the core of your Elm application, you're going to get that in a structured way.
I think it's important to point out what this technique or this technique tries to prevent
you from doing, which pitfalls it prevents you from falling into.
And that is to check things several times.
Yes, exactly.
What was the, I think there was like an academic article that she was referencing referring
to shotgun parsing.
Yeah, shotgun parsing.
So the idea is that you check for something once, like, is this list empty?
In one condition, you do something and the other condition you say, hey, we don't want
to have an empty list in this case.
And then in the condition where you said, where you notice that the list is not empty,
you do some other operations.
And then you have to check because of some other operation that the list is empty or
not again.
And if you did it with a if then else statement, then you lost that information and you have
to make the check again, meaning that you have to handle the case where the list is
You haven't preserved that information that you had.
You had the information and then you lost it.
So the type system can no longer help you with that.
So you have to handle the case again, even though you knew it already.
Or keep track of it in your head and say, oh, at this stage, I know that this data has
already checked for this.
Or I think I know.
I'll check later if I did check that again.
See also JavaScript.
Come on, don't throw JavaScript under the bus.
I mean, we touched on this in our in our JSON decoding episode a little bit that this is
a very familiar experience.
And you can certainly write JavaScript in a way where you're handling the validation
step up front as sort of a single step rather than this sort of shotgun parsing approach.
But I do find that it's quite common that you see this in JavaScript where you're sort
of mixing processing data with validating the data.
Yeah, so can you explain what shotgun parsing is?
Because I don't think we did.
We explained it.
Yes, I think shotgun parsing is just defined as mixing processing data with validating
So it's your intermingling those processes rather than sort of validating data upfront
and parsing it in a single process.
And then now that you have the data, process that data.
Does that seem like a fair definition?
Let's just reread the article real quick.
So here's a quote.
Shotgun parsing is a programming anti pattern, whereby parsing and input validating code
is mixed and spread across processing code, throwing a cloud of checks at the input and
hoping without any systematic justification that one or another would catch all the bad
Yeah, so for me, it means that you have if else statements all over the place.
And then you hopefully have enough to make sure that all the cases are handled.
And maybe there are maybe they're not.
I was recently working with some some Ruby code.
And there's like a, you know, in the controller, the Ruby controller, MVC controller, you can
you can say like params, and then get whatever, you know, data that's been passed in in the
in the request.
And it's quite common to see Ruby code where you just sort of grab the params and maybe
like you you try to remember to check in front of the code, maybe you add like you can add
these like before filters and that sort of thing in in your rails controllers where you're
you're asserting certain things about the data before you get it.
But then you're just directly reading like params sub username or something and
params dot sub dot username, you mean something like that?
Well, like params square brackets, some value.
So you're like grabbing a key out of hash that's just grabbing this raw data.
And it seems very prone to this sort of shotgun parsing anti pattern.
So another thing that I've seen, I was working with with some Ruby code recently that I found
to be a nice pattern that there was this sort of validator class that you could sort of
assert things about the data, and then you pass that data after you've made the assertions.
So you're never directly accessing the params in your controller, you're grabbing data after
you've made these validations.
So you can certainly like use this pattern of thinking parse don't validate in non typed
But when you're working with the typed language, you can actually have your type signatures,
keep you honest to make sure that your data types have the proofs in them.
So you could say this has to be a non empty list.
And now you can't just forget like, oh, I need to be sure to check all these invariants
at the top, you've proven by the time you call that code because it requires a non empty
list type.
Let's talk a little bit about the term parse, because that's certainly something that stuck
out to me when I when I first read this article.
You know, it's a little confusing.
And she does kind of get into that.
And she says, let me try to convince you that this word parse is actually a good term for
this concept.
It might seem unrelated at first.
So she's sort of saying, what is a parser?
It's a function.
So she's here's a quote, a parser is just a function and produces more structured output.
Some values in the domain do not correspond to any value in the range.
So all parsers must have some notion of failure.
So I think that's an important point, right is that if you're able to fail, then we kind
of talked about this in our opaque types episode that you can have a function that conditionally
returns a type.
And the way you do that in Elm is you can return a result or maybe type or a custom
Yeah, right.
And that represents a failure.
So you have to handle the failure step.
But once you check for failure, so you say, you know, case parse result, okay, error.
Well in the okay case, you know, you have that data, you know, you've validated whatever
you validated by the data that it's a non empty list that it's an authenticated user
that it's a valid username, whatever you've validated, you can conditionally return a
type and you know that the only way you can get that type is by running it through a function
that will conditionally give you that type if the validation passes.
So it's sort of this technique of just instead of just doing a conditional, and then losing
the proof that you've done in that conditional, you return a type, if it's true, and else
you return nothing or you return an error.
And now you're able to keep track of that proof that you've just done.
Yeah, the simplest example that I can find is when you do case of expression on a maybe.
So you do case, maybe something of, and then you have two cases, just something, and then
you have access to that something as if it is not a maybe, and then nothing where you
don't have it.
And then in that case where you have something, you just use it to what like normal value.
And you have successfully branched off depending on that value, but you have remembered in
that branch that the value is there.
You don't start again with a maybe something.
Yes, exactly.
You've proven that and so now you can work with that type without having to check for
it again.
And yeah, I think that's a great example.
And what often occurs, like to contrast that with the alternative to what you just described,
you just keep passing the maybe down, even though you've sort of unwrapped it in that
case statement and you've verified that it's present, if you expect it, sometimes you'll
see things like, you know, maybe with default empty string.
Now that may be what you want in some cases, and it's not inherently bad to use maybe.withDefault.
But I think it's good to be aware that sometimes maybe.withDefault is a smell that indicates
that you're basically validating rather than parsing.
Yeah, and in some other way you lose information of what was in the maybe beforehand.
You still have access to that maybe value, but you probably won't use it anymore.
Like let's say that you only want to run a certain bit of logic if you have a username.
If you're a logged in user and you have a username, you pass around this maybe user
and then you say username.
And maybe you pass in maybe.withDefault empty string.
Now you've got a string.
You've got a string.
What does the string represent?
Well, it might represent a username or it might be an empty string.
Why is the string empty?
Is the string empty?
Does it represent a pending username that hasn't been saved or does it represent a username
that came from a user or does it represent a guest user who's not logged in and therefore
you used maybe.withDefault empty string?
It could possibly represent all those things.
And well, that's, Alexis had the phrase in her article at one point, it's a bug waiting
to happen.
That phrase often pops into my mind actually as well.
And I would call that a bug waiting to happen.
It's a time bomb.
So what would you do instead?
Well, I mean, there are a lot of ways you can handle that, but you could have, for example,
a custom type that keeps track of that information.
And so you know if you have a username, if you have a logged in user, if you try to get
the logged in user and you get nothing, check that in one spot and pass that data through,
or maybe you wrap that in a custom type and you turn that into a custom type that's either
a guest or a logged in user.
But whatever it is, you want to preserve that information.
So that's like a common thing.
If you're sort of getting the lowest common denominator data type, you're getting some
like built in data type like a string and you're passing it around and it could represent
different things.
It's a bug waiting to happen and it's probably an opportunity to use this technique of parse
don't validate.
Yeah, usually when you simplify a complex data structure, data to a single primitive
or way simpler version of it, you're bound to lose data.
And if you do need that piece of data later on, then yeah, you're going to do some assumptions
which might be correct now, but which might be incorrect later.
And what you would like to do to have is to have the compiler check things for you.
So if you make assumptions by yourself and do maybe with default empty string, because
it will always be true in this case, because I checked it before, then it might be true
now, but maybe later on, because you move code around and the code might not have the
check anymore beforehand, then at some point things will change and you'll have a bug basically.
One, one thing I like to think about is imagine somebody is new to your code base.
If they come into the code base and they start using some code, how much do you have to tell
How many caveats are you going to explain to them?
Like, Oh, okay.
So like if you're doing a code review on some code, they just wrote, how afraid are you
going to be that that's going to get merged without you carefully checking that these
constraints are met?
Like, Oh, when you call this function, like if you call it with an empty string, then
then that's a problem.
So you have to make sure it's never an empty string.
Like see this code over here.
It checks if the string is empty.
Well, that burden shouldn't be on the newcomer to the code base.
That burden should be on the code that protects those conditions.
And it, you know, it needs to validate its input and then preserve that in type information.
And so it then becomes impossible because maybe you have like a non empty string type
that you take as an input to this or whatever data type that you validate into.
So one thing that I understand from this technique is that pretty much every time you want to
keep the information from the checks or the parsing that you have beforehand, you kind
of need to have a new or a specific custom type to your guarantees, to your assumptions.
So in the case of, um, I want a non empty list of something.
So yeah, you check beforehand whether your list is empty or not, you're doing a parsing.
So you get a non empty list.
So in the case of a non empty list that is quite generic, but in some cases you probably
need a new custom type.
You will have a new custom type for every step of your function calls.
And I think that's probably okay.
In practice, I don't think you will have so many new types.
That's not something that I noticed in my code base at least.
Yeah, it's not.
I mean, it's so lightweight to create types in Elm and so that's definitely not a problem.
I mean, another thing that Alexis talks about in the article is that you can use built in
types that have the characteristics that you need.
So for example, if you're saying that I can't have duplicate values in this list, then you
parse it into a set and the set data structure is going to take care of that for you.
And when you look at a set, you know that there are no duplicate keys.
You don't have to think about that.
She also talks about this technique of weakening return types and strengthening input types.
Yeah, that's nice.
That's a really nice sort of insight that she's captured there.
Those are sort of two tools at your disposal.
So we kind of talked about weakening return types with conditionally returning custom
So if you want to check if you have a valid username, you can conditionally return a custom
type that represents a valid username.
And if it's not valid, then you return nothing or error or some data type that doesn't give
you the username type that represents a validated username.
Yeah, so it's a weakened compared to always returning a valid username.
Exactly so instead of returning a username, you return a maybe user or guest or user type.
Exactly exactly which JSON decoding is the same pattern you I mean, it's parsing it's
it's not parsing in the sort of computer science sense of building up an AST and parsing the
tokens that's already been done with JSON decoders.
We discussed in our JSON episode, but it is parsing in Lexi's sense of the term in Alexis
a sense of the term where you have a way for it to fail.
And if it hasn't failed, then you've guaranteed certain qualities and you capture that in
a data type.
So yeah, that's weakening the return type.
That's a very useful technique.
And then on the other side, they're strengthening the input type and those those two things
go hand in hand.
And strengthening the input type.
She gives the example of requiring a non empty list as an argument.
And I find that that sort of thing strengthening the input type, it's quite nice because you
want to write really confident code that doesn't think about validating all these conditions,
you want to demand that your callers give you that data, and then be confident about
just using it right.
And when you do that, I mean, I think that that sort of gets at that shotgun parsing
concept that you're not sort of blending these responsibilities of processing the data and
validating the data, right?
Like, you should separate those responsibilities.
So if a function's job is to process data, state what data you need upfront with the
types, the types should express the guarantees that you need.
So that's sort of that's kind of this concept of like design by contract where you're expressing
the contract, and you can work with those assumptions within that code and just confidently
work without checking those assumptions.
In the case of a typed language, and especially Elm, you can encode those guarantees and
that contract into types.
So that's strengthening the input type, be very clear about what you require as a as
a constraint.
And then just confidently work with those assumptions, rather than mixing the checking
and the processing.
Yeah, I find it kind of odd, but it actually feels very natural to people who write Elm
usually because you do this with JSON decoding all the time.
We took it as an example already.
But let's imagine if we didn't do it, what you would probably do is you get a JSON value
from an HTTP request somewhere or from flags.
And then every time you need to access one field of it, you decode it for every use every
field, but not all at once.
And that just sounds unimaginable.
Which you could do if you wanted to.
Because you can do like a JSON decode dot value.
And yeah, also, so you could defer decoding and just get a JSON decode value for a certain
subtree of your JSON, and then parse it again at a later step.
And that would be that sort of like shotgun parsing approach.
But we don't like naturally people tend not to do that.
I think maybe it's like a cultural thing.
Maybe it's like the educational material and the examples.
I think it's the education.
But it also people don't like writing JSON decoders.
So if you can do it once, that's fair.
So a lot of people say, hey, JSON decoding in Elm is a pain.
But you only do it once.
And because it's a pain, yeah, that's why you only do it once.
And that's why you get those guarantees right on the edges of your application.
And you don't have to do shotgun parsing or shotgun thing.
Shotgun parsing, yeah.
Yeah, that's a great point.
So it's something that is actually quite natural.
Actually another reason why I think it's very natural when you write Elm is because you
have to handle the error case.
Unlike even Haskell.
How so?
In Haskell, it's a warning if you have an inexhaustive case statement, right?
But it's not an error.
And Alexis talked about that even in the article.
One of her examples is doing list.head in Haskell and how list.head in Haskell is somewhat
By default, it actually returns an element rather than a maybe element.
Oh, yeah, the default, the standards head.
So Elm is unique in how built into the language that approaches.
Let's think of a few other examples because it's actually, I think it's quite widespread.
And I think it's nice to sort of connect these examples that a lot of people will be familiar
with this sort of concept of parse don't validate.
So one that comes to mind for me is remote data.
And I like the way you were framing this.
What would it look like if we didn't use this parse don't validate approach with JSON decoding
in Elm?
Well, what would it look like if we didn't use this parse don't validate approach with
remote data?
And the answer is, well, you could have a loading flag and you could have some data
that you check if it's loading.
Remote data gives you this nice type that says you're going to need to destructure this
code to see which value it is instead of just checking a Boolean to see if it's loading
or if there was an error, for example.
And I think another thing that the remote data example illustrates well is this concept
of dealing with uncertainty at the periphery of your application.
So with remote data, and I think maybe some people even use remote data and aren't familiar
with this sort of technique.
So you can just pass remote data down and check it.
But what works the nicest is when if you say, I have these five different remote data values
that I depend on, then you can say remote data dot map five.
And if any of them have an error, then you want to show an error.
And if some of them are loading, you want to show a loading screen instead of showing
five different loading spinners like you sometimes see online.
Which is reasonable also.
That's right.
It may be what you want in some cases.
That's absolutely right.
But often you want to take a set of things and maybe you even depend on these different
pieces of data.
So you can't do anything without this sort of aggregate piece of data, even though it
comes from five different HTTP requests.
And you just do remote data dot map five.
And now you've got these five different remote data types that all come together at once
rather than having to check all these different conditions.
And so you handle that uncertainty at the very top level.
And then you just have a code that's blissfully ignorant to the possible failures at the periphery.
And it can just confidently use all of that data.
It's got all of it.
It doesn't need to check.
Oh, well, if I have this and I don't have this, then handle it this way.
Yeah, this is usually what I use.
Maybe map two and all the other map and functions.
Same pattern.
And it's something that you see quite often.
I saw one example in some code I was working with recently where I did a refactoring and
it turned out quite nicely.
What I was seeing was exactly this validation smell rather than parsing that we were checking
for a certain condition repeatedly and weren't able to use a data type to say, I need this
I wanted to strengthen the guarantees of the input I was getting.
So the example was there was this data type representing a document type and one of the
document types was unknown.
And that was just in this big custom type, document type is this type or that type or
that type or unknown.
And so you get code like if document type dot is unknown or if document type equals
equals document type dot unknown.
This is annoying to check for all the other ones.
Because there's special handling if it's unknown.
So I think this is like maybe sort of its own sub pattern that sometimes you have data
that's like mixed in and you want to be able to, again, you know, like we've been discussing,
you want to keep track of what you've proven.
So it's like if you've proven that you have a document type that's not unknown, now that
unknown type is just one of the variants of this big document type.
So to prove that it's not unknown, like there's no data type that can tell you that constraint.
So one thing you could do is you could, you know, you could have like a known document
type thing.
But then if you can have unknown in there, then it's like you could have a custom type
that says it's a known document type.
But if you have a document type type in there, that could include unknown.
So now you're possibly representing an impossible state.
So what I did instead is I created a new top level document type that was just document
type and document type was either unknown or a known document type.
And I created a separate type called known document type.
And I just took all of the variants except for unknown and put it into there.
And so now you can assert in certain functions that you have a known document type, in which
case you don't need to check if document type is unknown, then handle it this special way.
If you have known document type as the type of your function, you don't have to think
about that case.
Yeah, you've checked it once and you don't care about it afterwards.
Yeah, exactly.
And I think that this pattern comes up.
Maybe you have like, like I find this is quite common that you'll have different custom type
variants that have bits of data, you know, and if you know that you have these bits of
data, like maybe you have like a guest user has certain bits of data and a logged in user
has certain bits of data and admin user has certain bits of data.
But if you want to track that, if you've checked and the only users that are allowed to see
this page are admin users, if you have like a user custom type with those three different
variants, admin, logged in and guest, you can't use those variants to say the view function
for this page requires an admin user because it's a variant.
It's not a type.
So I think there's sort of like a, I don't know, maybe this is like an Elm refactoring
technique to sort of take the bits of data of like admin user and then just extract out,
you know, so if you have like admin user and it has either a record with bits of data or
positional values in the constructor, I think that's like a nice refactoring to like pull
out those bits of data, whether it's a record or positional constructor values, and then
create a new custom type called admin user.
And now you have type user equals admin user, admin user.
It's kind of weird.
But yeah, exactly.
Maybe you have some sort of clever naming convention that makes it less awkward there,
but that's a very common smell that I find that you lose track of which variant you have
and you want to strengthen your input assumptions and you can't do that because it's locked
in this variant, which you can't check for an Elm.
You don't have to extract the admin data at the top level where you defined the user.
You can just create a new admin user data or whatever better name you find locally at
the location where you need it.
So you don't need to refactor your whole application just because this one case.
That's a good point.
But if it makes sense in other places, then yeah, by all means, please do.
And I think that's a very good point.
And I think maybe it brings up another question, which is how do you control the creation and
the sort of destructuring and reading of that data?
So some data you want to protect, we discussed this in our opaque types episode.
So it might be like a social security number that you want to be careful about how it's
presented and make sure it's encrypted before it's sent over HTTP or whatever.
But you also want to protect sort of low level data, right?
Well, I guess another way to handle protecting low level data is to not make it low level,
to wrap it in a type that represents the assumption.
So I guess it depends on where you want that abstraction to be.
For example, if you have an ID that's just an int or maybe you have like an admin user
ID type or a user ID type, you can have more low level data if it's encapsulated in its
own module that's protected, that you can't directly read the data from.
You can only get it in the functions that are exposed by the admin user module.
So there's the question of do you have access to internal data, which may be too low level
to use?
And then do you have access to create that data?
So if you can just create an admin user, what's to stop you from taking the regular user data
you have and you're like, I just want to see if this admin user page works.
So I'm going to create an admin user from the current logged in user because I don't
want to handle those other cases now.
And so you take those bits of data and then you pass in an empty string for this one thing
you need and a, you know,
So you still want to make impossible states impossible and you still want to use this
sort of gatekeeper approach of protecting how you create and consume data.
So you would basically have a parser function to create your admin.
In a way.
That's right.
So how would you go about this?
You have a condition that you want to check.
So you want to branch off based on some condition, but there's no way to extract information
that is of a different type than what you have in the other condition.
You would have a name, for instance, which is a string containing the first name and
the last name of type string in one condition and the other condition you would just have
the first name.
So in both cases, you would have a string formatted just like you want to, but they're
the same types.
How would you go about that?
My guiding principle for cases like that is always wrap early, unwrap late.
You know, we talked about that, I think, in the opaque types episode.
And I'm always thinking like you want to have, I think Alexis talks about this in her article
as well.
You want to have the most pure representation of the data.
She talks about this approach of like letting the data types drive the implementation, not
having the implementation drive the data types that you define.
So think about the data types up front.
That's sort of like a data modeling technique and mindset is that you think about your data
types and think about your ideal data types, continually sort of refine and refactor your
data types to make them express your constraints better.
So at the core of your application, you want to have nice data that clearly represents
the semantics that clearly represents the constraints and guarantees, the things that
you've proven.
And as you get more and more into the core of your application, you should have more
and more refined data types and assumptions.
You know, so you, you should have fewer maybes, you should have, you know, fewer built in
types, and you should have more custom types and more things that illustrate these guarantees.
Um, more non empty lists and things like that.
So that's my sort of guiding principle is, you know, wrap early.
So I want to avoid this sort of shotgun parsing approach where I'm mixing validation logic
with processing logic.
I want to validate and parse into nice data types that prove my assumptions at the periphery
as soon as I possibly can.
Ideally, I don't even want it to exist as a string, because my decoder immediately turns
it into this value or fails if it can't.
If I expect the username to be a non empty string, or to be of a certain format, or to
not include certain characters, then I want my decoder to actually do that.
So I never have it as a string, it will just fail if it doesn't meet those conditions,
or I'll have I'll have it parsed into my valid username type.
And then sometimes it's tempting, we kind of talked about this example before that you
could like, you could do maybe dot with default empty string, and then pass around this empty
string that maybe it represents all these different things.
That's a violation of this concept of wrap early unwrap late.
So you wrap early, you have your ideal form of the data, the nicest data type.
And then at the last possible second, you're forced to have a string because or you know,
a JSON encode value or whatever, you've lost information and semantics, but you have to
because that's what that API requires.
So at the last possible second, you unwrap that data into these lossier data formats.
And that all happens at the periphery of the app.
Yeah, you lose information as soon as you don't need them anymore.
yes, exactly. So all along the way, there's not an opportunity for mistakenly using that
data in a way that the guarantees aren't enforced, because your data types help you enforce the
You don't lose semantic information because the types help enforce the semantics.
So yeah, I think wrap early unwrap late would be my sort of core guiding principle there.
It's yeah, definitely.
Yeah, it takes practice to find how to do that.
But I think it's always a good idea.
What I'm thinking right now is that if you have the same data in both conditions, the
same primitives, and the same data that it really contains, one way that you could keep
using this technique is to use phantom types.
So phantom types are a way to differentiate your types, even though their contained data
is the exact same.
Yes, exactly.
So you would have a parsing function that returns a data type with a type variable,
which is a phantom type.
So you would have, you could match them, you could call function that requires that the
phantom type is in a certain state, is of a certain type, but you would still have the
same data, you wouldn't need to create a new custom type.
So both techniques are nice, yet the situation will tell which one is best.
Yes, and that's actually exactly what Richard Feldman's Elm validate package does.
It runs it through a series of validations, and then it has a sort of phantom type that
it uses to keep track of the fact that you validated those conditions.
So that's definitely a valid approach, if you will.
We'll have to check that.
You'll have to parse that.
That sounds very weird.
So you just opened the bank accounts.
We're going to have to parse your information.
Sorry, what?
Don't validate my information, parse it.
This is unprofessional.
Yeah, I think phantom types definitely fit in here.
It is interesting because you can unwrap the phantom types in a more permissive way, depending
on how you structure it.
So for example, let's say you have an ID type, and it has a phantom type with different ID
You can just unwrap the ID.
You don't have to use specific ways to unwrap it.
It all requires human judgment to think about what guarantees do I want to provide, and
how do these different Elm devices that I'm using help me do that?
You have phantom types that you use to construct things.
Another way of saying it, there are types for which you can only build a type with a
certain phantom type using some functions.
You can make sure that there's only one way to build a value with this exact type, with
this exact value for a phantom type, and that is using this function.
This function is your parser function, basically.
Then you can probably destructure it using functions.
This is only true if you're using an opaque type.
Like for the phantom thing that you use, you can expose the type of your phantom type,
but not the constructor for the phantom type.
So the only thing that can build values with that phantom type attached, user ID for example,
can only be built in this one module because the constructor is hidden in that module.
So basically the idea is to get you across that border.
So you need to enter this function.
You need an ID with this type for a phantom type.
And once that is done, you don't care about the underlying data or the phantom type value
You just want to access the value of the ID.
I don't remember if I brought up this analogy in our opaque types episode or not.
Sometimes I think about those wristbands that they give you.
If you go to a beer garden and they give you a wristband that validates that you're of
drinking age, or you go to a concert and they give you a wristband that says that you paid
for your ticket.
Once you're in the concert venue, people don't have to keep checking your ticket.
Or once you're at the festival, you don't have to show your ID to buy a drink because
that's already been parsed, if you will.
They don't have to validate it at every stand.
There's absolutely no way that people will exchange their wristbands though.
So it's all totally safe.
That's right.
You have to think about that.
You have to be thinking like a security person.
In this case, I think you want to unwrap a bit later.
I think the constructors are too exposed in this case.
I think you're right.
I think you're right about that.
Another thing about phantom types versus custom types is you can potentially grab the bits
of raw data and it doesn't force you to handle the bits of data.
You can unwrap them.
Again, it's just a matter of using your own judgment to think about what are the possible
ways that these assumptions that I'm trying to provide can be bypassed.
That's the question.
Basically, if you use this technique well, you notice it because one, you don't have
to fake conditions or fake branches.
So basically, you made impossible code paths impossible.
Things that you can't test, those are impossible.
And two, you have not faked any data anywhere in the code.
By passing in like maybe with default value.
Yes, that's a great point.
That's a major red flag.
If you are just in an impossible case where it's like, this should never happen, instead
of throwing an exception in a language other than Elm, you do a similar thing where you
just say, I'll just return this value because this will never happen.
So I'll just return the string.
This should never happen.
Or worse, you call the function recursively so that the Elm compiler is happy.
Those are code smells.
Yeah, that's the code smell.
If you arrive at this should never happen, it should be in your parsing code at the periphery,
the code that's doing the initial logic to make sure the input is good.
And then you just immediately exit.
Otherwise, you have the good data and you run your core logic.
If you do get into that will not happen case, at the very least, leave a comment.
At the very, very, very least.
Because otherwise, people will not be happy when they discover that this is weird.
Or use a co worker's computer to do the commit so that the get blame show somebody else's
Let's talk a little bit about how you would, you know, you're going through your code base
and you're looking for opportunities to improve your code using this parse don't validate
What should you be on the lookout for?
Look for those comments.
Look for this should never happen code.
What else should you be on the lookout for?
Then you notice that a certain condition is made several times.
So if this is empty, and in that, that branch, you see, if this is empty again, which is
not often the case, usually you will have that in separate functions, which is, which
makes it a bit harder to tell.
I sometimes see functions like is thing like this is just Yeah, it is just Yeah.
So if you return a Boolean, that is usually not great that people refer to that as Boolean
blindness sometimes.
But sometimes, you also see that when you have a function returns like a tuple, the
first one being a Boolean, and the second element being the value that you are using.
So that is kind of parsing don't validate, but you're still returning a Boolean.
You're not keeping track of it in a very safe way.
Yeah, exactly.
So that is the one that I've seen most and which is the most actionable for me.
Like that is an instant red flag.
And that is not very hard to tell.
But it's quite rare.
And then also, like, you know, this, this other case, you were mentioning where you're
essentially, I think of it that you're discarding information you'd get from a case statement
So like if you're saying is just Yeah, is that always a bad thing?
No, maybe there's some cases where it's okay, but be on the lookout for it.
Like, I would look very carefully at code that's checking is just and then doing something.
And if it does is just followed by maybe with default, yeah, or using that value anyway,
exactly in any way.
Yeah, exactly.
So if it destructures a value, and then uses that value, that's a clear smell.
So one thing that I that you will see a lot when you try to do to apply this technique,
I think is you have a bunch of functions, which work with a maybe string, for instance.
And in one case, you know that is a string, what would you think should you then just
wrap it in and just or should you duplicate the function which takes a maybe string, but
then doesn't handle the nothing case, you're talking about like destructuring and then
piecing it back together the way it was in a way but not piecing it back into the same
value just you have a username.
So you have a function which takes that user, maybe username and displays it.
And in some parts of the code, you know that a username is just Yes, we still want to display
it from the username.
So you call that.
So you're putting a just in front of it because you're calling a function that takes a maybe
string and shows the username?
Yeah, right.
That's a good question.
I definitely have seen code like that.
Oh, yeah, me too.
It does generally feel like a smell.
I'm not going to go so far as to say that it's always a bad thing.
But I would definitely look very suspiciously at that code.
Yeah, yeah, I think I think you're right.
If you're like, I mean, Alexis kind of says this in the article, I think that you should
always be progressing towards more structured data.
And you should always be proving more information.
But adding it adjust is sort of reducing the amount of information you have.
So that so that would be no, you're adding chaos.
Yeah, I don't know.
You always want to.
Yeah, unlike the universe, your code should always be decreasing the entropy as it goes
Yeah, I think so.
Maybe in that case, you could look at all the call sites and check, oh, maybe they're
all just like all calls to this function is just so you can simplify it.
Yeah, maybe not.
Maybe not.
Good point.
But in the case where it's just in the function you were calling with wrapping it in adjust,
maybe the just case has a function that could be extracted out of that just clause.
Yeah, can be shared.
So you call it in the just case.
But maybe you can directly call that function and expose that as well.
Yeah, that sounds good.
Sometimes it's a nice pattern.
Like sometimes I see sort of core business logic code.
I think it's sort of an example of this shotgun parsing idea that you are checking for maybe
and doing maybe with default in your core logic.
But that should be the responsibility of the calling code.
So I think another thing is it's probably a smell if you have a lot of helper functions
that deal with maybes when if you think about the responsibilities, right, if you have like
a set of maybe helper functions, then of course, they're going to accept maybes because that's
the responsibility.
But if it's like presenting a username showing a guest username, and you say like, show username,
you know, and it takes a maybe, it's actually probably better to not take a maybe to take
a username, and it presents it however it does it.
And then you do maybe.with default, guest or whatever you're going to do.
But the calling code can take care of that uncertainty.
Otherwise you get into the I see that happen a lot where you're validating the code, where
you're processing it, right?
So this principle is really helpful that like, you should be just confidently asserting that
you have this code.
Don't hesitate and second guess yourself and say, if I have this maybe value, and then
I'll handle this nothing case this way, leave that to the caller.
In a lot of cases, that's better.
So another smell I think that you can pay attention to is and a lot of these things,
I think you want to let when the compiler is telling you something is wrong, when the
compiler seems to be making you do extra work, you can go two different directions, you can
go the direction of the compiler is telling me, I need to handle this case.
So I'll just make it happy and use with default or passing this hard coded value here in this
thing that will never happen, right?
That's telling you a smell.
So it's really about what do you do when the compiler is telling you, you're not handling
That's one thing to pay attention to.
Another thing to pay attention to is when you're writing tests, are you testing things
that you shouldn't even be able to test?
Because if you wrote your types in the right way, where you guaranteed those assumptions
and you made impossible states impossible, you wouldn't even be able to write the test
because it wouldn't compile.
So I'd say another code smell is being able to write compiling tests that should be impossible
Or in other words, have full coverage of function where you cannot have full coverage because
they're impossible states.
Yeah, exactly.
You know, another principle I think about a lot, you know, I mentioned this earlier,
if somebody new to your code base comes in, if they use something wrong, I would say the
onus is on the code.
It's something that should be fixed in the code, not, hey, we need to teach you how to
use this properly.
That's a smell.
Yeah, you shouldn't have to train your teammates after the fact that they've learned Elm.
So the training should be coming from the compiler telling them you can't do this.
And then they say, hey, why can't I do this?
Maybe you have to explain why the compiler isn't allowing it, but the compiler should
disallow it.
So I feel very uncomfortable if I have to keep track of constraints in the way I'm using
As much as possible, and it's a, you know, you're not going to have it on day one on
the first iteration built to, you know, make all the assumptions possible.
And you'd probably be doing something wrong in your process if you were.
These things emerge over time.
But as you discover ways that it's used, bugs that are happening, you should be making impossible
states more impossible as you discover them happening.
That's part of the process of working with an Elm code base.
Oh, yeah.
So this is very simple and in a way, as I said before, quite natural technique for Elm
developers and less so for people come from JavaScript or Ruby.
I think we got it pretty well covered.
I hope this was useful to you, listener.
And if it's painful, you might want to try parsing, not validating.
OK, well Dillon, see you next time.
Been a pleasure as always.
See you next time you're in.