Primitive Obsession

Primitive Obsession

We discuss the Primitive Obsession code smell, and how introducing types with semantic meaning can improve your Elm codebase.

PublishedFebruary 28, 2022

Episode#51

Solving the Boolean Identity Crisis (Elm Conf talk by Jeremy Fairbank)
If It Compiles, It Works episode
Opaque Types episode
Life of a File talk by Evan Czaplicki
Roc language
ianmackenzie/elm-units package
Parse, Don't Validate episode
Using Semantic Types to Squash Bugs - Dillon's talk where he walks through the mechnical steps to safely introduce a wrapper type
Dillon's Entry Gatekeepers blog post

Transcript

What are we talking about today?

Today, we're talking about primitive obsession.

Ah, a code smell.

This is a new type of topic for us, so...

Ah, that is true, yeah.

I mean, we refer to code smells, but we've never done an episode dedicated to a code

smell, so let's do some definitions.

First of all, maybe we should define what a code smell is.

Yeah, we should.

And maybe we can rename the episode to Primitive Obsession Avoidance.

That way, we don't talk about a code smell, we talk about a technique to improve the situation.

Don't tell people what not to do, tell them what to do.

Yes, I like that.

That's kind of not what I do with Elm Review, but...

Well, I guess I do both.

So, yeah, primitive obsession.

I quite like the definition that Jeremy Fairbank gave in his talk, Solving the Bullying Identity

Crisis, which was, primitive obsession is the act of using primitive data types, like

Boolean strings or integers, to represent domain concepts.

I like that too.

Yeah, it fits well.

I love the word domain.

To me, that is one of the most valuable concepts for writing better code, is being more domain

oriented and imbuing our code with domain semantics and keeping that in mind.

Yeah, in the same sense that domain driven design does it, right?

And yeah, just being able to read through our code in a high level way.

In a way, declarative programming lends itself to this as opposed to imperative programming

because it's declaratively stating what it's doing rather than how it's doing it.

So in a way, that lends itself more to expressing a domain.

And in a way, moving away from primitive values to more domain centric types feels more declarative

than imperative.

It feels more what rather than how.

So yeah, what are examples of primitive obsession?

And maybe before we define that, maybe we should just quickly define a code smell, which

people aren't familiar.

It's essentially the concept that you could have something that smells off, but maybe

it's just a nice blue cheese and it smells a little funny, but it's okay.

So a smell is just that's the analogy is that it's something to look into.

It doesn't necessarily mean it's bad, but it's something to think about changing.

I live in France and I just cannot accept the blue cheese reference.

It is just not acceptable.

Blue cheese is smelly and it probably tastes bad.

I haven't tasted it.

I won't taste it.

Well, that sounds like a Dutchman speaking, Dutchman in French clothing.

That sounds like me, maybe.

The Dutch actually like the French cheese.

So sounds like you're having an identity crisis here in a bullion identity crisis.

So we were saying examples of primitive obsession.

So primitive obsession.

I think a classic example would be if you've got a number being passed around that perhaps

represents money, a monetary value.

And I think we can all relate to that's a little bit scary because, for example, you

lose the concept of the semantics of that value.

There are a few things you lose that like you don't know how you don't know where it

You don't know how it can be used.

You don't know what it represents.

You lose all these pieces because it's just a number.

And so does it represent sense?

Does it represent a balance?

Does it represent a difference in balances?

Can it be negative?

Do you even know whether it represents money?

Do you know if it represents money?

Maybe it's an account number or maybe it's a different denomination.

Maybe it's a different currency.

So that feels very scary to me to just have a plain old number being passed through and

then to use that to update someone's bank balance or something.

That feels very scary.

So you could say that way then.

So, yeah, this episode is about scaring people into writing better code.

So hopefully we've succeeded.

So your way of scaring people is to present them with something that smells bad?

Well, it sounded like it scared you.

Get a knife, Dillon.

Yeah, so I think that's to me, that's a classic example of this code smell of primitive obsession.

And so like, let's talk about what that fix might look like.

Yeah, I think there are other problems that we can highlight, even just when you're reading

So as you said, you don't know what something might represent.

And that's even more highlighted when you have multiple of those things.

So if you have a function that takes two integers, usually in Elm, if you read the type of notation

of a function, you know what it does.

But if you have a function that takes an int, then another int and returns something else,

then you don't know what those two ints can be.

If you have a money and then an int, then it gets a little bit clear.

Like, okay, you have maybe this is adding this, maybe this is multiplying an amount of money.

But if you have two ints, then it's hard to tell.

And also, you have the issue that this can be a potential bug because you can easily

swap the two values.

If you thought that money was the first argument, but it's actually the second one, then the

compiler won't help you because the types match, but you will do the wrong thing.

And I actually realized, you know, we did our if it compiles, it works episode and we're

discussing like, what is it that gives us that feeling that when our Elm code compiles,

And I realized that this was a major thing that we omitted in that episode that I realized

this is one of the things that gives me the most confidence, that feeling that if you

gather up the correct data types, you can be very confident that you're piecing them

together in a valid, meaningful way, which you sort of lose if you're just passing around

a bunch of strings, numbers, balls.

Everything you can do Elm programs with only encode.json.encode.values.

And that's called JavaScript.

We need to be we need to do a live stream sometime where we just try building an app

Is it like a masochistic live stream that you want?

We'll do it on April Fools.

How about that?

Not looking forward to it.

Maybe let's go into what you can do to avoid those issues that we mentioned.

So let's let's take the money value.

I think I think people can sort of imagine what what we would suggest to avoid this.

You wrap it in a type and even better, wrap that type in a module, make it an opaque type.

You know, in the case of money, maybe you want to have a phantom type for currency.

We've talked about phantom types.

We've talked about opaque types.

You've heard a lot of a lot of our thoughts on that, although I feel that it is an underappreciated

topic and I think we will we will continue talking about opaque types for a long time to come because

I think they're a very important concept in Elm.

I almost feel like it's a shame that we that opaque types wasn't our first episode.

It should have been our first episode.

Surprisingly, that is the basis of a lot of our thoughts.

Yeah, because that's the only way you can truly wrap something and constrain your understanding

of how you manipulate that type to to a given context.

And that is so powerful.

Yeah. So maybe let's break down your advice.

So first of all, you said wrap it in a new type.

So if you're manipulating money, then maybe for now it's an integer or it's float.

What you can do is you define a new type, a new custom type that you call money or currency or whatever.

So you say type money equals money float or int.

I'm pretty sure that using floats for money is a very bad practice.

So let's count the number of cents as an integer maybe.

So type money equals money int.

So what that does is that that creates a new type that from the point of view of the compiler

will not be the same as an integer.

So you're going to have to wrap it and unwrap it where you're going to create and use it.

But that will avoid a lot of issues like having arguments swept around or using non money integers as money or money as an integer.

Yes. Yes. And an important distinction.

You said use a custom type. You did not say use a type alias.

And that is very important because if we used a type alias, type alias money equals int, that actually doesn't prevent us.

We could use our money type alias in one place that money is used and not in another place.

So it doesn't guarantee that the type represents that in any way.

And so that can be a bit misleading.

It feels like we've got a nice name for our type.

But in fact, it doesn't really give us a clear concept of what it represents.

So often in fact, I would go as far to say that using a type alias for us, a unary primitive, a single value primitive, not a compound one is probably a code smell.

So using a type alias for a record. Great.

Common practice, often a very good idea using a type alias for a tuple.

Maybe that's OK. Maybe, maybe not.

But using a type alias for string, probably a bad idea.

I can't think of an instance where that would be a good idea.

I can see some. So first of all, one is performance.

I'm going to contradict myself in a second. Just OK.

But if you type alias, you don't have to wrap it and unwrap it all the time.

So code wise, you have less to write, but that's also more code to be executed.

The nice thing, and this is part of why I contradict myself, is that the Elm compiler strips those away when you have a custom type with only a single constructor.

And that just wraps one data type. So that's actually not an issue.

But where I would see a type alias being used is when you want to switch from a primitive to a custom type.

Yeah. So let's imagine you're still still working with integers for money.

How do you go about going towards using a custom type, a non primitive type?

What you can do is you can create a type alias for your money and start using that.

So where you before you had int arrow HTML message or something, you now have money arrow HTML message.

And you can add those pretty randomly as you go around.

Maybe in some cases that will actually not be true. So that's a pitfall. So when you have just something that's not money and use it as money.

Exactly. Right.

But the thing is, when you do that, you're doing a tiny step towards having the type used where you need it to.

And then as we like to do, we commit and then we change the type from the type alias to a custom type with wrapping and unwrapping that we need.

And then we get a lot of compiler errors and we just have to resolve those one by one.

And that is pretty much how we can easily migrate from having a primitive for money to a custom type that does not have the same issues.

Yeah. So I'm a bit torn because I am a huge fan of tiny steps, but I am also uncomfortable with the idea of, well, I guess put it this way.

I'm a huge advocate for tiny steps. I'm also an advocate for feedback.

And I feel that while replacing our money type, our int money type with a type alias checks the box for tiny steps, it doesn't check a box for feedback.

And that's what concerns me. So, for example, we could use it in a place that actually doesn't represent money and not get feedback.

We could forget to use it in a place where we're actually passing through something that we changed to use our money type alias and downstream from it, it's not going to complain that we forgot to change another place.

Yes. So we're cut off from feedback.

Yeah. For that one, I don't mind because my idea at least is to use money in some places.

And then when you switch it, you will be notified of more places you're going to need to add it.

Yeah. And the idea is not to use it everywhere and not have any feedback and to then switch.

Right. It's a preparatory step to reduce the work for the next step when you turn it into an actual custom type.

Yeah. And also, like one of the aspects of tiny steps is improvement. So as long as you improve the code base, a tiny step is a good thing.

And I feel like it's already bringing you some value when you have a function that's read as int int HTML message that now reads as money int HTML message.

So that also already brings some value, even though the compiler won't help you. It brings some value.

The thing that concerns me is that it can give a false sense of security that we've improved it and we can use it incorrectly and not know.

So let me give an alternative approach that I like to use.

And I think this is kind of an example of a technique that I see sometimes in sort of, you know, code craft circles of trying to take the craft of coding seriously and refactoring and stuff.

I think it's a I think it's a good technique, which is sort of a semi automated refactoring where you can basically follow a recipe of using the compiler to make a change.

And I love how you say recipe instead of algorithm. Pretty much an algorithm. Yeah, that's right. And yeah. So so here's here's my algorithm.

So the algorithm is you've got your money type, which is you're just using a raw int.

And the step one, let's say you're only using it in one module for now.

But define type money equals money. So define just a wrapper custom type, then wrap early, unwrap late.

So what that means is you want to be passing around your money custom type everywhere you possibly can.

So that means so. So you start with the entry points and that's where you wrap early.

So, you know, you've got a an HTTP request, which is doing a decoder and decoding an int.

Well, you need to wrap that so you can just say decode dot int and then you can do pipe that to decode dot map money.

Now you've wrapped it. And then it's going to say, well, I I thought this was a decoder of this one type that has an int in it.

But now it's this one type that has money in it. So you change that. And now you go through.

You change all the places that say, hey, I thought I was getting an int, but now I'm getting money.

And you just change the type annotations everywhere to say money in all those places.

So that's just wiring through. That's in the middle, not not the early or the late part, but the middle part.

So all those places, you change it to use your new type in the annotations.

And then finally, when you're forced to unwrap it because you are you to value you want to use it.

Now you need a primitive. So you're doing you're sending an HTTP request and you need to encode it or you need to present it in the UI.

Now you need a primitive value because you can't present money directly.

Then you unwrap it. So now you just follow that through and use those simple steps to to mechanically transform.

Now, it's not exactly a tiny step. Ideally, this should be something that could be automated, but it is very mechanical, which is good.

When you say it could be automated, you mean using like an ID or something?

Yeah. Yeah. Do you know about any languages that do that?

Yeah, I mean, languages like Java and, you know, you know, C sharp, they have tools to turn a parameter into a parameter object or all sorts of.

Yeah, all sorts of things like that. OK, interesting.

Yes. So that's very much something that could could one day exist in Elma.

I think it would be very valuable. But see, that's sort of the the idea is that you're you're doing something so mechanically that it effectively is a tiny step.

It's just a very tedious step that should be automated. But you're you're following it in such a boring way that it essentially is.

Yeah, I agree. I feel like the only thing that's annoying is that you could not stop in the middle.

Yes, absolutely. Absolutely. Right. That's right. Because basically what what we've done in that process is we've done an atomic step conceptually,

but we've done it in a tedious manual, non atomic way, which means we can't break it up in the middle.

But if it were automated, it would in fact be an actual atomic step.

Right. Just like I can't think of a better way to do it in a non automated way that doesn't give any false feedback if we're using it incorrect.

Well, if you did it with the type areas, as I said, what you can potentially do is you use the type areas.

You start using it in some places, not too many, just one or two where you know where you define them or where you use them.

You commit that and then you change it to a custom type. And then, as you said, you're going to have to change it in all the places.

But if at some point you need to stop because, hey, it's six o clock or I don't know what time you stop,

then you can commit all the changes that you made except the one about the custom type and the wrapping.

So all the changes type and the wrapping and unwrapping, you don't commit those.

You really commit the rest and you can go home and you can continue tomorrow.

Right. And you're right that if you do that and you mess something up, you something is a quantity,

not money semantically, and you accidentally wrap that in the money type alias, you replace the money type alias for the int in the annotation there.

When you do the next step after you commit that step, as you're saying, then the compiler will tell you and you can sort it out there.

I just am wary of any false sense of progress where I'm cut off from feedback and that that scares me.

So I think that these are both valid approaches, but I think we've presented the tradeoffs pretty pretty well.

So I think people should experiment and let us know how it goes.

I would definitely go with your approach if it's like a small doable change.

If it's something that is used everywhere, I would go with mine.

Interesting. Yeah, that's a good way to look at it.

It's basically the same idea.

Yes. So then the next step, which would be like, let's say you've either done the approach you described or the approach I described.

But then, you know, after using the type alias with your approach, the next step would be to change that to a custom type, make sure that everything lines up, and then it converges with the approach I described.

Yeah, basically. Yeah. So then let's say we get to that point.

The next step from there is instead of just using a custom type that's just sitting around, now you extract that to a module.

OK. In preparation for making it an opaque type.

Yeah. So I would like to ask you, why would you do that? What would be the benefits?

Yes. Yes. Very good question.

So here's how I think of it.

When we extract it to that module, several things happen.

One is, well, OK, it's in a different module.

We sort of get that, you know, Evan gave his talk, The Life of a File, and he sort of says, you know, maybe that's not that important.

I think I agree with that, that that's actually not the important part, moving it to a separate file.

Here's what I think is the important part. And in Life of a File, Evan sort of describes, how do you extract modules?

Well, I think it has something to do with being some logic that's centered around a type.

And that's exactly what we're talking about here.

So you extract a type and some ways to operate on a type.

And what does that give you? Well, that gives you the ability to have an opaque type, an opaque type meaning see our opaque type episode.

It means see it. It means that you have a custom type for which you do not expose that constructor to the outside world outside of that module.

That's all it is. So what happens when you do that?

Well, now you can have a now you can actually have some semantics around this type.

So let's say that you have money and what can you do with money?

Can you multiply money? Can I take one money and multiply it by another money?

That doesn't make sense. That would be bad. You know, like squared euros.

Exactly. It's much more than plain euros. So this is why we should feel very afraid and concerned if we see a raw number value floating around, no pun intended, in our code and not being wrapped in some sort of type.

It's very concerning unless it's in the context of a money module which is defining that logic, which is defining how to add two sums of money together.

Then great. That's that's where it lives. So we can. So it allows us to define the semantics for that type.

What that means is if it's just an int wandering around our code, we don't get to control the semantics in one central place.

And what that means is if we want to understand, is this money being used correctly? How am I getting negative money?

I don't expect to ever have negative money because I expect that I can only ever have a positive sum of money and add it to another positive sum of money in my domain.

That might be different in different domains. But you are too well off, Dillon.

Not making the concept of negative money in a bank. You're too well off.

Right. So in your domain, you don't want to have negative money and you want to say, can I be sure that I can never have negative money? How do you know?

Well, you have to look at every single place that an integer that represents money is ever used in your entire code base now and in the future.

And that's not very fun. Now, imagine you have an opaque type in your money module. Now, can I have negative money?

Well, let's see. What are the operations I can perform on money? I can decode money from the server.

I can create money from, you know, whatever. These are the ways that I can get a money type.

So can it be negative there? You have to you have to figure that out. But you know exactly where to look to ask that question.

Can I take money type that was positive and make it into a negative one? You know exactly where to look to answer that question.

And that's it. That's and then how do I turn it into this thing that I'm serializing on the server or whatever?

So, you know where to look for potential issues. So you've kind of wrapped it in this.

I sometimes think of it as a semantic type, and I think this is a very powerful technique.

I think it's probably underused. So, yeah, create semantic type wrappers. Use an opaque type.

Now, again, I think I really like thinking about thinking about this in terms of, you know, like what what can an opaque type help us do?

Well, it can help us understand like the origin of a type. Where can it come from? How can I get this thing?

If it's an int, where can an int come from? Well, it's just an int. There's nothing special about it.

But where can my opaque type come from? Well, that's another question.

And a much simpler one. It's a much simpler question to answer.

So if you define a decoder that gives you a money type, then it can come from a decoder.

If you define an HTTP request that knows how to get money and get that type. So, you know, you get you get the idea.

Like you can control the origin. You can put a stamp on it. As we've talked about in our opaque types episode, you can.

You can conditionally return a type which allows you to perform a validation and represent a stamp of approval that this is a valid username or this is a positive money value or whatever, whatever it is.

And you don't return that type unless you've validated it. So there are just all these things that when you have a primitive type, you have to look.

It's almost this imperative thing where you have to get everything in your head to understand what it's doing rather than giving something semantics and being able to look in one place and then trust those semantics everywhere else.

So I'm going to try to summarize what every step gives you. So if you have a primitive, then you have no guarantees of any kind and no knowledge of how things were created or how things are used.

So let's imagine you create a type alias money. Then what you have is now a name. So you have a semantic name for this type. So it's easier to understand it and to identify it.

When you switch to a custom type or when you use a custom type from the get go, as some prefer, then you get a lot more guarantees from the compiler because you can't mix them with regular primitives.

And you still have this semantic name. Now you really have a semantic type. And when you switch it to an opaque type, then what you can do is control how it's created, what you can do with it.

And you can also enforce some guarantees, some constraints, some invariants.

Money not being negative, for instance. So yeah, each one of those steps adds more benefits, just a regular primitive. And I think each one is valuable.

I don't know if I would always go to an opaque type, but I also don't see why I would not. So yeah, why would you not use an opaque type?

The only reason I would not use an opaque type in general would be because I want consumers to be able to pattern match on the raw data or variants.

That's fair. Yeah.

So whenever you create an opaque custom type, if you want people to pattern match on them or to use them based on how it looks and what the value is, then either you expose the constructors, meaning it's not opaque, or you create some API which replicates a case of.

Which is usually not great.

Or maybe you create an intermediate type.

That's right. Which you do expose and it decouples you from the internal, actual opaque type that you use.

Yeah, but then which confuses people because, oh, I thought this was money from this module, but it's money from this module.

Right, right. Yeah.

Yeah, but in general, the way I think about like whether to use an opaque type or not, it's, do I want to enforce constraints about how this thing is used everywhere in my code base or in this one module?

And almost always the answer for me is in this one module.

That's where I want to think about how this can be used and what operations are valid for it.

That's, for me, that is the most powerful technique for making Elm code more maintainable.

Besides like impossible states and things like that.

I may go on a tangent here, but what do you think about, like in Elm if you have an opaque type, you cannot pattern match on it, and you cannot create it.

What would you think if we had a type where we cannot create it, but we can pattern match on it?

I'm not sure. I would have to think about that.

Because I think it would be valuable in a few cases.

Like we definitely encountered a few of those.

But also like now, if you change one of the types, then that's a breaking change.

If you had a new type, a new variance, or we name a variance.

Right. And we already have the tools to effectively do the same thing just by having a different type for the internal representation.

And the sort of presentation.

So we can define sort of a presentational custom type that you expose all the variants so you can pattern match on it.

And you can create it if you want. It's just a presentational type.

And then the actual thing that enforces the invariants about it and all of these good things we were talking about.

You own that type. You can make breaking changes to that internal type because you own it.

Whether this is publishing a package or the consumers of this module are all the employees at an Elm shop is the same idea.

You want to shield the consumers from breaking changes. That's a good practice.

So I think we sort of have the tools for that.

Yeah. It sounded good for a few seconds in my head.

It's an interesting idea.

Something I've been thinking about lately is, you know, Richard Feldman has given some talks presenting Rock recently.

Definitely worth checking some of those talks out. It's very interesting.

And he has this concept in Rock, this pure functional programming language, which is in development now, that you can have a, what is it?

A tagged union that's an open tagged union or closed tagged union.

But you can have anonymous tags. So you can just use a tag and it can infer that it can infer tags of a union and which tags can be possible based on the data that you return or the case expression that you do.

If you return a maybe, but you only return the just case, then it's going to notice only this is a tagged just.

Exactly. And it doesn't know about the nothing variant because you didn't use it. And that's quite interesting.

And I mean, it's essentially like in Elm, anytime you create a custom type and a custom type variant, they're nominal.

They're nominal types and values, meaning if it's not that exact thing, if you if you define a type money in one in a money module and you find a type money in main type money equals money in in your money module and in your main module.

Those are two different things. The type itself and the constructor, the variant money are different.

They create values of different types and they're not interchangeable. But Rock has this concept of a sort of, you know, structurally typed union tag essentially, which is, you know, so it's if you refer to a thing with the same name, then I can.

If all the types line up, then I can treat them as the same thing. And you can sort of without defining a type, you can use those union tags.

And it's it's very interesting. So I do wonder, like, does that give us the potential to to have a more lightweight way to define types in some instances?

Because sometimes I do think like I don't reach for a custom type and I sort of avoid it and I try to use like a maybe or a tuple or something like that when I'm just like, come on, like it's not that big a deal to just say type.

You know, like I was just doing something the other day where there were like two different things that a thing could return.

And I was like trying really hard to avoid doing that. And I'm just like, you know what, it's not that big a deal. Like I just define a custom type thing one and thing two and it can be those things.

And like that wasn't so hard, was it? You know? Yeah. In the feature you described from the Rock program language, I feel like in some cases that can lead to new problems.

Like, for instance, if you have an ID type, an ID string which represents like a non primitive string ID, then the ID from one module, you don't want to mix it with the idea of another module.

Absolutely. But actually Rock has a specific syntax for doing opaque types.

I think it's like an at symbol in front of the tag and that allows you to constrain in that way. But yeah, it's quite interesting. Now, there is one thing I think we should maybe talk about some of the trade offs for primitive obsession and when might it not be the right approach.

So like one of the things I think about is the standard libraries in Elm. So there are all these operations that are defined for type. And if the semantics match that, then it's quite nice. For example, you know, I mean, lists are quite nice.

There are useful things you can do with lists and results are quite nice. You can do meaningful things with results and with maybe should you have, you know, you can you can abuse maybe and use it to mean different semantics.

And that can be problematic, but sometimes it can be okay. So there are trade offs.

You do get a lot of things for free with maybe right, you don't have to rewrite things to shortcut the nothing case.

Exactly, exactly. You've got maybe that with default, you've got result that from maybe all these all these nice helpers.

So, yeah, I mean, definitely be aware of those trade offs.

I mean, that said, under the hoods, your type can still be a maybe something. So you can have maybe.

So you can have a money containing maybe something and then use maybe map under the hood.

Right. But you just need to add wrapping and unwrapping.

Right. Right. And you need to you need to then sometimes you have to sort of redefine those standard library operations or provide a way to get something as a maybe or that sort of thing.

So those are some trade offs to be aware of.

I have a limitation in mind, which is comparable.

So, for instance, dictionaries and sets, they only can contain comparable keys or keys, slash values for sets and custom types are not comparable.

So what do you do then, Dillon?

That's fair. After you've cried for like two minutes and 27 seconds.

What do you do? Sounds like you've cried for that exact amount of time before.

It's very specific. No, I'm a man. I only cry for two minutes and 12 seconds.

It's very brave of you. I know, right? It's a tough question.

I mean, it would it would be quite nice if it was comparable.

I mean, obviously there are packages to help use non comparable keys.

And if you wanted to, you could wrap a dictionary of that type.

Right. So you could you could sort of wrap constructors and say, OK, I'm passing in this thing that's a custom type, but I'm going to create like the a wrapper around dict dot insert.

That's actually going to take that custom type and destructure the pieces that I can use to be comparable and turn it into that.

So that's one possibility. I don't know if there's an easy answer there.

Because it's tough making this trade off of when do you wrap something and sort of expose these operations that exist for other low level data structures versus when do you just go for the low level data structure and just have string keys and a dict and say, you know what?

I'm just going to be sure that I only insert the string keys using this to string function for this custom type.

But you don't trust yourself. You only trust a compiler.

I think I think for good reason. Yeah, you're right. I don't trust myself.

Because you're a bad developer. Yeah, I think the sooner we admit that the better off we are.

Just admit you're a bad developer and let the compiler take care of things.

That's actually pretty good advice. Just assume you're going to mess up.

Act accordingly, just like with tiny steps. Like, assume you're going to mess up the next step and commit now.

Exactly. Socrates said that he knows more than anyone else in the world because he realizes that he knows nothing.

And that's the only thing that can truly be known. And therefore, he knows more than anyone else.

He knows one thing that he can know nothing. That's the only thing that can truly be known.

But what if someone else knows the same thing?

Well, then they would know as much as him. But no one knows more than him because no one can.

Is it like comparing infinity? Well, one guy knows infinity wisdom and another guy knows infinity wisdom.

So you compare those and that's always true, always less and always more.

I think it's not comparable, Jeroen. That's the problem.

Oh, so it's knowledge is opaque. OK, we got it. That's what I should have written in my philosophy courses.

Philosophy, knowledge is opaque. If I want to get knowledge from you, I need to use your API to transfer the knowledge to me.

That's true. That's basically what Elm Radio is. We're an API for people's minds.

Yeah. In this format, knowledge is actually a bit unwrapped.

Yes, we're exposing the interface for it.

Yeah, we're going to wrap it again at the end of the episode.

Exactly. Wrap early, unwrap late. So I was thinking about something the other day.

I was writing something and I was like, maybe I should be using a custom type here and a wrapper type.

I wonder what you think. The thing I was encountering was I was inserting things into a list.

And as you do with a list, you usually prepend to it because it's inefficient to add something to the end of the list.

So you go over this list and you're adding things.

And I realized there's actually a semantic concept behind this. This is not a list. It is a reverse list.

And I was thinking maybe I should just create a little wrapper type and have reverse list.

So you would have type reverse list equals reverse list parens list a.

And you'd need your a type variable in the definition.

So it's just a wrapper for list of something, but it would be an opaque type.

And the important thing would be what can you do with a reverse list?

Well, you can prepend to it and you can get it. You can turn it into a normal list.

What is turning it into a normal list look like? You do the reverse list, a reverse plus wrapping, unwrapping or unwrapping it.

Exactly. Unwrapping it and reverse it. But now here's here's the thing that you've given a name to these things.

So when you're prepending, you're doing like reverse list dot prepend.

And so, OK, you understand I'm prepending to this thing. And then when you do to list, you just use the list.

And so if you take that reverse list, you can't accidentally pass it somewhere and say, wait a minute, I lost track.

What was the context? Was this reversed at this point or not?

And I think that's one of the powerful things about these sort of semantic types, about wrapping or these primitive types with semantics,

is that it gives you this definition of what is the semantic context I'm in, which you can lose if you're passing a primitive because you're passing a list of something.

Well, was it reversed already or not? What stage does this represent?

I'm almost wondering, like, is reverse list a good name for this? Because I agree with the idea.

But I'm like, you're building something. So maybe you have something like what you were building, but let's imagine this was a list of books or something and they have to be ordered in some way.

Right. If you're like searching for books that match a certain criteria.

So I'm like, maybe you could call it reverse list, but you could also call it list of books being built.

Book search results.

Yeah, or temporary or construction in progress with the icon, you know.

But yeah, I agree with the idea. Would you want to use this? Would you want to create a custom type for this?

It depends on how much you would have to transfer this over and over again.

Like if it's an implementation detail only constrained to a single function, just call, just use list. And I'm guessing you could probably in some cases use list on map or something or list of fold.

And in which case it may not make any sense.

But if it still makes sense and you pass it all around all over the place, then yeah, maybe you can create a custom type, but then maybe give it a better name that more accurately represents the domain.

Right. Yes, I think you're right. And I think that's why I was a little hesitant was because it's, I mean, sometimes you're going to have domain concepts that are just kind of around a particular data structure and that can be okay.

But I think you're right. Sometimes that could be a sign that maybe you need to find something to express it in more specific terms for your domain.

But I think this concept is so powerful that, again, it's like you've got context on this thing.

You know, it's like, you know how you can lose context.

And if you just do something when you have context, it's so much easier because the context is right there. You don't have to context shift. You don't have cache misses in your brain.

And it's the same thing. We have this semantic idea.

So, you know, when we're getting a money value from the server, the current balance from the server, we have context of what that means because we're saying, hey, please get the current balance from the server.

We have this concept of what the thing we're trying to get represents at that point in time.

And the farther away we get from that point of origin, the more we lose context.

And so we have to hold that in our head imperatively to follow. Where did this come from? What were the steps again?

How did this thing get from here to here? And wrapping it in a semantic type just preserves that context so our brain doesn't have to do it.

Yeah. And also it will be more refactor proof.

Right. Exactly. Because we can not only preserve the semantic meaning, but define what operations can be used.

Where can it come from? Where can it go to? How can it be transformed? What operations can be used on it? What invariants does it have?

You can avoid having someone mess it up a few years down the line, for instance, or someone who never had the knowledge in their head forgets about some invariant that they should have kept.

Now, something I've been thinking about lately, I think that sometimes, often in front end frameworks, we'll deal with getting data from servers.

And often that looks like these formats like JSON. And JSON is very low level. JSON is great format. Don't get me wrong.

But it really it's just all primitives, right? By definition.

Well, no, because objects and arrays are compound types.

Would those not be considered primitives? I don't know.

Well, they contain other, in a way they're primitive, but they contain other kinds of primitives, including themselves.

Yeah, I'm not sure if that would technically be considered a primitive or not.

I guess they're primitive in a way, but they don't have any domain.

Exactly. They are devoid of domain semantics inherently. And because it's just a transfer format.

Yeah. And I mean, even if you have like a list of books at one point in your code base, it can be considered a high level domain term.

But another aspect, another part of code base, it could be considered primitive, potentially.

Because you have a list of books for this, you have a list of books for that. And maybe you want different types for each of those.

Whether something is considered primitive totally depends on your domain, I think.

Yeah, right. And so like what often happens is we've got, you know, maybe you have like a Ruby on Rails server that you're using for your JSON API.

Then you get that data from Elm in your Ruby on Rails code base. You might have some nice classes that are abstracting away a few things and how to access a particular part of the database.

And you've got these, you know, all these constraints that are sort of represented through these abstractions you have in your Ruby code base.

But then you go and you get all this data. And then Ruby has to say, well, here is some JSON.

It has to send the lowest common denominator. And now all those semantics in order to be serialized and deserialized, you lose all that semantic information.

You lose all those domain concepts and you have to rebuild them somehow.

You should have used them there.

Well, actually, it's very true. I've been thinking about that.

How powerful that is to actually have the exact same context and semantics and that glue code for serializing and deserializing. It's not just like tedious.

It is lossy. You lose semantic information. You lose information about the constraints.

But in Lambda, you don't you know what can if you have an opaque type, you have all these nice constraints we talked about in Lambda that that applies to the data that's being persisted and all of this.

Well, you have full stack invariants, which is pretty, pretty powerful stuff.

So, yeah, that avoids a lot of danger. Yes.

Problems that can happen at encoding and decoding. Yes.

Yeah. And, you know, so I mean, how do you fix this problem?

Sure. You can you can use Lambda. And if you're not using Lambda, just be aware that you're losing semantic information.

And even if you're using something like GraphQL, that gives you more safety for accessing those low level values.

And it gives you a few semantic values like GraphQL has these these scalar types, which can be quite nice for saying this represents this represents time.

And this is how you decode that. This represents a user ID.

That's great. But still, you are crossing between paradigms and and any invariants you had in your Ruby on Rails backend are not automatically there just because you use these GraphQL codecs and that sort of thing.

You have to be very deliberate about the contract that you're providing and think about that.

And any time you serialize and deserialize, you have to be careful that you're preserving the contract and the semantics and the domain concepts between those two paradigms.

I was talking to a few of my colleagues about primitive obsession today real quick.

And we're like, yeah, that's a nice technique. It's it's super helpful for plenty of things.

And then we were like, yeah, but we don't use it that much, actually.

Mm hmm. Like I said, I think that I think opaque types are one of the most underused techniques.

Well, no big types. Yes, but not just like new types.

Mm hmm. Right. Yes. So what I mean is we still have plenty of instances where we just use strings or integers or right.

Booleans or whatever. Not that much Booleans, I guess. We still use a lot of primitives. And I mean, things still work pretty well.

We don't have any issues about switching values, switching argument orders, something like that.

So I'm pretty happy with it, but it's still nice to have that additional guarantee that you didn't mess things up.

And a nice overview of what a function does or works with.

But yeah, I'm like, should we use this more? How would we go about it?

Would we do a single pass of the code base to do to make everything a new type or should we do it incrementally?

Probably more incrementally, I'm guessing. But yeah. Do you have any advice on that?

I mean, yeah, I think as with any of this stuff, it takes judgment and creativity to apply it.

But I would say one thing that I think about is like I like this. I like this idea.

For example, people talk about self describing code, right? And you've got like, do I write a like if you have a comment that says, OK, here's what this function is doing.

It's and only use it in this way. And well, OK, maybe you can turn that into the function name.

Make sure that this integer argument is never zero. Yes. Right. Right.

So any time you you find yourself wanting to write a comment or have caveats or explain to somebody in a pull request how it should be used or things like that,

I think that's one cue because the thing is, comments can lie. Function names are more likely to to evolve over time.

Now, that does require a habit of making small refactorings as you have that context.

So like I've heard it described as you have this context about understanding the code and refactoring in tiny steps is just about taking that context and putting it back into the code.

So you don't have to keep it in your head, especially if you're working with a team that's especially valuable. Right. But even for your future self.

So I think in the same way, when you have some concept that something has a certain semantic meaning rapid in a semantic type.

Now, we do have to keep in mind how that can affect composability and not being able to use things with standard APIs.

So things like maybe and result types can be quite useful in that regard. But but yeah, that's that's that's one thing I think about is how can we kind of put that context back into the code?

I'm just trying to pick your brain here. But what would you think of having something like a a new type for strings that would be displayed to the user?

So one instance where I would probably not use a new type is when I would have strings that I want to display to the user.

So like the number of their accounts, the user ID, the user ID, something. Now, not the user ID.

That's a very bad example. But just some text like a button text.

Do you think it would make sense to wrap that into a new type showable text or something?

Displayable text. Interesting. Is that going too far? Like where do you stop? Right.

Exactly. Yeah. This is this is a great example of kind of what we were talking about, too, with like the reversible or reverse list and stuff.

Is it like, sure, there's some semantic concept there, but is it the right way to to draw those lines?

And does it get you in trouble? Because because you can't necessarily easily compose these things together.

And drawing the line one place means that you can't draw the line another another way that conflicts with that.

And that's that's the thing. So my gut feeling with that example is that that's drawing the lines in a way that that prevents you from drawing them another way that's more specific to your domain.

So, I mean, first and foremost, with all of this refactoring stuff, the the the golden rule is address pain points.

So like with with this reverse list thing, the reason it came up for me was because I found a bug and the bug was in places.

And I wasn't sure when I was fixing the bug. I was like, do I fix it? This do I reverse it here or here and here?

And how do I keep track of that? And how do I know that some other thing isn't going to make that same bug in the future?

So I think that's first and foremost, like address pain points, address concrete pain points.

That's like the best refactoring advice I think you can give.

Yeah. And also, if you try it out and then you notice new pain points, then maybe back off.

All the more reason why tiny refactoring steps are valuable because you can try something out, see how it feels.

But yeah, my gut feeling is it might box you into a corner where you're creating these things that are not core domain concepts.

And it makes it hard to express things that you might want to as core domain concepts.

But I'm not I'm not quite sure. Yeah. Because I feel like in some cases you you create some some text like you extract.

You want to show the user ID to extract a string from it. Yeah. And then maybe you do some manipulation with it.

And that was not the intent. Like the intent was to produce a string that you can display and then someone removes like the first three characters or something and and does something with it.

And that was just not the point. And I feel like that happens more often than I would like to.

Again, it's not a problem, but also right. It's like blue cheese. Exactly. Exactly. The Dutch love it.

Is that what you're trying to say? Yeah, I guess we do. Yeah, I think that because abstractions have a cost and so we should.

That's why it's really important to be addressing pain points when we introduce abstractions, because if we don't, then we're incurring a cost with something that might not give us a benefit to outweigh that cost.

And also not just that, but not only are we incurring a cost, but we might be making another abstraction harder to to do in tandem with that because sometimes the lines just don't match up.

You draw the lines one way and now you can't draw them another way. So. So, yeah, I think, you know, it's important not to do premature abstraction for this type of thing.

Address concrete pain points as much as possible with these types of things. And you do get a sense of like, I'm making a money type.

I'm making a user ID type like these are probably safe things to just go ahead and create an opaque. Yeah, right away.

You know, I do feel like the showable text is interesting. Yeah. I also feel like very unsure about it. I know that there's some pain points that it could solve.

Like, yeah, I don't feel good about it either. Right. Yeah. Yeah.

Having like a leaky abstraction or an awkward abstraction can really make it very confusing where you're like, why can't I present this thing on the screen?

Oh, you have to make it showable text like, oh, OK, well, how do I get showable text?

We could do the end. It's like, hmm, it can lend itself to like making code that's scary to touch because you forget what the abstractions mean.

So I think that's another thing to keep in mind is like the abstractions should just really be meaningful.

And and and it's not something you figure out once and that's it.

It's something that you have to like massage over time to make sure it's meaningful and continues to be meaningful and keeps up with your evolving domain and your evolving understanding of the domain.

So so, yeah, I think really like make sure that it's a meaningful concept in your domain.

So one thing I sometimes think about is like can just giving things a name in a record serve this purpose of giving it context?

Now, obviously, you know, you could have a record money that has cents, USD cents in it.

And that's that's not good enough. I wouldn't feel good enough about that code to, you know, that abstraction for that.

But I don't know. Sometimes if it's like this is a username, this is the first name, the last name, the maybe it's good enough to give it a name instead of just passing string, string, string, string, string, username, first name.

You know, if it's a record, sometimes that's good enough. And then, you know, maybe you you preserve those pieces until the last minute.

And then when you're presenting it, you say user dot first plus plus space plus plus user dot last. And you're like, yeah, I'm I'm pretty sure I'm using this correctly.

Yeah. Whenever I see a function that has two arguments with the same type that are next to each other, like in indoor string string, then I tend to create a record.

I could indeed create a new type for this so that we can make we can't mess it up. But I guess in some cases, like they are the same type.

And that case, like you have two kinds of monies, like you have USD cents and you have euro cents or you have amount in bank and amount in in wallets.

And it's hard to like you probably wouldn't create another type around it. Like you could view money as a primitive at this point.

Again, I feel like just creating records is simpler. And in some cases, it's going to be more descriptive because, OK, I have a money argument.

But what kind of money does it represent? That, again, is lost. That is information that you don't have.

So a record is a pretty good solution for this, I think. For the representation within the opaque type to have a record?

No, for function arguments, for instance. So you have a function that displays the money that you have in your wallet and the money you have in your bank account.

I see. Then you have a record with money in wallet is money and money in bank account is money.

Like a presentational record. Or whatever else you need to do with it. But at least it's better than having money, money.

Right, right. Money, money, money. Must be funny.

Yeah, I think sometimes that can be sufficient. And it's a good tool to have in your toolkit.

And again, I think that concept of unwrap late applies there. Because it preserves the semantics of those field names.

Those field names do give some semantic information and that's useful context.

Sometimes I think that tuples can be overused. There are certain things that tuples are quite nice for.

But I think having tuples and having unnamed custom type arguments.

Oh yeah, yeah. Again, a place where you would use records probably.

Yeah, exactly. Custom types work very nicely with records.

I mean, tuples are useful to easily group things up to a certain number of them.

Right. But then when that starts having semantic meaning, now that's concerning.

When it's like, you know, I don't know, you have a result, result.map to tuple.pair.

You're like saying, hey, I've got two results. I want to map them together.

And if things are OK on both of these two things and one of the things is a user and one of the things is their balance.

OK, great. Like, you know, you've got these clear concepts of what they are.

But if you're getting a record and the positions are the things that give you information and context of what it means,

anytime you're getting like context from remembering what things in different positions mean, that's a little bit scary.

Yeah. That you might lose context and mix things up. So you just give things names, you know.

Sometimes, sometimes like one of the most important things that we do with refactoring is giving things names.

You know, like. But naming is hard Dillon. Naming is hard. Naming is absolutely hard.

Yeah. I mean, sometimes just, you know, extract a little function or a constant and give something a name.

And that's very powerful. Sometimes turn some, you know, a tuple into a record and give those values name.

And that it seems like it's not a meaningful difference, but it is because we're humans and we read the code and we infer context from it.

So it matters. I think we should have a honorable mention to a package that contains a lot of non primitive values, which is arm units from Ian McKenzie.

Yeah, absolutely. You're not going to use them in all applications, but it contains a lot of new primitives in a way that use arm core primitive underhood

for like physics and space and distances and a lot of mathematical concepts like angles, energy, pixels, mass, volume, just getting a few of those.

So if you are dealing with any of those kinds of units, then it's probably good to take a look at this and then you don't have to do it all yourself.

Right. And as to that composability concern, it helps with that because for one thing, you can sort of share these common types that have this semantic meaning with these units.

It gives you operations to deal with them in ways that are meaningful for these sort of concepts of these types of units. And yeah, that's a great example of that.

So Jeremy Fairbank, he did a whole talk and a few blog posts series about solving the Boolean identity crisis. We haven't talked about that at all, almost.

No, we didn't. But it's a very well done talk and very much worth a watch. We'll link to that.

Definitely. I do feel like we have addressed some of his concerns in talks like in our episodes like Parsons of Validate and opaque types.

So I think with those two and this episode, we covered most of it. So go give those episodes a listen again if you forgot about those and his talk as well.

Yeah, we didn't really. He does talk about the concept of parse don't validate a lot. He doesn't refer to it by those terms, but he talks about sort of the idea of like checking for a thing and then getting the thing.

But you lose the context that you actually had the thing, which is parse don't validate. That's the term we hear most often these days.

Yeah, that term is from at least one point by Lexi Lander was from 2019 and the talk is from 2017.

There you go. We can't blame him for the term in advance.

Yes, and that's a great point. Also, I think there's something to just get having the context of what a thing means.

And he talks about replacing these sort of true and false with more meaningful terms with custom types.

And I think that's a great practice that to be honest, I think I underuse that. I think too often I'll say, you know, is production true or false instead of environment, prod or dev.

And I think I could definitely make use of that much more often.

I still feel like I'm using is required equal true and I'm like, should that be a new type? I don't know.

Yeah, I think I don't reach for that as readily as I ought to.

I think it's worth trying out to put it like almost everywhere and then evaluate.

Yeah, just like I love the techniques. Like what I often say is in Elm, you can try it a few techniques and then it's easy to refactor to remove those as long as you reevaluate it earlier.

Yes, I think I think we earn our own trust when we do small refactoring steps and commit them often.

We lose trust in our likelihood that we'll ever do a refactoring when we only do big batch refactoring.

We have a giant refactoring branch that we don't merge for a month.

We stop trusting ourselves and with good reason when that's the only time refactoring happens.

So we we that means we aren't going to experiment as much because we know I'm probably never going to refactor this.

So, you know, give yourself the luxury of doing small refactoring frequently and and shipping them doing small, safe refactoring steps because that means you and your team are going to build up more trust that you actually will get to a refactoring.

And that allows you to experiment more.

Also, I gave a talk, I actually haven't really publicized this much or like shared the links, but I'll drop a link in the show notes that I gave a talk called using semantic types to squash bugs.

And I talk about this concept of wrap early, unwrap late.

I walk through these sort of refactoring steps of how to how to apply that mechanical refactoring we talked about and how to use it to fix a bug.

And yeah, so I think if those concepts are interesting, check that out.

Is it the one where you talk about social security ideas?

It is. Yes, exactly. And I'll drop a link to a blog post I have to about using sort of entry gatekeepers to to check those constraints with opaque types.

Well, I think we've I think we've covered primitive obsession pretty well.

I hope you like this episode on the tiny steps. I'm sorry. I mean, opaque types. I'm sorry. I mean, primitive reception.

We are a broken record type alias.

Let's go to the outro, please.

All right, Yaron, until next time. Until next time.