elm radio
Tune in to the tools and techniques in the Elm ecosystem.
The Root Cause of False Positives
We explore false positives and negatives in static analysis tools, and how Elm helps us avoid them.
Published
August 15, 2022
Episode
#63
Jeroen's tweet on the root cause of false positives
Epistemology
Isabelle
and
TLA+
(proof systems)
Precautionary principle
array-callback-return
ESLint rule
elm-review's
ModuleNameLookupTable
Jeroen's
Safe unsafe operations in Elm
blog post
Jeroen's Lambda Days 2022 talk (video not yet published)
Transcript
[00:00:00]
Hello Jeroen. Hello Dillon. I'm quite positive you're going to enjoy this episode. But I
[00:00:08]
could be wrong. Maybe it's a false positive. Maybe it's a false positive. That's a good
[00:00:15]
one. What are we talking about today? Today let's talk about false positives. Let's talk
[00:00:21]
about how Elm removes kinds of false positives in at least the area that I care about, static
[00:00:28]
analysis. But I think we can find ways how that applies to other things like optimizations,
[00:00:35]
stuff like that. Oh, okay. I like this. So let's start with the definition. What is a
[00:00:40]
false positive? This one is always kind of tricky. Like which one is the positive and
[00:00:44]
which one is the negative? Yeah, because you got false positives, you got false negatives.
[00:00:49]
So yeah, a false positive is when, at least for a linter or for a tool like Elm review
[00:00:55]
for a competitor, is when the tool reports a problem when it should not. Like it tells
[00:01:03]
you, hey, there's a problem here and actually there's no problem. That is a false positive.
[00:01:07]
So positives means that we think there's something and there isn't. And then you've got false
[00:01:14]
negatives, which are the tool should report a problem, but it doesn't. And then you've
[00:01:20]
got true positives and true negatives, which are like real errors and real non errors,
[00:01:26]
which yeah, maybe those names are weird. I actually don't know if there are names for
[00:01:31]
those probably. Yeah, it seems reasonable. If you take a COVID test and it comes back
[00:01:37]
positive, it could be a false positive, which means the test said you had COVID, but you
[00:01:42]
don't have COVID. It could be a false negative, which means the test said you do not have
[00:01:47]
COVID, but you do have COVID, which in fact is quite common, which kind of makes you think
[00:01:52]
like how much value is there to testing when there's actually like, from what I understand,
[00:01:56]
a fairly high rate of both of those. And it just like makes you like question the whole,
[00:02:05]
how can I make decisions based on this information, which I know is flawed quite often. So it's
[00:02:12]
a strange situation. The same dynamic applies in static analysis. If you can't rely on the
[00:02:18]
results you're getting, it makes you sort of lose faith in what the tool is telling
[00:02:23]
you. Yeah. Well, there's a difference between lenders and medicine. There is. Although sometimes,
[00:02:32]
you know, you rely on the guarantees you make in your code for medical procedures and when
[00:02:37]
people's health is on the line. So there can be an overlap. Yeah, absolutely. So yeah,
[00:02:42]
what is in your opinion, the root cause of false positives? Well, I did see a tweet this
[00:02:49]
morning that I was thinking maybe should have been marked with spoiler warning, at least
[00:02:54]
for Elm Radio co hosts. I regret posting it. Let me try to forget I ever saw that and see
[00:03:04]
if I can answer the question without prior knowledge. Please forget the correct answer.
[00:03:11]
Which was a very good tweet, by the way. We will link to that tweet. I will explain it,
[00:03:16]
I guess. It's hard not to be biased by the nice tweet you wrote, but I would tend to
[00:03:23]
think of it as like maybe complexity. Like how complex is an answer. Like if you're looking
[00:03:33]
at a chess position and you want to say, is there a forced checkmate in this chess position?
[00:03:40]
It's easier to, you know, if you can confirm that there is, you know that there is. But
[00:03:46]
if you can't, maybe there is one, but it's so complex. There's just so much complexity
[00:03:52]
to the position that there are nearly infinite possibilities. So it's very hard for you to
[00:03:58]
say. Like if you look at an opening chess position, is there like a forced checkmate?
[00:04:05]
Like maybe, I don't know, if I had infinite capacity to process infinite lines, then maybe
[00:04:13]
I could answer that. But it's just too complex for me to say yes or no.
[00:04:18]
Yeah, it's really easy to figure out whether there's a forced checkmate if you have two
[00:04:25]
pieces on the board and they can both only move, do one move, then it's really easy.
[00:04:32]
Like you got two or three, four things to check, depending on how things work. And that's
[00:04:38]
it. But if you got 10, 20 pieces on the board and they can all move in so many directions,
[00:04:46]
yeah, things become a lot harder. You have a lot more checks to do. You have a lot more
[00:04:51]
scenarios to evaluate.
[00:04:54]
Complexity is in my opinion, for sure, a cause of false positives.
[00:04:59]
Yeah. And if you can rule things out and if you can eliminate variables, that helps eliminate
[00:05:07]
that. As you said, that's a very good example with if you just have four pieces on the board
[00:05:12]
in a chess game, and can you answer that question? At that point, you can answer that with confidence.
[00:05:18]
If you can say, and sometimes it's easier to say proof by existence proof. And existence
[00:05:24]
proof is easy. I mean, if you can find it, but sometimes proving absence is difficult.
[00:05:31]
You could prove existence like, do black birds exist? Well, I can look out my window and
[00:05:40]
see a crow and say, yes, there are black birds, but then do...
[00:05:43]
Wrong, it's a dinosaur.
[00:05:45]
Well, then perhaps. And I mean, yeah, it does get quite philosophical. Like what can you
[00:05:52]
know? In a way, this is sort of like the term epistemology comes to mind, which is just
[00:05:57]
sort of what is knowable. And I often found it frustrating in philosophy classes when
[00:06:08]
epistemology comes up. I always found that topic extremely frustrating in a philosophical
[00:06:12]
context because it's kind of a dead end because you say what can be known? And at the end
[00:06:19]
of the day, you basically get to Descartes conclusion. I think therefore I am, which
[00:06:25]
is to say that is the only thing that I can prove. The only thing that's knowable because
[00:06:30]
my senses can mislead me. I can see, have I ever seen something and come to the wrong
[00:06:38]
conclusion about what it was and later discovered that I was wrong? So how can we trust anything
[00:06:44]
that we think we know? What is knowable? What is the nature of knowing and what is knowable
[00:06:49]
in the universe? And it's just to me, it's just not a satisfying subject in philosophy
[00:06:54]
because it's just like, well, nothing except that we exist. And so, okay, done. Like what
[00:07:01]
more can we do with that topic except go around in circles and we don't really get anywhere.
[00:07:06]
But with code, what is knowable? With static analysis, what is knowable? That's kind of
[00:07:11]
a more satisfying question because it's a more constrained area where we're looking
[00:07:16]
at. It's not just like questioning, can we trust these axioms about the universe? We
[00:07:23]
can sort of just think about it in terms of what can we know about this code? So yeah,
[00:07:28]
you want to give us your answer to that question. What is the root cause of false positives?
[00:07:33]
Well, the one that I came up with, and maybe we can figure out something even deeper than
[00:07:38]
that, but what I came down to is missing information just in general. Some of that through complexity,
[00:07:46]
some of that through other things, other means. But basically the way that I imagine it is
[00:07:51]
if you were omniscient being or omniscient tool, you had knowledge of everything, you
[00:07:57]
knew everything that happened in your program at runtime, at compile time, and you knew
[00:08:04]
what the developer's intents was, then you would always be right in what you report and
[00:08:11]
you would never miss anything. You would know, well, this is never used or this never happens
[00:08:17]
or this is not a problem or this is a problem. If you had all the information in the world,
[00:08:22]
then I think you would never be wrong. Therefore, the problem is missing information and you
[00:08:29]
can get those through different means. So for instance, for static analysis tool, one
[00:08:37]
of the bigger problems is dynamic code. So knowing what a value is at a certain point
[00:08:44]
is hard to figure out. Some tools can do it quite well. TypeScript does it quite well
[00:08:49]
and to some extent, much better than Elm. Elm knows the types, but it doesn't know the
[00:08:55]
values nor does it care about it. Languages that try to do proofs like Isabel or TLA+,
[00:09:05]
I think they know a lot more about what's going on in the program, but they're also
[00:09:10]
pretty complex and I don't know how they work. I don't know what their limitations are. Dynamic
[00:09:15]
things are complex, for instance. So when you have missing information, what do you
[00:09:20]
do? Because that's going to happen, right?
[00:09:22]
And perhaps you want to use the precautionary principle and say, I don't know if there's
[00:09:28]
a problem, so I will say, I can't prove that everything is okay, therefore, I'm going to
[00:09:37]
say there's potentially a problem because I can't prove that there's not a problem.
[00:09:41]
I guess this or the other approach I can think of is you could say, well, I didn't find a
[00:09:47]
problem, so no problems, right? That's sort of like what TypeScript does, right? It says,
[00:09:53]
I will tell you if I can guarantee that there is a problem rather than I will tell you if
[00:09:58]
I cannot guarantee there are no problems.
[00:10:02]
Yeah. So to me, that is making a presumption. So I might be wrong about the correct word,
[00:10:09]
but a presumption is when you accept something as true on the basis of probabilities. Like
[00:10:16]
it's very likely, I'm missing some information. I don't know whether this is true or not,
[00:10:21]
but I think it's going to be more likely to be true than false. So the example that I'd
[00:10:27]
like to take is an ESLint rule that is called array callback return, which is basically
[00:10:35]
when you do array.map and you pass it a function, that function should always return something.
[00:10:40]
Right. So you don't accidentally have a list of... that you're writing TypeScript, you don't
[00:10:46]
write return and so it's returning void, which I think has happened to everybody that goes
[00:10:52]
between like JavaScript and Elm that you forget a return and you're like, why is this value
[00:10:58]
all nulls or undefined?
[00:11:00]
Yeah, undefined. Yeah. So it's a very useful rule to have. But the thing is when you analyze
[00:11:06]
JavaScript code, there is something that is missing and that is type information. So TypeScript
[00:11:11]
could potentially help here, but basically when you see array.map, array being a variable
[00:11:18]
or something, it looks like an array or the map method of an array. And therefore you're
[00:11:23]
going to consider, you're going to presume, well, it's pretty much for sure the array.map
[00:11:30]
method. So I'm going to report any problems that are confined in the function that is
[00:11:36]
passed to it, but that might be wrong. So when you're missing, you're having missing
[00:11:41]
information, you're going to make presumptions and when they turn out to be wrong, that's
[00:11:47]
when you have a false positive or false negative. Because what we could also do is say, do some
[00:11:52]
more analysis, like is this read an array? Can we find somewhere where it is declared
[00:11:59]
where we can clearly see that this is an array? And if we see that it's an array, then we
[00:12:05]
report a problem. And if we don't see that it's an array, if we don't know, then we don't
[00:12:10]
report anything. And that removes all the false positives that we have, but that creates
[00:12:14]
a lot of false negatives. So whenever you need to make presumptions, you're going to
[00:12:20]
have the choice to lean more towards a false positive or false negative, but you're going
[00:12:25]
to have to choose or do more analysis, which can be complex and maybe you're not going
[00:12:30]
to be able to figure out the answer.
[00:12:32]
Yeah. And it strikes me that this doing more analysis piece, that there are maybe two different
[00:12:39]
categories here that either something is knowable, but takes a lot of work to know potentially
[00:12:49]
to the degree where it's essentially unknowable because it's infinite. Like is there a forced
[00:12:54]
checkmate from the first move on a chessboard? Like technically knowable, but practically
[00:13:00]
unknowable or even if you knew, maybe the sequence of moves that would lead you there
[00:13:07]
is so large that it's not usable information.
[00:13:11]
You mean it's not, you can't compute it?
[00:13:14]
Or if you compute it, it's like, yeah, here's a list of size 30 trillion of all the possible
[00:13:21]
responses to these different lines that gives you guaranteed ways to respond with a forced
[00:13:26]
checkmate and it's like, okay, well I can't really use that. So even if it's technically
[00:13:30]
knowable, it's essentially as the amount of analysis and information to deduce if something
[00:13:37]
is knowable approaches infinity, it starts to resemble being unknowable. But there's
[00:13:47]
like unknowable and then there's knowable with work. Those are two different things.
[00:13:51]
So like an example of something that is literally unknowable would be like, if you take eval
[00:13:59]
into account, if you run eval, what can this code do? Well, if it's from user input, that
[00:14:06]
user input is undefined. You don't know what the bound, there are no bounds on it.
[00:14:11]
So well, it's not undefined, it's a string.
[00:14:14]
Right. Or who knows?
[00:14:17]
Maybe it's a string undefined.
[00:14:20]
Yeah, it's not known. And therefore there's not enough information there to analyze certain
[00:14:28]
things about that. Whereas there are certain scenarios where you often talk about code
[00:14:34]
flow analysis and maybe it's like a massive amount of work. Maybe you need some postdoc
[00:14:43]
programming language researchers to assemble a team to solve this problem, but it's technically
[00:14:50]
knowable and you could do that or it's just a huge amount of work.
[00:14:55]
If you listener want to do that, please contact me.
[00:14:59]
And Elm is a very interesting space for these problems because as you've sort of been hinting
[00:15:04]
at it is more knowable because it's more constraint. Like in our chess analogy, it's more akin
[00:15:11]
to the chessboard that just has a handful of pieces rather than the starting chess position.
[00:15:16]
Yeah. So you're now hinting at something interesting is like, why is Elm more knowable, more analyzable
[00:15:23]
than other languages like JavaScript? And to me, there are multiple aspects to that.
[00:15:30]
One of which is that there's a compiler. Doesn't seem like a big thing, but it actually is
[00:15:35]
because potentially when you analyze your JavaScript code, it's a bunch of gibberish,
[00:15:40]
like saying A equals A or one equals two or well, actually not. Yeah. But basically what
[00:15:48]
you can have is a code that looks like code, but doesn't mean anything like it references
[00:15:55]
undefined variables or it has embedded semantics.
[00:16:02]
It's syntactically valid, but not well defined code.
[00:16:05]
Yeah, exactly. And those are all things that a compiler checks for. So when you know that
[00:16:11]
these are checked for you by the compiler, by a compiler, then you can start to rely
[00:16:17]
on them. And that is quite important actually. So JavaScript doesn't have a compiler, but
[00:16:22]
what it does, what it does have is a linter. So what ends up happening for language like
[00:16:29]
JavaScript is that you have a lot of ESL rules to do the same work that a compiler would
[00:16:36]
do. So you have a rule for undefined references. You have a rule for reporting duplicate declarations,
[00:16:46]
stuff like that. And once you have those, then other rules can kind of depend on those
[00:16:54]
semantic issues being not there. But it is like they can kind of rely on those because
[00:17:02]
people can disable the ESLint errors or people can just not enable those. And the fact that
[00:17:11]
we kind of need those rules is a reason why they're using the recommended configuration
[00:17:17]
for ESLint or other tools. If the tool doesn't ship with it, then they're not going to be
[00:17:22]
enforced and other rules will not be able to depend on those. And that is actually something
[00:17:27]
that we don't, we need with Elm because the compiler checks for so many things that the
[00:17:33]
Elm review rules don't need any more certainty. They can just rely on the things that the
[00:17:38]
compiler checks for and that's enough for pretty much anything.
[00:17:42]
Right. They might have like a snowball effect where by applying different rules and applying
[00:17:50]
fixes to those rules, you can eliminate more dead code because making one piece of dead
[00:17:59]
code go away makes another piece of dead code go away and there's this snowball effect.
[00:18:03]
But as you say, the language guarantees are enough that you're not depending on, I need
[00:18:10]
this guarantee in order to make my checks, therefore you have to turn on these rules
[00:18:15]
as prerequisites. I mean, you could imagine scenarios like that, but I guess you haven't
[00:18:20]
encountered them yet.
[00:18:22]
Yeah, I haven't yet. But yeah, for instance, if you were trying to evaluate an expression
[00:18:28]
and you saw a reference to a variable and you didn't have the guarantee because you
[00:18:32]
were in JavaScript that that variable was actually referenced anywhere, then potentially
[00:18:38]
you would enter a weird state or you would crash because, oh, well, I expected this to
[00:18:44]
be in the scope somewhere. So it's really nice not to have to be defensive about those
[00:18:50]
things. So therefore a compiler really helps with all those things.
[00:18:55]
Right. I could imagine like some rules around divide by zero or not a number or something
[00:19:05]
like that. And you could say, well, there are certain entry points where you could get
[00:19:11]
a number from a port from a JSON decoder. And at those terminal points, maybe you have
[00:19:19]
an Elm review rule that checks that you need to unwrap them into safe types that are not
[00:19:26]
a number. And then that rule could be a prerequisite for another rule that assuming that all of
[00:19:34]
the number inputs that you're using are not a number to begin with, you're dividing them
[00:19:38]
in a way that you're checking for divide by zero and things like that. And you're going
[00:19:44]
to have well defined values.
[00:19:46]
Yeah, I could imagine that as well. At work, we have a rule that detects unused CSS classes.
[00:19:54]
So what we do is we take our CSS files and we extract all the classes from those and
[00:20:00]
we turn them into an Elm file that our Elm review configuration then uses. And then we
[00:20:06]
just go through the entire files and find out the ones that are used and report the
[00:20:11]
ones that are left. But to be able to tell that, we also have another rule that checks
[00:20:17]
for any usages of the class function that are too dynamic, that are too hard for Elm
[00:20:24]
review to tell. So they kind of depend on each other. I actually don't remember whether
[00:20:29]
we merge them into one rule. But as long as you don't make anything depend on the other
[00:20:35]
one, like a fix, like imagine you have a fix that you want to apply, that should probably
[00:20:42]
not depend on information that has not been validated before. And because in Elm review
[00:20:48]
fixes take the upper hand or prioritize compared to non fixed errors, at least in fixed mode,
[00:20:57]
that can be kind of dangerous. But yeah, at least the number of guarantees that we have,
[00:21:01]
the number of presumptions that we need to do in Elm, or at least in Elm review is a
[00:21:06]
lot lower than what you would do in ESLint. So this hasn't been a problem really, so far,
[00:21:13]
in my experience, it could be, but I'm guessing it would be for things that are a lot more
[00:21:19]
precise than what we're currently doing.
[00:21:23]
And this seems like what you're talking about with checking for class names that are too
[00:21:29]
dynamic for you to basically effectively analyze. Because you could imagine pulling on that
[00:21:35]
thread more and more and saying, well, what if it's just an inline concatenation between
[00:21:42]
two string values? Could we just check those two literal concatenated string values? And
[00:21:48]
is that literal enough for us to use? And you say, okay, well, now that we're checking
[00:21:52]
for concatenated string literals, why don't we add something that says, well, what if
[00:21:58]
it's a string constant that's concatenated to another one? And maybe that's quite useful
[00:22:02]
because you want to be a little more dynamic with your class name. So and then you say,
[00:22:07]
well, what if we want to add a number to it? Can we and then we want to be able to do arithmetic
[00:22:14]
on that numbers, or we want to be able to map over a list of numbers and then check
[00:22:19]
those values. And eventually, you're just building like a pre compilation like evaluator
[00:22:26]
that's actually evaluating your program before a compile time. And you certainly can do those
[00:22:35]
things. But you're intentionally choosing a strategy there to preemptively give a false
[00:22:41]
positive and just say, or to put a constraint on the rule where you're I mean, I guess another
[00:22:47]
way to look at it is rather than the false positive, you're saying it's not a false positive,
[00:22:52]
it's just a constraint that the rule as the rule is saying, this is not a false positive
[00:22:57]
that like, hey, this could actually be valid, you're actually saying, I'm adding, here's
[00:23:02]
a rule that adds an additional constraint to your code. And it's not a false positive.
[00:23:06]
It's a true positive, this is not okay, you have to use this constraint where you only
[00:23:11]
use string literals for class names.
[00:23:13]
Yeah, yeah, as you said, like, if you want to figure out a lot more things, basically,
[00:23:20]
at some point, you're building an interpreter, as I see it, which would definitely be valuable
[00:23:27]
to be able to infer a lot more things.
[00:23:29]
And in Elm, you can do a lot in that regard.
[00:23:32]
I think you can do a lot. Yeah, because it's just pure functions, right? So it's none of
[00:23:38]
them have side effects that make the next things a lot easier. But it would still be
[00:23:42]
a lot of work and would make the tool a lot more slower, I think. And for the rule that
[00:23:48]
doesn't report false positives, but reports things that it wants to enforce new constraints,
[00:23:54]
you're absolutely right. Where I would say that it switches from a false positive to
[00:23:59]
a constraint is in the error message. Like if the error message actually explains like,
[00:24:06]
hey, this is not a problem in the sense that it's not going to cause your code to crash
[00:24:12]
or behave weirdly. But for the sake of this other Elm review rule that makes sure that
[00:24:19]
we don't have any new CSS classes, we require that this is a that the argument to class
[00:24:26]
is a static string, a string literal. And that's what we did. So if you explain the
[00:24:33]
problem and if you explain the benefits, then people accept it. Now, I haven't heard anyone
[00:24:41]
complain about this, so I'm very happy about that. But if you have like a one liner in
[00:24:45]
like in most static analysis tools, like that's going to be hard to explain. Like what do
[00:24:50]
you what is a problem? How to move forward? Why is this a real problem? Like, yeah, people
[00:24:57]
want to understand the problems that you're reporting.
[00:25:00]
So Elm review does, I mean, obviously abstract some things away from the user. In this case,
[00:25:09]
like a review rule author. Like, for example, like you do provide this lookup table, the
[00:25:17]
module name lookup table is sort of somewhat going down this path of being able to provide
[00:25:24]
more information about, you know, like, and it's a very, to me, it's a very interesting
[00:25:29]
path in Elm review. And I'm curious, like, are there any other examples where you sort
[00:25:36]
of do some amount of additional analysis of the code where you can sort of process some
[00:25:42]
information? Not, you know, not a full on pre interpreter, but doing a little more analysis.
[00:25:49]
Are there other examples of that in Elm review? And are there things on your mind that you
[00:25:54]
think might be appropriate for Elm review to expose?
[00:25:57]
Yeah. So just to explain the lookup table that you mentioned, the module name lookup
[00:26:04]
table is just basically a dictionary that says at this location in the source code,
[00:26:09]
this is a reference to this value, which comes from this import from this module. Because
[00:26:15]
in the abstract syntax tree, you have when you say a dots b, you reference the b function
[00:26:23]
or b type, depending on the uppercase of the a module.
[00:26:28]
Right, because HTML dot text could just be text, or it could be import HTML as h, and
[00:26:36]
then it's h dot text.
[00:26:38]
Yeah, or imports HTML dot styled as HTML, stuff like that. So the module name lookup
[00:26:45]
table is there to make it much easier for you to figure out what is the real original
[00:26:52]
module that this value comes from, which we didn't have at the beginning.
[00:26:56]
People probably invented that from scratch or a sort of imperfect version of that, I
[00:27:03]
would imagine.
[00:27:04]
Yeah, basically people were like, do I see an import to HTML? If so, what is the alias?
[00:27:11]
And also, does it expose the text function literally or using exposing dot dot? And basically,
[00:27:18]
people did it like that way, which in practice is good enough. I don't think you're gonna
[00:27:24]
have a lot of false positives, but it's a lot of work. And you could have some false
[00:27:29]
negatives potentially. So yeah, this was something that I really wanted to add to Elm Review
[00:27:35]
and I got it in there. And now I don't think about this sort of problem, which is really
[00:27:40]
nice. But it did require a few iterations to get right.
[00:27:45]
Yeah, it's basically something that you pass into the context, right?
[00:27:51]
Yeah, the way that you initialize your context, basically your model for going through the
[00:27:58]
AST, you say, hey, I'm interested in having the lookup table because I think that's gonna
[00:28:04]
be useful. Please compute it for me and give it to me.
[00:28:08]
And then people can use it.
[00:28:09]
Which is largely like a performance optimization, right? If it's not needed, then you don't
[00:28:13]
need to compute it.
[00:28:14]
Yes, exactly. Currently, if any rule requires it, then I compute it. I think I want to be
[00:28:21]
smart in the future where only the rules that I'm going to run now will, if any of them
[00:28:28]
are needed, then I compute it. Because basically, the fix mode is quite slow. And I think I'm
[00:28:34]
going to need to be able to cut up the review phase and running one rule at a time and be
[00:28:40]
able to stop whenever I find like a fix.
[00:28:44]
Elm is very interesting in that regard because certain times you need to compute certain
[00:28:52]
things upfront in a sort of framework design because the user can't just invoke a method
[00:28:59]
that then mutates some dictionary somewhere and then suddenly it doesn't have that value,
[00:29:05]
but it goes and performs a side effect and puts it in there and memoizes it. So you sort
[00:29:10]
of architect things differently in Elm.
[00:29:13]
Yeah, and also because we can't memorize it. So either we say, well, we're going to compute
[00:29:19]
it once and then people will use it zero to n times, or we're going to compute it lazily
[00:29:26]
and then it will be computed as many times as people require it, which is unfortunate.
[00:29:32]
In practice, it probably works out okay a lot of the time, especially in the context
[00:29:37]
of a browser application. Maybe for CLI applications, it's a little bit different.
[00:29:44]
For performance heavy tools, yeah.
[00:29:47]
So are there more cases where you've considered adding these types of things that provide
[00:29:53]
more sort of, you know, rather than just the abstract syntax tree, the syntax tree with
[00:29:58]
a little processing, with a little extra analysis performed for you that you can access through
[00:30:05]
the Elm review platform?
[00:30:06]
Yeah, I actually just added one this morning, this weekend, basically the module documentation.
[00:30:15]
So that's the curly brace dash pipe comment that you have at the beginning of your file
[00:30:21]
before the imports. That is the module documentation. And currently Elm syntax, the AST library
[00:30:27]
that we use doesn't have a way to store that as the documentation of the module. It's just
[00:30:34]
among the comments. So what I had to do in a bunch of rules is to go through the comments,
[00:30:41]
find the module documentation, which I just learned this weekend that there was room for
[00:30:48]
false positives because ports also have that problem. Like the documentation for port is
[00:30:55]
also not attached to the port. Whereas documentation for a function or for a type is properly attached.
[00:31:02]
So yeah, basically it was possible to confuse the documentation for port as the module documentation.
[00:31:10]
So yeah, that's, it's not super tricky, but it's not nice to have to compute it everywhere.
[00:31:17]
And in all of my implementations it was potentially broken. So I just made it, I added a new visitor
[00:31:26]
or a new context creator function to be able to have that information right away, basically.
[00:31:33]
And so I'm going to publish that in the next version. And the other big one is type information.
[00:31:38]
Yeah. So it is really surprising that Elm review works so well for a typed language,
[00:31:45]
considering we don't have type information. There are two ways that we can do that. One
[00:31:50]
of them is by invoking the compiler, which has a few problems, notably that you can't
[00:31:57]
invoke the compiler in tests. So probably have to write a separate testing framework
[00:32:02]
for Elm review rules where it would create files, run the compiler thousands of times,
[00:32:08]
because I have thousands of tests. And also the whole review process is one giant pure
[00:32:15]
function currently. And if I had to ask for the type information, then I would have to
[00:32:23]
break out of that somehow, especially in the fix all mode, it would be very messy in practice.
[00:32:33]
So the other method is to do the type inference ourselves, which I've tried a few times so
[00:32:39]
far and got so far.
[00:32:43]
It's a somewhat challenging problem, I would imagine.
[00:32:47]
Yeah. I think you need to know how things work, what the algorithm that Elm uses works
[00:32:54]
so that you have the same results because you got some edge cases where it can have
[00:33:00]
some differences. But basically you need to know the theory well in order to do a nice
[00:33:06]
implementation. And I've never understood that algorithm properly.
[00:33:11]
Yeah. It's a huge task.
[00:33:13]
I know someone who's working on this on and off. I don't want to put pressure on them.
[00:33:19]
Someone who has a very common name, it would seem.
[00:33:24]
Maybe yes.
[00:33:25]
We know you're listening.
[00:33:28]
Yeah. But yeah, that would be really nice. A few applications of that would be, for instance,
[00:33:39]
the no missing type annotation rule that could generate the missing type annotations.
[00:33:46]
That would be so nice.
[00:33:47]
Yeah. So we already have that in the editors. So we know that could work well. It doesn't
[00:33:54]
always give the nicest error, the nicest type spots. It could still be helpful.
[00:34:00]
But that would unlock more information. You said that removing false positives, it comes
[00:34:06]
down to needing more information. If you had unlimited information, you could remove all
[00:34:12]
false positives. So what are the areas that you could remove false positives with that
[00:34:18]
extra information?
[00:34:19]
Well, it's not necessarily false positives. It's false positives and false negatives because
[00:34:24]
you would be able to know more and therefore you would be able to report more. In Elm Review,
[00:34:31]
there are basically no false positives. So I'm not sure it would help with much. I know
[00:34:38]
with one location where it could potentially help, where we do have a false positive that
[00:34:43]
people report sometimes, which is the no unused custom type constructor arcs. It's a mouthful.
[00:34:54]
Basically, you can create a custom type where you say type A equals A int, type ID equals
[00:35:02]
ID int, for instance. And then you never extract that identifier, that string value. So the
[00:35:11]
rule reports that as not used. But potentially you could use that in a comparison. Like,
[00:35:20]
is this ID the same one that this one has? And if you use it that way, there's a false
[00:35:27]
positive. If you never extract the ID in another way. So that could potentially be able to
[00:35:34]
tell us like, hey, in this comparison, is there a usage of this type? If so, don't report
[00:35:41]
that type. So that's a false positive that we could remove. And then it's mostly going
[00:35:44]
to be about false negatives because there's a bunch of rules that we can't write with
[00:35:51]
that type of information. And well, I don't have that many in mind, but a few, like for
[00:35:56]
instance, the one that I really want and that some people want is reporting unused record
[00:36:02]
fields. That can get quite tricky to do right if you want to, basically we can do it. It's
[00:36:11]
just going to have a lot of false negatives. So as I said, like you can either lean towards
[00:36:15]
false negatives or false positives when you don't have information. Right. So basically
[00:36:19]
what we can do is, well, if we see that a function takes a extensible record as an argument
[00:36:27]
and some of those fields are not used, then we can remove those. And I actually already
[00:36:33]
have a prototype without working, but if you pass that argument to a list.map function,
[00:36:42]
for instance, so you have a list of some records and you pass that to a list.map. Well, now
[00:36:48]
you need to figure out what is the type of that mapper function that you pass to the
[00:36:54]
list.map because if that one uses some of the fields and those fields are used, if it
[00:37:00]
doesn't, then they're not used. But if you don't know the type, well, you don't know
[00:37:05]
whether they will be able to, which fields are used and which ones are unused. So therefore,
[00:37:12]
if we want to be safe and not report false positive, we're just going to say, well, it
[00:37:16]
looks like it could use anything. So we're not going to report anything. And that's the
[00:37:20]
same thing for a model. Like you pass your model, which is usually a record with plenty
[00:37:27]
of fields, you pass a model to some function that is a lambda that is hard to evaluate.
[00:37:33]
Therefore, we can't tell anything about it. So we stop. So having type information here
[00:37:38]
would be a lot, very helpful because we could analyze the type of those functions and we
[00:37:45]
could see, well, it seems to be using this field, this field, and that's it.
[00:37:51]
Yeah. It seems like that would unlock a lot of possibilities, not to mention fixes that
[00:37:57]
could, you know, I mean, code generation fixes, all sorts of ideas you could find there.
[00:38:04]
Yeah. I can imagine we will still have plenty of false negatives, but I think we will be
[00:38:09]
able to catch all false positives or we would not have false positives, but that's yeah,
[00:38:15]
again, like how conservative we want to be about things being used or unused. Cause we
[00:38:22]
could go either way. We could potentially have a configuration, the rule that says try
[00:38:26]
to be more aggressive now just for a while. And then you go check the false positives
[00:38:31]
and maybe you can remove, you could check the errors and maybe you can remove a few
[00:38:35]
things. Maybe you don't, but yeah.
[00:38:37]
Right.
[00:38:38]
But yeah, in general, we want to be very conservative and not report any false positives because
[00:38:42]
those are super annoying.
[00:38:45]
Yeah. So it seems like, I'm not sure if this falls into the same groups you've mentioned
[00:38:51]
of choosing to err towards false positives or err towards false negatives. But when we're
[00:38:58]
talking about ways to work with less information, you don't have as much information as you
[00:39:04]
need to be 100% sure of something that you're checking for. Well, like if we look at the
[00:39:13]
chess example again, you know, what do you do in that situation? If you, if you can concretely
[00:39:18]
determine it, then it's, then it's easy enough. If you can't, then you end up, you know, what
[00:39:24]
do you do for an opening chess move? You tend to rely on strategies and heuristics. So heuristic
[00:39:31]
for, you know, determining whether a chess move is good is you want your pawns to be
[00:39:37]
supporting each other. You want, you want to try to take the opponent's queen if you
[00:39:43]
can for trading for your knight. That might turn out to be a move that, that leads to
[00:39:50]
you being checkmated in the next move. But that's a heuristic that you can say, well,
[00:39:55]
let's just kind of generally assume that this is going to tend to be a good thing. And so
[00:40:00]
now your rule is now going back to like Elm review rules in the context of Elm review.
[00:40:06]
Now these heuristics are telling you things about your code that might give you unreliable
[00:40:14]
results. Because you're, cause essentially what a heuristic is, is it's measuring the
[00:40:20]
thing that is not directly what you care about. Like in a chess game, you care about checkmate.
[00:40:26]
That's the only thing you care about. But, and maybe like the number of moves until you
[00:40:30]
checkmate, like that's all you care about. But in this heuristic of trying to take the
[00:40:37]
opponent's queen, if you can, you are having a stand in goal that's, that's easier to determine,
[00:40:45]
but might that stand in might be flawed in some cases that stand in might actually not
[00:40:51]
yield the result you might, might lead to you getting checkmated.
[00:40:55]
Yeah. So yeah, in chess, I think computers are powerful enough to basically compute every
[00:41:02]
possible move in a game or close to no, no, probably not.
[00:41:08]
They're actually not. They actually rely a lot on heuristics to like prune the tree because
[00:41:14]
it's an exponentially growing tree. So it it's approaching infinite. So computers can't
[00:41:22]
deal with that, but they, so they do have to use heuristics.
[00:41:24]
Yes. They do use heuristics and do prune at all things. Yeah. Let's imagine they could
[00:41:33]
compute every case. Then basically it has perfect information. Right. So whatever it's
[00:41:40]
going to feed into try, it's going to work. If it's slightly limited, which in this case
[00:41:47]
it is, then you can improve the logic by saying, well, this is obviously a bad move. Right.
[00:41:53]
And you can remove some complexity. You can now rely on those. It's going to be a presumption.
[00:41:59]
Yeah. Right. Exactly. So when that turned out to be wrong, you're going to have worse
[00:42:03]
results than expected. But when those are true, then you get some nice results.
[00:42:08]
Right. So is that acceptable to have that in an Elm review rule or do you try to avoid
[00:42:15]
that? To have presumptions? Yeah. To have to have
[00:42:18]
heuristics because if it's a rule, it's telling you it's an error. There's no way to disable
[00:42:22]
it. And in some cases you might say, well, actually in this case it's okay. Like a code
[00:42:27]
smell, like, well, it's a code smell if you have a function that's over a certain number
[00:42:32]
of lines, but maybe in this particular instance, it's fine.
[00:42:37]
Yeah. In Elm review, I would, well, in general I would say it depends on the criticality
[00:42:45]
of the issue and how much you want to force it. For instance, the unused CSS classes rule,
[00:42:53]
that is basically like going to report false positives by saying, yeah, you should use
[00:42:59]
a literal, but as we said, it's going to be more of a constraint than a false positive
[00:43:05]
depending on how you frame it. Right. Yeah. Because we don't, so those opinionated rules
[00:43:10]
are fine if you opt in to those, I think. You need to be, to have the whole team accept
[00:43:17]
this rule in my opinion, like all of the rules, but in general Elm review doesn't allow ignoring
[00:43:25]
issues. So that's why at least all of the rules that I wrote tend to go to lean towards
[00:43:31]
false negatives or false positives. Right. Instead of heuristics.
[00:43:37]
Using heuristics, like basically using presumptions. I see. Well, I don't know, so I'm going to
[00:43:43]
take the route that I know will lead to people not getting false positives. You can view
[00:43:49]
it as a simple heuristic in a way, I think. So basically a heuristic is how you choose
[00:43:57]
to put some things into the false positive category or choose to put some things into
[00:44:03]
the false negative category. That heuristic is what determines that. Yeah, I'd say so.
[00:44:08]
And I think that Elm review really has this stance to go towards false negatives more
[00:44:14]
than other tools because in those other tools you can disable the errors when you have false
[00:44:19]
positives. And that also impacts how people write those rules or when they choose to write
[00:44:25]
and enable those rules. Because I know if I don't have disable comments, I know that
[00:44:30]
if I report false positives, it's going to be very annoying. And I know that if some
[00:44:35]
rule that reports like a code smell, which is not always bad when it reports an error
[00:44:41]
and shouldn't, well, people are going to be blocked. So if I have a way to tell them like,
[00:44:47]
please write the code this way in order to not have this false positive, then that's
[00:44:53]
acceptable I think. If I don't, then I'm just not going to write the rule. Right. And not
[00:44:59]
writing a rule is basically 100% false negatives. Right, right, right. Right. Right. Although
[00:45:07]
you could argue that 100% false negatives feels very different than 99% or 1% false
[00:45:14]
negatives because you know you just can't rely on 100% false negatives. Whereas you
[00:45:19]
don't know if it's 1% false negatives. You don't know if you can rely on that or not.
[00:45:25]
But other tools like ESLint and they have a lot more rules that have the potential for
[00:45:32]
false positives and they're considered okay because you can disable them. So I really
[00:45:36]
think that having the ability to disable errors impacts the way that we choose which rules
[00:45:44]
to write. Yeah. And as you say, it depends on the criticality of the issue if it is a
[00:45:52]
constraint that you really depend on for something that you're doing, then it's going to change
[00:45:58]
the calculus there. Yeah, if it's to report an issue that you know for sure will crash
[00:46:04]
your application, but it might be wrong, then yeah, it is probably something you want to
[00:46:09]
enforce at the cost of being a bit annoying sometimes. So people will have to add to the
[00:46:16]
disable comments or rewrite the code in a way that the linter will understand that this
[00:46:20]
is not a problem. But yeah, I haven't found any critical problems like that for Elm Review
[00:46:28]
so far, I think. So yeah.
[00:46:30]
So you often mention that code flow analysis is sort of the thing that makes a lot of rules
[00:46:39]
not worth writing. And I wonder... So here's the original tweet that we were talking about
[00:46:47]
earlier where you kind of talked about missing information being the root cause. So you said,
[00:46:51]
missing information is the root cause of false positives slash negatives in linters. Add more
[00:46:56]
information to find more problems and be less wrong at the same time. How? One, the linter
[00:47:01]
should provide more information to rule authors. And two, languages should restrict dynamic
[00:47:07]
features. So one, the linter should provide more information to rule authors. Like what?
[00:47:13]
Like is there information that Elm Review could provide to rule authors to help them
[00:47:18]
with code flow analysis in addition to the module lookup table we discussed? Like comparing
[00:47:23]
references seeing if something refers to the same value.
[00:47:27]
Yeah, for instance, having aliases. And I'm definitely thinking about ways to make analysis
[00:47:33]
easier, which is in a way providing information that would be hard to compute otherwise. Also,
[00:47:40]
there's just simply plenty of information that you sometimes can't get. Not so much
[00:47:45]
with Elm Reviews anymore, but like for instance, only recently I added the function to give
[00:47:52]
you the file path of a module to analyze. Because I thought people might do some weird
[00:47:59]
things with it. That's something that I was quite scared about, like people misusing the
[00:48:04]
tool at the beginning. In practice, not so much. So now I make that available and people
[00:48:11]
do use that for some applications. I don't have any in my head anymore. But so yeah,
[00:48:18]
give all the information that you can. And then yeah, make it possible to analyze codes
[00:48:23]
in a simpler way, like give type inference, give the real module name and yeah, provide
[00:48:31]
code flow analysis tools. I know that ESLint has something like that, which I never understood.
[00:48:38]
So I don't know how that would work. I've also thought about being able to figure out
[00:48:44]
like, is this value an alias to the other function? And that could be interesting. That
[00:48:54]
could catch more things. Definitely.
[00:48:59]
For the performance question, I could imagine, I don't know if this would be a fruitful direction
[00:49:06]
at all, but I could imagine a design where you sort of have, actually very much like
[00:49:11]
the store pattern that Martin was telling us about in our store pattern episode. Essentially,
[00:49:17]
you know, the store pattern you have your, I can't remember what he called it now, but
[00:49:21]
your query of these are the things I depend on for this page. This is the data I need.
[00:49:26]
You could sort of have that as a sort of subscription that says, this is what I need, which as we
[00:49:30]
discussed in the store pattern episode, as more information comes online in the store
[00:49:35]
pattern, it could be getting it with HTTP requests. Then you can do follow up information
[00:49:39]
because it's sort of a subscription that gets called whenever that changes. And then it
[00:49:43]
just keeps going until the information you say you need matches the information that
[00:49:48]
you have already or is a subset of it.
[00:49:52]
So I can imagine something like that where you sort of have like a subscription to like,
[00:49:58]
here's some computationally expensive data I need that you're not just going to go analyze
[00:50:03]
constantly and then you have these sort of remote data or maybe values or whatever that
[00:50:10]
you're waiting on. And then you can sort of take all those together once you have them
[00:50:15]
all filled in and then you can continue your analysis. So that could be really interesting
[00:50:20]
to like provide some primitives for doing that sort of thing.
[00:50:23]
I think the way that I understand it is I think already what Elm Review does to some
[00:50:30]
extent because we say like, I request the module name lookup table, therefore please
[00:50:36]
compute it. And the framework could do a better job at computing only what is necessary. And
[00:50:43]
then when it looks at the next file, compute again only what is necessary and so on and
[00:50:48]
so on. That I definitely want to have. And I think that's kind of the same idea like
[00:50:54]
this module depends on the lookup table for this module. So whenever you get to the next
[00:50:58]
module you compute it again for that module, etc.
[00:51:04]
Yeah, it is a similar pattern. I think the main difference would be in the case of a
[00:51:08]
module Elm Review knows what module it's looking at. And so it can fill in that bit of context
[00:51:14]
to say, okay, it's requesting the module lookup table and it's in this module so I can compute
[00:51:20]
it for this specific module. But if it's something more nuanced like I want to pre evaluate this
[00:51:26]
string for example, then it doesn't know which strings to pre evaluate based on some implicit
[00:51:34]
context of the process it's running. So in that case, that sort of store pattern style
[00:51:40]
could work where you can give it that information. You can say, hey, here's the node I'm looking
[00:51:45]
at and I would like to wait until you can finish analyzing, like pre computing this
[00:51:52]
string value, please. And then you wait until it's no longer a maybe and then get it back.
[00:51:58]
And that could allow you to lazily compute and memoize some of these more expensive values
[00:52:05]
with specific context where the user can say, I want it for this node. So anyway, like seems
[00:52:10]
like an interesting path to explore.
[00:52:12]
Yeah, it could be interesting. Yeah. In this case, it would definitely help to be able
[00:52:17]
to say, please compute this now and store it in the store directly without just by mutation.
[00:52:25]
That would definitely make things easier.
[00:52:28]
Right.
[00:52:29]
Yeah.
[00:52:30]
And I guess it's maybe a little bit of a chicken and egg problem to know which of these things
[00:52:37]
would open up interesting possibilities because when you offer this information to review
[00:52:43]
authors, review rule authors, then they do interesting things with it. And then when
[00:52:48]
they do interesting things with that, it builds and snowballs and it sparks people's imaginations.
[00:52:54]
And so it's sort of hard to know which ones to explore before you've seen what people
[00:52:59]
do with them.
[00:53:00]
Yeah. Yeah. Well, I have my own opinions about things that could be interesting or ideas,
[00:53:06]
not opinions. But yeah, I've been surprised by what people came up with. For instance,
[00:53:13]
you made the Elm review HTML to Elm.
[00:53:17]
Yes, that's right. Yeah.
[00:53:19]
Based on what Martin Stewart made credit to him and to you, obviously, but the idea was
[00:53:25]
from Martin.
[00:53:27]
His idea to use Elm review fixes as a code generation tool is 100% credit to him. And
[00:53:32]
I used a bunch of his code for that.
[00:53:34]
So yeah, that one I did not expect. And yeah, that's a pretty cool avenue to explore. Definitely.
[00:53:41]
I also know that some people would like to be able to generate modules to create files
[00:53:46]
on disk based on the same idea. So like, that could be interesting.
[00:53:51]
Yeah. So the type information is like the big one on your wish list right now.
[00:53:56]
Yeah. And also performance for fixes and performance for Elm review because in my opinion, it's
[00:54:04]
too slow. But there's maybe just me as a parent to the tool. Like, ah, at works, it takes
[00:54:11]
like a whole minute to run on our code base, which like, yeah, that's too slow. Like, if
[00:54:17]
even I want to go do scroll on Twitter while the review is ongoing, like, it's too long.
[00:54:24]
Yeah. Right.
[00:54:25]
But I do wonder like, what kinds of use cases could people come up with if there was more
[00:54:33]
information? Like, I wonder if some sort of dependent typed kind of techniques could emerge
[00:54:41]
if people had more tools for doing code flow analysis or, you know, just more information
[00:54:48]
at their fingertips. Because like what Elm can do with all the information it has about
[00:54:53]
your code, both because it's a compiler and has computed all this information and because
[00:54:58]
the constraints of the Elm language, all the things, all the guarantees it has based on
[00:55:03]
how you have to write your code for it to be valid. There are just so many cool things
[00:55:07]
that it can do. And if you start like looking at the compiler code, you're thinking of all
[00:55:12]
these possibilities. Like I know I do with Elm pages. I'm like, oh my God, if I was a
[00:55:16]
compiler, there are so many cool things I could do with the information I would have.
[00:55:22]
Yeah. So a compiler is basically a static analysis tool, just like Elintor, right?
[00:55:28]
Right. It's a static analysis tool that the code must pass through in order to run, which
[00:55:36]
that's basically all it is. It's those two things.
[00:55:39]
And then it generates some files.
[00:55:41]
Right. Right. Also then, right.
[00:55:43]
That is a compiler part, but the rest is very important as well. And the thing is the compiler
[00:55:50]
is a general purpose tool, right? So it's only going to be able to infer things that
[00:55:55]
the language tries to allow and to report things that it doesn't want to allow. But
[00:56:02]
then if you want to do something more precise that the language was not designed for, you
[00:56:07]
could potentially do that with a very powerful static analysis tool. So like, I don't know
[00:56:12]
much about dependent types, but being able to figure out at compile time that some number
[00:56:19]
is always smaller than five, you could potentially do it by adding constraints, just like a language
[00:56:26]
with dependent types would do. Maybe, I don't know enough, but you could definitely try
[00:56:31]
to do that and then report errors like, Hey, I am not smart enough to figure this out.
[00:56:39]
Please change the way that you work with your code. Kind of like proof languages, which
[00:56:44]
I think they accept plenty of things, but if it's too hard, then they ask the people
[00:56:49]
to rewrite their code in a way that they can understand.
[00:56:52]
Right. Which I mean, in a way, like, yeah, if you say non empty list, you know, from
[00:57:01]
cons or whatever, right? That's like a lazy approach to that in a way where you're saying,
[00:57:07]
I'm not going to do code flow analysis. You must prove to me by actually passing a single
[00:57:13]
definite value and then a list which could be empty. I don't care. And so you've proven
[00:57:19]
it. That's like the shortcut to proving that. Or you could do code flow analysis and you
[00:57:25]
could say, well, I can analyze your code paths and I can see that you're using this non empty
[00:57:32]
type that promises to be non empty, but maybe not through the compiler, but through Elm
[00:57:38]
review and I see this one pinch point that I know this type will always go through and
[00:57:45]
it adds something to the list. Therefore, you're good. Like that would be the deluxe
[00:57:50]
approach.
[00:57:51]
Yeah. But then some things are very hard to infer because it uses code from dependencies
[00:57:58]
that we don't have information about. So again, misinformation. There is a request to be able
[00:58:05]
to analyze the code from dependencies before analyzing the project. And I think that would
[00:58:11]
be very valuable. If you do that, you can basically do whole program analysis except
[00:58:16]
for the JavaScript parts. Maybe we would like to be able to analyze CSS and JavaScript files
[00:58:22]
as well, but I think that's getting a bit of out of hand at the moment at least. It
[00:58:29]
should be interesting, but maybe it's better to use two tools like ESLens and Elm review
[00:58:35]
and configure them in a way to give you all the same guarantees.
[00:58:39]
And you can always go the other way too, right? Like if you're wanting to analyze things with
[00:58:45]
your CSS, you can generate CSS from Elm and then you have a more constrained place to
[00:58:53]
analyze it. Whereas if you're like guarantees are always, you can always flip it on its
[00:58:59]
head. You can say, well, this is too unconstrained and hard to analyze. Therefore I'm going to
[00:59:05]
constrain it. Like to take something from an unconstrained environment to a constrained
[00:59:11]
environment is very, very hard to take something from a constrained environment to an unconstrained
[00:59:17]
environment is very easy, relatively speaking.
[00:59:19]
I remember when I rewrote an Elm application to React, that was really easy. Whereas the
[00:59:27]
opposite would have been way harder, just like basically re implement everything. But
[00:59:32]
for Elm to React, there was a translation, which is much easier.
[00:59:37]
To take a lossless audio file and turn it into a compressed one is easy. To take a compressed
[00:59:44]
audio file or compressed image and turn it into a lossless one or to do the CSI enhance,
[00:59:51]
it's a harder problem.
[00:59:52]
I don't know if you want to talk about side effects as well. That's interesting, but I
[01:00:00]
don't know how we are on time.
[01:00:02]
We could talk a little more and still be in our general time window.
[01:00:06]
Well, we can extend our episodes to be two hours long. That's fine as well. I mean, we
[01:00:13]
did have shorter episodes recently, so we need to compensate, right?
[01:00:18]
One area where you have a lot of false positives or false negatives in a lot of other languages
[01:00:25]
and other linters is with the presence of side effects. For instance, if we take the
[01:00:33]
no unused variables rule for Elm, where you say if you have A equals some function call,
[01:00:41]
and then this value A is never used.
[01:00:45]
In Elm review, we know, well, this function call has no side effects. We can remove the
[01:00:52]
entire declaration from the code, and then we can look at whether that function is used
[01:00:58]
or not used anywhere else.
[01:01:00]
But in a language with side effects, it's very hard to tell that. We know we can remove
[01:01:09]
const A equals, we can remove that part, but we don't know if we can remove the function
[01:01:13]
call because it might have side effects, right?
[01:01:17]
And that is going to be true for any language, as far as I know, that is not a pure functional
[01:01:25]
language, or at least where the function is not annotated in some way as being pure.
[01:01:31]
So being able to rely on the fact that functions have no side effects, that actually allows
[01:01:38]
us to do some very cool things, just like dead coded animation, a very powerful one,
[01:01:44]
as we've seen.
[01:01:45]
I think removing dead coded in Elm using Elm review is something that a lot of people love,
[01:01:50]
and I definitely do. And that is very hard to do if you have side effects. And yeah,
[01:01:57]
then you got things like moving code around where you have one function call after another
[01:02:02]
one. And if you want to optimize the code or make it nicer to read, then potentially
[01:02:11]
you have to inverse the order of those function calls. Well, is that safe to do? Well, we
[01:02:17]
don't know. Unless we have no side effects, then we know we can do it.
[01:02:22]
So we could still do that analysis. Does this function have a side effect? Does this one
[01:02:28]
also have a side effect? Do they impact each other? Do they depend on each other? And that's
[01:02:33]
a lot of work. That's really a big amount of work to do, like a lot of interpretation
[01:02:38]
and a lot of analysis. And potentially at the end, you still don't know the answer.
[01:02:43]
So you're still going to have to make a presumption like, yeah, I think this is going to... We
[01:02:48]
don't know. So we're just going to assume that it has a side effect and that it needs
[01:02:53]
to stay this way.
[01:02:55]
Right. Yeah. It's the poison pill. Things can be very easily tainted. And it's the unconstrained
[01:03:02]
versus constrained environments. And if you can take, as we've talked about in the past,
[01:03:09]
if you take pure functional Elm code, you can do more complex things under the hood
[01:03:17]
preserving those guarantees, like persisting data in Lamedera, for example. So it's pretty
[01:03:25]
compelling how you can still preserve those guarantees and do more complex things when
[01:03:31]
you have that purity. For example, you could even imagine doing some of these kind of costly
[01:03:40]
computations in Elm review. Like instead of doing this sort of Elm store pattern style,
[01:03:46]
you could imagine doing some sort of hacks under the hood, like a sort of Elm review
[01:03:51]
compiler that could...
[01:03:53]
Oh, I never thought about doing that. Definitely on my mind, but so far I've never attempted
[01:04:02]
it because I wanted... For type inference, I think that's going to be slow. Evan said
[01:04:08]
that it's going to be slow in a language where you don't have mutation. So I'm thinking about
[01:04:13]
altering that at compile time to make it much faster. We don't have type inference yet.
[01:04:21]
So I will wait for that to happen.
[01:04:24]
Interesting. Oh, that's cool. Yeah. Yeah. So I could imagine like...
[01:04:29]
But I don't know if that will have any surprising effects. That's going to be interesting to
[01:04:35]
figure out.
[01:04:36]
Well, it's definitely an ambitious path to go down, but it would open up a lot of interesting
[01:04:40]
possibilities. But yeah, you could certainly like, I could imagine you saying here's essentially
[01:04:46]
a magic function that gives you some expensive computational result and under the hood, swap
[01:04:54]
it out to do some optimizations and make it more efficient and not call it if it's not
[01:04:59]
needed and that sort of thing.
[01:05:00]
Yeah, potentially. But yeah, I would definitely not write a baggage code that would depend
[01:05:05]
on this. It would just be like an improvement that people will not notice.
[01:05:10]
Yeah, exactly.
[01:05:11]
In terms of performance, under the hood optimization, that's the only way that I would accept doing
[01:05:17]
something like that.
[01:05:18]
Yes, I agree. Exactly. Yeah. But as long as you can preserve the semantics and expectations
[01:05:24]
of how it's going to behave, you can swap it out for however you achieve that under
[01:05:28]
the hood.
[01:05:29]
Yeah. But it would be kind of tricky to test because you could not use Elm test for this
[01:05:35]
anymore.
[01:05:36]
Yeah.
[01:05:37]
All of these guarantees that we've talked about, things that we can rely on that makes
[01:05:45]
analysis easier, it applies to linters, but it also applies to code optimizers. For instance,
[01:05:53]
Elm optimize level two, it knows that it can move some functions or some operations around
[01:06:01]
as long as they don't depend on each other because they know, well, this function has
[01:06:04]
no side effect, this function has no side effect, so they can move things. They can
[01:06:09]
do a lot of these things because they know that the compiler wrote code in a specific
[01:06:15]
way that the original code was in a specific way, that things are valid, that semantics
[01:06:21]
match, that types match as it was in the code. So using all of these guarantees that the
[01:06:28]
compilers, that the type checker, the language design give you, you can do a lot of powerful
[01:06:33]
things. But as soon as you missing one of those, well, some areas, some optimization
[01:06:39]
ideas, some linter rules that you wouldn't want to write, they crumble, you can't do
[01:06:46]
them anymore. Or they require a lot more analysis, which we've seen can be hard. So yeah. So
[01:06:53]
that's the part about what I was saying, languages should remove dynamic features or features
[01:06:59]
that are hard to analyze, like side effects and dynamic values. Those are hard and therefore,
[01:07:06]
if we can remove those, if we can make them more static, well, that helps static analysis
[01:07:14]
tools. And that is something that I don't think that a lot of other languages know fully
[01:07:20]
enough, right? I just wish people knew that more.
[01:07:24]
What I'm taking away from this is basically like move the goalposts. Like instead of trying
[01:07:31]
to solve a hard problem, define the problem in a way that makes it easier, right? So like
[01:07:39]
we talked about with static analysis, like if you have a like, oh, I have to do all this
[01:07:44]
code flow analysis to make this, to figure out what the class name is. Make the problem
[01:07:49]
easier for yourself by making more assumptions, having more constraints. So you can do that
[01:07:54]
in a language and you can do that in a static analysis rule and any sort of static analysis
[01:07:59]
context you can move the goalposts, make the problem easier for yourself.
[01:08:03]
Yeah. I wrote a blog post called safe unsafe operations in Elm, which is basically doing
[01:08:09]
the same idea. Like we want to make the idea is we want to make something like reg ex dot
[01:08:15]
from literal, where we can basically have a function that doesn't return a maybe reg
[01:08:22]
ex, but a reg ex and Elm review then says, well, this is okay. We know that at compile
[01:08:28]
time this works because this looks like a valid reg ex. So this is fine. And whenever
[01:08:34]
you pass in a dynamic value, we move the goalposts and by saying like, please don't write it
[01:08:39]
this way. We don't understand it. And you can, you can do that that way or you can do,
[01:08:45]
make the analysis more complex, both work. But as long as at some point you can give
[01:08:51]
the guarantee then everyone's happy. Otherwise you can fall back on the reg ex dot from string,
[01:08:59]
which returns maybe never maybe reg ex.
[01:09:01]
Well, are there any other things people should look at? Any, any blog posts, any conference
[01:09:07]
talks, perhaps soon to be released?
[01:09:10]
Yeah. So I, a lot of what I said today was explained hopefully better than today in a
[01:09:18]
talk that I made at Lender Days mid mid July. So it's called static analysis tools, love
[01:09:25]
pure FP. I think it's going to be released. I'm pretty sure it's going to be released
[01:09:29]
after this episode. So hopefully we, I haven't spoiled too much. I think some parts of it
[01:09:36]
at least, but I think it's going to be, I think it was a good talk. I'm very pleased
[01:09:41]
with it at least.
[01:09:42]
I'm excited to watch it. Yeah. We'll keep an eye on our Twitter account then we will,
[01:09:46]
we'll tweet a link to it. We will we'll try to update the show notes though. They may
[01:09:50]
be immutable in your podcast.
[01:09:53]
Yeah. They often are right.
[01:09:54]
Yeah, I think so. But yeah, keep an eye on our Twitter and you're in until next time.
[01:10:01]
Until next time.