spotifyovercastrssapple-podcasts

The Root Cause of False Positives

We explore false positives and negatives in static analysis tools, and how Elm helps us avoid them.
August 15, 2022
#63

Transcript

[00:00:00]
Hello Jeroen. Hello Dillon. I'm quite positive you're going to enjoy this episode. But I
[00:00:08]
could be wrong. Maybe it's a false positive. Maybe it's a false positive. That's a good
[00:00:15]
one. What are we talking about today? Today let's talk about false positives. Let's talk
[00:00:21]
about how Elm removes kinds of false positives in at least the area that I care about, static
[00:00:28]
analysis. But I think we can find ways how that applies to other things like optimizations,
[00:00:35]
stuff like that. Oh, okay. I like this. So let's start with the definition. What is a
[00:00:40]
false positive? This one is always kind of tricky. Like which one is the positive and
[00:00:44]
which one is the negative? Yeah, because you got false positives, you got false negatives.
[00:00:49]
So yeah, a false positive is when, at least for a linter or for a tool like Elm review
[00:00:55]
for a competitor, is when the tool reports a problem when it should not. Like it tells
[00:01:03]
you, hey, there's a problem here and actually there's no problem. That is a false positive.
[00:01:07]
So positives means that we think there's something and there isn't. And then you've got false
[00:01:14]
negatives, which are the tool should report a problem, but it doesn't. And then you've
[00:01:20]
got true positives and true negatives, which are like real errors and real non errors,
[00:01:26]
which yeah, maybe those names are weird. I actually don't know if there are names for
[00:01:31]
those probably. Yeah, it seems reasonable. If you take a COVID test and it comes back
[00:01:37]
positive, it could be a false positive, which means the test said you had COVID, but you
[00:01:42]
don't have COVID. It could be a false negative, which means the test said you do not have
[00:01:47]
COVID, but you do have COVID, which in fact is quite common, which kind of makes you think
[00:01:52]
like how much value is there to testing when there's actually like, from what I understand,
[00:01:56]
a fairly high rate of both of those. And it just like makes you like question the whole,
[00:02:05]
how can I make decisions based on this information, which I know is flawed quite often. So it's
[00:02:12]
a strange situation. The same dynamic applies in static analysis. If you can't rely on the
[00:02:18]
results you're getting, it makes you sort of lose faith in what the tool is telling
[00:02:23]
you. Yeah. Well, there's a difference between lenders and medicine. There is. Although sometimes,
[00:02:32]
you know, you rely on the guarantees you make in your code for medical procedures and when
[00:02:37]
people's health is on the line. So there can be an overlap. Yeah, absolutely. So yeah,
[00:02:42]
what is in your opinion, the root cause of false positives? Well, I did see a tweet this
[00:02:49]
morning that I was thinking maybe should have been marked with spoiler warning, at least
[00:02:54]
for Elm Radio co hosts. I regret posting it. Let me try to forget I ever saw that and see
[00:03:04]
if I can answer the question without prior knowledge. Please forget the correct answer.
[00:03:11]
Which was a very good tweet, by the way. We will link to that tweet. I will explain it,
[00:03:16]
I guess. It's hard not to be biased by the nice tweet you wrote, but I would tend to
[00:03:23]
think of it as like maybe complexity. Like how complex is an answer. Like if you're looking
[00:03:33]
at a chess position and you want to say, is there a forced checkmate in this chess position?
[00:03:40]
It's easier to, you know, if you can confirm that there is, you know that there is. But
[00:03:46]
if you can't, maybe there is one, but it's so complex. There's just so much complexity
[00:03:52]
to the position that there are nearly infinite possibilities. So it's very hard for you to
[00:03:58]
say. Like if you look at an opening chess position, is there like a forced checkmate?
[00:04:05]
Like maybe, I don't know, if I had infinite capacity to process infinite lines, then maybe
[00:04:13]
I could answer that. But it's just too complex for me to say yes or no.
[00:04:18]
Yeah, it's really easy to figure out whether there's a forced checkmate if you have two
[00:04:25]
pieces on the board and they can both only move, do one move, then it's really easy.
[00:04:32]
Like you got two or three, four things to check, depending on how things work. And that's
[00:04:38]
it. But if you got 10, 20 pieces on the board and they can all move in so many directions,
[00:04:46]
yeah, things become a lot harder. You have a lot more checks to do. You have a lot more
[00:04:51]
scenarios to evaluate.
[00:04:54]
Complexity is in my opinion, for sure, a cause of false positives.
[00:04:59]
Yeah. And if you can rule things out and if you can eliminate variables, that helps eliminate
[00:05:07]
that. As you said, that's a very good example with if you just have four pieces on the board
[00:05:12]
in a chess game, and can you answer that question? At that point, you can answer that with confidence.
[00:05:18]
If you can say, and sometimes it's easier to say proof by existence proof. And existence
[00:05:24]
proof is easy. I mean, if you can find it, but sometimes proving absence is difficult.
[00:05:31]
You could prove existence like, do black birds exist? Well, I can look out my window and
[00:05:40]
see a crow and say, yes, there are black birds, but then do...
[00:05:43]
Wrong, it's a dinosaur.
[00:05:45]
Well, then perhaps. And I mean, yeah, it does get quite philosophical. Like what can you
[00:05:52]
know? In a way, this is sort of like the term epistemology comes to mind, which is just
[00:05:57]
sort of what is knowable. And I often found it frustrating in philosophy classes when
[00:06:08]
epistemology comes up. I always found that topic extremely frustrating in a philosophical
[00:06:12]
context because it's kind of a dead end because you say what can be known? And at the end
[00:06:19]
of the day, you basically get to Descartes conclusion. I think therefore I am, which
[00:06:25]
is to say that is the only thing that I can prove. The only thing that's knowable because
[00:06:30]
my senses can mislead me. I can see, have I ever seen something and come to the wrong
[00:06:38]
conclusion about what it was and later discovered that I was wrong? So how can we trust anything
[00:06:44]
that we think we know? What is knowable? What is the nature of knowing and what is knowable
[00:06:49]
in the universe? And it's just to me, it's just not a satisfying subject in philosophy
[00:06:54]
because it's just like, well, nothing except that we exist. And so, okay, done. Like what
[00:07:01]
more can we do with that topic except go around in circles and we don't really get anywhere.
[00:07:06]
But with code, what is knowable? With static analysis, what is knowable? That's kind of
[00:07:11]
a more satisfying question because it's a more constrained area where we're looking
[00:07:16]
at. It's not just like questioning, can we trust these axioms about the universe? We
[00:07:23]
can sort of just think about it in terms of what can we know about this code? So yeah,
[00:07:28]
you want to give us your answer to that question. What is the root cause of false positives?
[00:07:33]
Well, the one that I came up with, and maybe we can figure out something even deeper than
[00:07:38]
that, but what I came down to is missing information just in general. Some of that through complexity,
[00:07:46]
some of that through other things, other means. But basically the way that I imagine it is
[00:07:51]
if you were omniscient being or omniscient tool, you had knowledge of everything, you
[00:07:57]
knew everything that happened in your program at runtime, at compile time, and you knew
[00:08:04]
what the developer's intents was, then you would always be right in what you report and
[00:08:11]
you would never miss anything. You would know, well, this is never used or this never happens
[00:08:17]
or this is not a problem or this is a problem. If you had all the information in the world,
[00:08:22]
then I think you would never be wrong. Therefore, the problem is missing information and you
[00:08:29]
can get those through different means. So for instance, for static analysis tool, one
[00:08:37]
of the bigger problems is dynamic code. So knowing what a value is at a certain point
[00:08:44]
is hard to figure out. Some tools can do it quite well. TypeScript does it quite well
[00:08:49]
and to some extent, much better than Elm. Elm knows the types, but it doesn't know the
[00:08:55]
values nor does it care about it. Languages that try to do proofs like Isabel or TLA+,
[00:09:05]
I think they know a lot more about what's going on in the program, but they're also
[00:09:10]
pretty complex and I don't know how they work. I don't know what their limitations are. Dynamic
[00:09:15]
things are complex, for instance. So when you have missing information, what do you
[00:09:20]
do? Because that's going to happen, right?
[00:09:22]
And perhaps you want to use the precautionary principle and say, I don't know if there's
[00:09:28]
a problem, so I will say, I can't prove that everything is okay, therefore, I'm going to
[00:09:37]
say there's potentially a problem because I can't prove that there's not a problem.
[00:09:41]
I guess this or the other approach I can think of is you could say, well, I didn't find a
[00:09:47]
problem, so no problems, right? That's sort of like what TypeScript does, right? It says,
[00:09:53]
I will tell you if I can guarantee that there is a problem rather than I will tell you if
[00:09:58]
I cannot guarantee there are no problems.
[00:10:02]
Yeah. So to me, that is making a presumption. So I might be wrong about the correct word,
[00:10:09]
but a presumption is when you accept something as true on the basis of probabilities. Like
[00:10:16]
it's very likely, I'm missing some information. I don't know whether this is true or not,
[00:10:21]
but I think it's going to be more likely to be true than false. So the example that I'd
[00:10:27]
like to take is an ESLint rule that is called array callback return, which is basically
[00:10:35]
when you do array.map and you pass it a function, that function should always return something.
[00:10:40]
Right. So you don't accidentally have a list of... that you're writing TypeScript, you don't
[00:10:46]
write return and so it's returning void, which I think has happened to everybody that goes
[00:10:52]
between like JavaScript and Elm that you forget a return and you're like, why is this value
[00:10:58]
all nulls or undefined?
[00:11:00]
Yeah, undefined. Yeah. So it's a very useful rule to have. But the thing is when you analyze
[00:11:06]
JavaScript code, there is something that is missing and that is type information. So TypeScript
[00:11:11]
could potentially help here, but basically when you see array.map, array being a variable
[00:11:18]
or something, it looks like an array or the map method of an array. And therefore you're
[00:11:23]
going to consider, you're going to presume, well, it's pretty much for sure the array.map
[00:11:30]
method. So I'm going to report any problems that are confined in the function that is
[00:11:36]
passed to it, but that might be wrong. So when you're missing, you're having missing
[00:11:41]
information, you're going to make presumptions and when they turn out to be wrong, that's
[00:11:47]
when you have a false positive or false negative. Because what we could also do is say, do some
[00:11:52]
more analysis, like is this read an array? Can we find somewhere where it is declared
[00:11:59]
where we can clearly see that this is an array? And if we see that it's an array, then we
[00:12:05]
report a problem. And if we don't see that it's an array, if we don't know, then we don't
[00:12:10]
report anything. And that removes all the false positives that we have, but that creates
[00:12:14]
a lot of false negatives. So whenever you need to make presumptions, you're going to
[00:12:20]
have the choice to lean more towards a false positive or false negative, but you're going
[00:12:25]
to have to choose or do more analysis, which can be complex and maybe you're not going
[00:12:30]
to be able to figure out the answer.
[00:12:32]
Yeah. And it strikes me that this doing more analysis piece, that there are maybe two different
[00:12:39]
categories here that either something is knowable, but takes a lot of work to know potentially
[00:12:49]
to the degree where it's essentially unknowable because it's infinite. Like is there a forced
[00:12:54]
checkmate from the first move on a chessboard? Like technically knowable, but practically
[00:13:00]
unknowable or even if you knew, maybe the sequence of moves that would lead you there
[00:13:07]
is so large that it's not usable information.
[00:13:11]
You mean it's not, you can't compute it?
[00:13:14]
Or if you compute it, it's like, yeah, here's a list of size 30 trillion of all the possible
[00:13:21]
responses to these different lines that gives you guaranteed ways to respond with a forced
[00:13:26]
checkmate and it's like, okay, well I can't really use that. So even if it's technically
[00:13:30]
knowable, it's essentially as the amount of analysis and information to deduce if something
[00:13:37]
is knowable approaches infinity, it starts to resemble being unknowable. But there's
[00:13:47]
like unknowable and then there's knowable with work. Those are two different things.
[00:13:51]
So like an example of something that is literally unknowable would be like, if you take eval
[00:13:59]
into account, if you run eval, what can this code do? Well, if it's from user input, that
[00:14:06]
user input is undefined. You don't know what the bound, there are no bounds on it.
[00:14:11]
So well, it's not undefined, it's a string.
[00:14:14]
Right. Or who knows?
[00:14:17]
Maybe it's a string undefined.
[00:14:20]
Yeah, it's not known. And therefore there's not enough information there to analyze certain
[00:14:28]
things about that. Whereas there are certain scenarios where you often talk about code
[00:14:34]
flow analysis and maybe it's like a massive amount of work. Maybe you need some postdoc
[00:14:43]
programming language researchers to assemble a team to solve this problem, but it's technically
[00:14:50]
knowable and you could do that or it's just a huge amount of work.
[00:14:55]
If you listener want to do that, please contact me.
[00:14:59]
And Elm is a very interesting space for these problems because as you've sort of been hinting
[00:15:04]
at it is more knowable because it's more constraint. Like in our chess analogy, it's more akin
[00:15:11]
to the chessboard that just has a handful of pieces rather than the starting chess position.
[00:15:16]
Yeah. So you're now hinting at something interesting is like, why is Elm more knowable, more analyzable
[00:15:23]
than other languages like JavaScript? And to me, there are multiple aspects to that.
[00:15:30]
One of which is that there's a compiler. Doesn't seem like a big thing, but it actually is
[00:15:35]
because potentially when you analyze your JavaScript code, it's a bunch of gibberish,
[00:15:40]
like saying A equals A or one equals two or well, actually not. Yeah. But basically what
[00:15:48]
you can have is a code that looks like code, but doesn't mean anything like it references
[00:15:55]
undefined variables or it has embedded semantics.
[00:16:02]
It's syntactically valid, but not well defined code.
[00:16:05]
Yeah, exactly. And those are all things that a compiler checks for. So when you know that
[00:16:11]
these are checked for you by the compiler, by a compiler, then you can start to rely
[00:16:17]
on them. And that is quite important actually. So JavaScript doesn't have a compiler, but
[00:16:22]
what it does, what it does have is a linter. So what ends up happening for language like
[00:16:29]
JavaScript is that you have a lot of ESL rules to do the same work that a compiler would
[00:16:36]
do. So you have a rule for undefined references. You have a rule for reporting duplicate declarations,
[00:16:46]
stuff like that. And once you have those, then other rules can kind of depend on those
[00:16:54]
semantic issues being not there. But it is like they can kind of rely on those because
[00:17:02]
people can disable the ESLint errors or people can just not enable those. And the fact that
[00:17:11]
we kind of need those rules is a reason why they're using the recommended configuration
[00:17:17]
for ESLint or other tools. If the tool doesn't ship with it, then they're not going to be
[00:17:22]
enforced and other rules will not be able to depend on those. And that is actually something
[00:17:27]
that we don't, we need with Elm because the compiler checks for so many things that the
[00:17:33]
Elm review rules don't need any more certainty. They can just rely on the things that the
[00:17:38]
compiler checks for and that's enough for pretty much anything.
[00:17:42]
Right. They might have like a snowball effect where by applying different rules and applying
[00:17:50]
fixes to those rules, you can eliminate more dead code because making one piece of dead
[00:17:59]
code go away makes another piece of dead code go away and there's this snowball effect.
[00:18:03]
But as you say, the language guarantees are enough that you're not depending on, I need
[00:18:10]
this guarantee in order to make my checks, therefore you have to turn on these rules
[00:18:15]
as prerequisites. I mean, you could imagine scenarios like that, but I guess you haven't
[00:18:20]
encountered them yet.
[00:18:22]
Yeah, I haven't yet. But yeah, for instance, if you were trying to evaluate an expression
[00:18:28]
and you saw a reference to a variable and you didn't have the guarantee because you
[00:18:32]
were in JavaScript that that variable was actually referenced anywhere, then potentially
[00:18:38]
you would enter a weird state or you would crash because, oh, well, I expected this to
[00:18:44]
be in the scope somewhere. So it's really nice not to have to be defensive about those
[00:18:50]
things. So therefore a compiler really helps with all those things.
[00:18:55]
Right. I could imagine like some rules around divide by zero or not a number or something
[00:19:05]
like that. And you could say, well, there are certain entry points where you could get
[00:19:11]
a number from a port from a JSON decoder. And at those terminal points, maybe you have
[00:19:19]
an Elm review rule that checks that you need to unwrap them into safe types that are not
[00:19:26]
a number. And then that rule could be a prerequisite for another rule that assuming that all of
[00:19:34]
the number inputs that you're using are not a number to begin with, you're dividing them
[00:19:38]
in a way that you're checking for divide by zero and things like that. And you're going
[00:19:44]
to have well defined values.
[00:19:46]
Yeah, I could imagine that as well. At work, we have a rule that detects unused CSS classes.
[00:19:54]
So what we do is we take our CSS files and we extract all the classes from those and
[00:20:00]
we turn them into an Elm file that our Elm review configuration then uses. And then we
[00:20:06]
just go through the entire files and find out the ones that are used and report the
[00:20:11]
ones that are left. But to be able to tell that, we also have another rule that checks
[00:20:17]
for any usages of the class function that are too dynamic, that are too hard for Elm
[00:20:24]
review to tell. So they kind of depend on each other. I actually don't remember whether
[00:20:29]
we merge them into one rule. But as long as you don't make anything depend on the other
[00:20:35]
one, like a fix, like imagine you have a fix that you want to apply, that should probably
[00:20:42]
not depend on information that has not been validated before. And because in Elm review
[00:20:48]
fixes take the upper hand or prioritize compared to non fixed errors, at least in fixed mode,
[00:20:57]
that can be kind of dangerous. But yeah, at least the number of guarantees that we have,
[00:21:01]
the number of presumptions that we need to do in Elm, or at least in Elm review is a
[00:21:06]
lot lower than what you would do in ESLint. So this hasn't been a problem really, so far,
[00:21:13]
in my experience, it could be, but I'm guessing it would be for things that are a lot more
[00:21:19]
precise than what we're currently doing.
[00:21:23]
And this seems like what you're talking about with checking for class names that are too
[00:21:29]
dynamic for you to basically effectively analyze. Because you could imagine pulling on that
[00:21:35]
thread more and more and saying, well, what if it's just an inline concatenation between
[00:21:42]
two string values? Could we just check those two literal concatenated string values? And
[00:21:48]
is that literal enough for us to use? And you say, okay, well, now that we're checking
[00:21:52]
for concatenated string literals, why don't we add something that says, well, what if
[00:21:58]
it's a string constant that's concatenated to another one? And maybe that's quite useful
[00:22:02]
because you want to be a little more dynamic with your class name. So and then you say,
[00:22:07]
well, what if we want to add a number to it? Can we and then we want to be able to do arithmetic
[00:22:14]
on that numbers, or we want to be able to map over a list of numbers and then check
[00:22:19]
those values. And eventually, you're just building like a pre compilation like evaluator
[00:22:26]
that's actually evaluating your program before a compile time. And you certainly can do those
[00:22:35]
things. But you're intentionally choosing a strategy there to preemptively give a false
[00:22:41]
positive and just say, or to put a constraint on the rule where you're I mean, I guess another
[00:22:47]
way to look at it is rather than the false positive, you're saying it's not a false positive,
[00:22:52]
it's just a constraint that the rule as the rule is saying, this is not a false positive
[00:22:57]
that like, hey, this could actually be valid, you're actually saying, I'm adding, here's
[00:23:02]
a rule that adds an additional constraint to your code. And it's not a false positive.
[00:23:06]
It's a true positive, this is not okay, you have to use this constraint where you only
[00:23:11]
use string literals for class names.
[00:23:13]
Yeah, yeah, as you said, like, if you want to figure out a lot more things, basically,
[00:23:20]
at some point, you're building an interpreter, as I see it, which would definitely be valuable
[00:23:27]
to be able to infer a lot more things.
[00:23:29]
And in Elm, you can do a lot in that regard.
[00:23:32]
I think you can do a lot. Yeah, because it's just pure functions, right? So it's none of
[00:23:38]
them have side effects that make the next things a lot easier. But it would still be
[00:23:42]
a lot of work and would make the tool a lot more slower, I think. And for the rule that
[00:23:48]
doesn't report false positives, but reports things that it wants to enforce new constraints,
[00:23:54]
you're absolutely right. Where I would say that it switches from a false positive to
[00:23:59]
a constraint is in the error message. Like if the error message actually explains like,
[00:24:06]
hey, this is not a problem in the sense that it's not going to cause your code to crash
[00:24:12]
or behave weirdly. But for the sake of this other Elm review rule that makes sure that
[00:24:19]
we don't have any new CSS classes, we require that this is a that the argument to class
[00:24:26]
is a static string, a string literal. And that's what we did. So if you explain the
[00:24:33]
problem and if you explain the benefits, then people accept it. Now, I haven't heard anyone
[00:24:41]
complain about this, so I'm very happy about that. But if you have like a one liner in
[00:24:45]
like in most static analysis tools, like that's going to be hard to explain. Like what do
[00:24:50]
you what is a problem? How to move forward? Why is this a real problem? Like, yeah, people
[00:24:57]
want to understand the problems that you're reporting.
[00:25:00]
So Elm review does, I mean, obviously abstract some things away from the user. In this case,
[00:25:09]
like a review rule author. Like, for example, like you do provide this lookup table, the
[00:25:17]
module name lookup table is sort of somewhat going down this path of being able to provide
[00:25:24]
more information about, you know, like, and it's a very, to me, it's a very interesting
[00:25:29]
path in Elm review. And I'm curious, like, are there any other examples where you sort
[00:25:36]
of do some amount of additional analysis of the code where you can sort of process some
[00:25:42]
information? Not, you know, not a full on pre interpreter, but doing a little more analysis.
[00:25:49]
Are there other examples of that in Elm review? And are there things on your mind that you
[00:25:54]
think might be appropriate for Elm review to expose?
[00:25:57]
Yeah. So just to explain the lookup table that you mentioned, the module name lookup
[00:26:04]
table is just basically a dictionary that says at this location in the source code,
[00:26:09]
this is a reference to this value, which comes from this import from this module. Because
[00:26:15]
in the abstract syntax tree, you have when you say a dots b, you reference the b function
[00:26:23]
or b type, depending on the uppercase of the a module.
[00:26:28]
Right, because HTML dot text could just be text, or it could be import HTML as h, and
[00:26:36]
then it's h dot text.
[00:26:38]
Yeah, or imports HTML dot styled as HTML, stuff like that. So the module name lookup
[00:26:45]
table is there to make it much easier for you to figure out what is the real original
[00:26:52]
module that this value comes from, which we didn't have at the beginning.
[00:26:56]
People probably invented that from scratch or a sort of imperfect version of that, I
[00:27:03]
would imagine.
[00:27:04]
Yeah, basically people were like, do I see an import to HTML? If so, what is the alias?
[00:27:11]
And also, does it expose the text function literally or using exposing dot dot? And basically,
[00:27:18]
people did it like that way, which in practice is good enough. I don't think you're gonna
[00:27:24]
have a lot of false positives, but it's a lot of work. And you could have some false
[00:27:29]
negatives potentially. So yeah, this was something that I really wanted to add to Elm Review
[00:27:35]
and I got it in there. And now I don't think about this sort of problem, which is really
[00:27:40]
nice. But it did require a few iterations to get right.
[00:27:45]
Yeah, it's basically something that you pass into the context, right?
[00:27:51]
Yeah, the way that you initialize your context, basically your model for going through the
[00:27:58]
AST, you say, hey, I'm interested in having the lookup table because I think that's gonna
[00:28:04]
be useful. Please compute it for me and give it to me.
[00:28:08]
And then people can use it.
[00:28:09]
Which is largely like a performance optimization, right? If it's not needed, then you don't
[00:28:13]
need to compute it.
[00:28:14]
Yes, exactly. Currently, if any rule requires it, then I compute it. I think I want to be
[00:28:21]
smart in the future where only the rules that I'm going to run now will, if any of them
[00:28:28]
are needed, then I compute it. Because basically, the fix mode is quite slow. And I think I'm
[00:28:34]
going to need to be able to cut up the review phase and running one rule at a time and be
[00:28:40]
able to stop whenever I find like a fix.
[00:28:44]
Elm is very interesting in that regard because certain times you need to compute certain
[00:28:52]
things upfront in a sort of framework design because the user can't just invoke a method
[00:28:59]
that then mutates some dictionary somewhere and then suddenly it doesn't have that value,
[00:29:05]
but it goes and performs a side effect and puts it in there and memoizes it. So you sort
[00:29:10]
of architect things differently in Elm.
[00:29:13]
Yeah, and also because we can't memorize it. So either we say, well, we're going to compute
[00:29:19]
it once and then people will use it zero to n times, or we're going to compute it lazily
[00:29:26]
and then it will be computed as many times as people require it, which is unfortunate.
[00:29:32]
In practice, it probably works out okay a lot of the time, especially in the context
[00:29:37]
of a browser application. Maybe for CLI applications, it's a little bit different.
[00:29:44]
For performance heavy tools, yeah.
[00:29:47]
So are there more cases where you've considered adding these types of things that provide
[00:29:53]
more sort of, you know, rather than just the abstract syntax tree, the syntax tree with
[00:29:58]
a little processing, with a little extra analysis performed for you that you can access through
[00:30:05]
the Elm review platform?
[00:30:06]
Yeah, I actually just added one this morning, this weekend, basically the module documentation.
[00:30:15]
So that's the curly brace dash pipe comment that you have at the beginning of your file
[00:30:21]
before the imports. That is the module documentation. And currently Elm syntax, the AST library
[00:30:27]
that we use doesn't have a way to store that as the documentation of the module. It's just
[00:30:34]
among the comments. So what I had to do in a bunch of rules is to go through the comments,
[00:30:41]
find the module documentation, which I just learned this weekend that there was room for
[00:30:48]
false positives because ports also have that problem. Like the documentation for port is
[00:30:55]
also not attached to the port. Whereas documentation for a function or for a type is properly attached.
[00:31:02]
So yeah, basically it was possible to confuse the documentation for port as the module documentation.
[00:31:10]
So yeah, that's, it's not super tricky, but it's not nice to have to compute it everywhere.
[00:31:17]
And in all of my implementations it was potentially broken. So I just made it, I added a new visitor
[00:31:26]
or a new context creator function to be able to have that information right away, basically.
[00:31:33]
And so I'm going to publish that in the next version. And the other big one is type information.
[00:31:38]
Yeah. So it is really surprising that Elm review works so well for a typed language,
[00:31:45]
considering we don't have type information. There are two ways that we can do that. One
[00:31:50]
of them is by invoking the compiler, which has a few problems, notably that you can't
[00:31:57]
invoke the compiler in tests. So probably have to write a separate testing framework
[00:32:02]
for Elm review rules where it would create files, run the compiler thousands of times,
[00:32:08]
because I have thousands of tests. And also the whole review process is one giant pure
[00:32:15]
function currently. And if I had to ask for the type information, then I would have to
[00:32:23]
break out of that somehow, especially in the fix all mode, it would be very messy in practice.
[00:32:33]
So the other method is to do the type inference ourselves, which I've tried a few times so
[00:32:39]
far and got so far.
[00:32:43]
It's a somewhat challenging problem, I would imagine.
[00:32:47]
Yeah. I think you need to know how things work, what the algorithm that Elm uses works
[00:32:54]
so that you have the same results because you got some edge cases where it can have
[00:33:00]
some differences. But basically you need to know the theory well in order to do a nice
[00:33:06]
implementation. And I've never understood that algorithm properly.
[00:33:11]
Yeah. It's a huge task.
[00:33:13]
I know someone who's working on this on and off. I don't want to put pressure on them.
[00:33:19]
Someone who has a very common name, it would seem.
[00:33:24]
Maybe yes.
[00:33:25]
We know you're listening.
[00:33:28]
Yeah. But yeah, that would be really nice. A few applications of that would be, for instance,
[00:33:39]
the no missing type annotation rule that could generate the missing type annotations.
[00:33:46]
That would be so nice.
[00:33:47]
Yeah. So we already have that in the editors. So we know that could work well. It doesn't
[00:33:54]
always give the nicest error, the nicest type spots. It could still be helpful.
[00:34:00]
But that would unlock more information. You said that removing false positives, it comes
[00:34:06]
down to needing more information. If you had unlimited information, you could remove all
[00:34:12]
false positives. So what are the areas that you could remove false positives with that
[00:34:18]
extra information?
[00:34:19]
Well, it's not necessarily false positives. It's false positives and false negatives because
[00:34:24]
you would be able to know more and therefore you would be able to report more. In Elm Review,
[00:34:31]
there are basically no false positives. So I'm not sure it would help with much. I know
[00:34:38]
with one location where it could potentially help, where we do have a false positive that
[00:34:43]
people report sometimes, which is the no unused custom type constructor arcs. It's a mouthful.
[00:34:54]
Basically, you can create a custom type where you say type A equals A int, type ID equals
[00:35:02]
ID int, for instance. And then you never extract that identifier, that string value. So the
[00:35:11]
rule reports that as not used. But potentially you could use that in a comparison. Like,
[00:35:20]
is this ID the same one that this one has? And if you use it that way, there's a false
[00:35:27]
positive. If you never extract the ID in another way. So that could potentially be able to
[00:35:34]
tell us like, hey, in this comparison, is there a usage of this type? If so, don't report
[00:35:41]
that type. So that's a false positive that we could remove. And then it's mostly going
[00:35:44]
to be about false negatives because there's a bunch of rules that we can't write with
[00:35:51]
that type of information. And well, I don't have that many in mind, but a few, like for
[00:35:56]
instance, the one that I really want and that some people want is reporting unused record
[00:36:02]
fields. That can get quite tricky to do right if you want to, basically we can do it. It's
[00:36:11]
just going to have a lot of false negatives. So as I said, like you can either lean towards
[00:36:15]
false negatives or false positives when you don't have information. Right. So basically
[00:36:19]
what we can do is, well, if we see that a function takes a extensible record as an argument
[00:36:27]
and some of those fields are not used, then we can remove those. And I actually already
[00:36:33]
have a prototype without working, but if you pass that argument to a list.map function,
[00:36:42]
for instance, so you have a list of some records and you pass that to a list.map. Well, now
[00:36:48]
you need to figure out what is the type of that mapper function that you pass to the
[00:36:54]
list.map because if that one uses some of the fields and those fields are used, if it
[00:37:00]
doesn't, then they're not used. But if you don't know the type, well, you don't know
[00:37:05]
whether they will be able to, which fields are used and which ones are unused. So therefore,
[00:37:12]
if we want to be safe and not report false positive, we're just going to say, well, it
[00:37:16]
looks like it could use anything. So we're not going to report anything. And that's the
[00:37:20]
same thing for a model. Like you pass your model, which is usually a record with plenty
[00:37:27]
of fields, you pass a model to some function that is a lambda that is hard to evaluate.
[00:37:33]
Therefore, we can't tell anything about it. So we stop. So having type information here
[00:37:38]
would be a lot, very helpful because we could analyze the type of those functions and we
[00:37:45]
could see, well, it seems to be using this field, this field, and that's it.
[00:37:51]
Yeah. It seems like that would unlock a lot of possibilities, not to mention fixes that
[00:37:57]
could, you know, I mean, code generation fixes, all sorts of ideas you could find there.
[00:38:04]
Yeah. I can imagine we will still have plenty of false negatives, but I think we will be
[00:38:09]
able to catch all false positives or we would not have false positives, but that's yeah,
[00:38:15]
again, like how conservative we want to be about things being used or unused. Cause we
[00:38:22]
could go either way. We could potentially have a configuration, the rule that says try
[00:38:26]
to be more aggressive now just for a while. And then you go check the false positives
[00:38:31]
and maybe you can remove, you could check the errors and maybe you can remove a few
[00:38:35]
things. Maybe you don't, but yeah.
[00:38:37]
Right.
[00:38:38]
But yeah, in general, we want to be very conservative and not report any false positives because
[00:38:42]
those are super annoying.
[00:38:45]
Yeah. So it seems like, I'm not sure if this falls into the same groups you've mentioned
[00:38:51]
of choosing to err towards false positives or err towards false negatives. But when we're
[00:38:58]
talking about ways to work with less information, you don't have as much information as you
[00:39:04]
need to be 100% sure of something that you're checking for. Well, like if we look at the
[00:39:13]
chess example again, you know, what do you do in that situation? If you, if you can concretely
[00:39:18]
determine it, then it's, then it's easy enough. If you can't, then you end up, you know, what
[00:39:24]
do you do for an opening chess move? You tend to rely on strategies and heuristics. So heuristic
[00:39:31]
for, you know, determining whether a chess move is good is you want your pawns to be
[00:39:37]
supporting each other. You want, you want to try to take the opponent's queen if you
[00:39:43]
can for trading for your knight. That might turn out to be a move that, that leads to
[00:39:50]
you being checkmated in the next move. But that's a heuristic that you can say, well,
[00:39:55]
let's just kind of generally assume that this is going to tend to be a good thing. And so
[00:40:00]
now your rule is now going back to like Elm review rules in the context of Elm review.
[00:40:06]
Now these heuristics are telling you things about your code that might give you unreliable
[00:40:14]
results. Because you're, cause essentially what a heuristic is, is it's measuring the
[00:40:20]
thing that is not directly what you care about. Like in a chess game, you care about checkmate.
[00:40:26]
That's the only thing you care about. But, and maybe like the number of moves until you
[00:40:30]
checkmate, like that's all you care about. But in this heuristic of trying to take the
[00:40:37]
opponent's queen, if you can, you are having a stand in goal that's, that's easier to determine,
[00:40:45]
but might that stand in might be flawed in some cases that stand in might actually not
[00:40:51]
yield the result you might, might lead to you getting checkmated.
[00:40:55]
Yeah. So yeah, in chess, I think computers are powerful enough to basically compute every
[00:41:02]
possible move in a game or close to no, no, probably not.
[00:41:08]
They're actually not. They actually rely a lot on heuristics to like prune the tree because
[00:41:14]
it's an exponentially growing tree. So it it's approaching infinite. So computers can't
[00:41:22]
deal with that, but they, so they do have to use heuristics.
[00:41:24]
Yes. They do use heuristics and do prune at all things. Yeah. Let's imagine they could
[00:41:33]
compute every case. Then basically it has perfect information. Right. So whatever it's
[00:41:40]
going to feed into try, it's going to work. If it's slightly limited, which in this case
[00:41:47]
it is, then you can improve the logic by saying, well, this is obviously a bad move. Right.
[00:41:53]
And you can remove some complexity. You can now rely on those. It's going to be a presumption.
[00:41:59]
Yeah. Right. Exactly. So when that turned out to be wrong, you're going to have worse
[00:42:03]
results than expected. But when those are true, then you get some nice results.
[00:42:08]
Right. So is that acceptable to have that in an Elm review rule or do you try to avoid
[00:42:15]
that? To have presumptions? Yeah. To have to have
[00:42:18]
heuristics because if it's a rule, it's telling you it's an error. There's no way to disable
[00:42:22]
it. And in some cases you might say, well, actually in this case it's okay. Like a code
[00:42:27]
smell, like, well, it's a code smell if you have a function that's over a certain number
[00:42:32]
of lines, but maybe in this particular instance, it's fine.
[00:42:37]
Yeah. In Elm review, I would, well, in general I would say it depends on the criticality
[00:42:45]
of the issue and how much you want to force it. For instance, the unused CSS classes rule,
[00:42:53]
that is basically like going to report false positives by saying, yeah, you should use
[00:42:59]
a literal, but as we said, it's going to be more of a constraint than a false positive
[00:43:05]
depending on how you frame it. Right. Yeah. Because we don't, so those opinionated rules
[00:43:10]
are fine if you opt in to those, I think. You need to be, to have the whole team accept
[00:43:17]
this rule in my opinion, like all of the rules, but in general Elm review doesn't allow ignoring
[00:43:25]
issues. So that's why at least all of the rules that I wrote tend to go to lean towards
[00:43:31]
false negatives or false positives. Right. Instead of heuristics.
[00:43:37]
Using heuristics, like basically using presumptions. I see. Well, I don't know, so I'm going to
[00:43:43]
take the route that I know will lead to people not getting false positives. You can view
[00:43:49]
it as a simple heuristic in a way, I think. So basically a heuristic is how you choose
[00:43:57]
to put some things into the false positive category or choose to put some things into
[00:44:03]
the false negative category. That heuristic is what determines that. Yeah, I'd say so.
[00:44:08]
And I think that Elm review really has this stance to go towards false negatives more
[00:44:14]
than other tools because in those other tools you can disable the errors when you have false
[00:44:19]
positives. And that also impacts how people write those rules or when they choose to write
[00:44:25]
and enable those rules. Because I know if I don't have disable comments, I know that
[00:44:30]
if I report false positives, it's going to be very annoying. And I know that if some
[00:44:35]
rule that reports like a code smell, which is not always bad when it reports an error
[00:44:41]
and shouldn't, well, people are going to be blocked. So if I have a way to tell them like,
[00:44:47]
please write the code this way in order to not have this false positive, then that's
[00:44:53]
acceptable I think. If I don't, then I'm just not going to write the rule. Right. And not
[00:44:59]
writing a rule is basically 100% false negatives. Right, right, right. Right. Right. Although
[00:45:07]
you could argue that 100% false negatives feels very different than 99% or 1% false
[00:45:14]
negatives because you know you just can't rely on 100% false negatives. Whereas you
[00:45:19]
don't know if it's 1% false negatives. You don't know if you can rely on that or not.
[00:45:25]
But other tools like ESLint and they have a lot more rules that have the potential for
[00:45:32]
false positives and they're considered okay because you can disable them. So I really
[00:45:36]
think that having the ability to disable errors impacts the way that we choose which rules
[00:45:44]
to write. Yeah. And as you say, it depends on the criticality of the issue if it is a
[00:45:52]
constraint that you really depend on for something that you're doing, then it's going to change
[00:45:58]
the calculus there. Yeah, if it's to report an issue that you know for sure will crash
[00:46:04]
your application, but it might be wrong, then yeah, it is probably something you want to
[00:46:09]
enforce at the cost of being a bit annoying sometimes. So people will have to add to the
[00:46:16]
disable comments or rewrite the code in a way that the linter will understand that this
[00:46:20]
is not a problem. But yeah, I haven't found any critical problems like that for Elm Review
[00:46:28]
so far, I think. So yeah.
[00:46:30]
So you often mention that code flow analysis is sort of the thing that makes a lot of rules
[00:46:39]
not worth writing. And I wonder... So here's the original tweet that we were talking about
[00:46:47]
earlier where you kind of talked about missing information being the root cause. So you said,
[00:46:51]
missing information is the root cause of false positives slash negatives in linters. Add more
[00:46:56]
information to find more problems and be less wrong at the same time. How? One, the linter
[00:47:01]
should provide more information to rule authors. And two, languages should restrict dynamic
[00:47:07]
features. So one, the linter should provide more information to rule authors. Like what?
[00:47:13]
Like is there information that Elm Review could provide to rule authors to help them
[00:47:18]
with code flow analysis in addition to the module lookup table we discussed? Like comparing
[00:47:23]
references seeing if something refers to the same value.
[00:47:27]
Yeah, for instance, having aliases. And I'm definitely thinking about ways to make analysis
[00:47:33]
easier, which is in a way providing information that would be hard to compute otherwise. Also,
[00:47:40]
there's just simply plenty of information that you sometimes can't get. Not so much
[00:47:45]
with Elm Reviews anymore, but like for instance, only recently I added the function to give
[00:47:52]
you the file path of a module to analyze. Because I thought people might do some weird
[00:47:59]
things with it. That's something that I was quite scared about, like people misusing the
[00:48:04]
tool at the beginning. In practice, not so much. So now I make that available and people
[00:48:11]
do use that for some applications. I don't have any in my head anymore. But so yeah,
[00:48:18]
give all the information that you can. And then yeah, make it possible to analyze codes
[00:48:23]
in a simpler way, like give type inference, give the real module name and yeah, provide
[00:48:31]
code flow analysis tools. I know that ESLint has something like that, which I never understood.
[00:48:38]
So I don't know how that would work. I've also thought about being able to figure out
[00:48:44]
like, is this value an alias to the other function? And that could be interesting. That
[00:48:54]
could catch more things. Definitely.
[00:48:59]
For the performance question, I could imagine, I don't know if this would be a fruitful direction
[00:49:06]
at all, but I could imagine a design where you sort of have, actually very much like
[00:49:11]
the store pattern that Martin was telling us about in our store pattern episode. Essentially,
[00:49:17]
you know, the store pattern you have your, I can't remember what he called it now, but
[00:49:21]
your query of these are the things I depend on for this page. This is the data I need.
[00:49:26]
You could sort of have that as a sort of subscription that says, this is what I need, which as we
[00:49:30]
discussed in the store pattern episode, as more information comes online in the store
[00:49:35]
pattern, it could be getting it with HTTP requests. Then you can do follow up information
[00:49:39]
because it's sort of a subscription that gets called whenever that changes. And then it
[00:49:43]
just keeps going until the information you say you need matches the information that
[00:49:48]
you have already or is a subset of it.
[00:49:52]
So I can imagine something like that where you sort of have like a subscription to like,
[00:49:58]
here's some computationally expensive data I need that you're not just going to go analyze
[00:50:03]
constantly and then you have these sort of remote data or maybe values or whatever that
[00:50:10]
you're waiting on. And then you can sort of take all those together once you have them
[00:50:15]
all filled in and then you can continue your analysis. So that could be really interesting
[00:50:20]
to like provide some primitives for doing that sort of thing.
[00:50:23]
I think the way that I understand it is I think already what Elm Review does to some
[00:50:30]
extent because we say like, I request the module name lookup table, therefore please
[00:50:36]
compute it. And the framework could do a better job at computing only what is necessary. And
[00:50:43]
then when it looks at the next file, compute again only what is necessary and so on and
[00:50:48]
so on. That I definitely want to have. And I think that's kind of the same idea like
[00:50:54]
this module depends on the lookup table for this module. So whenever you get to the next
[00:50:58]
module you compute it again for that module, etc.
[00:51:04]
Yeah, it is a similar pattern. I think the main difference would be in the case of a
[00:51:08]
module Elm Review knows what module it's looking at. And so it can fill in that bit of context
[00:51:14]
to say, okay, it's requesting the module lookup table and it's in this module so I can compute
[00:51:20]
it for this specific module. But if it's something more nuanced like I want to pre evaluate this
[00:51:26]
string for example, then it doesn't know which strings to pre evaluate based on some implicit
[00:51:34]
context of the process it's running. So in that case, that sort of store pattern style
[00:51:40]
could work where you can give it that information. You can say, hey, here's the node I'm looking
[00:51:45]
at and I would like to wait until you can finish analyzing, like pre computing this
[00:51:52]
string value, please. And then you wait until it's no longer a maybe and then get it back.
[00:51:58]
And that could allow you to lazily compute and memoize some of these more expensive values
[00:52:05]
with specific context where the user can say, I want it for this node. So anyway, like seems
[00:52:10]
like an interesting path to explore.
[00:52:12]
Yeah, it could be interesting. Yeah. In this case, it would definitely help to be able
[00:52:17]
to say, please compute this now and store it in the store directly without just by mutation.
[00:52:25]
That would definitely make things easier.
[00:52:28]
Right.
[00:52:29]
Yeah.
[00:52:30]
And I guess it's maybe a little bit of a chicken and egg problem to know which of these things
[00:52:37]
would open up interesting possibilities because when you offer this information to review
[00:52:43]
authors, review rule authors, then they do interesting things with it. And then when
[00:52:48]
they do interesting things with that, it builds and snowballs and it sparks people's imaginations.
[00:52:54]
And so it's sort of hard to know which ones to explore before you've seen what people
[00:52:59]
do with them.
[00:53:00]
Yeah. Yeah. Well, I have my own opinions about things that could be interesting or ideas,
[00:53:06]
not opinions. But yeah, I've been surprised by what people came up with. For instance,
[00:53:13]
you made the Elm review HTML to Elm.
[00:53:17]
Yes, that's right. Yeah.
[00:53:19]
Based on what Martin Stewart made credit to him and to you, obviously, but the idea was
[00:53:25]
from Martin.
[00:53:27]
His idea to use Elm review fixes as a code generation tool is 100% credit to him. And
[00:53:32]
I used a bunch of his code for that.
[00:53:34]
So yeah, that one I did not expect. And yeah, that's a pretty cool avenue to explore. Definitely.
[00:53:41]
I also know that some people would like to be able to generate modules to create files
[00:53:46]
on disk based on the same idea. So like, that could be interesting.
[00:53:51]
Yeah. So the type information is like the big one on your wish list right now.
[00:53:56]
Yeah. And also performance for fixes and performance for Elm review because in my opinion, it's
[00:54:04]
too slow. But there's maybe just me as a parent to the tool. Like, ah, at works, it takes
[00:54:11]
like a whole minute to run on our code base, which like, yeah, that's too slow. Like, if
[00:54:17]
even I want to go do scroll on Twitter while the review is ongoing, like, it's too long.
[00:54:24]
Yeah. Right.
[00:54:25]
But I do wonder like, what kinds of use cases could people come up with if there was more
[00:54:33]
information? Like, I wonder if some sort of dependent typed kind of techniques could emerge
[00:54:41]
if people had more tools for doing code flow analysis or, you know, just more information
[00:54:48]
at their fingertips. Because like what Elm can do with all the information it has about
[00:54:53]
your code, both because it's a compiler and has computed all this information and because
[00:54:58]
the constraints of the Elm language, all the things, all the guarantees it has based on
[00:55:03]
how you have to write your code for it to be valid. There are just so many cool things
[00:55:07]
that it can do. And if you start like looking at the compiler code, you're thinking of all
[00:55:12]
these possibilities. Like I know I do with Elm pages. I'm like, oh my God, if I was a
[00:55:16]
compiler, there are so many cool things I could do with the information I would have.
[00:55:22]
Yeah. So a compiler is basically a static analysis tool, just like Elintor, right?
[00:55:28]
Right. It's a static analysis tool that the code must pass through in order to run, which
[00:55:36]
that's basically all it is. It's those two things.
[00:55:39]
And then it generates some files.
[00:55:41]
Right. Right. Also then, right.
[00:55:43]
That is a compiler part, but the rest is very important as well. And the thing is the compiler
[00:55:50]
is a general purpose tool, right? So it's only going to be able to infer things that
[00:55:55]
the language tries to allow and to report things that it doesn't want to allow. But
[00:56:02]
then if you want to do something more precise that the language was not designed for, you
[00:56:07]
could potentially do that with a very powerful static analysis tool. So like, I don't know
[00:56:12]
much about dependent types, but being able to figure out at compile time that some number
[00:56:19]
is always smaller than five, you could potentially do it by adding constraints, just like a language
[00:56:26]
with dependent types would do. Maybe, I don't know enough, but you could definitely try
[00:56:31]
to do that and then report errors like, Hey, I am not smart enough to figure this out.
[00:56:39]
Please change the way that you work with your code. Kind of like proof languages, which
[00:56:44]
I think they accept plenty of things, but if it's too hard, then they ask the people
[00:56:49]
to rewrite their code in a way that they can understand.
[00:56:52]
Right. Which I mean, in a way, like, yeah, if you say non empty list, you know, from
[00:57:01]
cons or whatever, right? That's like a lazy approach to that in a way where you're saying,
[00:57:07]
I'm not going to do code flow analysis. You must prove to me by actually passing a single
[00:57:13]
definite value and then a list which could be empty. I don't care. And so you've proven
[00:57:19]
it. That's like the shortcut to proving that. Or you could do code flow analysis and you
[00:57:25]
could say, well, I can analyze your code paths and I can see that you're using this non empty
[00:57:32]
type that promises to be non empty, but maybe not through the compiler, but through Elm
[00:57:38]
review and I see this one pinch point that I know this type will always go through and
[00:57:45]
it adds something to the list. Therefore, you're good. Like that would be the deluxe
[00:57:50]
approach.
[00:57:51]
Yeah. But then some things are very hard to infer because it uses code from dependencies
[00:57:58]
that we don't have information about. So again, misinformation. There is a request to be able
[00:58:05]
to analyze the code from dependencies before analyzing the project. And I think that would
[00:58:11]
be very valuable. If you do that, you can basically do whole program analysis except
[00:58:16]
for the JavaScript parts. Maybe we would like to be able to analyze CSS and JavaScript files
[00:58:22]
as well, but I think that's getting a bit of out of hand at the moment at least. It
[00:58:29]
should be interesting, but maybe it's better to use two tools like ESLens and Elm review
[00:58:35]
and configure them in a way to give you all the same guarantees.
[00:58:39]
And you can always go the other way too, right? Like if you're wanting to analyze things with
[00:58:45]
your CSS, you can generate CSS from Elm and then you have a more constrained place to
[00:58:53]
analyze it. Whereas if you're like guarantees are always, you can always flip it on its
[00:58:59]
head. You can say, well, this is too unconstrained and hard to analyze. Therefore I'm going to
[00:59:05]
constrain it. Like to take something from an unconstrained environment to a constrained
[00:59:11]
environment is very, very hard to take something from a constrained environment to an unconstrained
[00:59:17]
environment is very easy, relatively speaking.
[00:59:19]
I remember when I rewrote an Elm application to React, that was really easy. Whereas the
[00:59:27]
opposite would have been way harder, just like basically re implement everything. But
[00:59:32]
for Elm to React, there was a translation, which is much easier.
[00:59:37]
To take a lossless audio file and turn it into a compressed one is easy. To take a compressed
[00:59:44]
audio file or compressed image and turn it into a lossless one or to do the CSI enhance,
[00:59:51]
it's a harder problem.
[00:59:52]
I don't know if you want to talk about side effects as well. That's interesting, but I
[01:00:00]
don't know how we are on time.
[01:00:02]
We could talk a little more and still be in our general time window.
[01:00:06]
Well, we can extend our episodes to be two hours long. That's fine as well. I mean, we
[01:00:13]
did have shorter episodes recently, so we need to compensate, right?
[01:00:18]
One area where you have a lot of false positives or false negatives in a lot of other languages
[01:00:25]
and other linters is with the presence of side effects. For instance, if we take the
[01:00:33]
no unused variables rule for Elm, where you say if you have A equals some function call,
[01:00:41]
and then this value A is never used.
[01:00:45]
In Elm review, we know, well, this function call has no side effects. We can remove the
[01:00:52]
entire declaration from the code, and then we can look at whether that function is used
[01:00:58]
or not used anywhere else.
[01:01:00]
But in a language with side effects, it's very hard to tell that. We know we can remove
[01:01:09]
const A equals, we can remove that part, but we don't know if we can remove the function
[01:01:13]
call because it might have side effects, right?
[01:01:17]
And that is going to be true for any language, as far as I know, that is not a pure functional
[01:01:25]
language, or at least where the function is not annotated in some way as being pure.
[01:01:31]
So being able to rely on the fact that functions have no side effects, that actually allows
[01:01:38]
us to do some very cool things, just like dead coded animation, a very powerful one,
[01:01:44]
as we've seen.
[01:01:45]
I think removing dead coded in Elm using Elm review is something that a lot of people love,
[01:01:50]
and I definitely do. And that is very hard to do if you have side effects. And yeah,
[01:01:57]
then you got things like moving code around where you have one function call after another
[01:02:02]
one. And if you want to optimize the code or make it nicer to read, then potentially
[01:02:11]
you have to inverse the order of those function calls. Well, is that safe to do? Well, we
[01:02:17]
don't know. Unless we have no side effects, then we know we can do it.
[01:02:22]
So we could still do that analysis. Does this function have a side effect? Does this one
[01:02:28]
also have a side effect? Do they impact each other? Do they depend on each other? And that's
[01:02:33]
a lot of work. That's really a big amount of work to do, like a lot of interpretation
[01:02:38]
and a lot of analysis. And potentially at the end, you still don't know the answer.
[01:02:43]
So you're still going to have to make a presumption like, yeah, I think this is going to... We
[01:02:48]
don't know. So we're just going to assume that it has a side effect and that it needs
[01:02:53]
to stay this way.
[01:02:55]
Right. Yeah. It's the poison pill. Things can be very easily tainted. And it's the unconstrained
[01:03:02]
versus constrained environments. And if you can take, as we've talked about in the past,
[01:03:09]
if you take pure functional Elm code, you can do more complex things under the hood
[01:03:17]
preserving those guarantees, like persisting data in Lamedera, for example. So it's pretty
[01:03:25]
compelling how you can still preserve those guarantees and do more complex things when
[01:03:31]
you have that purity. For example, you could even imagine doing some of these kind of costly
[01:03:40]
computations in Elm review. Like instead of doing this sort of Elm store pattern style,
[01:03:46]
you could imagine doing some sort of hacks under the hood, like a sort of Elm review
[01:03:51]
compiler that could...
[01:03:53]
Oh, I never thought about doing that. Definitely on my mind, but so far I've never attempted
[01:04:02]
it because I wanted... For type inference, I think that's going to be slow. Evan said
[01:04:08]
that it's going to be slow in a language where you don't have mutation. So I'm thinking about
[01:04:13]
altering that at compile time to make it much faster. We don't have type inference yet.
[01:04:21]
So I will wait for that to happen.
[01:04:24]
Interesting. Oh, that's cool. Yeah. Yeah. So I could imagine like...
[01:04:29]
But I don't know if that will have any surprising effects. That's going to be interesting to
[01:04:35]
figure out.
[01:04:36]
Well, it's definitely an ambitious path to go down, but it would open up a lot of interesting
[01:04:40]
possibilities. But yeah, you could certainly like, I could imagine you saying here's essentially
[01:04:46]
a magic function that gives you some expensive computational result and under the hood, swap
[01:04:54]
it out to do some optimizations and make it more efficient and not call it if it's not
[01:04:59]
needed and that sort of thing.
[01:05:00]
Yeah, potentially. But yeah, I would definitely not write a baggage code that would depend
[01:05:05]
on this. It would just be like an improvement that people will not notice.
[01:05:10]
Yeah, exactly.
[01:05:11]
In terms of performance, under the hood optimization, that's the only way that I would accept doing
[01:05:17]
something like that.
[01:05:18]
Yes, I agree. Exactly. Yeah. But as long as you can preserve the semantics and expectations
[01:05:24]
of how it's going to behave, you can swap it out for however you achieve that under
[01:05:28]
the hood.
[01:05:29]
Yeah. But it would be kind of tricky to test because you could not use Elm test for this
[01:05:35]
anymore.
[01:05:36]
Yeah.
[01:05:37]
All of these guarantees that we've talked about, things that we can rely on that makes
[01:05:45]
analysis easier, it applies to linters, but it also applies to code optimizers. For instance,
[01:05:53]
Elm optimize level two, it knows that it can move some functions or some operations around
[01:06:01]
as long as they don't depend on each other because they know, well, this function has
[01:06:04]
no side effect, this function has no side effect, so they can move things. They can
[01:06:09]
do a lot of these things because they know that the compiler wrote code in a specific
[01:06:15]
way that the original code was in a specific way, that things are valid, that semantics
[01:06:21]
match, that types match as it was in the code. So using all of these guarantees that the
[01:06:28]
compilers, that the type checker, the language design give you, you can do a lot of powerful
[01:06:33]
things. But as soon as you missing one of those, well, some areas, some optimization
[01:06:39]
ideas, some linter rules that you wouldn't want to write, they crumble, you can't do
[01:06:46]
them anymore. Or they require a lot more analysis, which we've seen can be hard. So yeah. So
[01:06:53]
that's the part about what I was saying, languages should remove dynamic features or features
[01:06:59]
that are hard to analyze, like side effects and dynamic values. Those are hard and therefore,
[01:07:06]
if we can remove those, if we can make them more static, well, that helps static analysis
[01:07:14]
tools. And that is something that I don't think that a lot of other languages know fully
[01:07:20]
enough, right? I just wish people knew that more.
[01:07:24]
What I'm taking away from this is basically like move the goalposts. Like instead of trying
[01:07:31]
to solve a hard problem, define the problem in a way that makes it easier, right? So like
[01:07:39]
we talked about with static analysis, like if you have a like, oh, I have to do all this
[01:07:44]
code flow analysis to make this, to figure out what the class name is. Make the problem
[01:07:49]
easier for yourself by making more assumptions, having more constraints. So you can do that
[01:07:54]
in a language and you can do that in a static analysis rule and any sort of static analysis
[01:07:59]
context you can move the goalposts, make the problem easier for yourself.
[01:08:03]
Yeah. I wrote a blog post called safe unsafe operations in Elm, which is basically doing
[01:08:09]
the same idea. Like we want to make the idea is we want to make something like reg ex dot
[01:08:15]
from literal, where we can basically have a function that doesn't return a maybe reg
[01:08:22]
ex, but a reg ex and Elm review then says, well, this is okay. We know that at compile
[01:08:28]
time this works because this looks like a valid reg ex. So this is fine. And whenever
[01:08:34]
you pass in a dynamic value, we move the goalposts and by saying like, please don't write it
[01:08:39]
this way. We don't understand it. And you can, you can do that that way or you can do,
[01:08:45]
make the analysis more complex, both work. But as long as at some point you can give
[01:08:51]
the guarantee then everyone's happy. Otherwise you can fall back on the reg ex dot from string,
[01:08:59]
which returns maybe never maybe reg ex.
[01:09:01]
Well, are there any other things people should look at? Any, any blog posts, any conference
[01:09:07]
talks, perhaps soon to be released?
[01:09:10]
Yeah. So I, a lot of what I said today was explained hopefully better than today in a
[01:09:18]
talk that I made at Lender Days mid mid July. So it's called static analysis tools, love
[01:09:25]
pure FP. I think it's going to be released. I'm pretty sure it's going to be released
[01:09:29]
after this episode. So hopefully we, I haven't spoiled too much. I think some parts of it
[01:09:36]
at least, but I think it's going to be, I think it was a good talk. I'm very pleased
[01:09:41]
with it at least.
[01:09:42]
I'm excited to watch it. Yeah. We'll keep an eye on our Twitter account then we will,
[01:09:46]
we'll tweet a link to it. We will we'll try to update the show notes though. They may
[01:09:50]
be immutable in your podcast.
[01:09:53]
Yeah. They often are right.
[01:09:54]
Yeah, I think so. But yeah, keep an eye on our Twitter and you're in until next time.
[01:10:01]
Until next time.