elm-review Data Extractors

We discuss elm-review's Data Extractors and how they open up your elm-review context to external tools like data visualizations.
May 22, 2023


Hello Jeroen.
Hello Dillon.
Well, I've been playing around with a fun feature in Elm Review that we haven't really talked about before.
Wait a second. Where's the pun?
I've got to keep you on your toes Jeroen. The pun might spring out at any point.
We started the recording and I was like, oh no, I forgot he was going to do a pun and I'm not prepared.
The one time you were mentally preparing.
Kind of on the late side and now you're like, there's no pun? What?
I thought I could extract a reaction from you.
Okay, that's fine then. We have standards, man. That's why we have linters. We need standards.
We got to deliver.
So today, we haven't really talked about this on the podcast, but there's this whole feature that's kind of like pretty powerful in Elm Review.
The extractors. So do you want to tell us what Elm Review extractors are?
Yeah, sure. So Elm Review has this new feature, which is extractors, which allows you to gain insight into your code base.
So you run an Elm Review rule and it's going to gather a lot of information.
And then it's going to be able to give you that information as a JSON output.
And you can read that JSON and you can figure out whatever information you wanted.
So what kinds of information can you get? Well, that entirely depends on you and on the rules that you've enabled or written.
So one very small example that I made is making an import graph of your modules.
So what Elm Review is going to do is it's going to look at your whole Elm project.
And it's going to notice, okay, well, this module is importing this thing, this other module, this other module is importing that module, and so on and so on.
And it's going to internally make a graph of the imports.
And at the end, it just says, well, I've made those into a graph or a list of arrows.
And this is now like it's going to format it as a mermaid diagram or a dot graph diagram.
And you can now use that to generate an image using the dot specification, the mermaid specification, whatever.
So but this information is just like one rule is saying, well, I'm going to gather this information and then present it as a graph.
It's not something that is built into Elm Review. It's just whatever you configure your rule to do.
Right. Yeah, it's really it's like allowing the context that you have that you build up in Elm Review to escape from the bounds of Elm Review to the outside world.
So you can use it in external tools however you want.
So it's yeah, it's really like so you've got like, you know, if for anyone who's familiar with writing Elm Review rules and, you know, using visitors to update the context, which we've talked about in previous episodes, that would be akin to your update function updating your model.
You know, you have in the Elm architecture, this tuple you return from update of the model and commands.
And in Elm Review, visitors can change the context, which would be like updating the model.
And they can return errors or fixes, which would be like running a command.
But extractors, instead of resulting in fixes and reporting errors, you can you're given the context at the end of running the rule, and then you can turn that into JSON.
Right. That's all it that's all it really is.
Absolutely. Yeah.
But the what you can do with that is it's anything anything that you can do by visiting an Elm project, extracting context from that and turning that into JSON.
Now you have access to that in any external tool.
So we said it's JSON, but in practice, it's whatever can be encoded as JSON.
So if you if you just want it to be YAML, then you just make a string that is YAML format in practice.
The way to use this feature is to call the Elm Review CLI with a dash dash extract and also dash dash reports equals JSON.
Otherwise, you won't get the extracts and then we'll give you a JSON.
But that contains arbitrary JSON.
So if you want it to be a string that contains the YAML, whereas in YAML format, that's fine.
That's kind of what I do with the imports graph, because that's just dots, specification strings.
Right. Right. So you can Yeah, it's it's JSON at the top level.
But inside of those JSON values, it might just be a single JSON string, which is actually a different data format.
Yeah, I've I think there are a few different ways that you could approach.
I mean, it depends on what you're trying to do with it.
You could directly encode some format if you need some mermaid format or some markdown format or whatever you want to encode to.
You could just directly encode to those data formats if you're consuming it from another Elm app.
Then you could use something like Elm codec to share your codec to encode the JSON data and then decode using code sharing between the code bases.
If you're consuming it and extracting from from Elm code.
So, well, this might be a good opportunity. You're in this might be a bit of an intervention.
Actually, it turns out I I've spoken to you about my idea of Elm code Elm pages code gen review.
So pages code gen review. OK, sure. Yeah. Are you sure it's not Elm review pages code gen?
We can we can negotiate. Which which comes first? Absolutely. I'm open to that.
But the the main point. So, well, I recently kind of brought my my Elm pages code gen idea to fruition.
And I'm quite happy with it. I kind of joked about that in our code gen episode with Matt Griffith.
But the challenge there being what was your idea again? Right.
So, OK, so there are Elm pages scripts. Elm pages v3 has this thing called a back end task, which, you know,
it's like a pretty full featured API for reading files from the file system and doing globs to list out files and reading environment variables and handling failures.
If something if there's a fatal error, you can pass that up at the top level of the script and it will report that with rich information.
Or you can handle fatal errors and make sure that you've handled every possible failure.
So now Elm code gen, we did an episode on that.
It's an amazing tool that lets you write Elm code to generate Elm code.
And it, you know, to me, its main superpower is giving you like these sort of type safe ways to to put together bits of code and make sure that you're passing them the data that they need,
because it creates these little stubs for different packet like bindings for different packages that helps you make sure you're generating valid code for those APIs.
It also has a little CLI tool that helps you generate code.
But then the challenge becomes, OK, well, but now I want to pass in some data to it.
So, OK, the CLI that Elm code gen provides gives you this little hook that lets you pass in some JSON data.
And then, well, what if you want to read an environment variable?
OK, well, encode that into JSON and pass that in as a flag when you call the Elm code gen CLI.
And then, well, what if you want to read, you know, you want to read an environment variable, you want to read a file and pass in that context.
You kind of have to hack something together.
Yeah, it feels like you need to do a bash script to gather the information and then call Elm code gen.
Exactly. And and then like, well, should Elm code gen give you access to environment variables?
Should it give you a first class way to do that? Should it give you a first class way to read files?
Should it give you a nice way to do error handling and report that something went wrong?
Which it does. But at a certain point, it's like becoming a more general purpose sort of scripting thing.
So and that's sort of like back end task is a pretty general purpose way of doing that sort of thing.
Elm pages scripts. I've like put a lot of polish into making it like you just do Elm pages run path to your own pages script, which is in a script project folder.
And it figures out how to compile and run it. And you can even do, you know, Elm pages bundle script and it will compile and minify that down to a single executable JS file.
So you can just chmod it, make it executable and run it. And if you have node installed, it'll just run as a single file with all the dependencies in line.
So like should Elm like Elm code gen if it tries to is going to creep too far towards that.
And we're going to end up like reinventing the wheel over and over. And then like, well, what about Elm review?
Like, should it? Because there are certain things you might want to do. You might want to write some of that context to a file.
You might. So at a certain point you run into a similar thing. And so I was I was recently hacking on some of these ideas that we talked about in our Elm and AI episode.
And I was making an Elm pages script that calls the GPT for API and super fun. And I essentially wanted to take like the the in scope type annotations, kind of like we talked about, like what, what values are in scope?
Like what are the direct dependencies that are available to me? And what are the like values that are in scope, the let values, the parameters that are in scope, right? Elm review knows.
So that you can feed all of that information to GPT.
Exactly. I can seed the prompt with that context. So like Elm review isn't, you know, you're not going to make Elm review something that can like make HTTP requests and like do scripting tasks and write something to a file or print arbitrary things to the console.
Like at a certain point, you run into the same problem, right?
So Elm review pages code gen.
dash GPT.
Because essentially, like, I want to bridge the gap between like, I essentially want to make an Elm review rule that is like a context gatherer.
Yeah. Which is good, because that's exactly what the Elm review rule does. It gathers context.
It got, yeah, it gathers context. And then instead of saying, like defining a data extract, like with data extractor, that turns it into JSON, I want to say, like, here's my Elm pages script. And here is my Elm review rule that gathers context, pass that in as an argument to my Elm pages script.
And then given that context, I can now execute a script, which calls Elm code gen.
Yes, exactly. So the Elm code gen thing I already, I mean, it's actually fairly straightforward. You just in your Elm pages script project folder. So just like Elm review has a project folder, a review folder, that's just a regular Elm project.
Elm pages scripts also have, you know, by convention, you can call it a script folder, but it's just any Elm project, which has Elm pages installed as a dependency.
By the way, I really like that we've kind of started using this pattern that I think I introduced.
I believe so. That's definitely what clued me into it.
I mean, it's like, oh, it's a folder for configuration, but it's like, yeah, but it works really well.
So because, yeah, because you can think of it just like a regular Elm project. You can use your editor tooling and it understands what an Elm project is. So yeah, it's amazing.
So in that script project folder, you just have an Elm code gen project, like the generated source as a source directories.
And then you're good to go. So then you generate the Elm code gen bindings there. That's all you really need. Right. So there's not that much to that, you know, sort of combining those two tools together.
It's actually a pretty loose coupling, but it works well. So, yeah, I've definitely thought about like how this could be done with Elm review and Elm pages.
Yeah. So you were mentioning it was an intervention. What do you mean with an intervention?
The intervention is...
Where's my family?
Can I pressure you? We're worried about you, Jeroen. And we really think that you need to make Elm review pages code gen.
Yeah. I mean, I think it would be an interesting thing to explore. It's sort of this question of like, so like potentially if Elm review had a way to call it through like some Node.js dependency, so you could like import some Node.js version of Elm review and run it.
That could be interesting.
Okay. You mean calling Elm review programmatically instead of...
Exactly. Right. Because basically, I mean, so you can certainly use, you can certainly like build up project context and use like a codec to share your encoders and decoders. And that works pretty well.
Actually, the Lambda compiler also has this sort of undocumented way to create encoders, decoders of bytes, encoders. And so I've definitely considered like just hacking together a little prototype that I think I could do this, assuming that they're serializable values, things like functions are not.
That's one benefit to doing it this other way where you can programmatically call Elm review because then it can execute this Elm code for you. So it's interesting. But I mean, at the end of the day, it's, I guess the point I'm trying to make is that like, this is a really powerful feature. And I think we should like build more cool stuff with it.
I'm not gonna convince you otherwise.
I have announced it, I think in November, even though the feature was released probably a little bit earlier than that. No, I announced it in December, but it was released in November. Yeah, it's a running blog post is a pain sometimes.
Yeah. Have you heard of any cool things people have built with it?
No, in practice, or at least people have not told me.
They haven't shared it. Well, listener, if you have built something cool with it, tell your own because he's interested. Maintainers like to know those things.
Yeah, they like feedback.
Yes, good ones.
Yeah, absolutely. Well, I had a really good time building with it, like just being able to like do a direct dependency visitor and extract the information is so nice. You know, I mean, you've built these API's for going and extracting information for for use with reporting rules and fixes. So why not break it outside of the box?
Yeah, I did add extractors to a few of the rules that I made. Yeah, some of them published and some of them I haven't yet, at least. So for instance, I have this rule for licenses, which is called no approved license, which is basically going to forbid you from using licenses that your project has not cleared or have not accepted.
Like, for instance, yeah, our company is not allowed to use license XYZ.
GPL licenses and things. Yeah.
Yeah, it's something that doesn't allow proprietary licenses, whatever, I'm not that familiar. And this rule is going to tell you, well, this dependency is using a license that you can't use, or this dependency is using a license that you have not mentioned as being okay or not okay. So you need to ask your legal team what you think should be done.
And at my company, we are a security firm. So we need things to be good legally and security wise. And one of the requirements that we have is that we need to know the licenses of the things that we use. And I think we need to make it available publicly. I think I've never seen it before, actually.
So every now and then, or actually, every time we update our dependencies, we need to inform some legal entity in our company, which licenses we're using and which dependencies we're using.
And so what I've made for this rule, I've changed it so that it has a data extractor that gives you the licenses that you're using for each dependency. So dependency elm slash core is using MIT or something like that.
So that's now the process of listing the license that we have is automated, which previously we had to do manually. It's not a big deal in the case of our dependency, because we don't have 50 of them. It's a lot more painful for our JavaScript dependencies or NPM dependencies. But it's a nice thing to automate.
Another one that I've worked on, which I have not published yet, because I forgot about it.
Was the no deprecated one.
Are you familiar with that rule?
Yeah, I think so. Does it look for like an at deprecated annotation or something in the doc comments?
Yeah, exactly. So everything that you annotate as deprecated, either through that at deprecated in the function or the modules documentation, or something that has deprecated in their name, which you can easily do in application code, like, oh, this thing is deprecated. So let's rename it. That makes it obvious everywhere.
This rule is really meant to be used with ElmReview suppress, so that all the things that you have deprecated, you will be able to continue using, but you should not add more usages of those. So ElmReview suppress is really working very well with that rule, in my opinion.
The problem, though, is that it doesn't tell you what to tackle next. Suppressions are meant to be resolved. They're meant to be tackled at some point. You should get rid of those.
And the suppression files are meant to be readable by human and even edited, potentially. So you can see which files have the most usages of deprecated things. So file A has 47 issues, file B has 23.
And regardless how you want to tackle those, you can say, okay, well, I'm going to try to remove all the ones from file A or file B or whatever. But that only gives you like one aspect of what to tackle, like which files you should look at.
But it doesn't tell you what deprecated things are used most. That is something that you're going to have to look at yourself, right? And that can be a bit painful if you really want to get that information.
So what I made is I added a data extractor, big surprise here, where that is exactly the information that you're getting out of, is which things are deprecated. And are they deprecated because the module is deprecated?
Or are they deprecated because they have been tagged as deprecated? And then in how many places are they used? And then you can see, okay, well, this function, this deprecated function is used two times.
So it will be pretty easy to fix. And this function is used 230 times, which is like, okay, well, that's going to be harder. Maybe it's more interesting to tackle that one now. So now you have the information of where the deprecations are in which files.
You have the information of which deprecated things are used and how often. And now you have more information to tackle those issues, which I've found to be pretty interesting. So I should just focus on releasing that one.
If it's not released by the time that you listen to this, and you're interested, let me know. Like, kick my butt, something.
Very cool. I like that idea a lot of kind of bundling an extractor directly into these rules. That seems really cool. So like, do you think that that is a good general practice?
Are there specific contexts where that works better? Like where a rule has context that's meaningful on its own, as opposed to like, like imports, they don't really necessarily tell you anything. Whereas like licenses, you just list out the licenses. I don't know.
I haven't made any, I haven't uncovered any specific rules about rules and data extractors. It's like, it's more like, I need some insight to do some task, and I need to, in order to do something.
And if there happens to be a rule that's that relates to that, then maybe it's interesting for that rule to do it. In the instance of no deprecated, the issue is how do I fix those application issues? So that's really related to the rule.
So I think it makes sense to have the data extractor bundled along with it. Will it? I guess there's probably not much of a performance cost because the no deprecated is going to collect that information regardless.
So then it's just like, do you encode that context to JSON or not? But does it basically skip executing the JSON encoding part if you don't use the extract flag in the CLI?
Exactly. I don't know if people are going to make things that are very expensive to compute, but I'm assuming it will be. So yes, I'm skipping work. If I notice that people that were not reporting in JSON format, and we're not asking to extract something.
So we will still need to collect all the information. There's no special case like, oh, if we're trying to extract, then we will collect things differently, which could be nice in practice, but would also make the rule a lot more complex.
But we will skip calling the data extractor if it's not requested. So yeah, for the licenses, I think it makes sense to have a list of licenses that go with that rule. For instance, what rules could you have with no unused variables?
Like, could you make a graph of what other functions could be removed if we removed it? Maybe. Would it be worthwhile? I don't know. I don't think so, so far.
I mean, I think it's a rule of thumb. If it's like a clear cut set of data to extract, then maybe bundle it with it. But you could potentially even make that configurable since ElmReview is configured through plain Elm code. You could pass in options for an extractor. But I mean, at that point, I mean...
You could have a data extractor enabled or disabled based on configurations. It really depends on what you want, right?
Right. What is the purpose of preventExtract? There's something in the ElmReview API to prevent an extractor.
So the use case I had for this was like something for detecting unused CSS or something like that. You could imagine that you have a rule that tries to find all the CSS classes that are used in your application and returns a list of these CSS classes.
This is not even about unused. It's just like this rule will try to find all the CSS classes that are referenced in your Elm code. And then you can pass that into a tool that prunes your CSS files by removing all the CSS classes that are not mentioned and so on.
So for this rule to work, all the places where you would reference a CSS class, it would have to be in known position. So for instance, if you pass a CSS class to HTML.attributes.class, then it's known.
It's extractable and it's easy to see what this value is used to be a class or this value is a class. But if you pass a variable to that function, to the class function, then it's a lot harder to figure out what the CSS class was.
And in that case, you could have the rule say, hey, I encountered something that is unknown to me and I prefer reporting an error and you will have to fix it. And this is something that we have at our codebase at Crystallize.
And so the user will see that there's an error, but potentially you would also want to say, well, given this problem, I don't want to give you the extract. I'm going to prevent this extract or this issue.
If you ask me to do an extract and there is this kind of issue, I'm just going to give you something that is incorrect. So let me just stop the extraction.
So that's what this function is for. So whenever you create an error, you can annotate it as preventing the extract.
Right. So just to make that explicit, it's a prevent extract is a function that you call on an Elm review error. So in the course of your rule, whether it's your no unused CSS classes rule or whatever it might be called, you can give an error.
And then you can pass that error to prevent extract. And that error will now have that special behavior that when you call it the CLI with the extract flag, that error will propagate through and prevent it from giving the final JSON for extract.
That's very cool. So whenever you will run Elm review extract and report JSON and don't like the UX so far, because it will give you like null or undefined for the extract where you expected it.
And you basically have to run Elm review again without those flags in order to find out which errors have popped up. It's not great, but I see. Yeah, it's good enough so far, at least until someone brings it.
It lets me know if I get an idea. Right. I guess you could like have a special error format and you could provide like an error decoder. I mean, this is assuming that people are consuming it in Elm, but even if they're not consuming it in Elm, you could make it a fairly lightweight format where you give an easy to consume error message.
Yeah, but the problem is that that could be like you would write an arbitrary JSON value to say, hey, this didn't work out. Right. But that could also be in a valid value for that rule. Right. Totally.
I actually don't remember what I do exactly. Maybe I do something like show an error in the JSON. But in practice, when you run it with JQ to extract the exact information you want, you get a null. You get a null. Yeah. Yeah. So it's a bit annoying.
That makes sense. Yep. But if you consume this through a Node.js script, then you could do more checks like, oh, if rule name dot error equals something, then exactly something bad happens or something.
Yeah, it totally depends on how you're using it. I would like to think maybe you're using it from Elm review code gen pages, but one page is now at the end.
I'm using it as a bargaining chip to try to build consensus.
The only part that I thought should not be at the beginning is code gen. And that's what you just did. So let's at least call it ElmU something or Elm pages something. But Elm code gen is not appropriate here, I'd say.
Matt would have to come on this show and tell us otherwise. But yeah, that's true. In the meantime. Okay, you can have code gen first or review first.
So why don't we talk about like a couple of other possible use cases? I mean, again, really, it's really just taking these cool features in Elm review, which are, you know, giving you the ability to have visitors that look at expressions, which let you look at the abstract syntax tree of an entire project and walking through looking at what imports for a particular module are there.
What are the direct dependencies, indirect dependencies of project, looking at the readme, looking at doc comments, and gathering up context to connect these things together.
So, you know, of course, like, you can imagine the kinds of context, you would have to build up for marking certain values as unused or things like that. So building up like pretty sophisticated context about where how things connect together and things like that.
So given that, actually, on that topic, like, Elm review is not necessarily super easy tool to use, right? Like, writing a rule takes a lot of code in practice. Like if you have to do something really quick, I would not use Elm review rule for to extract something out of the code base, like I would use something like grep, or combi or tree grappler, which Brian Hicks made a few years ago.
Which are like, you just say, like, which patterns you're interested in, and you can then extract that information. The thing where Elm review is very powerful is it gives you a little more semantic analysis, like, you know, it's easy to figure out, well, this, this function is that from a local is defined in this in this module, or is it from a, an important module or from a dependency? And if so, which one?
Like the module lookup table?
Would that be what you're talking about there? Yeah.
Potentially type inference, potentially other kinds of information. And we were Elm review shines is because it's not just a specification on how you can extract certain AST notes or specific information from ASTs. But it's a, it's like, it's literally code, you can put all those contexts into, you can find all those things.
You can find connections between those contexts. So for instance, if you want to detect unused variables, you need to connect all the declarations of variables with usages of the variables. If you just try to pattern match on a nodes individually, you're not going to be able to do that, you're going to have to write an external system, internal external things to be able to deduct that, to deduce that.
So that's, I think where Elm review is really powerful is it allows you to put those things into context.
Right. It sounds kind of like, you know, should I use a reg X or a parser for this?
And if you're trying to like extract semantic information from HTML, then reg X, you're going to have a bad time with because, you know.
Yeah. But in some cases, like, especially if it's for a quick and dirty thing, grep will be very good. Combi will be very good as well. TreeGrepper, I'm guessing as well. It's probably the same thing, the syntax I'm just not familiar with.
But if you need something that will need to be correct or something that needs to be used a lot of times, something where you need less false positives or less false negatives, then Elm review will be very good.
That said, in a lot of cases, like depending on what information you want to get, you will not be able to get it. Like a static analysis tool is sometimes limited, right?
Dynamic values.
There are certain things that Elm review makes very high level that would be a ton of work to gather manually. For example, like extracting information about the direct dependencies of a project, which I was trying to do for my particular use case with this kind of AI prompting context builder.
You provide like a fairly high level API where that information exists somewhere on the file system. You might even have to like run Elm make on something to like make that come into existence somewhere.
I actually download everything manually. I look at the Elm home and if it's there, I use that. Otherwise I download things.
Right. So if you're building a quick and dirty script, it's not going to be quick, but it will be dirty if you try to get that information yourself. Whereas if you use Elm review, you just get like a nice Elm type that describes the direct dependencies and all of the types and values it exposes.
So that's pretty powerful. Like the one thing I see that's definitely a little tricky is like Elm review rules are not super composable. Like visitors, it's hard to like bundle up visitors together.
You sort of have to say like, okay, this visitor is going to go and update the context by looking at imports and this visitor is going to update it. But you can't really have a self-contained like, all right, here's my Elm review thing that does like five different types of visitors.
And it gives you, you can't have like nested T, nested Elm architecture, review visitors. That's like a combined visitor that somebody published as a helper.
Actually, you can.
Can you really?
Oh, I had no idea.
Not in the shape that you think. The thing that is possible is you can add multiple visitors of the same type. So you can have multiple with expression visitor, multiple with declaration visitors.
The module name lookup table, which you mentioned before is a basically Elm review pre-computes for every node in your AST where something was defined.
So if you have a function, is it defined in an import, in a module? And if so, which one or is it defined locally? That was not in Elm review to start with. That was actually in a package that I purposefully did not publish for maintenance reasons.
But you could basically copy paste that code and just add a function that says add visitors for scope. I think it was scope dots, add visitors, something.
And that added all the visitors that that sub rule, sub information gathering thing needed to modify your context. Like it was specifically modifying one field in your context.
And then that was available to all the other rules, to all the other visitors. So it is possible. It's just not in the way you want to. And in practice, it's not done very often.
Like I've basically only use it for that purpose and I quite liked it. So it's still feature that is not very well documented. And also like I'm thinking about removing it because like things are slightly faster if they're not a list of visitors.
Like maybe visitor under the hood, but it is possible and it is a nice feature when it's necessary. Would that allow you to, you probably wouldn't have access to that, like prepared finished context by the time your visitors run?
Or could you, could you make sure that those visitors all run before your visitors subsequently run so they have access to that context?
Yeah, yeah.
Okay. Yeah. And that's very powerful. Interesting.
Yeah. It's depends on what you want to do, but yeah, potentially all those, those rules can do whatever they want to. Those, sorry, those helpers can gather all the information you need. But depending on the information, that will be more or less easy, but usually it's pretty easy.
Hmm. Interesting.
Because they're running in a specific order, which makes that easy.
That's pretty cool. I would love to see like people sharing the cool stuff that they build with this. And like, I don't know, maybe we should host a, an ElmReview extractor hackathon Jeroen to get, because there's so much cool stuff you could build with this functionality.
Yeah. So the main thing that I was thinking about, like that could be pretty, that is pretty generic and it could be useful is like extracting metrics out of your code base. So lines of code is the easy one, which there are many powerful tools for that.
There's just simple, the WC command line tool, there's clock and there's plenty of others, I'm sure. So maybe that's not the best metric, but that's the kind of idea that you could do.
At the moment, one of my colleagues is I think working on computing the complexity of the code base. So there's the cognitive complexity, which I made a rule for. I think he will probably try that first, which by the way, there's now a data extractor, I think.
Somewhere, maybe in a branch of mine, maybe it's just on my computer. I think, at least it would be very easy to make it. That's for sure. Because it's just extracting the context to JSON.
But yeah, there's also cyclomatic complexity, which is basically a measure of how many unit tests would you need to cover all the branches for a given function. And that's a reasonable metric, not to tell you how to refactor your code.
Cognitive complexity is better for that. But cyclomatic does tell you, well, this code is pretty complex, or this project is pretty complex, because there's a lot of edge cases that it needs to handle.
And you could make a graph out of that. So at LogScale, we're a log product, we just send all of our logs to LogScale, and then we can make dashboards with it. And when we run Elm Review in our CI, well, we have those metrics available.
Well, we can run Elm Review with dash dash extracts, dash dash report JSON. And that gives us metrics. And then we can extract those and try to make dashboards out of it.
And yeah, then whatever metrics you think are useful to you. Like, this is an area where I'm not all that familiar with, like what metrics are interesting to gather for a technical depth finding platform or something.
There's a tool called CodeScene, where they basically gather a lot of this information, like notably the complexity, but also like the get information from your code base.
And they say, well, the files that are most that are touched the most often are usually the ones with bugs. And we can try to combine the information of how often is something touched and how complex is the the file in terms of cyclomatic complexity.
And you could say, well, this is probably where we need to address technical debt, or this is where it'd be useful to to put our eyes on.
Yeah, exactly. I mean, you could even just extract the length of certain functions.
Yeah, as a for instance, yeah. So Elm Review does not have access to get information. But it does have access to a lot of other information. And I think at some point, you will, I will just make it available to you to extract information from arbitrary files in your project.
Yeah, cool.
Like just trying to be able to read CSS files, and figure out which CSS files or which CSS classes are available or unused, even. That'd be useful, I think. So that's, that's the idea behind that thing.
Yeah, it's super powerful. Yeah, I mean, and you can start piecing these ideas together, you know, you start gathering metrics. I mean, you know, what metrics can you gather from a code base number of lines number of number of non opaque types, perhaps?
Potentially, yeah. Like, I don't know if there are metrics for noticing like how coupled something is, a couple of one module to another is to another. But if there's those things, and potentially that there's a thing that you could put it on a dashboard and try to follow and try to reduce or try to increase, I don't know.
Test coverage, but yeah, that's not Elm Review. There's a different tool for that.
Yeah, yeah. And yeah, and there is Elm coverage, which actually works pretty darn well. But, but yeah, like, look for things like, you know, checking if something is not an opaque type, like, as much as we would like for that to be a rule that just reports an error, error, non opaque type found, there are use cases where you want an exposed custom type.
And so it's, it's not a problem. It's a, it's a thing to be aware of. And so metrics, and visualizations and things like that are are useful. So that's the kind of area where extracting information rather than reporting an error is what you want.
Yeah, like if your company or if your project has a specific use case of something that they'd like to increase or decrease or collect somehow, then you can make a rule for that. And you can extract the information and display it somewhere. But yeah, like this is really something that I have not looked into. And clearly, like, I don't know what metrics are useful. So if people want to play with that and know like, what could be useful, try it out and let me know.
Yeah, I mean, you could really build like, a suite of Elm code quality tools and visualizations for it with this, you know, like, yeah, all the pieces are there. You could look at the number of maybes in a code base to write like, that's an interesting piece of information, like maybes aren't bad, but where are there more of them?
Yeah, or the number of primitives, you know, primitive obsession, that kind of thing. But yeah, metrics are always kind of scaring that thing like, like, do we want to decrease this number even further? Like, sometimes yes, but like, it's not, it's not a fixed rule. That's, that's also like an area where this can get useful. It's like Elm Review has, like, it's really hard to say
well, this, this rule does not apply everywhere. Elm Review basically, all the rules have to be 100% correct, in the sense that they don't report false positives, they can report false negatives, they can have false negatives, you can't report false negatives. But if you make that into, if you add an extractor to that, then you can extract that information without enforcing it on your project, if that makes sense. That's a possibility.
I don't know if that will work out. But it's, it's something you could do.
I think it's really good. It's a good workflow to just like, have this information, and then say, like, all right, what could use some refactoring. And then, especially if you like cross reference that with like, like, if you take, like you said, like, Elm Review doesn't currently have access to get information. But if you're building an extractor, you just extract what you need using Elm Review extractors, and then extract some information from the command line from Elm Review.
And then you extract some information from the command line from Git of how frequently files are changed, you know, the churn rate of certain modules. And then you can just sort of cross reference those pieces of data in a in a script. And so you say like, okay, well, this file has a lot of churn, it has a lot of non opaque types, it has a lot of deeply nested conditional statements, it has a lot of maybes and primitive types.
And it's churning a lot, we change, we touch it a lot, or maybe like bug fix commits touch this file a lot. So maybe that's a place to that that's just something to be aware of that it might be a hotspot. It doesn't mean the numbers need to go to zero, because they're bad inherently, it just is something to focus your attention on when you're looking for areas to refactor.
So if you're missing some information in Elm Review, then yes, exactly. As you say, you can combine it with other information in a scripts. You can also do the opposite way of you can provide more information to Elm Review by adding it to the configuration or by you could generate that information into Elm code.
For instance, that's what this what we do for detecting unused CSS classes in the workplace. We take the CSS files, and we generate an Elm file that the Elm Review configuration then fetches through simple imports. So you could again use Elm Code Gen for that. We chose to have something simpler, but right, especially because it's older than Elm Code Gen.
Or Elm Review Code Gen pages.
Yeah, exactly.
Problem solved.
Elm Review Code Gen pages, Code Gen Review.
You know, that's maybe one of those pitfalls that we have with the Elm community where we call everything based on what it does, like Elm Code Gen. Okay, it does Code Gen. Elm Review, it reviews. Whereas you have something like React. Oh, that's a cool name. It's not like it reacts to something.
Yeah, maybe here, like, it would be interesting to have a cool name, like, Elm, I don't know how to make cool names.
Elm Linguini.
Exactly. Yeah. Elm Spaghetti. I mean, if it's to find problems in your code base, like Spaghetti Code.
Spaghetti Code.
To Spicy Meatball.
Exactly. I mean, plenty of projects that are named that way for some reason. So yeah, if you're missing some information, you can try to combine it in one way or another.
I like your idea a lot to, like you said, either Code Gen a file which is imported into your Elm Review rule, or you could like Code Gen something into your Elm Review configuration file and then pass that data in.
Or you could use Elm Review pages Code Gen.
I think we should add CM at the end of that now. Just to make it longer. And I'm sure such a complex project would need a manager or a factory or something, you know.
Absolutely. Absolutely.
Yeah. So another, so actually, if we go down the route of what Elm Review can give you as information, it can go pretty far. Like one experiment that I tried was generating the docs.json file for packages or for any kind of projects.
But yeah, mostly packages, which is like, basically the summary of the packages documentation. And that is what the Elm Packages website uses to show the documentation of a package.
And if you use Elm Duck Preview, that's also what it uses.
So I figured like, well, as a proof of concept, and this is kind of what I played around with to test things, I can generate that docs.json file, which is what the compiler does.
What does the Elm compiler also generate? Well, it generates JavaScript code based on the dependencies.
Crazy. Yes, yes.
Based on the Elm code. And like, technically, maybe with the limitation of how much JSON you can really output.
Potentially, you can write an Elm compiler using Elm Review.
Whoa. Interesting. Yeah.
So the Elm in Elm compiler could be made in Elm Review.
That's a cool concept. Very interesting.
Whether it will work in practice, I don't know. But that said, like, if I ever want to make Elm compile to something else than JavaScript, like, this could be a pretty easy way to get started.
Like, you don't have to read files. You don't have to do all those things.
Just take all the information from the Elm code and make an output. And maybe you'll be pretty far off.
Yeah. And I mean, if you want to bridge the gap between Elm and other contexts, it's a pretty good start for that, too.
I'm not sure if it would, like, help you. I don't think Elm Review would help you resolve.
Maybe it would be fine. Like, if you wanted to do some code generation tasks, for example, like, for my old Elm TypeScript Interrupt project, I extracted information with the help of Elmite to JSON, which extracts information from the sort of binary stuff in the Elm stuff folder.
And I used that to get the types of all of the ports for the project. But then I also had to do some static analysis to resolve which aliases pointed to which types so that I could turn them into actual primitives that I could send through ports and then build decoders based on that.
Yeah, that's exactly one of the use cases where I could see Elm Review being used instead of a custom tool or custom script. Whether it will do better than another tool, probably not.
It's like Elm Review is not that fast, unfortunately, but maybe I'm self-deprecating here.
Yeah. Well, you'll have to suppress your self-deprecations, Jeroen.
I will try to do so. But it's like trying to impose rules on myself is really hard, you know.
Yeah, I think that's a pretty powerful use case, though. And I guess you totally could build that lookup table that resolves a type alias to what it actually points to. You could totally do that with Elm Review visitors.
Yeah, I think that's a really cool use case. And I think that's really cool because you can have certain constraints where you say, this rule requires, for example, type annotations for these things.
Yeah, exactly.
I didn't mention it explicitly, but when you're saying that Elm Review has the ability to generate the docs.json, which is what generates the package documentation for Elm Packages, Elm Review doesn't currently have the ability to do type inference, but it doesn't need that because Elm Packages, in order to be publishable, must have type annotations for all of their top-level functions.
Yeah, for everything that is exposed, at least.
Everything exposed. So you can have constraints like that, and you can say, hey, I'm only able to operate on things that you have given explicit type annotations for. But you can do all sorts of cogeneration tasks for this, like, kind of bridging, you know, types without borders stuff to kind of connect different paradigms to each other. So that's pretty cool.
Actually, I haven't mentioned why I made this feature in the first place. So I had a project idea that is not Elm Review, where I would use Elm Review, which I've kind of put a lid on at this moment or put to pause.
I actually bought the domain name, and it expired this week. So I'm not sure I'm going to get to it soon. But basically, I wanted to show how secure the Elm Package ecosystem was.
I might have mentioned this during the Christmas episode, where we did mention Elm Review's extracts feature. So the idea is that, like, there's not a lot of security issues. There's a few that I fixed a year ago with regards to virtual DOM and XSS or cross-site scripting issues.
And I wanted to know, like, where do we have any packages in the Elm ecosystem that are making that are abusing this feature or this security hole?
So what I did is I went through the packages, and I figured, well, which ones are depending on Elm slash HTML or Elm slash virtual DOM, which could be using these problematic functions, which have these security issues?
And I will notice, well, most of them are not, but a lot of them are transitioning the problem. So the problem was if the one of the problems was using virtual DOM dot node NS.
And NS stands for namespace. So it's a regular, like, HTML dot node, which takes three arguments, which is the tag name, so like div or span or button, then a list of attributes and a list of children.
And the functions that are that have this NS in their name, they also take a namespace, which is like a specification URL. I actually didn't get what that was.
Some XML schema URL thing. Yeah, yeah.
Yeah. And there was a problem where these were not checked correctly. One of the arguments was not checked correctly. So what I noticed was a lot of these packages that were using these functions, they were taking,
they were passing as arguments, things that were themselves taken as arguments. So the first argument that I needed to check for node NS was itself an argument to this function. So if I wanted to see whether a function was using the security issue,
then I needed to check which ones were using this function. But that was potentially in a different package. So what I did was I started working on this data extractor behind the scenes in a branch of the project on my computer.
And I extracted the information of which functions are using these functions that are vulnerable and making a list of those. And then whenever I visited another package, I took all the vulnerable functions that they depend on.
They have virtual DOM as a dependency or this other intermediate package that have these dependency issues. So these vulnerabilities and so on and so on. And the idea was, well, I'm just going to extract everything.
And at some point, I'm going to be able to find out whether they are, one of them is misusing something. I never completed the research. Well, so far, I haven't been able to find any problematic use, at least.
So that's very positive. And now these issues are fixed anyway. So yeah, my idea was to make something that showed how many vulnerabilities there were in Elm and in its package ecosystem.
And the highlight would have been, oh, there are zero issues.
Just make a website that says zero.
Yeah. And now, I don't know, like companies pay me to have a certification that says, hey, I'm using Elm and we are aware of all the security issues in the package ecosystem.
Like basically, I was trying to do an audit of the ecosystem and trying to monetize that somehow, maybe.
Yeah, yeah.
Turns out I lost interest in that. I don't know if it would be interesting financially. If your company thinks it would be interesting, let me know.
But otherwise, yeah, I prefer working on ElmReview directly, I guess.
So yeah, that was why I started making it. Like I needed to extract information from a package to give to other packages, to the review of other packages, and so on and so on.
That's very cool. Yeah, it does open up so many doors. Like the security vulnerability checking is like, I mean, there are some interesting projects in the JS ecosystem these days checking for security vulnerabilities.
But it's just so much easier to get assurances around things like that with Elm and with ElmReview and ElmReview Extractors. It's a really cool space.
And code quality, like, I mean, we don't really have much by way of code quality tools in the Elm community. We have ElmReview.
Right. I mean, those are tools that help ensure code quality, but not code quality tools in the sense of like giving you a code quality report to try to, you know, I mean, of course, like unused functions and things like that are a code quality metric.
But it's sort of like stops being a code quality metric when the tool just fixes it for you. And then like, it's just a static analysis tool.
Yeah, the metric is always zero because it fixed everything.
Exactly. But that would be really cool to have like a suite of code quality tools to look at where to refactor.
Yeah. So I remember talking to the author of Elm Analyze, Mats. He was really liking the direction of ElmReview and he thought it was a great project. So that made me happy.
And he was thinking, well, potentially, I could now make Elm Analyze use ElmReview under the hood and make things like dashboards or code quality metrics.
So exactly this idea. So yeah, I think it's possible. Yeah. What needs to be shown, what is interesting, that remains to be investigated.
Yeah, so many interesting things to explore here. One thing we would be remiss if we did not mention unit testing. There's really not too much to say on this point other than you can do it. You can write unit tests for your data extractors.
If you have tests, then ElmReview's testing framework will force you to explain how the data extract will look like.
Yeah. Basically, ElmReview's testing framework is like, if there's anything that we can check, we will check it.
Whoa, I did not know this. I'm proud to say I did not know it because I have not encountered this situation before, because I really like writing tests for my ElmReview rules. But wow, that's cool.
Yeah. I mean, if your rule provides a fix, then you will have to say how it's fixed. If you say, well, there's an error, it will say, well, what are the message and details and where is that error?
Basically, everything that we can check, we check. I think sometimes it's annoying because you'd have to write a lot of things, especially for data extracts. Maybe it's a bit too much. I could imagine adding a feature that says, please don't force me to write a data extractor.
But so far in general, the idea holds very well. I personally see it as a positive. It has worked for my rules, at least.
Yeah, you could potentially have something that allows you to assert on the data extractor or something. If you want to look at a subset of the values or something like that, I don't know. It's an interesting point.
But yeah, currently what it does is you say, review.test.expectDataExtract, and then you give a JSON string that you expect it to return for your extractor.
I could imagine if you have very rich information, that could become tedious, but it's a great feature.
Yeah, in general, the test cases you use your rules on are pretty small, so I think it's manageable. But yeah, I can absolutely imagine it will be a bit too much.
So if you're hitting that and genuinely it's annoying, open an issue.
Very interesting.
So another use case, I can really see this being used, but I think it might also be, or this data extract idea, but I think it might be a bit too...
It will not work perfectly, but I would like to see an investigation in this, is for explaining things.
For instance, one of the things where people can get confused is when you have update functions, where you have one message that triggers a command,
that triggers another message, which in turn triggers another command, and so on and so on.
So for instance, if you need to process a payment, there's a lot of steps to that, and it can be hard to figure out in which order these messages come in.
And I would love to see a diagram that shows in which order these things are applied.
And I'm thinking that using static analysis, we could make this diagram.
We're using ElmReview data extractors. You get all the information and you now make it a diagram.
And I think you can do diagrams for a lot of things, like even, for instance, as I mentioned, the import modules before.
Well, if you group them nicely, if you format them nicely, if you remove all the unnecessary details,
this can now become an overview of your project that you can show to newcomers to your projects.
It can become documentation that you can force to always be up to date.
So this is something that I would like to see some exploration as well in.
For the example, for the update, it's a lot more local, right?
It's like, well, once you get information for this module, for this function,
maybe it could even be like explaining how the HTML will look like and how it interacts with,
or which buttons, for instance, will trigger what messages. I don't know.
But the part where I'm like, this might not work very well is because you're probably going to have to compute that for your entire project,
which sounds like a lot when you're only interested in a specific part.
You're only interested in the diagram for this one module or this one function.
So maybe the data extract should take an argument.
Something like ElmCogen takes information in, which gets very tricky in terms of trying to cache the results,
but it could be doable in practice.
Like just if we don't care about ElmReview being slower, this could be done. Definitely.
So this is something I think could be pretty cool.
I think Richard Feldman also talked about this for Rock, where they want people to be able to build tools for their packages.
And then whenever you install a package, you also get like editor integrations that help explain things.
Like I don't remember if this was an example, a possible example, but you could imagine you're adding in Rock slash HTML package,
and now you can have a preview of the HTML in your editor, or you have a color package and you can have a color picker in your editor.
I don't know. Things like that.
And yeah, I think I could see the same thing with ElmReview, but like performance wise, this will not work.
And I know you've talked about like having language server like...
Yes, yes.
Intents or actions using ElmReview.
And yeah, it would be nice to go that way, I think.
But like maybe we should have an intervention like that.
ElmReview is doing too much right now.
But these would be interesting explorations, I think.
Yeah, it does make... I mean, the extract functionality makes ElmReview feel more like a platform, which I think is a great thing.
My sort of AI experiment where I'm trying to build like a type solver that replaces certain debug.todos.
That's one of the things I'm doing is I'm actually like extracting the range of a debug.todo and then the relevant context around it of which things are in scope there.
So that the type solver can go and do its thing and then pass that back to ElmReview and give it the range that it's solved for.
And that gives guardrails to the GPT prompt.
So it's not just going and modifying your entire code base.
It's only modifying the part you gave it permission to build an implementation for.
And it ran the Elm compiler on it to make sure type checks and all that.
But so it's actually going back and forth to ElmReview where it extracts from ElmReview, runs a script and then sends the results of that back to ElmReview to perform a replace rule.
A fix. Pretty interesting possibilities there.
Or we could just give the entire code base to an AI and then it would do the same thing that ElmReview would do.
A lot easier.
That's true. Give it five years. We'll see where we are then.
Oh, you pessimist.
I think there's a collaboration to be had between our traditional static analysis and our more modern artificial intelligence approaches.
I've mentioned this idea before of for a phantom builder extracting a diagram that shows you as a state machine, what are the possible states that this builder API can go through.
Yeah, exactly. State machines are a perfect example. It's very easy to draw. There are tools for that.
So you just need to output to this good format. Like dots or mermaid I'm sure do that.
You just need to figure out what information you want a state diagram for and extract it and you're good to go.
Yeah. Sky's the limit. I mean, there's so much cool stuff we could build.
So, yeah, again, if you build some cool stuff, let us know. Tweet at us.
Yeah. If you want to get started with this feature, we will link the article that I put on my blog, which is called Gaining Insight into your Codebase with Elm Review.
So read that and otherwise read the documentation for with data extractor in Elm Review's package documentation.
Yeah. And let us know what you come up with.
Yeah. Great stuff. Well, thanks again for the great feature and Jeroen, until next time.
Until next time.