Generate package code: string processing or metaprogramming?

Hi,

I am wrapping a low-level API generated with Clang.jl, and generate source code (with some string processing) to programmatically create the wrapper code (around 30k lines of Julia code, it’s a big library). I am wondering if I can exploit the Julia AST and avoid having to do this string-based processing.

This thread points out that using metaprogramming to generate code is more elegant and allows one to reuse related tools rather than creating its own mini-tool for emitting source code from strings. I totally agree, but to me there are some downfalls that, as I see it, make it scale very badly.

First, to me it seems that using AST transformations lose a lot of readability. Particularly:

  • Exprs can be very hard to read when it is non-trivial, especially for those not familiar with it,
  • Debugging the code can be really hard, since there’s no file/line to look at.

Second, how would you store this code so that it only has to be generated once? For example, code generation may rely on other packages (in my case, packages to parse a XML or JSON specification), which are not relevant outside this code generation context. You don’t want to have users re-generate it, even at precompilation time, because they would need those additional dependencies.

One partial solution might be to do the processing on the AST, and convert everything to text code. However it feels like I’m going to need a lot of string-based processing, which I want to avoid. Or maybe there is a practical way of producing human-readable (text) code from AST? If the generated code can be put into a text file to be included in the package, that would remove many of the mentioned problems.

What would you recommend? Stick to generating strings, dumping that to a file and including it? Or generating Exprs, and storing it somewhere somehow to avoid doing any fragile string processing?

What do you mean by hard to read? Any parsable expression should not be harder to read than the string version since they are exactly the same.

This is only true for people that strip the line numbers.

You could do this at build time instead of precompilation time. Or if your goal is only to generate code once at development time and review the code to be published as part of your sourcecode then yes string processing is fine and there’s little to do with julia (i.e. you could do that processing basically any language). You could store a binary serialization of the code to be evaluated at runtime but that’s not really necessary.

2 Likes

Your answer makes me realize that I had quite a big misconception about metaprogramming. I didn’t know that you could actually just print expressions to get valid syntax, so technically you can just generate an AST and print everything to a file to generate code. I also thought that evaluating expressions would leave the user with a feeling similar to macro-based APIs, in that if it works it’s great but if it doesn’t you have to look at the macro definition to have the slightest chance of understanding why. This is not the case at all, the stacktrace is what you would get from written source code anyway.

I think I’ll just embrace metaprogramming from now on (where needed of course). Thanks once again @yuyichao! :slight_smile:

3 Likes

This is generally true for most expressions though there are expressions that can’t be printed and also probably bugs = = …
If your goal was to general and save code (again, totally fine depending on the usecase, e.g. during build time for fairly static input) you kinda have to go through string anyway so using string isn’t that bad in this case and constructing expressions and then print them isn’t that much better. (you’ll get automatic indentation I guess, which is quite nice …)

Correct, and that’s the reason I really don’t like the strip linenumber function (IIRC from MacroTools) …
It’s useful of course since sometimes you are generating code that will only be run once or isn’t “code” but declarations in which case this matters much less, but still, I feel like people were abusing it (didn’t check recently but I’ll be surprised if it changed…).

MacroTools offers really nice functions, and it complements well the existing tooling around expressions. From what I see it’s a lot easier to construct expressions, rather than constructing strings, and it’s a lot more robust. For example there’s no longer need for super-advanced but obscure functions to allow for building a method definition with parametrized argument types. You also don’t need to reimplement some kind of parser to get input inside your sort of “IR”. I would be interested in knowing what type of expression can’t be printed though. But if those are just some corner cases, then I think it’s fine to just write a function that outputs a string for those parts of expressions.

Yes, you’d want to use string manipulation when printing to files (so that code looks OK), but the advantage is that you don’t need it to consume the generated code; it’s optional. Plus, for styling you can always use a formatter (e.g. JuliaFormatter), as long as you have valid input it should be straightforward. Then, adding some new lines here and there and organizing logic into separate files (if you really want to push for a well-written library) shouldn’t be too complicated. As long as the core of the expressions don’t have to be constructed with strings, a lot of effort and complexity is saved.

It also makes it easier for running checks on the generated code (e.g. make sure variables are assigned correctly; you don’t have to figure out whether there is an occurrence of foo on the lhs of an equal sign at the top-level of an expression, i.e. presumably outside parentheses, curly braces or stuff like that).