Where to start for a macro of `@if`

I have an idea that I would like to try and implement in Jula. Ideally, a workflow would look like this:

df.new = df.x .- mean(df.x) @if df.y .> 1 

We have on one side a function f(x) = x -> x .- mean(x) and on another side a boolean array. This macro @if imputes f(x) at df.new[i] if df.y[i] > 1 and missing otherwise.

In a function, it would probably look something like this. Notice that I subset the vector first, meaning that I could do x = [1,2,"a string] and still have the operation work because we don’t evaluate the whole vector on the lhs.

function my_if(f::Function, v::Vector, b::Vector{Bool})
values = f(x[b])
T = promote_type(eltype(values), Missing)
  out = Vector{T}(undef, length(b))
  nonmissing_counter = 1
for i in eachindex(b)
  if b[i] == true
    out[i] = t[nonmissing_counter]
    nonmissing_counter = nonmissing_counter + 1
  else
    out[i] = missing
  end
end
return out
end

I like the idea of a macro doing this. But I can’t figure out how to have a macro “look backward”. Presumably it has something to do with the way you write functions like + and isa where you can do x isa Vector etc. But I’m a bit lost on the implementation.

Any help is appreciated. Thanks.

Macros cannot “look backward”. The parser will only pass to the macros expressions which happen after it. So you’d need to put the macro call at the beginning of the line, or after =.

That said, maybe you could achieve something similar using DataFramesMeta. For example, @byrow! makes it easy to do if :y > 1; missing; else ... end.

Thanks for the feedback. byrow! is tough because it means you have to use escape syntax to evaluate the column rather than just element row of that column. You also don’t get the subsetting behavior in my toy function above.

Having the @if at the beginning is definitely reasonable (though we leave stata behind. Perhaps for the best)

What are people’s thoughts on the custom parsing of x .- mean(x). My impression is that it would be difficult to reason about if a macro automatically went in and replaced x with some subset of x. Are there other parts of julia that do something similar?

Rather, I would bet syntax would probably be something that is exactly like my_if, which uses a separate function. Only with some macro to make it more closely resemble stata-esque code.

I don’t think it would be confusing if the macro replaced x with the subset vector. Though I’d keep the if in the position where it is, and use another name for the macro. This kind of thing could even be an extension of DataFramesMeta’s @with, enabled when one of the arguments is an if block.

I will try and figure out a macro environment where I can do something like

@subset begin 
...everything is normal
x = y @if boolean_vector # special behavior
end

And see how hard that is to implement.

One thing that programmers don’t like about Stata is that

if x > y {
   z = x 
}

and

z = x if x > y

Mean totally different things. The first actually evaluates to if x[1] > y[1] and then operates like any other if block. The second is the subsetting I am describing.Hopefully whatever I am able to make takes the ease of stata’s data cleaning without it sacrifices in programming abilities.