As you know - I tend to use DataFrames.jl mainly and this is what I get:
julia> @time @pipe filter([:STATUS, :MONTHS_BALANCE] => (x,y) -> x != "C" && y > -23, bureau_bal) |>
setindex!(_, (x -> x=="X" ? 0 : parse(Int, x)).(_.STATUS), !, :STATUS) |> # or transform!, which is a bit slower as it does more work, but nothing significant
groupby(_, :SK_ID_BUREAU) |>
combine(_, :STATUS => maximum => :worst_status_l12m) |>
rightjoin(_, bureau, on = :SK_ID_BUREAU);
2.223288 seconds (7.74 M allocations: 1.383 GiB, 6.63% gc time)
where the most expensive part is rightjoin that takes over 1 second (and as noted above it is known that this where there is much to be improved).
(and on my laptop R codes take ~3 seconds)
So the reason for slow performance is that convenience packages most probably do not generate an efficient low-level DataFrames.jl code.
The new DataFrames backend for @transform, etc. was only merged into master 7 days ago, so any speed improvements won’t be reflected on the release branch