Despite the apparent simplicity of _ placeholder syntax, the design of a convenient and general but also simple meaning for _ has resisted all our efforts. (At this point there’s been a ridiculous amount of effort put into exploring that design space.) So now we’ve got a bunch of special purpose packages which make various tradeoffs but are mostly centered around piping.
Underscores.jl is my attempt at a general _ placeholder syntax which happens to work with pipes rather than a special purpose piping syntax package.
(:
At least some of the packages you listed are focused on specific usecases, so it shouldn’t be very difficult to choose among these. For example, Chain.jl clearly aims at dataframe operations with their calling conventions; DataPipes.jl’s focus is at data processing operations with common functions following the Julia convetion; Underscores.jl - see Chris’s post.
Nevertheless, there is clearly lots of overlap. Something like @p sum(_ * 2, 1:10) would look basically the same with any approach.
It doesn’t seem easily avoidable indeed… For example, Chain and DataPipes cannot follow the same conventions without adding boilerplate, because typical calls of dataframe functions are significantly different from other functions.
Agreed. DataFrames is very clever and convenient with its domain specific column naming syntax. It’s just unfortunate that the syntax doesn’t mesh well with any reasonably general meaning of _.
The situation is kind of unsatisfactory but it’s unclear how to proceed without a good language syntax option for placeholders. A lightweight way to delimit the extent of the lambda helps but there hasn’t been a lot of support for that, nor a really nice syntax candidate. (The delimiter could either go outside the function which accepts the lambda (as in Underscores.jl and DataPipes) or on the lambda itself, a bit like Swift’s shorthand argument names or key paths.)
Their let articleIDs = articles.map { $0.id } is indeed a nice and consice lambda syntax. Does it nest?
Regarding key paths: maybe I missed something, but they seem much less general compared to arbitrary functions. As I understand, their close equivalent in Julia are lens in Setfields.jl/Accessors.jl, which have even more advanced functionality.
I guess you mean that a good general meaning of _ would swallow the => operators as part of the lambda function body? I more or less disagree.
For a standalone _ (without another piece of syntax to delimit the scope of the lambda) I think the “tight” meaning implemented in #24990 is the best: it’s far from covering all use case but it does have general usefulness, and it’s super clear and readable.
For more complex lambda bodies, a delimiter like @_ is required.
We can have both, and they both work nicely with DataFrames
# Would work if #24990 is merged
transform(df, :a => _.^2 => :a_square)
# Already works with Underscores.jl
transform(df, :a => @_ exp.(_.^2) => :exp_a_square)
It’s worth noting that this exact same DataPipes.mutate syntax works with Underscores.@_ (and has done since shortly after it was released). So this is further evidence that these packages are very similar in design. If we could agree on a general solution for implicit __ vs ↑ and naming of arguments, it may be that these packages can join together in some way.
julia> using TypedTables
julia> t = Table(a=[1,2,3], b=[4,5,6]);
julia> @_ mutate(exp_a_square=exp(_.a^2), a_square=_.a^2, t)
Table with 4 columns and 3 rows:
a b exp_a_square a_square
┌─────────────────────────────
1 │ 1 4 2.71828 1
2 │ 2 5 54.5982 4
3 │ 3 6 8103.08 9
julia> @_ mutate(comp_val=_.a > _.b ? _.a^2 : _.a, t)
Table with 3 columns and 3 rows:
a b comp_val
┌───────────────
1 │ 1 4 1
2 │ 2 5 2
3 │ 3 6 3
mutate seems pretty handy. Perhaps something like it could go into SplitApplyCombine?
mutate seems straight from dplyr; if a package is going to go that route I think it would be good to look more comprehensively at the dplyr API so Julia’s version can get a cohesive look-and-feel.
Totally agree that there is a significant overlap, and short single-step examples work completely the same in these packages (maybe even in Chain.jl).
I myself don’t see a general and still no-boilerplate interpretation of the differences, but would be very curious to know if there is any. DataPipes clearly implements a less general approach, but is more convenient for piped data analysis (hence the name (: ). Also, there are pipe-related features on top, like @export macro, and I also plan to add @aside macro like in Chains.jl. They don’t seem fit for a really general _-package like Underscores.jl, but I may be mistaken here.
Currently, this function (and some other short ones) is defined in DataPipes, but not documented. The reason is I don’t know where it is best to put them, and they may be changed/removed at any time. Maybe you are right and SAC.jl is the right place for them to go…
I agree with Chris, that specific functions like mutate should really be out of scope of DataPipes and similar packages. Don’t know or use dplyr myself, but in would be interesting to see someone attempt to implement a similar interface in Julia, if there is none yet. Currently available functions (Base, SAC.jl, …) may be less “cohesive”, but are more general than dplyr and dataframes.
That’s true, there’s some things which will only apply to pipes but are super handy such as having variables for partial results assigned within the pipeline.
Underscores.jl actually does have a small accommodation for |> syntax (also ∘, <|, .|> and .<|), but only in the sense that it recurses into such expressions and applies the same _ replacement rules inside them, rather than treating them as normal call expressions.
That’s a clever and general solution!
Indeed, having a pure syntax indication such as __ in the first pipe step helps distinguish between a function definition and application.
julia> data = 5:12
5:12
julia> @_ data |>
filter(_>10) |>
map(_^2)
2-element Vector{Int64}:
121
144
This is type piracy, of course! But this version of Base.filter isn’t defined, and though Base.map(f::Function) is defined the existing definition of mapping over zero collections seems pretty useless.
Yes Transducers.jl is really cool for many reasons.
But sometimes you just want to do some quick data processing without any extra dependencies, which is why I kind of wish we had versions of normal map() and filter() as above.
Out of my habit, I find it easier to write/read code where the functions are embedded (# 2) rather than the form # 1.
But I like the ability to shorten the syntax.
The first form I tried was # 5, which doesn’t work. Then I found the other shapes that give the expected result, but I’m not sure I understand why # 5 doesn’t work and # 4 does.
But maybe there is an even more correct way to get what I was looking for.
ok. I have the same version (downloaded yesterday) and now I get the same results as you. I don’t know what to think.
I apologize for the wrong report.
I had initially tried with the expression
@p filter(exp(_)>5, map(_^2,1:4) )
which I then corrected in form # 5 and since it seemed to me (perhaps confused with the initial form) that this didn’t work, so I tried first with # 3 and then with # 4.
Could you in this case use the nested form with the placeholder _1?
PS
is it possible to retrieve the log of the outputs (the one of the inputs I have) of yesterday’s session made in the vscode environment?