Fixing the Piping/Chaining Issue

Okay I guess I misunderstood what you meant by “actual real-life example.”

The best examples will not be of methods or types from Base, but from packages that create complicated objects with many tightly-specialized methods, for which method discovery will be most appreciated.

I’ll use DataFrames.jl as an example. Keep in mind, this is just a simple example of how autocomplete could behave, solely by acting on types.

Example Possible Autocomplete Behavior

Here's a walked-through example of how an autocomplete *could* work with underscore syntax. (click to expand)

Let’s create an object df = DataFrame(a=1, b=2). When I type

df |> 

I should see a (very very long) list of methods appear: those which specialize on typeof(df), followed by methods which specialize on supertype(typeof(df)), followed by supertype(supertype(typeof(df))), etc., sorted by degree of specialization. The list is about two thousand entries long, something like this:

df |>
  append!(df1::DataFrame, df2::AbstractDataFrame; cols, promote)
  append!(df::DataFrame, table; cols, promote)
  copy(df::DataFrame; copycols)

  ⋮ other methods of `DataFrame`
  ⋮ (vdots here to shorten my explanation)

  Array(df::AbstractDataFrame)
  ==(df1::AbstractDataFrame, df2::AbstractDataFrame)
  (Matrix)(df::AbstractDataFrame)

  ⋮ other methods of `AbstractDataFrame`

  ArgumentError(msg)
  AssertionError(msg)
  BoundsError(a)

  ⋮ other methods of `Any`

The fact that we have underscore syntax in the language means I can call any of these methods conveniently using the pipe operator. The list was simply created by calling methodswith of the type and its supertypes, with no attention paid to argument position.

Pressing CTRL+B (or something, some hotkey combination) might change settings. For example, maybe I want to see only methods that act on abstract types, in which case pressing CTRL+B could bring up:

df |>
  Array(df::AbstractDataFrame)
  ==(df1::AbstractDataFrame, df2::AbstractDataFrame)
  (Matrix)(df::AbstractDataFrame)

  ⋮ other methods of `AbstractDataFrame`

  ArgumentError(msg)
  AssertionError(msg)
  BoundsError(a)

  ⋮ other methods of `Any`

But for now, I decide I want to see methods specialized to strictly this concrete type. So I press CTRL+B again and I see:

df |>
  append!(df1::DataFrame, df2::AbstractDataFrame; cols, promote)
  append!(df::DataFrame, table; cols, promote)
  copy(df::DataFrame; copycols)
  delete!(df::DataFrame, inds)
  deleteat!(df::DataFrame, inds::InvertedIndex)
  deleteat!(df::DataFrame, inds::AbstractVector{Bool})

  ⋮

And then I can scroll down the list to find what I’m looking for. One neuron fires in my brain and I remember that the first character is a p. So I type p and I see:

df |> p
  pop!(df::DataFrame)
  popat!(df::DataFrame, i::Integer)
  popfirst!(df::DataFrame)
  prepend!(df1::DataFrame, df2::AbstractDataFrame; cols, promote)
  prepend!(df::DataFrame, table; cols, promote)
  push!(df::DataFrame, row::Union{AbstractDict, NamedTuple}; cols, promote)
  push!(df::DataFrame, row::DataFrameRow; cols, promote)
  push!(df::DataFrame, row; promote)
  pushfirst!(df::DataFrame, row::Union{AbstractDict, NamedTuple}; cols, promote)
  pushfirst!(df::DataFrame, row::DataFrameRow; cols, promote)
  pushfirst!(df::DataFrame, row; promote)

The list is now sufficiently short that I can see the whole thing and remind myself that the function I wanted to call was pushfirst!, and I type u. Now I see:

df |> pu
  push!(df::DataFrame, row::Union{AbstractDict, NamedTuple}; cols, promote)
  push!(df::DataFrame, row::DataFrameRow; cols, promote)
  push!(df::DataFrame, row; promote)
  pushfirst!(df::DataFrame, row::Union{AbstractDict, NamedTuple}; cols, promote)
  pushfirst!(df::DataFrame, row::DataFrameRow; cols, promote)
  pushfirst!(df::DataFrame, row; promote)

I hit <tab> and it autocompletes to push:

df |> push
  push!(df::DataFrame, row::Union{AbstractDict, NamedTuple}; cols, promote)
  push!(df::DataFrame, row::DataFrameRow; cols, promote)
  push!(df::DataFrame, row; promote)
  pushfirst!(df::DataFrame, row::Union{AbstractDict, NamedTuple}; cols, promote)
  pushfirst!(df::DataFrame, row::DataFrameRow; cols, promote)
  pushfirst!(df::DataFrame, row; promote)

Now I type f:

df |> pushf
  pushfirst!(df::DataFrame, row::Union{AbstractDict, NamedTuple}; cols, promote)
  pushfirst!(df::DataFrame, row::DataFrameRow; cols, promote)
  pushfirst!(df::DataFrame, row; promote)

I press <tab> again and the name fully fills out, including the unfixed argument and placing the cursor after the comma. Now I see:

df |> pushfirst!(_, )
  pushfirst!(df::DataFrame, row::Union{AbstractDict, NamedTuple}; cols, promote)
  pushfirst!(df::DataFrame, row::DataFrameRow; cols, promote)
  pushfirst!(df::DataFrame, row; promote)

and the list all but disappears as I fill out the rest of the arguments.

df |> pushfirst!(_, [1, 2])
  pushfirst!(df::DataFrame, row; promote)

The autocomplete has assisted me in finding the method I was looking for, enabling me to search for methods which specialize on its concrete type.

Where m is one of the methods returned by calling methodswith, this simple example uses only m.sig and doesn’t sort by argument position at all. However, it could be imagined to do so.

In addition, it could be imagined to use m.module to sort methods by what module defined them, showing first the methods defined in the same module as this object (using parentmodule(MyType)); m.file to find only the methods which were defined in the same file as this object; or any of the other properties of a Method to return better search results. (and the autocomplete could have a suite of hotkeys, or a settings panel, or some settings popup dialog, to determine how it searches.) It could also use statistical inference based on function call data from GitHub, or even your personal use data, to return better search results.

I hope I don’t have to write my own autocomplete, and I’m totally incompetent, but if I’m pushed hard enough…

1 Like

Thanks for the detailed comparison!

I’ll admit that I looked at the OP and thought “oh more chaining/anon function stuff” and jumped to conclusions, apparently without reading the OP properly :sweat_smile:

So the significant underlying differences here weren’t exactly intentional! But might be an interesting alternative take.

2 Likes

Just some thoughts, it’s been a long time since I used Common Lisp but if I remember correctly you can actually replace the parser at run time with something else. There’s something called the “readtable” and you can make “read” do basically anything… This can be super useful for constructing entire honest to goodness domain specific languages which are not a subset of the basic syntax of common lisp. Similarly I could imagine something like:

@with_reader foo ...

and in the ‘…’ it would actually call the reading function foo to read and parse the stuff in …

i’m imagining that JuliaSyntax might actually enable that sort of thing to happen? Whether that’s a good idea or not is up in the air :sweat_smile:

Yes, in fact I might be inclined to say it’s better! I think it seems to sacrifice some of the most powerful functor-currying-partial-application power of the original proposal, but in return it gets to be a lot simpler and more obvious, and probably easier on the compiler, while still satisfying probably the majority of use cases. I think we can still do things like map(/> myfunc(myparam), myvec)

1 Like

Counterproposal, rewriting yours:

Example Possible Autocomplete Behavior

Let’s create an object df = DataFrame(a=1, b=2). When I type

df |> 

I should see a (very very long) list of methods appear: those which specialize on typeof(df), followed by methods which specialize on supertype(typeof(df)), followed by supertype(supertype(typeof(df))), etc., sorted by degree of specialization. The list is about two thousand entries long, something like this:

df |>
  append!(DataFrame,
  copy(DataFrame;

  ⋮ other methods of `DataFrame`
  ⋮ (vdots here to shorten my explanation)

  Array(AbstractDataFrame)
  ==(AbstractDataFrame,
  (Matrix)(AbstractDataFrame

  ⋮ other methods of `AbstractDataFrame`

  ArgumentError(msg)
  AssertionError(msg)
  BoundsError(a)

  ⋮ other methods of `Any`

And then I can scroll down the list to find what I’m looking for. One neuron fires in my brain and I remember that the first character is a p. So I type p and I see:

df |> p
  pop!(DataFrame)
  popat!(DataFrame,
  popfirst!(DataFrame)
  prepend!(DataFrame,
  push!(DataFrame,
  pushfirst!(DataFrame,

I think maybe for first level we should show two levels, typeof(df), but not followed by supertype(typeof(df)) rather disregarding the distinction (unless the supertype is Any?), and order alphabetically as one group?

2 Likes

Okay so since working through that example I basically did all the work needed to figure out how to it anyway, I threw together a quick and simple script that does what I described for a simple autocomplete (you knew I would, didn’t you :triumph:)

Code here

function propose_method_autocompletions(obj, func_name_fragment::String=""; 
    type_depth = 1:typemax(Int), only_same_module = false, 
    prioritize_firstlast = false, github_inference = false, personalized_inference = false)::Vector{Method}

    @assert first(type_depth) ≥ 1 && last(type_depth ≥ 1)
    recs = Vector{Method}[]
    get_type_at_depth(type, depth=1) = depth == 1 ? type : type == Any ? nothing : get_type_at_depth(supertype(type), depth-1)

    for i ∈ type_depth
        stype = get_type_at_depth(typeof(obj), i)
        isnothing(stype) && break

        meths = filter(only_same_module ? methodswith(stype, parentmodule(typeof(obj))) : methodswith(stype)) do m
            length(func_name_fragment) > length(string(m.name)) && return false
            string(m.name)[1:length(func_name_fragment)] == func_name_fragment
        end

        prioritize_firstlast || true # do cool sorting stuff, add later
        github_inference || true # do cool sorting stuff, add later
        personalized_inference || true # do cool sorting stuff, add later

        recs = [recs; meths]
    end

    recs
end

To invoke:

using DataFrames
df = DataFrame(a=1, b=2)

propose_method_autocompletions(df)

By default, it returns all available methods that can act on object df. As you start to type in a function name, the list gets narrowed down quickly:

propose_method_autocompletions(df, "p")
propose_method_autocompletions(df, "pu")
propose_method_autocompletions(df, "pushfirst!")

You can also change the search depth. By default, it searches specializations on typeof(df), as well as supertype(typeof(df)), supertype(supertype(typeof(df))), and so on. This can be changed like so:

propose_method_autocompletions(df, "p"; type_depth=1:2)

This gives methods of DataFrame as well as AbstractDataFrame, but not Any. The iterator can also be reversed 2:-1:1 to show the abstract type’s methods first.

You can also restrict the search to only methods that are defined in the same module as DataFrame:

propose_method_autocompletions(df, "p"; only_same_module = true)

And, one day, it’ll work with the arg_order_inference, github_inference and personalized_inference options too. :wink:

Someday it could be interesting to make it feed its results into a Channel, so that it can offer results with greater immediacy.

Unfortunately it doesn’t solve *my* problem, because the module didn’t export its methods so they don’t appear in methodswith :sweat_smile:

Oddly, when calling methodswith(DataFrame, DataFrames) (trying this to see if it improves speed over filtering *after* the methodswith call), it simply doesn’t return a bunch of methods of DataFrame that are indeed declared by the DataFrames module. For example, pushfirst! is missing. Strange.

3 Likes

Yes I see. It’s very funny that we’ve done the opposite things here in terms of precedence, but with the end result being almost the same from the user’s perspective :laughing:

I think /> having high precedence would be surprising to anyone who’s trying to use it for piping. For example, what does this mean?

x /> Mod1.g[1].h(y)(z)
  • Low precedence: Mod1.g[1].h(y)(x, z)
  • High precedence: Mod1.g[1].h(x,y)(z) … presumably?

Couple of thoughts here:

  • Is there a flexible lowering which might allow pipelines of /> to be optimized more completely. For example, turning x \> map(_^2) \> reduce(+) into mapreduce(_^2, +, x). In particular I’m thinking of how transducers are a lot better for compilation than iterators, as described here Comparison to iterators · Transducers.jl and that we could make use of the iterator-transducer duality if we could give the compiler the right insight. @tkf :slight_smile:
  • What about first class pipelines without data such as \> map(isodd) \> filter(_>10)? Well ok I guess these can be lowered to a lambda x -> filter(y->y>10, map(isodd, x))
2 Likes

To answer my own question, yes! The fixbutfirst/fixbutlast lowering already allows this. Adding the following definition to chain will turn a map \> reduce pipeline into a call to mapreduce:

function chain(x, f1::FixButLast{typeof(map)}, f2::FixButLast{typeof(reduce)}, fs...)
    chain(x, fixbutlast(mapreduce, f1.args..., f2.args...; f1.kwargs..., f2.kwargs...), fs...)
end

Presumably this is not the best such lowering for such transformations, as this is a fairly special case. But it does give a tantalizing taste of possibility.

2 Likes

Great, thanks for the implementation!

A nice thing is that such autocomplete can be added without any changes to julia syntax: just expand it to obj |> it->func(<cursor here>, it) instead of obj |> func(<cursor here>, _). So, it can already be implemented in a package using eg ReplMaker.

I tried your propose_method_autocompletions, and some kind of sorting would definitely help.
For a simple table, tbl = [(a=1,)], the first suggestions without typing the method name are:

ulia> propose_method_autocompletions(tbl)
[1] ArgumentError(msg) in Core at boot.jl:325
[2] AssertionError(msg) in Core at boot.jl:341
[3] BoundsError(a) in Core at boot.jl:277
[4] BoundsError(a, i) in Core at boot.jl:278
[5] ConcurrencyViolationError(msg) in Core at boot.jl:290
[6] DomainError(val) in Core at boot.jl:296
[7] DomainError(val, msg) in Core at boot.jl:297
[8] ErrorException(msg) in Core at boot.jl:267
[9] InexactError(f::Symbol, T, val) in Core at boot.jl:318
[10] InitError(mod::Symbol, error) in Core at boot.jl:354
[11] InitError(mod, error) in Core at boot.jl:354
[12] LineNumberNode(l::Int64, f) in Core at boot.jl:407
[13] LoadError(file::AbstractString, line::Int64, error) in Core at boot.jl:348
[14] LoadError(file, line, error) in Core at boot.jl:348
[15] MethodError(f, args) in Core at boot.jl:338
[16] MethodError(f, args, world::UInt64) in Core at boot.jl:335
[17] (::Type{T})(itr) where T<:Tuple in Base at tuple.jl:317
[18] NamedTuple(itr) in Base at namedtuple.jl:123
[19] OverflowError(msg) in Core at boot.jl:321
[20] Pair(a, b) in Core at boot.jl:825
...

None of these methods are commonly used on tables/collections.
Typing propose_method_autocompletions(tbl, "push") does find the push! function of course, but still doesn’t show the expected push!(tbl, ...) method among the first suggestions. The first is push!(s::Set, x), so it wants to put tbl as the 2nd argument.

Autocomplete based on the ideas from this topic can already be implemented in a package, and would be great if it turns out actually useful! But for now I’m skeptical, given a huge amount of methods specifying few to no argument types.

3 Likes

Oops! It appears that calling methodswith on a parameterized type doesn’t give the methods of the unparameterized type! My autocomplete was failing to find any methods of Vector because it was being parameterized with the type of its contents.

See this example.

julia> struct MyThing{T} end

julia> foo(::MyThing) = 1
foo (generic function with 1 method)

julia> methodswith(MyThing)
[1] foo(::MyThing) in Main at REPL[157]:1

julia> methodswith(MyThing{1})


julia> bar(::MyThing{1}) = 2
bar (generic function with 1 method)

julia> methodswith(MyThing)
[1] foo(::MyThing) in Main at REPL[157]:1

julia> methodswith(MyThing{1})
[1] bar(::MyThing{1}) in Main at REPL[160]:1

As a result, because an object such as [(a=1,)] is of type Vector{NamedTuple{(:a,), Tuple{Int64}}}, none of the methods of Vector appear.

I’ve rewritten it so it’ll detect if there are no methods of the parameterized type, and if so, give methods of the unparameterized type (with special case handling for vectors). I think it’ll have poor behavior in some particular case, but for a quick hack I think it’s not too bad. Might consider doing something more sophisticated later. find methods of the parameterized type, and append methods of the unparameterized type with special case handling for vectors and matrices.

Here's the code now.

function propose_method_autocompletions(obj, func_name_fragment::String=""; 
    type_depth = 1:typemax(Int), only_same_module = false, only_imported_modules = false,
    arg_order_inference = false, github_inference = false, personalized_inference = false)::Vector{Method}

    @assert first(type_depth) ≥ 1 && last(type_depth) ≥ 1
    recs = Vector{Method}[]
    get_type_at_depth(type, depth=1) = depth ≤ 1 ? type : type == Any ? nothing : get_type_at_depth(supertype(type), depth-1)

    for i ∈ type_depth
        stype = get_type_at_depth(typeof(obj), i)
        isnothing(stype) && break

        meths = methodswith(stype)
        if !(stype isa UnionAll) && length(stype.parameters) > 0
            if stype <: AbstractVector # special case handling for vectors
                meths = [meths; methodswith(getfield(parentmodule(stype), nameof(stype)){T,1} where T)]
            elseif stype <: AbstractMatrix
                meths = [meths; methodswith(getfield(parentmodule(stype), nameof(stype)){T,2} where T)]
            end
            meths = [meths; methodswith(getfield(parentmodule(stype), nameof(stype)))]
        end

        meths = filter(meths) do m
            length(func_name_fragment) > length(string(m.name)) && return false
            string(m.name)[1:length(func_name_fragment)] == func_name_fragment || return false
            only_same_module && (parentmodule(typeof(obj)) == m.module || return false)
            # How to detect only modules that have been explicitly imported?
            true
        end

        arg_order_inference || true # do cool sorting stuff, add later
        github_inference || true # do cool sorting stuff, add later
        personalized_inference || true # do cool sorting stuff, add later

        recs = [recs; meths]
    end

    recs
end

Calling it on the object of [(a=1,)]:


julia> propose_method_autocompletions([(a=1,)])
[1] CapturedException(ex, bt_raw::Vector) in Base at task.jl:12
[2] \(A::LinearAlgebra.Transpose{<:Complex, <:LinearAlgebra.Hermitian{<:Complex, <:SparseArrays.AbstractSparseMatrixCSC}}, B::Vector) in SparseArrays at C:\Users\unime\AppData\Local\Programs\Julia-1.8.0\share\julia\stdlib\v1.8\SparseArrays\src\linalg.jl:872
[3] \(L::SuiteSparse.CHOLMOD.FactorComponent, b::Vector) in SuiteSparse.CHOLMOD at C:\Users\unime\AppData\Local\Programs\Julia-1.8.0\share\julia\stdlib\v1.8\SuiteSparse\src\cholmod.jl:1521
[4] append!(a::Vector, items::AbstractVector) in Base at array.jl:1105
[5] deleteat!(a::Vector, i::Integer) in Base at array.jl:1485
[6] deleteat!(a::Vector, r::AbstractUnitRange{<:Integer}) in Base at array.jl:1491
[7] deleteat!(a::Vector, inds::AbstractVector{Bool}) in Base at array.jl:1590
[8] deleteat!(a::Vector, inds::AbstractVector) in Base at array.jl:1533
[9] deleteat!(a::Vector, inds) in Base at array.jl:1532
[10] empty!(a::Vector) in Base at array.jl:1738

⋮

[3209] groupview(f, X; restype, kwargs...) in FlexiGroups at C:\Users\unime\.julia\packages\FlexiGroups\1ItB2\src\base.jl:41

Beyond that, for truly generic methods that don’t specialize on anything, whose behavior can be thought of as really generalizable across any object, they shouldn’t float to the top of such a simple autocomplete anyway; if a method is general enough not to specialize on any type, it’s probably general enough to have memorized.

It’s possible to make them float to the top anyway, using a sorting technique based on statistical inference of a model fitted to a codebase (imagine if you could train your autocomplete to your own codebase!), but I’m not going to make that at the moment.

Without such inference built-in, for methods to gain visibility, they need to be specialized to such types as AbstractArray, AbstractDict, or etc.

3 Likes

This would be super cool. IMHO compiler tooling should do a lot more of this kind of thing. I want some kind of data-driven approach to the JuliaSyntax.jl diagnostics system. What exactly, I’m not sure yet :slight_smile: I wish for some happy medium between traditional compilers (where all the diagnostic messages are hand-coded at huge engineering effort and are still hard to interpret), vs hugely bloated natural language models which seem hard to train and deploy, but can perform amazing feats of inferring what the user intended.

2 Likes

I think this is a relatively straightforward problem, conceptually anyway. One would wish to collect the frequencies in the codebase with which Tᵢ typed objects are fed to mⱼ methods. Then, when autocompleting methods for a Tᵢ object, simply sort the methods by order of call frequency.

I think this would probably be an optional mode: for example, one mode of autocomplete might be to sort based on type specialization; another mode might be to sort by frequency of use in your personal codebase; another mode might be to sort by frequency of use in all GitHub repos.

I think it’s probably decent to use the GitHub repo method use frequencies as a starting point, and then do some sort of weighted average with the frequencies they’re used in personal use. That way you don’t overfit to your own codebase.

It’s also possible to find the frequencies associated with certain packages being loaded (e.g., somebody who has loaded LinearAlgebra is more likely to call qr), and do a weighted average of the frequencies associated with the packages you’ve loaded in your project to get an autocomplete that’s more likely to be perfect for your specific use case. The possibilities are endless.

Indeed, although as we’ve shown here, compile times for lambdas can get prohibitive if it’s done a lot, sometimes two orders of magnitude greater compile time than the alternative. Much better to have dedicated syntax to make a partial application functor.

It’s also simply cleaner to have dedicated syntax.

I think the piping and the autocomplete are really orthogonal, though some might prefer certain syntax as a UI to autocomplete.

Imagine something like vscode if you could go to the Julia workspaces panel and just right click a variable and ask it to “explore methods for this variable” maybe it pops up a new panel with methods and you can filter them by characters you type into a search box as well as select which modules you want to search etc etc.

4 Likes

Cool idea!

I like that we’ve moved the conversation from “autocomplete is infeasible” to “this is how I would do it.” :wink:

In the same way that . dot-syntax is syntax sugar that motivates tab-complete for property names, a proper chaining syntax motivates tab-complete for method names. Hopefully this should make idiomatic Julian style easier and more accessible.

I don’t think people claimed “autocomplete is infeasible”, but we did give reasons for why the syntax proposal doesn’t really help the core problem and that the core problem can be solved without new syntax (as you’ve shown, by implementing a rudimentary autocomplete yourself).

I’m also pretty sure that LanguageServer.jl already has that feature, provided it knows the type of the object. That is a requirement that hasn’t changed, as your code literally has typeof(obj) in it, which is not necessarily something an editor can do because the editor would potentially have to run your code to figure that out, which you surely don’t want (or run type inference on all of your code, which runs into the problem of “partial inference is not implemented”).

Not every piece of code will be executed in the REPL, where the object already exists (possibly…).

1 Like

My point was that the actual core problem was one of motivation: people need a reason to work on an autocomplete, hopefully beyond just being an academic pursuit. Which syntax sugar provides. In the same way, somebody was motivated to write an autocomplete for . but not getproperty, despite the former being just sugar for the latter, in the confidence that the autocomplete on . would find heavy use and the effort would not go to waste.

With sufficient motivation, somebody smarter than me will figure this out I’m sure. In the meantime, I’d be pretty happy to have autocomplete in REPL mode.

2 Likes

I don’t think it has yet been mentioned that many of the pipes in this thread can be implemented without a macro by using the Transducers.jl package. For example:

julia> using Transducers

julia> [4, 9, 16] |> Map(sqrt) |> Filter(iseven) |> collect
2-element Vector{Float64}:
 2.0
 4.0

In fact, I suspect that the Clojure threading macros (-> and -->) that were mentioned in the original post were largely superseded by transducers when transducers were introduced to Clojure. (That’s just a guess based on my limited knowledge of transducers in Clojure. I’m not a Clojure programmer.)

I need to study Clojure transducers and Transducers.jl more, but my current understanding is that the technically correct way to combine transducers is via function composition. So the piping syntax above is a bit of a pun. Clojure uses compose for this. For some reason, Transducers.jl uses “opposite compose”, rather than regular composition. A Unicode operator for “opposite compose” is provided, so the above example can be written like this:

julia> t = Map(sqrt) ⨟ Filter(iseven);

julia> collect(t, [4, 9, 16])
2-element Vector{Float64}:
 2.0
 4.0
3 Likes

I really need to take a closer look at transducers. I used them once to filter a lot of data read from big CSV files but I didn’t really understand how they were different from iteration

3 Likes

Can’t wait for the inevitable Transducers.jl x Diffractor.jl crossover episode

3 Likes