Fixing the Piping/Chaining Issue

Counterproposal, rewriting yours:

Example Possible Autocomplete Behavior

Let’s create an object df = DataFrame(a=1, b=2). When I type

df |> 

I should see a (very very long) list of methods appear: those which specialize on typeof(df), followed by methods which specialize on supertype(typeof(df)), followed by supertype(supertype(typeof(df))), etc., sorted by degree of specialization. The list is about two thousand entries long, something like this:

df |>
  append!(DataFrame,
  copy(DataFrame;

  ⋮ other methods of `DataFrame`
  ⋮ (vdots here to shorten my explanation)

  Array(AbstractDataFrame)
  ==(AbstractDataFrame,
  (Matrix)(AbstractDataFrame

  ⋮ other methods of `AbstractDataFrame`

  ArgumentError(msg)
  AssertionError(msg)
  BoundsError(a)

  ⋮ other methods of `Any`

And then I can scroll down the list to find what I’m looking for. One neuron fires in my brain and I remember that the first character is a p. So I type p and I see:

df |> p
  pop!(DataFrame)
  popat!(DataFrame,
  popfirst!(DataFrame)
  prepend!(DataFrame,
  push!(DataFrame,
  pushfirst!(DataFrame,

I think maybe for first level we should show two levels, typeof(df), but not followed by supertype(typeof(df)) rather disregarding the distinction (unless the supertype is Any?), and order alphabetically as one group?

2 Likes

Okay so since working through that example I basically did all the work needed to figure out how to it anyway, I threw together a quick and simple script that does what I described for a simple autocomplete (you knew I would, didn’t you :triumph:)

Code here

function propose_method_autocompletions(obj, func_name_fragment::String=""; 
    type_depth = 1:typemax(Int), only_same_module = false, 
    prioritize_firstlast = false, github_inference = false, personalized_inference = false)::Vector{Method}

    @assert first(type_depth) ≥ 1 && last(type_depth ≥ 1)
    recs = Vector{Method}[]
    get_type_at_depth(type, depth=1) = depth == 1 ? type : type == Any ? nothing : get_type_at_depth(supertype(type), depth-1)

    for i ∈ type_depth
        stype = get_type_at_depth(typeof(obj), i)
        isnothing(stype) && break

        meths = filter(only_same_module ? methodswith(stype, parentmodule(typeof(obj))) : methodswith(stype)) do m
            length(func_name_fragment) > length(string(m.name)) && return false
            string(m.name)[1:length(func_name_fragment)] == func_name_fragment
        end

        prioritize_firstlast || true # do cool sorting stuff, add later
        github_inference || true # do cool sorting stuff, add later
        personalized_inference || true # do cool sorting stuff, add later

        recs = [recs; meths]
    end

    recs
end

To invoke:

using DataFrames
df = DataFrame(a=1, b=2)

propose_method_autocompletions(df)

By default, it returns all available methods that can act on object df. As you start to type in a function name, the list gets narrowed down quickly:

propose_method_autocompletions(df, "p")
propose_method_autocompletions(df, "pu")
propose_method_autocompletions(df, "pushfirst!")

You can also change the search depth. By default, it searches specializations on typeof(df), as well as supertype(typeof(df)), supertype(supertype(typeof(df))), and so on. This can be changed like so:

propose_method_autocompletions(df, "p"; type_depth=1:2)

This gives methods of DataFrame as well as AbstractDataFrame, but not Any. The iterator can also be reversed 2:-1:1 to show the abstract type’s methods first.

You can also restrict the search to only methods that are defined in the same module as DataFrame:

propose_method_autocompletions(df, "p"; only_same_module = true)

And, one day, it’ll work with the arg_order_inference, github_inference and personalized_inference options too. :wink:

Someday it could be interesting to make it feed its results into a Channel, so that it can offer results with greater immediacy.

Unfortunately it doesn’t solve *my* problem, because the module didn’t export its methods so they don’t appear in methodswith :sweat_smile:

Oddly, when calling methodswith(DataFrame, DataFrames) (trying this to see if it improves speed over filtering *after* the methodswith call), it simply doesn’t return a bunch of methods of DataFrame that are indeed declared by the DataFrames module. For example, pushfirst! is missing. Strange.

3 Likes

Yes I see. It’s very funny that we’ve done the opposite things here in terms of precedence, but with the end result being almost the same from the user’s perspective :laughing:

I think /> having high precedence would be surprising to anyone who’s trying to use it for piping. For example, what does this mean?

x /> Mod1.g[1].h(y)(z)
  • Low precedence: Mod1.g[1].h(y)(x, z)
  • High precedence: Mod1.g[1].h(x,y)(z) … presumably?

Couple of thoughts here:

  • Is there a flexible lowering which might allow pipelines of /> to be optimized more completely. For example, turning x \> map(_^2) \> reduce(+) into mapreduce(_^2, +, x). In particular I’m thinking of how transducers are a lot better for compilation than iterators, as described here Comparison to iterators · Transducers.jl and that we could make use of the iterator-transducer duality if we could give the compiler the right insight. @tkf :slight_smile:
  • What about first class pipelines without data such as \> map(isodd) \> filter(_>10)? Well ok I guess these can be lowered to a lambda x -> filter(y->y>10, map(isodd, x))
2 Likes

To answer my own question, yes! The fixbutfirst/fixbutlast lowering already allows this. Adding the following definition to chain will turn a map \> reduce pipeline into a call to mapreduce:

function chain(x, f1::FixButLast{typeof(map)}, f2::FixButLast{typeof(reduce)}, fs...)
    chain(x, fixbutlast(mapreduce, f1.args..., f2.args...; f1.kwargs..., f2.kwargs...), fs...)
end

Presumably this is not the best such lowering for such transformations, as this is a fairly special case. But it does give a tantalizing taste of possibility.

2 Likes

Great, thanks for the implementation!

A nice thing is that such autocomplete can be added without any changes to julia syntax: just expand it to obj |> it->func(<cursor here>, it) instead of obj |> func(<cursor here>, _). So, it can already be implemented in a package using eg ReplMaker.

I tried your propose_method_autocompletions, and some kind of sorting would definitely help.
For a simple table, tbl = [(a=1,)], the first suggestions without typing the method name are:

ulia> propose_method_autocompletions(tbl)
[1] ArgumentError(msg) in Core at boot.jl:325
[2] AssertionError(msg) in Core at boot.jl:341
[3] BoundsError(a) in Core at boot.jl:277
[4] BoundsError(a, i) in Core at boot.jl:278
[5] ConcurrencyViolationError(msg) in Core at boot.jl:290
[6] DomainError(val) in Core at boot.jl:296
[7] DomainError(val, msg) in Core at boot.jl:297
[8] ErrorException(msg) in Core at boot.jl:267
[9] InexactError(f::Symbol, T, val) in Core at boot.jl:318
[10] InitError(mod::Symbol, error) in Core at boot.jl:354
[11] InitError(mod, error) in Core at boot.jl:354
[12] LineNumberNode(l::Int64, f) in Core at boot.jl:407
[13] LoadError(file::AbstractString, line::Int64, error) in Core at boot.jl:348
[14] LoadError(file, line, error) in Core at boot.jl:348
[15] MethodError(f, args) in Core at boot.jl:338
[16] MethodError(f, args, world::UInt64) in Core at boot.jl:335
[17] (::Type{T})(itr) where T<:Tuple in Base at tuple.jl:317
[18] NamedTuple(itr) in Base at namedtuple.jl:123
[19] OverflowError(msg) in Core at boot.jl:321
[20] Pair(a, b) in Core at boot.jl:825
...

None of these methods are commonly used on tables/collections.
Typing propose_method_autocompletions(tbl, "push") does find the push! function of course, but still doesn’t show the expected push!(tbl, ...) method among the first suggestions. The first is push!(s::Set, x), so it wants to put tbl as the 2nd argument.

Autocomplete based on the ideas from this topic can already be implemented in a package, and would be great if it turns out actually useful! But for now I’m skeptical, given a huge amount of methods specifying few to no argument types.

3 Likes

Oops! It appears that calling methodswith on a parameterized type doesn’t give the methods of the unparameterized type! My autocomplete was failing to find any methods of Vector because it was being parameterized with the type of its contents.

See this example.

julia> struct MyThing{T} end

julia> foo(::MyThing) = 1
foo (generic function with 1 method)

julia> methodswith(MyThing)
[1] foo(::MyThing) in Main at REPL[157]:1

julia> methodswith(MyThing{1})


julia> bar(::MyThing{1}) = 2
bar (generic function with 1 method)

julia> methodswith(MyThing)
[1] foo(::MyThing) in Main at REPL[157]:1

julia> methodswith(MyThing{1})
[1] bar(::MyThing{1}) in Main at REPL[160]:1

As a result, because an object such as [(a=1,)] is of type Vector{NamedTuple{(:a,), Tuple{Int64}}}, none of the methods of Vector appear.

I’ve rewritten it so it’ll detect if there are no methods of the parameterized type, and if so, give methods of the unparameterized type (with special case handling for vectors). I think it’ll have poor behavior in some particular case, but for a quick hack I think it’s not too bad. Might consider doing something more sophisticated later. find methods of the parameterized type, and append methods of the unparameterized type with special case handling for vectors and matrices.

Here's the code now.

function propose_method_autocompletions(obj, func_name_fragment::String=""; 
    type_depth = 1:typemax(Int), only_same_module = false, only_imported_modules = false,
    arg_order_inference = false, github_inference = false, personalized_inference = false)::Vector{Method}

    @assert first(type_depth) ≥ 1 && last(type_depth) ≥ 1
    recs = Vector{Method}[]
    get_type_at_depth(type, depth=1) = depth ≤ 1 ? type : type == Any ? nothing : get_type_at_depth(supertype(type), depth-1)

    for i ∈ type_depth
        stype = get_type_at_depth(typeof(obj), i)
        isnothing(stype) && break

        meths = methodswith(stype)
        if !(stype isa UnionAll) && length(stype.parameters) > 0
            if stype <: AbstractVector # special case handling for vectors
                meths = [meths; methodswith(getfield(parentmodule(stype), nameof(stype)){T,1} where T)]
            elseif stype <: AbstractMatrix
                meths = [meths; methodswith(getfield(parentmodule(stype), nameof(stype)){T,2} where T)]
            end
            meths = [meths; methodswith(getfield(parentmodule(stype), nameof(stype)))]
        end

        meths = filter(meths) do m
            length(func_name_fragment) > length(string(m.name)) && return false
            string(m.name)[1:length(func_name_fragment)] == func_name_fragment || return false
            only_same_module && (parentmodule(typeof(obj)) == m.module || return false)
            # How to detect only modules that have been explicitly imported?
            true
        end

        arg_order_inference || true # do cool sorting stuff, add later
        github_inference || true # do cool sorting stuff, add later
        personalized_inference || true # do cool sorting stuff, add later

        recs = [recs; meths]
    end

    recs
end

Calling it on the object of [(a=1,)]:


julia> propose_method_autocompletions([(a=1,)])
[1] CapturedException(ex, bt_raw::Vector) in Base at task.jl:12
[2] \(A::LinearAlgebra.Transpose{<:Complex, <:LinearAlgebra.Hermitian{<:Complex, <:SparseArrays.AbstractSparseMatrixCSC}}, B::Vector) in SparseArrays at C:\Users\unime\AppData\Local\Programs\Julia-1.8.0\share\julia\stdlib\v1.8\SparseArrays\src\linalg.jl:872
[3] \(L::SuiteSparse.CHOLMOD.FactorComponent, b::Vector) in SuiteSparse.CHOLMOD at C:\Users\unime\AppData\Local\Programs\Julia-1.8.0\share\julia\stdlib\v1.8\SuiteSparse\src\cholmod.jl:1521
[4] append!(a::Vector, items::AbstractVector) in Base at array.jl:1105
[5] deleteat!(a::Vector, i::Integer) in Base at array.jl:1485
[6] deleteat!(a::Vector, r::AbstractUnitRange{<:Integer}) in Base at array.jl:1491
[7] deleteat!(a::Vector, inds::AbstractVector{Bool}) in Base at array.jl:1590
[8] deleteat!(a::Vector, inds::AbstractVector) in Base at array.jl:1533
[9] deleteat!(a::Vector, inds) in Base at array.jl:1532
[10] empty!(a::Vector) in Base at array.jl:1738

⋮

[3209] groupview(f, X; restype, kwargs...) in FlexiGroups at C:\Users\unime\.julia\packages\FlexiGroups\1ItB2\src\base.jl:41

Beyond that, for truly generic methods that don’t specialize on anything, whose behavior can be thought of as really generalizable across any object, they shouldn’t float to the top of such a simple autocomplete anyway; if a method is general enough not to specialize on any type, it’s probably general enough to have memorized.

It’s possible to make them float to the top anyway, using a sorting technique based on statistical inference of a model fitted to a codebase (imagine if you could train your autocomplete to your own codebase!), but I’m not going to make that at the moment.

Without such inference built-in, for methods to gain visibility, they need to be specialized to such types as AbstractArray, AbstractDict, or etc.

3 Likes

This would be super cool. IMHO compiler tooling should do a lot more of this kind of thing. I want some kind of data-driven approach to the JuliaSyntax.jl diagnostics system. What exactly, I’m not sure yet :slight_smile: I wish for some happy medium between traditional compilers (where all the diagnostic messages are hand-coded at huge engineering effort and are still hard to interpret), vs hugely bloated natural language models which seem hard to train and deploy, but can perform amazing feats of inferring what the user intended.

2 Likes

I think this is a relatively straightforward problem, conceptually anyway. One would wish to collect the frequencies in the codebase with which Tᵢ typed objects are fed to mⱼ methods. Then, when autocompleting methods for a Tᵢ object, simply sort the methods by order of call frequency.

I think this would probably be an optional mode: for example, one mode of autocomplete might be to sort based on type specialization; another mode might be to sort by frequency of use in your personal codebase; another mode might be to sort by frequency of use in all GitHub repos.

I think it’s probably decent to use the GitHub repo method use frequencies as a starting point, and then do some sort of weighted average with the frequencies they’re used in personal use. That way you don’t overfit to your own codebase.

It’s also possible to find the frequencies associated with certain packages being loaded (e.g., somebody who has loaded LinearAlgebra is more likely to call qr), and do a weighted average of the frequencies associated with the packages you’ve loaded in your project to get an autocomplete that’s more likely to be perfect for your specific use case. The possibilities are endless.

Indeed, although as we’ve shown here, compile times for lambdas can get prohibitive if it’s done a lot, sometimes two orders of magnitude greater compile time than the alternative. Much better to have dedicated syntax to make a partial application functor.

It’s also simply cleaner to have dedicated syntax.

I think the piping and the autocomplete are really orthogonal, though some might prefer certain syntax as a UI to autocomplete.

Imagine something like vscode if you could go to the Julia workspaces panel and just right click a variable and ask it to “explore methods for this variable” maybe it pops up a new panel with methods and you can filter them by characters you type into a search box as well as select which modules you want to search etc etc.

4 Likes

Cool idea!

I like that we’ve moved the conversation from “autocomplete is infeasible” to “this is how I would do it.” :wink:

In the same way that . dot-syntax is syntax sugar that motivates tab-complete for property names, a proper chaining syntax motivates tab-complete for method names. Hopefully this should make idiomatic Julian style easier and more accessible.

I don’t think people claimed “autocomplete is infeasible”, but we did give reasons for why the syntax proposal doesn’t really help the core problem and that the core problem can be solved without new syntax (as you’ve shown, by implementing a rudimentary autocomplete yourself).

I’m also pretty sure that LanguageServer.jl already has that feature, provided it knows the type of the object. That is a requirement that hasn’t changed, as your code literally has typeof(obj) in it, which is not necessarily something an editor can do because the editor would potentially have to run your code to figure that out, which you surely don’t want (or run type inference on all of your code, which runs into the problem of “partial inference is not implemented”).

Not every piece of code will be executed in the REPL, where the object already exists (possibly…).

1 Like

My point was that the actual core problem was one of motivation: people need a reason to work on an autocomplete, hopefully beyond just being an academic pursuit. Which syntax sugar provides. In the same way, somebody was motivated to write an autocomplete for . but not getproperty, despite the former being just sugar for the latter, in the confidence that the autocomplete on . would find heavy use and the effort would not go to waste.

With sufficient motivation, somebody smarter than me will figure this out I’m sure. In the meantime, I’d be pretty happy to have autocomplete in REPL mode.

2 Likes

I don’t think it has yet been mentioned that many of the pipes in this thread can be implemented without a macro by using the Transducers.jl package. For example:

julia> using Transducers

julia> [4, 9, 16] |> Map(sqrt) |> Filter(iseven) |> collect
2-element Vector{Float64}:
 2.0
 4.0

In fact, I suspect that the Clojure threading macros (-> and -->) that were mentioned in the original post were largely superseded by transducers when transducers were introduced to Clojure. (That’s just a guess based on my limited knowledge of transducers in Clojure. I’m not a Clojure programmer.)

I need to study Clojure transducers and Transducers.jl more, but my current understanding is that the technically correct way to combine transducers is via function composition. So the piping syntax above is a bit of a pun. Clojure uses compose for this. For some reason, Transducers.jl uses “opposite compose”, rather than regular composition. A Unicode operator for “opposite compose” is provided, so the above example can be written like this:

julia> t = Map(sqrt) ⨟ Filter(iseven);

julia> collect(t, [4, 9, 16])
2-element Vector{Float64}:
 2.0
 4.0
3 Likes

I really need to take a closer look at transducers. I used them once to filter a lot of data read from big CSV files but I didn’t really understand how they were different from iteration

3 Likes

Can’t wait for the inevitable Transducers.jl x Diffractor.jl crossover episode

3 Likes

Transducers seem to create some Functors structs and then composition is by opcompose and that relies on Julia to do smart inlining? Because of the way they are composed it winds up producing some extremely efficient code is that a correct understanding?

Nope, it’s actually not very reliant on a ‘sufficiently smart compiler’, it’s instead reliant on defining data traversal in terms of folds rather than iteration.

So for instance, if you write

f(x) = 2x
foldl(+, Iterators.map(f, filter(iseven, input))) 

with iterators, this becomes something like

function map_filter_iterators(xs, init)
    ret = iterate(xs)
    ret === nothing && return init
    acc = init
    @goto filter
    local state, x
    while true
        while true                                    # input
            ret = iterate(xs, state)                  #
            ret === nothing && return acc             #
            @label filter                             #
            x, state = ret                            #
            iseven(x) && break             # filter   :
        end                                #          :
        y = 2x              # imap         :          :
        acc += y    # +     :              :          :
    end             # :     :              :          :
    #                 + <-- imap <-------- filter <-- input
end

whereas the transducers equivalent

foldl(+, Filter(iseven) |> Map(f), input, init=0)

becomes

function map_filter_transducers(xs, init)
    acc = init
    #              input -> Filter --> Map --> +
    for x in xs  # input    :          :       :
        if iseven(x)  #     Filter     :       :
            y = 2x    #                Map     :
            acc += y  #                        +
        end
    end
    return acc
end

This doesn’t rely on smart inlining or the compiler being clever at all to get that algorithmic shape.

2 Likes

I think it’s a little different from that. A transducer is supposed to be a function from a reduction function to a reduction function. In other words, it should have a signature like this:

((acc, item) -> acc) -> ((acc, item) -> acc)

I was working on an example that I was hoping would demonstrate the advantage of transducers, but unfortunately Transducers.jl seems to perform quite poorly for this example. Here it is anyways:

If we compose a mapper and a filterer, the composed function will create an intermediate vector. However, I think in theory if you compose Map and Filter, there shouldn’t be an intermediate vector. Here’s the Base Julia composition approach:

mapper = xs -> map(x -> 2x, xs)
filterer = xs -> filter(>(100), xs)
f = filterer ∘ mapper

And here’s the Transducers.jl approach:

t = Map(x -> 2x) ⨟ Filter(>(100))

And here’s the benchmark where, unfortunately, Transducers.jl does quite poorly:

julia> @btime f(1:1000);
  1.124 μs (2 allocations: 15.88 KiB)

julia> @btime collect($t, 1:1000);
  10.261 μs (957 allocations: 51.61 KiB)

I could imagine Transducers.jl having more than 2 allocations if it is pushing the result into an initial empty vector, but 957 allocations is too many.

1 Like

The problem here is collect. Generally, Transducers.jl performs very poorly with collect. What it really excels at is doing a reduction over the process. So compare these:

julia> xs = collect(1:100);

julia> @btime sum(Iterators.filter(>(100), Iterators.map(x -> 2x, $xs)))
  75.283 ns (0 allocations: 0 bytes)
7550

julia> @btime sum(filter(>(100), map(x -> 2x, $xs)))
  110.717 ns (2 allocations: 1.75 KiB)
7550

against

julia> using Transducers

julia> @btime sum($xs |> Map(x -> 2x) |> Filter(>(100)))
  33.505 ns (0 allocations: 0 bytes)
7550

collect could be improved with transducers, but doing so is pretty hard because Base.collect on iterators is pretty optimized.

4 Likes