Seeking feedback on Chainables before registering it

Hi all

I think Chain.jl is a phenomenal package.

I’ve submitted a package called Chainables.jl to the General registry, and a registry maintainer suggested I bring up the idea here to get feedback and thoughts.

The goal of the package is to make it easier to keep @chain pipelines going and improve readability in cases where functions expect function/lambda arguments first.

Chainables provides macro versions of common functions with reversed argument order, so the data can appear first in the pipeline. For example:

map(f, x)  →  @map(x, f)

This allows functions to be used more naturally inside @chain blocks, and there are utilities to make other functions more @chain-able as well.

Repo:

Example

Current way

@chain 1:10 begin
   zip(21:30)
   collect
   filter(t -> t[2] <= 25, _)
   map(t -> t[1], _)
end

Chainables way

@chain 1:10 begin
    zip(21:30)
    collect
    @filter @unpack (a, b) -> b <= 25
    @map @unpack (a, b) -> a
end

# can use Iterators.filter instead
@chain 1:10 begin
    zip(21:30)
    @filteriter @unpack (a, b) -> b <= 25
    @map @unpack (a, b) -> a
end

I’m mainly interested in hearing:

  • whether people are interested in this kind of utility
  • whether similar functionality already exists elsewhere in the ecosystem

One design choice worth mentioning is that the macros intentionally use the same function names, e.g. map becomes @map.

Keen to hear your thoughts!

2 Likes

Hi @TyronCameron , I’m the maintainer of FunctionChains.jl. I wonder if there might be some synergy potential here, like (optionally) generating FunctionChain instances from chains with a compatible structure?

Hi @oschulz, thanks for reaching out. I haven’t dabbled much with FunctionChains.jl but I strongly think that function composition is the most important part of coding.

Synergy is possible. fchain is of course already compatible with @chain in some sense because one can write

@chain foo fchain(bar) # foo first, then bar

If one wanted a similar style of pleasant code, one could do

# In the package
#---

using FunctionChains: fchain 

macro fchain(expr...)
    function_list = Expr(:tuple, reduce(vcat, _flatten_functions.(expr))...)
    quote
        $fchain($(function_list)...)
    end |> esc 
end 

function _flatten_functions(expr)
    expr isa Expr && expr.head ∈ (:tuple, :block) && 
        return reduce(vcat, _flatten_functions.(expr.args))
    (expr isa Symbol || expr isa Expr) &&
        return [expr]
    return []
end

# Usage
#---

foo(x) = x^2 
bar(x) = 3x

foo_then_bar = @fchain begin
    foo 
    bar 
end

another_foo_then_bar = @fchain foo bar 

@assert fchain(foo, bar)(4) == foo_then_bar(4) == another_foo_then_bar(4) == 48 

(could neaten that up / make it more robust, but just the idea stands for now).

In general, I’m supportive of the concept of adding something like that.

I think in Chainables.jl as it stands, I’ve focused on manipulating the inputs to functions – putting them into different argument positions, allowing function arguments to be written in do-block syntax, and packing and unpacking args from tuples / zips.

The reverse idea has been mostly untouched, which is how to combine functions (fchain), pack and unpack functions (fcprod, I think?), and I like the with_intermediate_results touch!

I probably need to learn a bit about FunctionChains.jl itself (and consequently understand if there’s benefit in synergy).

What do you think?

Totally share the overall goal of making data manipulation more convenient in Julia :slight_smile: But creating a macro for each function one may want to use doesn’t really scale… Even for Iterators.filter you had to create another name that one needs to remember, in addition to the underlying function.

There’s already DataPipes.jl (disclaimer: I’m the author) that’s stable and has been around for ~5 yrs.
DataPipes is specifically designed to be a lightweight code transformation that makes all common data manipulation functions in Julia basically boilerplate-free. Not just Base.xxx, or Iterators.xxx, but all functions that follow the Julian argument order:

With DataPipes.jl:

@p let
   1:10
   zip(21:30)
   collect
   filter(_[1] <= 25)
   map(_[2])
end

Same with Iterators.filter, of course:

@p let
   1:10
   zip(21:30)
   Iterators.filter(_[1] <= 25)
   map(_[2])
end

The transformation is purely syntactic – basically, just two operations, pass the previous step result and transform _ into lambda. No special handling of any functions ever!

2 Likes

Ah, maybe is misunderstood a bit - I thought @chain can also build function objects, that still need to be given an input, and wondered if those might not be representable as FunctionChain instances. But @chain produces values, not functions, correct?

Thank you for this announcement! What I was a little bit concerned about when I saw the package initially is a general question that I’ve now formulated in

On a slightly closer view at your README, I don’t think you’re actually modifying the behavior of macros defined by other packages. Is that correct? If so, I have no concerns about the package in principle. Of course,

is still a great question to discuss here, and I’ll leave it to you and everyone else in this thread to figure that out. At the end of the day, I’d follow your best judgement as to whether Chainables has a place as an independent package in its current or any modified form; so when this discussion has run its course and you’d like me to unblock the registration PR, just tag me there!

1 Like

:100:

That does not require a macro, though. For example, to reverse the argument order of a function f, simply replace f by splat(f) ∘ reverse ∘ tuple.

Then why use macros instead of function composition?

Some criticism of the Chainables.jl code

There are more issues I did not bring up, I lack the time right now.

pack is just tuple

In Chainables.jl, a function pack is defined like so:

pack(args...) = args

However you might as well have defined it as:

const pack = tuple

Or, even better, use tuple and forget about pack.

::Function constraints are not desirable

In some places in Chainables.jl I see the ::Function constraint placed on a method argument type, such as here:

Do not do that, it is an unnecessary and arbitrary limitation. Function is not special in Julia, any type may be callable, as long as someone defines a method for it.

Abstractly typed fields are not what you want

Do not do this:

In most cases you want the field types to be concrete.

This is covered in the Performance Tips in the Manual:

Partial is mostly just Base.Fix1

Probably you want to use Base.Fix1 instead of Partial, or at least implement partial/Partial by relying on Base.Fix.

Closure capture boxing and inference failure

The above code surely has bad performance and I think it can not have good inference. state gets captured by the closure and mutated, so it has to be boxed, and I suppose (did not check) it is also inferred as Any (worst-case inference, Any is the top type, the supertype of all types).

2 Likes

Hey @aplavin , had a quick look and it looks like a really well-designed package.

Personally I’ve always liked Chain.jl because when it’s used the intent is very explicit – it documents itself (i.e. you immediately know what @chain means and where it comes from). DataFramesMeta.jl is also a chef’s kiss in that regard – it uses Chain.jl and I find myself using it a lot (exploiting the symbols notation for column headers, and so on).

To clarify, I think that DataPipes.jl is a great alternative to Chain.jl(both use underscore syntax & lambdas to pass values through to the next function).

Chainables.jl, on the other hand, is a connector pack for Chain. It’s about extending other functions to be more compatible / readable. Compatibility is provided through @rev (which solves the same problem everywhere), and the macros generated with @chainable are just nice shorthand (for the common functions. Of course, I’ve exported the @chainable macro.

I know it’s really a small thing, but Chainables tries to avoid underscores (which by their nature signal information which is to be put out of mind). [Please forgive me if there is a trick I am not aware of!]

In DataPipes:

import DataPipes

@p "Hello world!" begin
    replace(__, " world!" => "")
end

This is the same thing in Chain.

using Chain

@chain 1:10 begin
    map(x -> x^2, _)
end

With the Chainables layer on top of Chain, the core logic is isolated, which I think aids readability.

using Chainables

@chain 1:10 begin
    @map x -> x^2
end

I’m not a seasoned package dev, but I really love Julia, so keen to get your further thoughts on this. :slight_smile:

Hi @oschulz that’s correct. @chain (which is from Chain.jl which I was not involved with) just passes information from one function to the next in a nice readable way (allowing you to choose which argument you pass the next value into). I just really like Chain.jl and want more things to look more readable when using it. :slight_smile:

Thanks for sharing! I think the most important part is function composition in the mind of the programmer. A core goal is maximum readability of code. Would be interested if a splat(f) ∘ reverse ∘ tuple formulation could achieve that same thing.

I really appreciate the feedback, thanks for providing it.

pack is just tuple – you make a fair point. I’m not opposed to dropping it and relying on the existing functions. pack and unpack have a nice ring to them (clear that they’re opposites), but I agree there’s probably a greater good to be had.

The ::Function constraints: I was not aware, thanks for highlighting.

Abstractly typed fields: fair enough point, I can neaten that up.

Partial is mostly just Base.Fix1 – wasn’t aware of Base.Fix. Looks like an awesome addition to Base. Will need to play with it to see how closely the intention aligns, but high level I would be keen to remove repeated logic. Might keep the syntactic sugar if it’s viable since that’s what this package is about.

Closure capture boxing – this is mainly a performance concern, which is fair. I think it’s mainly in the Partial construction itself, which I don’t think does a lot of work. The point stands, though. Why have slow code if we can have nice fast code? :slight_smile: I’ll have a review of this and the type inference when I have a look at using Fix.

1 Like

Is @pipe less clear? DataPipes provides @pipe – I always use @p myself because it’s less visual noise, but @pipe is also there.

This package is DataFrames.jl-specific, right? Haven’t used them for years, but don’t see the connection here aside from the fact that they happen to use Chain.jl

Slight correction: I’m pretty sure DataPipes doesn’t use lambdas to pass values through – it simply transforms the syntax by introducing a bunch of temporary intermediate variables.

The same thing is

@p let
    1:10
    map(_^2)
end

in DataPipes.jl. Or one-line: @p 1:10 |> map(_^2) for short pipelines.

And then, say you want to use something from Iterators. With DataPipes: just use Iterators.filter With Chainables: need to find and remember another name.
Further, what about functions from Itertools and wider Julia data manipulation ecosystem? no luck with Chainables, I guess…

Thanks for your view on the matter. Piping is a good topic to dig into.

Worth noting:

@chain 1:10 begin
    @rev Iterators.filter(x -> x > 5) # Or any other function from any package
    collect
end

Also available:

foo(f, x) = f(x^2) # Putting the function arg first 
@chainable foo # generates @foo

@chain 2 begin
    @foo x -> x^2
    isequal(16)
    @assert 
end

To be perfectly honest, I feel like developing a yet firmer grasp of Julia should be a blocker for registration here. From the issues I point out above it seems clear that you are as of yet somewhat of a newbie to Julia, @TyronCameron. It very well might happen that if you were to approach the same problems again in a few months, you might come up with a completely different package design, due to the different perspective as you gain more experience and understanding.

Chainables is a really nice name, basically it is prime real estate in the General registry, and once a package is registered, the package name is more-or-less used up as far as the General registry is concerned. I feel we, the Julia community, do not really do enough to protect the common good that is the General registry name space.

Have you considered/tried just contributing to Chain.jl, instead of registering a new package?

Yet more concrete criticism

unpack

Single-argument unpack is just splat.

vectorise

vectorise is just broadcast.

Global variable used instead of a constant

You want this instead:

const var"@∂" = var"@partial"

Reflection abuse

Is Partial meant to be used outside macros? If so, it is not efficient to depend on reflection (methods) at run time. In any case, using reflection like that is not robust.

2 Likes

For what that is worth, here is an alternative approach, using actual function composition and without depending on any macro or local or anonymous function:

collapsible
F = Base.Fix
f = F{2}(<=, 25) ∘ F{2}(getindex, 2)
g = (
    F{1}(map, first) ∘
    F{1}(filter, f) ∘
    collect ∘
    F{2}(zip, 21:30)
)
g(1:10)

This happens to avoid naming arguments of the created callables, which is known as the point-free style or as tacit programming:

Tacit programming - Wikipedia

Some arguments in favor of this style and against the style as chosen here and in Chain.jl:

collapsible
  • One benefit of avoiding macros is easier interoperability with other functional code, as it’s not possible to compose (with or with Base.Fix or similar) a function with a macro.

  • In the case of some macros, a constraint is that they increment world age, that is, some macros are not appropriate to be used in a scope local to a function. Making macros less general.

  • One benefit of avoiding anonymous and local functions is nicer names, relevant for:

    • display in the REPL

    • debugging

    • profiling

    • Cthulhu.jl, etc.

  • Another benefit of avoiding macros and avoiding local functions is less total types loaded in a Julia session. TBH I don’t know if this would make a tangible difference in compiler latency, however perhaps it would given a lot of loaded packages. However a clear drawback of local functions is that dispatch (or other type system-based logic) can not tell that Base.Fix2(==, 3) and x -> x == 3 behave the same.

  • I feel the strongest drawback to macro use (except places where macros are actually essential) is the introduction of a new syntax. A human reading code that uses macros has to learn the new syntax. Someone has to teach the new syntax to tooling, too.

1 Like