Mutable struct vs closure

I had written a post about how anyone who goes poking around in other people’s structs gets what’s coming to them, but I withdrew it after reading this post.

I’m still not convinced that privacy needs to be enforced by the language (nor how it could be, given multiple dispatch, unless it were at a module level), but, as a software developer, I sometimes forget that the Julia community is dominated by scientists who, according to the cliché (… and according to the links you provided…), are not particularly concerned with best practices.

There’s a quote from Larry Wall that I like:

In Perl 6, we decided it would be better to fix the language than fix the user.

I guess this is especially applicable in languages where the primary users are more interested in a program’s output than its design.

2 Likes

@tomtom This looks like an iterator pattern. I ended up with something pretty similar when I was doing simulation in a functional programming language. You can do some fun stuff if you formalize it into the languages iterator interface, like onesecond_sim = take( sim, 30). You get undo and evolvable snapshots for free, which is really handy. Julia might not do this so well though, since some common data structures are not immutable, i.e. Arrays.

1 Like

^ it’s the point

so far, without changing the language, I see two workarounds:

  1. setting Base.getproperty() - although one can still get around by getfield(), at least it would make the thing more conscious
  2. using functional/closure approach, e.g. in my last function Closure() above. Actually, (as far as I know) there’s NO way to set/get the captured variables “directly” by skipping the closures given.

In essence, the “issue” of flexibility/encapsulation tradeoff, I think, is due to the unique, strong and attractive feature of julia: multiple dispatch. With multiple dispatch, a method typically dues with several types at the same time, so that there’s no “belongings” of a method to a particular type.

Finally, the type system in julia is very “free” in the sense that anyone could define any method to work with existing types. It’s very good, however, at the same time rises some “best practices” issues (in sense of computer science).

indeed, it’s a big problem in terms of “clean code” standards… but at the same time, these content-mutable data structures are also so useful… does there exist a perfect solution??? @_@

Like everything in programming you’ll have tradeoffs. If you wanted to you could operate Julia in immutable mode. Use GitHub - JuliaCollections/FunctionalCollections.jl: Functional and persistent data structures for Julia instead of the native data structures, write some macros to make changing single fields + copy more convenient for structs, or use PersistentHashMap instead of struct. Also the very popular StaticArrays.jl are immutable. You get full control over time evolution of your program state, parallel processing is vastly simplified and some other nice things. It’s a bit of a price to pay in performance and convenience!

I like keeping things as immutable as possible until it’s a performance issue. For numerics this usually means I have a very small number of data structures that are mutable and usually I’m not mutating them after assembling my system matrix anyways. Even then those matrices are built out of data structures that I only mutate on construction, which possibly would be worth switching to PersistentDataStructures for the nice things I mentioned before. Still I tend to try to do things that have the least friction in a language, so I use Array and Dict in practice and some domain specific immutable types .

3 Likes

I expect that as libraries mature, they will develop neat interfaces for everything and this will not be necessary, but keep in mind that many parts of Julia are exploratory and WIP (yes, even in Base, especially if one is pushing the envelope). The current situation is not ideal, but a trade-off between exploration and standardization.

I guess it is assumed to be implicitly understood that solutions using internals are fragile, but perhaps inexperienced users would be better served if these were provided with a warning, or not at all.

1 Like

in the case where you only ever need to create one single instance of the struct, I find that using an anonymous closure to encapsulate the data and defining it as const helps limit the additional load on the local namespace to only 1 additional name, while with an explicit struct declaration would add multiple names and methods to local namespace to handle both the struct, instantiation, and access

edit: so I use explicit structs intended for multiple instance semantic data, and anonymous closures for single instance data objects. this way, the user will only ever be exposed to multi-instance struct types

challenge: make a more elegant or faster implementation of this (here I used a _cache and @pure)

const binomsum_cache = ( () -> begin
        Y = Array{Int,1}[Int[1]]
        return (n::Int,i::Int) -> (begin
                j = length(Y)
                for k ∈ j+1:n
                    push!(Y,cumsum([binomial(k,q) for q ∈ 0:k]))
                end
                i ≠ 0 ? Y[n][i] : 0
            end)
    end)()

Base.@pure binomsum(n::Int,i::Int) = binomsum_cache(n,i)

if there is a much better way, I’d like to see, but I prefer to limit the load on the namespace to 1 for this

1 Like

Generally I try to make sure that closures

  1. don’t escape the scope the created them (so, in practice, I use them for partial application and similar),
  2. don’t have mutable state, or it is very simple.

Closures are indeed very powerful constructs, but the “Lambda the Ultimate …” style wizardry generally ignores whether they

  1. map to efficient compiled code,
  2. whether the resulting programs are maintainable.
julia> function closuremaker()
       x= 1
       function inc()
       x+=1
       x
       end
       inc
       end;
julia> inc = closuremaker();
julia> inc()
2
julia> inc.x.contents=5.6;
julia> inc()
6.6

Apart from one-liners (which I admit are really cool), the verbose way is

julia> mutable struct Inc <:Function
       __cnt::Int
       end

julia> Inc()=Inc(1);
julia> (inc::Inc)() = (inc.__cnt+= 1);

julia> inc = Inc()
(::Inc) (generic function with 1 method)

julia> inc()
2

This way, you have sensible field names (very nice when debugging!) and complete control over the data layout and types of everything that is closed over. There might be an argument about struct Inc __cnt::Base.RefValue{Int64}, but I don’t see it.

In many cases the compiler is smarter than you. In the specific case of closures (choosing types, layout, names for state, choosing which variables are needed) the compiler is pretty mediocre: Making the struct itself mutable instead of having the ref saves one allocation+pointer indirection. Most importantly, typing the contents as ::Int makes a very large performance difference. Part of the reason for this is that the closures are layout during lowering, which does not have access to all the cool optimizations (inference, constant prop, SSA), and the later optimization steps don’t change the layout.

julia> const __binomsum_cache = [[1]];
julia> function binomsum_cache(n::Int, i::Int)
       j=length(__binomsum_cache)
       for k=j+1:n
       push!(__binomsum_cache, cumsum([binomial(k,q) for q=0:k]))
       end
       i!=0 ? __binomsum_cache[n][i] : 0
       end

I don’t think the @pure is valid, though? Something with generated functions and precompilation?

Note that this variant is virtually the same code, but it is much easier to reason about the global mutable state of your module, in the case that precompilation, generated functions, or thread-safety blow up in your face.

Re namespace, you can isolate stuff in nested modules, not export things, or choose scary names, e.g. beginning with double underscore (my adhoc way). But I would like some naming convention for “don’t touch, really don’t” internals that is officially endorsed, used in Base and supported by linters.

Cool! I wonder how that works under the hood (on the JVM level; no big fan of java, but there is a plethora of languages targeting the JVM).

3 Likes

It is mathematically valid as @pure, since the output is always the same for each n and i.

This is needed so type-computations requiring this value can be pre-allocated without a function call.

Now if the user changed the array by modifying it, then it would not be @pure anymore and break it.

To protect its behavior as @pure, the user needs to be kept away from modifying contents of the array

Fair enough, and I would also put @pure, since it behaves like a pure function.

const binomsum_cache = [[1]]
Base.@pure function binomsum(n::Int, i::Int)
    for k=length(binomsum_cache)+1:n
        push!(binomsum_cache, cumsum([binomial(k,q) for q=0:k]))
    end
    i ≠ 0 ? binomsum_cache[n][i] : 0
end

Previously, I did not use the @pure, so originally it was only one name, but with @pure it is 2 names anyway, and this way it is also 2 names, so I suppose this better.

From what I understand @pure doesn’t just mean the regular definition as you’ve stated it. See https://github.com/JuliaLang/julia/pull/24817#issuecomment-381153462

A previously suggested docstring for @pure was

Annotates a function to ease type inference by strictly specifying the
output is completely determined by the input.
Usage can easily lead to whole program corruption or crashes and should be avoided
by all users.
Do not use if the function involved:

  • Involves globals, pointers
  • Does not return exactly (===) the same result for the same input
  • Gets its methods extended after it is called
  • Uses dispatch on any of its arguments
3 Likes

That’s not what @pure means, unfortunately.

Afaik @pure also asserts to the compiler that the function is side-effect free, which is not true in your case. The compiler also cares about side-effects that are invisible to users that adhere to the published API. This specific case might still be OK in practice, but you are lying to the compiler. Hell if I know how @pure, serialization of your module’s global state (during precompilation), multithreading, etc all interact, now and in the future. And yes, your function is decidedly non-thread-safe: Two threads resizing the cache at the same time can probably segfault.

PS: Also everything @dawbarton said.

3 Likes

(as illustrated in my previous example) we need several tricks to build “good” closure. Following your example:

julia> inc.x
Core.Box(6.6)
julia> getfield(inc, :x)
Core.Box(6.6)

the captured variable x is a Core.Box, which is bad.

Now, we need the following tricks:

  1. even for scalar, use Array (of length 1) to store captured variables.
  2. enclose the closure definition by local <closure_name>, let <captured_variable> = <captured_variable> and end
function closuremaker2()
    x = [1]   # trick 1: to avoid Core.Box, even scalar captured variable needs to be store in an Array

    local inc2   # trick 2
    let x = x    # trick 2
    function inc2()
        x[1] += 1    # trick 1
        x[1]     # trick 1
    end
    end    # trick 2

    inc2
end

julia> inc2 = closuremaker2()
inc2 (generic function with 1 method)

julia> inc2()
2

julia> inc2.x.contents=5.6
ERROR: type #inc2#5 has no field x

julia> inc2()
3

julia> ind2.x
ERROR: UndefVarError: ind2 not defined

julia> getfield(inc2, :x)
ERROR: type #inc2#5 has no field x

after using these tricks, the captured variable is completed encapsulated (as far as I know): it’s no longer a Core.Box, and could not be access directly, even with getfield().

sorrie… I don’t understand how this constructor relates to push!… could you kindly explain???

You should play around interactively at the REPL or in a notebook:
after defining closuremaker2 as in your post, do

julia> inc2 = closuremaker2()
inc2 (generic function with 1 method)

julia> inc2.<TAB>

Here, <TAB> means “press the tab key”. This will show you all of the fields inside the object inc2. In this case, I get

inc2.#1#x

The #1#x is a generated name corresponding to x. Then I can see its value using

julia> getfield(inc2, Symbol("#1#x"))
1-element Array{Int64,1}:
 1

Alternatively, you can get this using

julia> typeof(inc2)
getfield(Main, Symbol("#inc2#3")){Array{Int64,1}}

julia> fieldnames(typeof(inc2))
(Symbol("#1#x"),)

It’s certainly more “hidden”, but not completely.

3 Likes

All I’m saying is that if you make a “copy” of the object, it will use the same Vector inside.
The allocation that you showed comes because you effectively made a copy of that vector using the [v; 1] syntax. The allocation is not inherent to having an immutable struct.

1 Like
julia> inc2 = closuremaker2()
inc2 (generic function with 1 method)
julia> inc2()
2
julia> pop!(getfield(inc2, 1));
julia> inc2()
ERROR: BoundsError: attempt to access 0-element Array{Int64,1} at index [1]

Again: Closures are not a first-class concept in julia. They are struct <: Function, and their layout is generated by the compiler during lowering. You can introspect and access their fields just like any other julia struct.

The compiler is not very good at laying out the structures it uses to represent closures. In most cases, it is imo much more transparent (for readers of your code) and efficient to lay out the structure for captured state by hand. This needs no tricks. In many cases, this does not even increase the amount of code!

I mean, what do you think is more readable one year down the road, closuremaker2 or Inc? If some downstream bug leads to some poor sod with an instance of your type that ended up some place it doesn’t belong, what do you think will be more helpful for debugging and grepping the source?

There are imo two exceptions, where compiler-generated layout of closures is warranted: Things that can be expressed as a succinct one-liner, or cases where the closure does not escape, like anonymous functions you want to map over something, do syntax (really awesome!), etc.

Closures really are syntactic sugar, and not having an explicit human-readable description of your types in source code is a liability: Sometimes worth it for succinctness, but often just bad.

Using compiler-generated closures also means that non-breaking updates of the compiler will change the data layout of your stuff! Things can change between bitstype, can change size, inference changes, all kinds of stuff. And there is not a lot of documentation or debug-tooling for the part that decides on this layout (the frontend part that is written in femtolisp). For comparison, there is much better tooling for post-lowering stages, from @code_warntype to Cassette.jl, inspecting llvm compiler passes and native code, etc.

3 Likes

Good point, are there any suggestions on how to make the cache @pure and also grow in size?

Don’t use @pure. It doesn’t mean what you think it means and Julia can often perform the optimizations you want with inlining alone.

2 Likes

Consider this

julia> struct data{N,G} end

julia> const binomsum_cache = [[1]];

julia> function impure_binomsum(n::Int, i::Int)
           for k=length(binomsum_cache)+1:n
              push!(binomsum_cache, cumsum([binomial(k,q) for q=0:k]))
           end
           i!=0 ? binomsum_cache[n][i] : 0
       end;

julia> Base.@pure function pure_binomsum(n::Int, i::Int)
           for k=length(binomsum_cache)+1:n
              push!(binomsum_cache, cumsum([binomial(k,q) for q=0:k]))
           end
           i!=0 ? binomsum_cache[n][i] : 0
       end;

Then compare

julia> pure(::data{N,G}) where {N,G} = float(pure_binomsum(N,G));

julia> impure(::data{N,G}) where {N,G} = float(impure_binomsum(N,G));

julia> @code_warntype impure(data{5,4}())
Body::Float64
1 1 ─ %1 = invoke Main.impure_binomsum($(Expr(:static_parameter, 1))::Int64, $(Expr(:static_parameter, 2))::Int64)::Int64
  │   %2 = (Base.sitofp)(Float64, %1)::Float64                                     │╻╷ float
  └──      return %2                                                               │  

julia> @code_warntype pure(data{5,4}())
Body::Float64
1 1 ─     return 26.0                                                                      │

with @inline it is much much worse

julia> @inline function inline_binomsum(n::Int, i::Int)
           for k=length(binomsum_cache)+1:n
              push!(binomsum_cache, cumsum([binomial(k,q) for q=0:k]))
           end
           i!=0 ? binomsum_cache[n][i] : 0
       end;

julia> inline(::data{N,G}) where {N,G} = float(inline_binomsum(N,G));

julia> @code_warntype inline(data{5,4}())
Body::Float64
1 1 ── %1  = $(Expr(:static_parameter, 1))::Core.Compiler.Const(5, false)
  │    %2  = $(Expr(:static_parameter, 2))::Core.Compiler.Const(4, false)
  │    %3  = Main.binomsum_cache::Array{Array{Int64,1},1}   │╻          inline_binomsum
  │    %4  = (Base.arraylen)(%3)::Int64                     ││╻          length
  │    %5  = (Base.add_int)(%4, 1)::Int64                   ││╻          +
  │    %6  = (Base.sle_int)(%5, %1)::Bool                   │││╻╷╷╷       Type
  │          (Base.sub_int)(%1, %5)                         ││││╻          unitrange_last
  │    %8  = (Base.sub_int)(%5, 1)::Int64                   │││││┃          -
  │    %9  = (Base.ifelse)(%6, %1, %8)::Int64               │││││      
  │    %10 = (Base.slt_int)(%9, %5)::Bool                   │││╻╷╷        isempty
  └───       goto #3 if not %10                             │││        
  2 ──       goto #4                                        │││        
  3 ──       goto #4                                        │││        
  4 ┄─ %14 = φ (#2 => true, #3 => false)::Bool              ││         
  │    %15 = φ (#3 => %5)::Int64                            ││         
  │    %16 = φ (#3 => %5)::Int64                            ││         
  │    %17 = (Base.not_int)(%14)::Bool                      ││         
  └───       goto #31 if not %17                            ││         
  5 ┄─ %19 = φ (#4 => %15, #30 => %71)::Int64               ││         
  │    %20 = φ (#4 => %16, #30 => %72)::Int64               ││         
  │    %21 = %new(getfield(Main, Symbol("##15#16")){Int64}, %19)::getfield(Main, Symbol("##15#16")){Int64}
  │    %22 = (Base.sle_int)(0, %19)::Bool                   │││╻╷╷╷       Type
  │          (Base.sub_int)(%19, 0)                         ││││╻          unitrange_last
  │    %24 = (Base.ifelse)(%22, %19, -1)::Int64             │││││      
  │    %25 = %new(UnitRange{Int64}, 0, %24)::UnitRange{Int64}│││       
  │    %26 = %new(Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##15#16")){Int64}}, %21, %25)::Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##15#16")){Int64}}
  │    %27 = invoke Base.collect(%26::Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##15#16")){Int64}})::Array{Int64,1}
  │    %28 = Base.cumsum::typeof(cumsum)                    ││╻          cumsum
  │    %29 = (Base.sle_int)(1, 1)::Bool                     │││╻╷╷╷╷      #cumsum
  └───       goto #7 if not %29                             ││││┃│││       isempty
  6 ── %31 = (Base.sle_int)(1, 0)::Bool                     │││││┃│││       iterate
  └───       goto #8                                        ││││││┃│         iterate
  7 ──       nothing                                        │          
  8 ┄─ %34 = φ (#6 => %31, #7 => false)::Bool               │││││││┃          iterate
  └───       goto #10 if not %34                            ││││││││   
  9 ──       invoke Base.getindex(()::Tuple{}, 1::Int64)    ││││││││   
  └───       $(Expr(:unreachable))                          ││││││││   
  10 ─       goto #12                                       ││││││││   
  11 ─       $(Expr(:unreachable))                          ││││││││   
  12 ┄       goto #13                                       │││││││    
  13 ─       goto #14                                       │││││╻          iterate
  14 ─       goto #15                                       │││││      
  15 ─ %43 = invoke Base.:(#cumsum#570)(1::Int64, %28::Function, %27::Array{Int64,1})::Array{Int64,1}
  └───       goto #16                                       ││││       
  16 ─       goto #17                                       │││        
  17 ─ %46 = Main.binomsum_cache::Array{Array{Int64,1},1}   ││         
  │    %47 = (Core.lshr_int)(1, 63)::Int64                  │││╻╷╷╷╷╷╷    _growend!
  │    %48 = (Core.trunc_int)(Core.UInt8, %47)::UInt8       ││││┃│││││     cconvert
  │    %49 = (Core.eq_int)(%48, 0x01)::Bool                 │││││┃││││      convert
  └───       goto #19 if not %49                            ││││││┃││        Type
  18 ─       invoke Core.throw_inexacterror(:check_top_bit::Symbol, Int64::Any, 1::Int64)
  └───       $(Expr(:unreachable))                          ││││││││┃          check_top_bit
  19 ─       goto #20                                       │││││││││  
  20 ─ %54 = (Core.bitcast)(Core.UInt64, 1)::UInt64         ││││││││   
  └───       goto #21                                       ││││││││   
  21 ─       goto #22                                       │││││││    
  22 ─       goto #23                                       ││││││     
  23 ─       goto #24                                       │││││      
  24 ─       $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), :(:ccall), 2, :(%46), :(%54), :(%54)))
  └───       goto #25                                       ││││       
  25 ─ %61 = (Base.arraysize)(%46, 1)::Int64                │││╻╷╷╷╷      lastindex
  │    %62 = (Base.slt_int)(%61, 0)::Bool                   ││││╻╷╷╷       eachindex
  │    %63 = (Base.ifelse)(%62, 0, %61)::Int64              │││││┃│││││     axes1
  │          (Base.arrayset)(true, %46, %43, %63)           │││╻          setindex!
  └───       goto #26                                       ││╻          push!
  26 ─ %66 = (%20 === %9)::Bool                             │││╻          ==
  └───       goto #28 if not %66                            │││        
  27 ─       goto #29                                       │││        
  28 ─ %69 = (Base.add_int)(%20, 1)::Int64                  │││╻          +
  └───       goto #29                                       ││╻          iterate
  29 ┄ %71 = φ (#28 => %69)::Int64                          ││         
  │    %72 = φ (#28 => %69)::Int64                          ││         
  │    %73 = φ (#27 => true, #28 => false)::Bool            ││         
  │    %74 = (Base.not_int)(%73)::Bool                      ││         
  └───       goto #31 if not %74                            ││         
  30 ─       goto #5                                        ││         
  31 ─ %77 = invoke Base.getindex(Main.binomsum_cache::Array{Array{Int64,1},1}, %1::Int64)::Array{Int64,1}
  │    %78 = (Base.arrayref)(true, %77, %2)::Int64          ││╻          getindex
  └───       goto #32                                       ││         
  32 ─ %80 = (Base.sitofp)(Float64, %78)::Float64           ││╻╷         Type
  └───       return %80                                     │          

this is why I wanted to use @pure in my situation, it provides performance boost

how can I achieve similar optimizations without @pure in this situation?