Mutable struct vs closure


#31

Yes. Marking something as “private” in languages like julia or C++ is just a documentation hint for your downstream. Dedicated users will point a disassembler at your binary and just poke at the memory. Then, ten years down the line, some poor sod will be stuck supporting backwards compatibility for these hacks.

If you want to enforce encapsulation, then you must run on a virtual machine that does this for you, like the JVM. There is a plethora of languages targeting the JVM besides java.

Julia targets the hardware, without an intermediate enforcement layer. Since dedicated users will access your internal fields anyway, what is the additional gain above clearly documenting that the field is internal, both by actual docs and by using suggestive names?


#32

I’m not sure what the OP had in mind, but this reminds me of Let Over Lambda by Doug Hoyte. One of the patterns advertized in the book is called let over two lambdas, and illustrates the parallel between closures and objects. It looks like this (in Common Lisp):

(let ((counter 0))
  (values
    (lambda () (incf counter))
    (lambda () (decf counter))))

#33

It depends what you want to do. If you just want to update in place, as Simon said, do

import Base: push!

push!(x::Composite, i) = push!(x.values, i)

(You will also need to update smallest and largest.)

Note that even though Composite is an immutable struct, mutable objects inside the struct are still mutable.

Also, if you make a “copy constructor” as

julia> Composite(x::Composite) = Composite(x.values)

then you do not allocate the vector, but rather reuse it:

julia> push!(x.values, 10)
3-element Array{Int64,1}:
  1
  2
 10

julia> y
Composite(1, 2, [1, 2, 10])

i.e. y has also changed.

If you don’t want this then indeed you have to allocate a new vector (e.g. with copy).


#34

Can someone explain to me what kind of safety is actually at issue? Is the concern malicious code, or accidentally doing something you didn’t mean to do (or a fear users will accidentally do something unintended)?

If the latter, it seems like @Tamas_Papp’s point about a clear API is ideal - it took me a while to figure out when I first started with Julia, but it’s been a long time since I accessed a field directly (I did it because the accessor didn’t exist, and the package maintainers responded to an issue and added one).


#35

I guess the usual concern is a large team of programmers, where some members could do something quick & dirty by exposing internals, which could then lead to a bug. Internal style guides and code review protect against this to some extent, and as others have pointed out, there is no protection against a sufficiently determined individual who is bent on shooting themselves in the foot.

But some languages, eg C++, offer facilities for this that at least raise the cost of accessing internals, and some people miss them in Julia.


#36

I agree that encapsulation vs flexibility is an important discussion. Yes, you can get around encapsulation in many other languages even when they were designed for it. In Java for example you can access and modify private fields using reflection. I think this argument misses the point a bit – IMO the point isn’t that it’s possible, but that it’s so easy to do it in Julia, and that it seems tolerated, or even encouraged. I doubt you’ll find anyone seriously suggesting a workaround based on reflection to access private members on a Java forum, and any such code would likely not pass code review. But I’ve lost count of how many posts I’ve seen on this forum suggesting this, or packages doing it. Sometimes accompanied with warnings, but I’m not sure how much that helps.

A recent example is this thread where a new user asked for help on indexing, and was provided one “proper” solution, and one shorter solution using internal methods. No-one opposed. As we can see in a later post, the approach using internal methods was chosen as a solution.

Some more examples here, here, here, here, here, here, here.

As for risks/safety, it’s not about malicious code IMO, but very well-intentioned code, that becomes unmanageable over time as you reach a large code-base with many authors. Joshua Bloch (author of Effective Java, and many core Java features) writes the following:

"Minimize the accessibility of classes and members"

The single most important factor that distinguishes a well-designed component from a poorly designed one is the degree to which the component hides its internal data and other implementation details from other components. A well-designed component hides all its implementation details, cleanly separating its API from its implementation. Components then communicate only through their APIs and are oblivious to each others’ inner workings. This concept, known as information hiding or encapsulation, is a fundamental tenet of software design.

Information hiding is important for many reasons, most of which stem from the fact that it decouples the components that comprise a system, allowing them to be developed, tested, optimized, used, understood, and modified in isolation. This speeds up system development because components can be developed in parallel. It eases the burden of maintenance because components can be understood more quickly and debugged or replaced with little fear of harming other components. While information hiding does not, in and of itself, cause good performance, it enables effective performance tuning: once a system is complete and profiling has determined which components are causing performance problems, those components can be optimized without affecting the correctness of others. Information hiding increases software reuse because components that aren’t tightly coupled often prove useful in other contexts besides the ones for which they were developed. Finally, information hiding decreases the risk in building large systems because individual components may prove successful even if the system does not.


#37

I just realized this is not true - I’m still doing m.captures to get captures from a regular expression match and this is how it’s written in the documentation


#38

There was one person who pushed back on the _ind2sub solution: you did. And I was grateful you did. I probably would have said something if you hadn’t.

I think some of this comes from the legacy of Julia as it’s been developed. Many of us started working with Julia pre-1.0 when it was more of a necessity to get your hands dirty because there weren’t any other options. That should probably be tempered a bit more now — especially for newer users. There are still cases in those examples you link to where there simply isn’t an official solution. In such cases, I think trying to work them out in the thread makes sense, gives the original poster a (hopefully) temporary fix, and gives us fodder for further improving the language. We should probably do better at converting these into GitHub issues (and ideally PRs).


#40

I guess this concise statement from the Perl manual also applies to Julia:

“Perl doesn’t have an infatuation with enforced privacy. It would prefer that you stayed out of its living room because you weren’t invited, not because it has a shotgun”
― Larry Wall


#41

I feel reminded of the questions that asked how to “inherit” correctly in Julia, with the answer people suggested as most Julian being composition + overloaded accessors functions. But that leads me to the problem of asking “which functions does a third party module need to operate on my struct?” And that’s a tough question to answer for something like a dataframe.

Nor do I have any guarantees (formal or informal) that a future release of dataframe wont break things for me. My impression is that the issue here is that there is no programmatic way to express API expectations.

It seems to me it would be extremely useful to have something like (not CS so my terminology is probably wrong) abstract interfaces. Basically a set of function signatures operating on or with the type. Then a module declaring that it implements an interface for a concrete type, would declare that it has implemented a set of functions. And a module could export an interface that it expects for it’s structs.

The tooling could then also use this information to show me what the interface functions are that I can use on a type.

I suspect that this could all be done as a package, alas it’s beyond my time/abilities.

If I have a composite type

struct foo
    b::bar
end

And I have an interface interf for bar, I could have a macro that implements the interface for foo automatically.

@implement interf b
struct foo
    b::bar
end

Which would just write a bunch of functions:
interf.func(f::foo) = interf.func(f.b)

This would not enforce privacy, but it would communicate what I expect to be private and what I expect to be public, and where you can substitute your behaviour safely.


#42

I had written a post about how anyone who goes poking around in other people’s structs gets what’s coming to them, but I withdrew it after reading this post.

I’m still not convinced that privacy needs to be enforced by the language (nor how it could be, given multiple dispatch, unless it were at a module level), but, as a software developer, I sometimes forget that the Julia community is dominated by scientists who, according to the cliché (… and according to the links you provided…), are not particularly concerned with best practices.

There’s a quote from Larry Wall that I like:

In Perl 6, we decided it would be better to fix the language than fix the user.

I guess this is especially applicable in languages where the primary users are more interested in a program’s output than its design.


#43

@tomtom This looks like an iterator pattern. I ended up with something pretty similar when I was doing simulation in a functional programming language. You can do some fun stuff if you formalize it into the languages iterator interface, like onesecond_sim = take( sim, 30). You get undo and evolvable snapshots for free, which is really handy. Julia might not do this so well though, since some common data structures are not immutable, i.e. Arrays.


#44

^ it’s the point

so far, without changing the language, I see two workarounds:

  1. setting Base.getproperty() - although one can still get around by getfield(), at least it would make the thing more conscious
  2. using functional/closure approach, e.g. in my last function Closure() above. Actually, (as far as I know) there’s NO way to set/get the captured variables “directly” by skipping the closures given.

In essence, the “issue” of flexibility/encapsulation tradeoff, I think, is due to the unique, strong and attractive feature of julia: multiple dispatch. With multiple dispatch, a method typically dues with several types at the same time, so that there’s no “belongings” of a method to a particular type.

Finally, the type system in julia is very “free” in the sense that anyone could define any method to work with existing types. It’s very good, however, at the same time rises some “best practices” issues (in sense of computer science).


#45

indeed, it’s a big problem in terms of “clean code” standards… but at the same time, these content-mutable data structures are also so useful… does there exist a perfect solution??? @_@


#46

Like everything in programming you’ll have tradeoffs. If you wanted to you could operate Julia in immutable mode. Use https://github.com/JuliaCollections/FunctionalCollections.jl instead of the native data structures, write some macros to make changing single fields + copy more convenient for structs, or use PersistentHashMap instead of struct. Also the very popular StaticArrays.jl are immutable. You get full control over time evolution of your program state, parallel processing is vastly simplified and some other nice things. It’s a bit of a price to pay in performance and convenience!

I like keeping things as immutable as possible until it’s a performance issue. For numerics this usually means I have a very small number of data structures that are mutable and usually I’m not mutating them after assembling my system matrix anyways. Even then those matrices are built out of data structures that I only mutate on construction, which possibly would be worth switching to PersistentDataStructures for the nice things I mentioned before. Still I tend to try to do things that have the least friction in a language, so I use Array and Dict in practice and some domain specific immutable types .


#47

I expect that as libraries mature, they will develop neat interfaces for everything and this will not be necessary, but keep in mind that many parts of Julia are exploratory and WIP (yes, even in Base, especially if one is pushing the envelope). The current situation is not ideal, but a trade-off between exploration and standardization.

I guess it is assumed to be implicitly understood that solutions using internals are fragile, but perhaps inexperienced users would be better served if these were provided with a warning, or not at all.


#48

in the case where you only ever need to create one single instance of the struct, I find that using an anonymous closure to encapsulate the data and defining it as const helps limit the additional load on the local namespace to only 1 additional name, while with an explicit struct declaration would add multiple names and methods to local namespace to handle both the struct, instantiation, and access

edit: so I use explicit structs intended for multiple instance semantic data, and anonymous closures for single instance data objects. this way, the user will only ever be exposed to multi-instance struct types

challenge: make a more elegant or faster implementation of this (here I used a _cache and @pure)

const binomsum_cache = ( () -> begin
        Y = Array{Int,1}[Int[1]]
        return (n::Int,i::Int) -> (begin
                j = length(Y)
                for k ∈ j+1:n
                    push!(Y,cumsum([binomial(k,q) for q ∈ 0:k]))
                end
                i ≠ 0 ? Y[n][i] : 0
            end)
    end)()

Base.@pure binomsum(n::Int,i::Int) = binomsum_cache(n,i)

if there is a much better way, I’d like to see, but I prefer to limit the load on the namespace to 1 for this


#49

Generally I try to make sure that closures

  1. don’t escape the scope the created them (so, in practice, I use them for partial application and similar),
  2. don’t have mutable state, or it is very simple.

Closures are indeed very powerful constructs, but the “Lambda the Ultimate …” style wizardry generally ignores whether they

  1. map to efficient compiled code,
  2. whether the resulting programs are maintainable.

#50
julia> function closuremaker()
       x= 1
       function inc()
       x+=1
       x
       end
       inc
       end;
julia> inc = closuremaker();
julia> inc()
2
julia> inc.x.contents=5.6;
julia> inc()
6.6

Apart from one-liners (which I admit are really cool), the verbose way is

julia> mutable struct Inc <:Function
       __cnt::Int
       end

julia> Inc()=Inc(1);
julia> (inc::Inc)() = (inc.__cnt+= 1);

julia> inc = Inc()
(::Inc) (generic function with 1 method)

julia> inc()
2

This way, you have sensible field names (very nice when debugging!) and complete control over the data layout and types of everything that is closed over. There might be an argument about struct Inc __cnt::Base.RefValue{Int64}, but I don’t see it.

In many cases the compiler is smarter than you. In the specific case of closures (choosing types, layout, names for state, choosing which variables are needed) the compiler is pretty mediocre: Making the struct itself mutable instead of having the ref saves one allocation+pointer indirection. Most importantly, typing the contents as ::Int makes a very large performance difference. Part of the reason for this is that the closures are layout during lowering, which does not have access to all the cool optimizations (inference, constant prop, SSA), and the later optimization steps don’t change the layout.

julia> const __binomsum_cache = [[1]];
julia> function binomsum_cache(n::Int, i::Int)
       j=length(__binomsum_cache)
       for k=j+1:n
       push!(__binomsum_cache, cumsum([binomial(k,q) for q=0:k]))
       end
       i!=0 ? __binomsum_cache[n][i] : 0
       end

I don’t think the @pure is valid, though? Something with generated functions and precompilation?

Note that this variant is virtually the same code, but it is much easier to reason about the global mutable state of your module, in the case that precompilation, generated functions, or thread-safety blow up in your face.

Re namespace, you can isolate stuff in nested modules, not export things, or choose scary names, e.g. beginning with double underscore (my adhoc way). But I would like some naming convention for “don’t touch, really don’t” internals that is officially endorsed, used in Base and supported by linters.

Cool! I wonder how that works under the hood (on the JVM level; no big fan of java, but there is a plethora of languages targeting the JVM).


#51

It is mathematically valid as @pure, since the output is always the same for each n and i.

This is needed so type-computations requiring this value can be pre-allocated without a function call.

Now if the user changed the array by modifying it, then it would not be @pure anymore and break it.

To protect its behavior as @pure, the user needs to be kept away from modifying contents of the array

Fair enough, and I would also put @pure, since it behaves like a pure function.

const binomsum_cache = [[1]]
Base.@pure function binomsum(n::Int, i::Int)
    for k=length(binomsum_cache)+1:n
        push!(binomsum_cache, cumsum([binomial(k,q) for q=0:k]))
    end
    i ≠ 0 ? binomsum_cache[n][i] : 0
end

Previously, I did not use the @pure, so originally it was only one name, but with @pure it is 2 names anyway, and this way it is also 2 names, so I suppose this better.