Why do functions with default values become boxed if returned from another function?

Hi,

I’m completely new to Julia and I’m trying to reuse some of my tricks from matlab to prevent my programs from degenerating to passing around deep hierarchical state struct arrays and having alot of code doing index alignment.

I found this post which had a neat trick using closures to achieve what I (think I) want (see first answer):
https://stackoverflow.com/questions/39133424/how-to-create-a-single-dispatch-object-oriented-class-in-julia-that-behaves-l

However, I found that when I add default values to the returned functions they become boxed. What is the reason for this and is it a sign of something bad?

For example:

function funcs()
    function boxed(str::String = "default")
        println(str)
    end
    function notboxed(str::String)
        println(str)
    end

    return () -> (boxed; notboxed)
end

julia> fs = funcs()
#205 (generic function with 1 method)

julia> fs.boxed
Core.Box(getfield(Main, Symbol("#boxed#206"))())

julia> fs.notboxed
(::getfield(Main, Symbol("#notboxed#207"))) (generic function with 1 method)

Adding an unboxing method lets me circumvent this, so it appears as if the functions work as intended:

function funcs()
    function boxed(str::String = "default")
        println(str)
    end
    function notboxed(str::String)
        println(str)
    end

    return unbox(() -> (boxed; notboxed))
end

function unbox(funcs)
    unboxed = funcs.boxed.contents
    notboxed = funcs.notboxed
    () -> (unboxed, notboxed)
end

julia> fsu = funcs()
#215 (generic function with 1 method)

julia> fsu.unboxed
(::getfield(Main, Symbol("#boxed#212"))) (generic function with 2 methods)

julia> fsu.unboxed()
default

It does however feel a bit silly to always have to wrap things in unboxers like this. Is there a way to avoid the boxing?

I think the short answer to this is probably “don’t do this”. Julia is not Java, and an object-oriented style is unlikely to be as performant or as easy to work with as a more idiomatic approach. The language itself doesn’t guarantee that captured variables in closures won’t be boxed like this (see performance of captured variables in closures · Issue #15276 · JuliaLang/julia · GitHub ), so I think you’ll find this a difficult road to go down.

Fortunately, there are a lot of other ways to solve problems of structuring code in Julia that don’t require these kinds of shenanigans. If you can describe more about what the actual problem you want to solve is, I’m sure we can help figure something out.

4 Likes

+1 to @rdeits’s sentiments: try to write julian code. It’s fun!

I’m not sure why your example does not work. It works if you use anonymous functions:

julia> function funcs()
           f1 = function (str::String = "default")
               println(str)
           end
           f2 = function (str::String)
               println(str)
           end

           return () -> (f1, f2)
       end

also are you sure you didn’t mean

julia> function funcs()
           f1 = function (str::String = "default")
               println(str)
           end
           f2 = function (str::String)
               println(str)
           end

           return f1, f2
       end

Your example returns a function which returns two functions (but does not modify/adjust them in anyway).

Thanks for the replies. Despite being an OOP fanboy, I’m definitely wanting to learn how to do things the julian way and see how easy things can be if I let go of old truths. I will try to study the link you sent me. It was a bit too much to extract anything from just skimming it.

On a high level, (and this is probably OOP damage) I often find that whenever I try FP, I end up with functions where some arguments are associated with some configuration or context and others are not. It gets especially nasty when some of the configuration parameters dictate what function shall be used to carry out a certain task and goes into overdrive nastiness when the structures are hierarchical.

As I said, in matlab I usually end up using (curried) function variables to deal with this but I understand that this is basically OOP emulation. I found a lot of instances where people are saying that context.f(data) can just be replaced by f(context, data) but I can’t seem to figure out how to deal with the two mentioned nastinesses. What if f (or some function which f will call) is different for different contexts for instance? What if two contexts need to represent their data in different ways. I understand I could define f for different types of contexts but I can’t see how this would help me. What if f is the same for many different contexts for example?

Concretely, in this case I’m trying to build a simple logfile parser which will allow me to explore data from (a lot of) logfiles e.g by plotting x vs y, cdfs of z when x > k etc. in the REPL. I know there are ready solutions for this, but in this case I’m trying to learn something.

Each line in the file could be from a different log and each logline looks something like this [timestamp], LogId, …., param0 = 1234, param2 = {12 34}, param3=etc. up to maybe 100 parameters per line. The lines don’t really have a uniform format so a generic (\w*)=(\w*) won’t work. Furthermore, the values of the parameters are not in any particular format either. Some are plain numbers, but others are in hex or even stranger, requiring a custom function to parse them (in some cases even requiring the value of another parameter on the same line to make sense of it).

Putting this into context with the above, I have the following hierarchy :

LogIds I’m interested in
Params I’m interested in for given a LogId
Function to parse the value of a given parameter

The straightforward approach to just have code like this:

regexLogId0 = r”timestamppattern, LogId0.paramX=(\d).paramY={(\d) (\d*)} etc.”

and then code to handle each capture based on what parameter it matches becomes cumbersome to work with as I might want to add a parameter in between paramX and paramY and this will change the group indices.

Using the approach in the OP, I could make it so that adding a parameter to parse was a single line like this: logIdContext.add(, “paramX=(\d*)”, parsefunction) and then I can just “press play” again to have all the data. What is the julian way to achieve the same/similar thing?

@mauro3: Thanks for the tip with the anonymous functions. The reason for the weird () → (…) is that that makes the functions named. In your second example you just get a tuple and need to use indexing to call the right function. I have no idea why, I just copied it from the SO suggestion.

Apologies, I haven’t yet had time to grok your actual application and suggest an alternative approach, but there is at least a very easy way to return named functions, which is to use a NamedTuple. Such a thing didn’t exist when the SO post you cite was written, but it does now:

julia> function funs()
         function default(str="default")
           println(str)
         end
         function nodefault(str)
           println(str)
         end
         (default=default, nodefault=nodefault)
       end
funs (generic function with 1 method)

julia> fs = funs()
(default = getfield(Main, Symbol("#default#7"))(), nodefault = getfield(Main, Symbol("#nodefault#8"))())

julia> fs.default()
default

julia> fs.nodefault("hello")
hello
1 Like

You can do the essentially same thing in Julia e.g.

addparam(logparser, "SomeLogID", paramX=r"\d*", parsefunction)

where logparser is some datatype that contains, for example, a dictionary mapping log IDs to parameter info, where the latter is another dictionary mapping strings "paramX" to (regex, parsefunction) tuples. Then you could have another function

logentry = parse(logparser, line)

that parses a line from a file and returns the log entry in some format.

At a superficial level the only difference here is spelling: instead of writing object.method(args...) you write method(object, args...). The advantage of the latter, however, is that:

  • It enables multiple dispatch: if you want, you can dispatch to a method implementation based on all of the argument types, not just on the type of object.

  • You can add new functions for existing types that still participate in dynamic dispatch, without modifying the type’s implementation (which may be in Base or some other package). (In Python etcetera, I can write a function foo(object) that acts on someone else’s object class, but I can only write one foo function — if I want to dispatch to different foo implementations based on the type of object then I have to do manual introspection. For example, the plot(x,y…) function in Matplotlib has to do manual dispatch like this.)

3 Likes

Thanks again for putting up with me. The NamedTuple was neat as it retains the ability to have multiple functions with the same name but different signatures.

I think the original concrete question was answered through the linked issue as well as the above.

I would love to stick around and discuss how to do what I want in a way which is more harmonious with the intents behind the language.

My example with logIdContext.add was too simplistic to point out my gripes. The add method is just a convenience function for the simple (but yet most common case) of a single parameter. As a simple example look at the time stamp; it consists of something like hour=04, minute=23, second=31, millisecond=1234. To get something useful to plot on the time axis I need to convert these four parameters to a single millisecond value. With the addparam approach, it seems like I would need to create a new function to parse the time stamp into a single value and probably also change the structure of the logparser (unless I want to deal with it in postproc, but I don’t :slight_smile: ).

What I do now is that logIdContext.add(“paramX=(\d*)”, parsefunction) in turn just calls add(SimpleParam(“paramX=(\d*)”, parsefunction)). For the timestamp, I can do something like logIdContext.add(SumParams(SimpleParam(“hour=(\d*)”, hour2ms), SimpleParam(“minute=(\d*)”, minute2ms),…).

Also, (and I’m not sure it matters) I don’t want to evaluate a regex multiple times for the same line for speed reasons, so both SimpleParam and SumParams can 1) concatenate the regexps so that logIdContext has a single regex to match each line with and 2) store which capture(s) to use based on the “next capture” (which is maintained by the logIdContext).

Example code from logIdContext:

    function add(data)
        # Immutability for immutabilties sake. Can't say it is any better than a mutable solution
        newlogdata = Dict(logdata)
        newlogdata[data.name()] = data
        (newpattern, newnext_group) = data.concat_pattern!(pattern, next_group)
        return LogIdContext(newpattern, name, newnext_group, newlogdata)
    end

I can almost see how I can achieve the same thing using multiple dispatch, but the penny has not dropped.

Btw, I’m no fan of python or duck typing in general so I don’t have a problem with types or multiple dispatch (unless it means something else in julia than the normal same function name with different signatures).

The solution was ported from a python script i wrote, but it was too slow to be practical. The julia version runs much much faster. Not sure if this is due to julia itself or just because the ipython REPL is slow somehow as I wouldn’t expect regexps and file IO to differ much between languages.

The snippet you poster relies on globals (data, pattern, etc) so it is hard to reason about it to formulate an idiomatic solution. Can you post something self-contained, but under, say, 25 LOC?

Sorry about being too brief. Here is the full version (convenience methods removed):

function LogContextBuilder(
    pattern::String,
    name::String=Nothing,
    next_capturenr::Integer=1,
    logdata::Dict{String, Any}=Dict{String, Any}())

    function add(data)
        # Immutability for immutabilties sake. Can't say it is any better than a mutable solution
        newlogdata = Dict(logdata)
        newlogdata[data.name()] = data
        (newpattern, newnext_capturenr) = data.concat_pattern!(pattern, next_capturenr)
        return LogContextBuilder(newpattern, name, newnext_capturenr, newlogdata)
    end

    build() = LogContext(Regex(pattern), logdata, name)

    return  (add=add, build=build)
end

Here are the other players (although it breaks the 25 loc limit):

function LogContext(pattern, logdata, name)

    function parse!(line)
        matcher = match(pattern, line)
        if !isnothing(matcher)
            extract(matcher)
            return true
        end
        return false
    end

    function extract(matcher)
        for data in values(logdata)
            data.extract!(matcher)
        end
    end

    get(name) = logdata[name].get()

    return (parse! = parse!, get=get)
end



function SimpleParam(_name::String, pattern::String, parsefun=str->parse(Int64, str))
    name() = _name
    capturenr = -1
    data = [] # Result will be stored in here

    function concat_pattern!(whole_pattern, next_capturenr)
        capturenr = next_capturenr
        return (whole_pattern * ".*" * pattern , next_capturenr+1)
    end

    extract!(matcher) = push!(data, parsefun(matcher.captures[capturenr]))

    get() = data

    return (name=name, concat_pattern! = concat_pattern!, extract! = extract!, get=get)
end

function Time()
    name() = "time"
    timepars = [SimpleParam("h", "h=(\\d*)", str -> parse(Int64, str) * 3600_000), # hour2ms
                SimpleParam("m", "m=(\\d*)", str -> parse(Int64, str) * 60_000), #minute2ms
                SimpleParam("s", "s=(\\d*)", str -> parse(Int64, str) * 1000), #second2ms
                SimpleParam("ms", "ms=(\\d*)")]

    function concat_pattern!(whole_pattern, next_group)
        p = ""
        g = next_group
        for param in timepars
            p,g = param.concat_pattern!(p, g)
        end
        return (p * ".*" * whole_pattern, g)
    end

    function extract!(matcher)
        foreach(param -> param.extract!(matcher), timepars)
    end

    function get()
        mapreduce(param -> param.get(), (x,y) -> x .+ y , timepars)
    end
    return (name=name, concat_pattern! =concat_pattern! ,extract! = extract!, get=get)
end

As you can see it is OOP emulation without the clarity and safety of interfaces. For this simple throw away code it doesn’t matter though, but it would be good to know how to do things the right way.

Testing code:

testStrLogId0 = "[h=03 m=13 s=10 ms=1234] logID:0 parA=13 parB=666"
testStrLogId1 = "[h=03 m=13 s=10 ms=1234] logID:1 parC=13 parD=666"

logId0Context = LogContextBuilder("logID:0", name="Log0" ).
                add(Time()).
                add(SimpleParam("B", "parB=(\\d*)")).
                build()

logId1Context = LogContextBuilder("logID:1", name="Log1" ).
                add(Time()).
                add(SimpleParam("C", "parC=(\\d*)")).
                build()

success = logId0Context.parse!(testStrLogId0)
println("got match: $success" )

Bs = logId0Context.get("B")
println("Bs: $Bs")

ts = logId0Context.get("time")
println("ts: $ts")

##Etc for logId1Context