Best practice for storing functions in a struct


#1

Given that Function is an abstract type, I was wondering whether the following will impact performance on future function calls:

struct FuncStruct
    f::Function
end

Should I instead be doing the following?

struct FuncStruct{T<:Function}
    f::T 
end

I think I should definitely be doing the latter for a mutable struct, but is the former okay for a struct?

Thanks in advance to all responders.

Colin


#2

Depends on what you’re doing. Do you want functions to recompile for each new function, or not?


#3

I actually wanted to have a sequence of functions stored, like a Vector{FuncStruct} (using the definition from my original post), and to iteratively call them with the output of one being the input to the next (the input to the first one would be exogenously determined). The functions would typically only be called once, before a given Vector{FuncStruct} is discarded and a new one created (using existing functions), but this could be happening thousands upon thousands of times.

In other words, ideally I might have some functions f1, …, fK, that are available to use, and I would make lots of these Vector{FuncStruct} using different subsets of f1, …, fK, so no, ideally I would not be re-compiling f1, …, fK, at all, once they’ve been compiled the first time.

Does this make sense?

Cheers,

Colin


#4

That won’t happen. The functions will only be compiled once no matter what. But anyways, it doesn’t matter which you do here since either way Vector{FuncStruct} is not strictly type and so there will be a dynamic part in there somewhere. But of course, whether or not that matters depends on the application (i.e. how expensive the function is). If you want it to be fully specialized (which only makes sense if you have <16 functions), then you can do a recursion over a tuple of functions. Otherwise, you might want to consider using FunctionWrappers.jl which I think could make this a bit more optimized while not relying on full specialization.


#5

Understood. I’d just (sort of) grasped that issue myself.

Each function is likely to be operating on thousands of observations of data. I would guess that would be classified as “relatively expensive”. Does that sound right?

I will definitely have <16 functions in each sequence of functions, so maybe I should be looking at this. However, the number of functions will vary, so I can’t do, eg Tuple{Function, Function}, since sometimes I might actually want Tuple{Function, Function, Function}. Does Julia have something like VarTuple{Function, 3}, that works sort of like Vararg{Function} (or maybe Vararg{Function} is exactly what you’re referring to here…?

I’ll do some reading on this too. Thanks for the pointer!


#6

You’d have to have a constant number for what I was discussing to work, since you’d want to have the recursion unravel at compile time.

Yeah, so as long as you do make sure the result of the function call is inferred properly (but a type assertion on the return), then the total cost of having it dynamic won’t be much at all (100ns or so per function call?) in comparison to the cost of the functions themselves.


#7

This has been an incredibly helpful conversation, thank you very much.

All the functions in question are type stable, and I also have type assertion on returns so that if I have stuffed up somehow I’ll get an error. And I would guess function runtime will be measured in milliseconds, so 100ns will be barely noticeable.

I’ll also have a think about having a constant number of functions in a tuple, and just padding with Base.identity when I actually want less than that constant number.

Many thanks again.

Cheers,

Colin


#8

Storing a lot of Function objects in a struct is almost always the wrong thing to do. (Often, it’s a symptom of someone trying to use OOP syntax in Julia rather than rewriting in a Julia style, but that doesn’t seem to be the case here.)

If you just want to compose a bunch of functions into a new function, why not do exactly that? i.e. do f = f1 ∘ f2 ∘ f3 ∘ f4 (or f = reduce(∘, identity, fcollection) if you have a collection of functions to compose) and then pass the composed function f around as needed? This way, the composed function f will be properly compiled (possibly with inlining) when it is used.

Higher-order functions and anonymous functions are fast in Julia. Take advantage of them!

Alternatively, if you are actually creating “thousands upon thousands” of functions that are used only once, that suggests to me that what you really have might be a single family of functions that are parameterized in some way, in which case a functor object encapsulating the parameters (not the composed functions) might be more appropriate. Can you say a little more about your use case and why so many different function compositions are arising for you?


#9

I agree with @stevengj’s sentiment that you may be going about this the wrong way. Early on, I was certainly guilty of letting OO artifacts from my C++ days creep into my code, and would occasionally contemplate things like this. I am much happier now that I am free of the OO prison and have seen the light of multiple dispatch (my number 1 most beloved aspect of Julia).

That said, FunctionWrappers is extremely useful and in my opinion should really be in Base, since there are probably very few people with the expertise to write or maintain it.


#10

Good question. The broader context of what I am doing is implementing my own general framework for working with reasonably large quantities of time-series data. Let’s take the case where I have an object with f1, f2, f3, and f4. In my framework, this is essentially a set of instructions for building a new time-series from (possibly multiple) existing time-series. We apply f1 to some data, f2 to the output of f1 and so on. I could combine them all into f as you suggest, but then when building, I’d need to start from the beginning every time. The vision in my head that I’m trying to implement instead iterates down from f4, and at each stage it stops and checks whether the time-series it is trying to build has already been built before and saved in my database (nothing fancy, just binary files using the file-system for lookup). It might turn out that in this particular case, f2 is quite expensive, and I’ve already built it before. So my routine would ideally iterate down from f4 looking for pre-built series, stop at f2 and say a ha, we’ve already done this one before, read in that series from the database, then apply f3 and f4 to it.

I’m am certainly open to suggestions about better ways to implement this vision. This is actually my second attempt at such a framework. My first relied almost entirely on multiple dispatch, and ultimately ended up being too unwieldy (but I was fairly new to the language when I did it). One reason I settled on functions is that my needs from the framework can change quite rapidly, and being able to just write new functions and pass them in is a relatively simple solution for someone of my abilities (I’m reasonably experienced with Julia now, but have essentially zero CS background, ie entirely self-taught, so there are some alarming gaps in my knowledge).

Often it does work out as you say - there is a small family of functions and I use many different parameterizations. In this case I have typically just generated an anonymous function for each parameterization I needed and passed it round. The limitation to this (for me anyway) is that I can’t utilize my trick of saving expensive transformations to the hard-drive, since I can’t generate any consistent naming conventions from anonymous functions. So my more recent solution to this was to generate fixed function names for the parameterizations that currently interest me via meta-programming.

In other cases, each transformation is its own expensive, individual function.

Thanks for taking the time to read this thread and think about my problem. I’ll look into functor objects (which I’d never heard of before - see aforementioned alarming gaps), and anything else you might recommend based on the above.

Cheers,

Colin


#11

Honestly I don’t think I’m doing anything OO here. I actually only have a vague understanding of what OO is. I have zero CS background, and used Matlab before coming to Julia, which, based on my understanding, is about as far from an OO language as it is possible to get. Incidentally, as I mentioned in my (much longer) answer to Steven, my first attempt at the framework I’m trying to put together used only multiple dispatch, but it ended up being a little too unwieldy (admittedly I was a lot less experienced with Julia when I did it).

The current idea in my head still uses multiple dispatch quite a lot - all the functions in question are type-stable, and their output has the ability to be saved to a binary format on the SSD, based on the output type, and read in at a later time, again using known type information.

Cheers, and thanks for responding.

Colin


#12

In case it’s helpful, the general term for caching the arguments to a function (and only actually doing the computation for new inputs) is “memoization”. And, in fact, there’s a Julia package that can do it for you (in simple cases): https://github.com/simonster/Memoize.jl

As for functors, that’s just how we refer to a struct that stores some parameters and is itself callable like a function. They’re actually implemented exactly the same way as anonymous functions in Julia, except that we give them a name:

julia> struct Adder
         x::Int
       end

julia> function (a::Adder)(y)  # this is how we make an `Adder` callable like a function
         a.x + y
       end

julia> a1 = Adder(1)
Adder(1)

julia> a1(2)
3

Storing functors like this instead of anonymous functions may make your code easier to reason about.


#13
  • 1 for FunctionWrappers und base. And i also strongly recommend trying this in your context.

It is often but not always true that one can replace an array of functions with functors or other „Julian“ constructs. And even when one can - an array of functions can be very useful at times or just lead to clearer code.


#14

That 's very clear, thank you. And the implementation of Memoize using a Dict is nice and neat too! (I like using things which I have some understanding of how they work internally).

Cheers,

Colin


#15

I have absolutely no idea what you are applying this to, but I can’t think of anything in nature that works this way. The most common types of stochastic time series can be thought of as being generated by a “kernel” that simply takes the previous k values, i.e. \langle x_{n+1}\rangle =f(x_{n-k-1},\ldots,x_{n}), for a particularly simple but particularly important example see here. I’m pretty ignorant about chaotic systems but I think that in those cases one can typically consider a small ensemble of initial states and use them to make statistical inferences, which again seems very different than what you are describing. What you are describing sounds either like a “super chaotic” system in which you have essentially a completely distinct Wiener step being applied at each point (I can’t imagine where such a thing would occur, but perhaps I’m just not imaginative enough), or, more likely something that can be parametrized much more simply.

Perhaps I’m simply out of my depth here, but I find that often if something seems to call for a highly unorthodox implementation it is because I’m thinking about it in some convoluted way.

By the way, for what it’s worth, and at the risk of getting to off-topic, the only case I’ve encountered in Julia in which I really had to do something like this where I store lots of functions in a container is in this simulator of a MOS 6502 CPU in which I have to be able to efficiently call arbitrary functions at run-time (essentially each time the simulated CPU processes an instruction). Even then, I could have used metaprogramming to “trans-compile” 6502 machine instructions into Julia, which I sort of wound up doing post-hoc anyway. On the bright side, I do believe that my final result using function wrappers was efficient enough that I could use it to create a real NES emulator (which would have been totally impossible without that package).


#16

I probably shouldn’t have said “time-series” as I think I’ve sent you off on the wrong track. While many of the transformations I’m interested in are as you suggest (albeit multivariate and often asynchronous), what I’m actually trying to build at the moment is more general. I’m essentially trying to build a framework for myself that will facilitate the investigation of new ideas with minimal additional coding from me. This basically means being able to apply, and mix, new transformations to my pre-existing database, with the possibility of storing the output of some of these new transformations (the computationally expensive ones) in the database so they in turn can be arbitrarily called and transformed again down the track.

My background is in time-series, so I tend to think of everything as time-series, but in this framework, some of the “data” that is stored will be non-numeric, eg strings, booleans, modelling parameters, actual models, etc, albeit it is all stored with an index type that is either DateTime, or some variant on DateTime, eg an interval of DateTime etc.

I guess part of the misunderstanding here is that I can’t really see what is unorthodox about my current approach to this problem. In my head, it is the most rational. I’ve have something, I want to use it as input to a transformation, possibly save the output for later use, or possibly pass it on to another transformation. Phrasing the framework for this as a bunch of type-stable functions seemed quite natural to me :slight_smile:


#17

Maybe what you’re looking to do is instead build an AST to transform models, a la SciCompDSL.jl?


#18

I’m probably doing it wrong, but out of interest
added - closer to working than expected =)

mutable struct funcAdder
  x::Array{Function,1}; ##();
  funcAdder() = new([]);
  end;

function (a::funcAdder)(newFunc::Function)
  push!(a.x, newFunc);
  end;

x = funcAdder();
x(z->z);
x.x[1](1) 
> 1

#19

Yes, in a way this is not that far off what I am attempting to do. Another way of thinking of it is that I’m really trying to write a nice concise language that is specified to exactly the kind of research that I do. However, without a huge investment of learning time on my part, I don’t really have the skills to implement something too fancy, or abstract. Hence this is why I’m focusing on something that (to me) seems relatively simple - a framework for passing transformations around to interact on data of known type, and sometimes store the output.

Cheers,

Colin


#20

Just thought you might be interested to know: I’ve seen plenty of your and ExpandingMan’s comments on here before, and always thought highly of them. So I took a day off and came back with fresh eyes and now agree with you that passing functions around in structs was going to be a mistake. I’ve re-drafted the framework using dedicated transformation types and have come up with something that I think is a Pareto improvement in flexibility on what I had before (and now uses Julia’s dispatch system for everything). In summary, many thanks. You may have saved me some real grief down-the-line.

Cheers,

Colin