I actually wanted to have a sequence of functions stored, like a Vector{FuncStruct} (using the definition from my original post), and to iteratively call them with the output of one being the input to the next (the input to the first one would be exogenously determined). The functions would typically only be called once, before a given Vector{FuncStruct} is discarded and a new one created (using existing functions), but this could be happening thousands upon thousands of times.
In other words, ideally I might have some functions f1, ā¦, fK, that are available to use, and I would make lots of these Vector{FuncStruct} using different subsets of f1, ā¦, fK, so no, ideally I would not be re-compiling f1, ā¦, fK, at all, once theyāve been compiled the first time.
That wonāt happen. The functions will only be compiled once no matter what. But anyways, it doesnāt matter which you do here since either way Vector{FuncStruct} is not strictly type and so there will be a dynamic part in there somewhere. But of course, whether or not that matters depends on the application (i.e. how expensive the function is). If you want it to be fully specialized (which only makes sense if you have <16 functions), then you can do a recursion over a tuple of functions. Otherwise, you might want to consider using FunctionWrappers.jl which I think could make this a bit more optimized while not relying on full specialization.
Understood. Iād just (sort of) grasped that issue myself.
Each function is likely to be operating on thousands of observations of data. I would guess that would be classified as ārelatively expensiveā. Does that sound right?
I will definitely have <16 functions in each sequence of functions, so maybe I should be looking at this. However, the number of functions will vary, so I canāt do, eg Tuple{Function, Function}, since sometimes I might actually want Tuple{Function, Function, Function}. Does Julia have something like VarTuple{Function, 3}, that works sort of like Vararg{Function} (or maybe Vararg{Function} is exactly what youāre referring to hereā¦?
Iāll do some reading on this too. Thanks for the pointer!
Youād have to have a constant number for what I was discussing to work, since youād want to have the recursion unravel at compile time.
Yeah, so as long as you do make sure the result of the function call is inferred properly (but a type assertion on the return), then the total cost of having it dynamic wonāt be much at all (100ns or so per function call?) in comparison to the cost of the functions themselves.
This has been an incredibly helpful conversation, thank you very much.
All the functions in question are type stable, and I also have type assertion on returns so that if I have stuffed up somehow Iāll get an error. And I would guess function runtime will be measured in milliseconds, so 100ns will be barely noticeable.
Iāll also have a think about having a constant number of functions in a tuple, and just padding with Base.identity when I actually want less than that constant number.
Storing a lot of Function objects in a struct is almost always the wrong thing to do. (Often, itās a symptom of someone trying to use OOP syntax in Julia rather than rewriting in a Julia style, but that doesnāt seem to be the case here.)
If you just want to compose a bunch of functions into a new function, why not do exactly that? i.e. do f = f1 ā f2 ā f3 ā f4 (or f = reduce(ā, identity, fcollection) if you have a collection of functions to compose) and then pass the composed function f around as needed? This way, the composed function f will be properly compiled (possibly with inlining) when it is used.
Higher-order functions and anonymous functions are fast in Julia. Take advantage of them!
Alternatively, if you are actually creating āthousands upon thousandsā of functions that are used only once, that suggests to me that what you really have might be a single family of functions that are parameterized in some way, in which case a functor object encapsulating the parameters (not the composed functions) might be more appropriate. Can you say a little more about your use case and why so many different function compositions are arising for you?
I agree with @stevengjās sentiment that you may be going about this the wrong way. Early on, I was certainly guilty of letting OO artifacts from my C++ days creep into my code, and would occasionally contemplate things like this. I am much happier now that I am free of the OO prison and have seen the light of multiple dispatch (my number 1 most beloved aspect of Julia).
That said, FunctionWrappers is extremely useful and in my opinion should really be in Base, since there are probably very few people with the expertise to write or maintain it.
Good question. The broader context of what I am doing is implementing my own general framework for working with reasonably large quantities of time-series data. Letās take the case where I have an object with f1, f2, f3, and f4. In my framework, this is essentially a set of instructions for building a new time-series from (possibly multiple) existing time-series. We apply f1 to some data, f2 to the output of f1 and so on. I could combine them all into f as you suggest, but then when building, Iād need to start from the beginning every time. The vision in my head that Iām trying to implement instead iterates down from f4, and at each stage it stops and checks whether the time-series it is trying to build has already been built before and saved in my database (nothing fancy, just binary files using the file-system for lookup). It might turn out that in this particular case, f2 is quite expensive, and Iāve already built it before. So my routine would ideally iterate down from f4 looking for pre-built series, stop at f2 and say a ha, weāve already done this one before, read in that series from the database, then apply f3 and f4 to it.
Iām am certainly open to suggestions about better ways to implement this vision. This is actually my second attempt at such a framework. My first relied almost entirely on multiple dispatch, and ultimately ended up being too unwieldy (but I was fairly new to the language when I did it). One reason I settled on functions is that my needs from the framework can change quite rapidly, and being able to just write new functions and pass them in is a relatively simple solution for someone of my abilities (Iām reasonably experienced with Julia now, but have essentially zero CS background, ie entirely self-taught, so there are some alarming gaps in my knowledge).
Often it does work out as you say - there is a small family of functions and I use many different parameterizations. In this case I have typically just generated an anonymous function for each parameterization I needed and passed it round. The limitation to this (for me anyway) is that I canāt utilize my trick of saving expensive transformations to the hard-drive, since I canāt generate any consistent naming conventions from anonymous functions. So my more recent solution to this was to generate fixed function names for the parameterizations that currently interest me via meta-programming.
In other cases, each transformation is its own expensive, individual function.
Thanks for taking the time to read this thread and think about my problem. Iāll look into functor objects (which Iād never heard of before - see aforementioned alarming gaps), and anything else you might recommend based on the above.
Honestly I donāt think Iām doing anything OO here. I actually only have a vague understanding of what OO is. I have zero CS background, and used Matlab before coming to Julia, which, based on my understanding, is about as far from an OO language as it is possible to get. Incidentally, as I mentioned in my (much longer) answer to Steven, my first attempt at the framework Iām trying to put together used only multiple dispatch, but it ended up being a little too unwieldy (admittedly I was a lot less experienced with Julia when I did it).
The current idea in my head still uses multiple dispatch quite a lot - all the functions in question are type-stable, and their output has the ability to be saved to a binary format on the SSD, based on the output type, and read in at a later time, again using known type information.
In case itās helpful, the general term for caching the arguments to a function (and only actually doing the computation for new inputs) is āmemoizationā. And, in fact, thereās a Julia package that can do it for you (in simple cases): https://github.com/simonster/Memoize.jl
As for functors, thatās just how we refer to a struct that stores some parameters and is itself callable like a function. Theyāre actually implemented exactly the same way as anonymous functions in Julia, except that we give them a name:
julia> struct Adder
x::Int
end
julia> function (a::Adder)(y) # this is how we make an `Adder` callable like a function
a.x + y
end
julia> a1 = Adder(1)
Adder(1)
julia> a1(2)
3
Storing functors like this instead of anonymous functions may make your code easier to reason about.
1 for FunctionWrappers und base. And i also strongly recommend trying this in your context.
It is often but not always true that one can replace an array of functions with functors or other āJulianā constructs. And even when one can - an array of functions can be very useful at times or just lead to clearer code.
That 's very clear, thank you. And the implementation of Memoize using a Dict is nice and neat too! (I like using things which I have some understanding of how they work internally).
I have absolutely no idea what you are applying this to, but I canāt think of anything in nature that works this way. The most common types of stochastic time series can be thought of as being generated by a ākernelā that simply takes the previous k values, i.e. \langle x_{n+1}\rangle =f(x_{n-k-1},\ldots,x_{n}), for a particularly simple but particularly important example see here. Iām pretty ignorant about chaotic systems but I think that in those cases one can typically consider a small ensemble of initial states and use them to make statistical inferences, which again seems very different than what you are describing. What you are describing sounds either like a āsuper chaoticā system in which you have essentially a completely distinct Wiener step being applied at each point (I canāt imagine where such a thing would occur, but perhaps Iām just not imaginative enough), or, more likely something that can be parametrized much more simply.
Perhaps Iām simply out of my depth here, but I find that often if something seems to call for a highly unorthodox implementation it is because Iām thinking about it in some convoluted way.
By the way, for what itās worth, and at the risk of getting to off-topic, the only case Iāve encountered in Julia in which I really had to do something like this where I store lots of functions in a container is in this simulator of a MOS 6502 CPU in which I have to be able to efficiently call arbitrary functions at run-time (essentially each time the simulated CPU processes an instruction). Even then, I could have used metaprogramming to ātrans-compileā 6502 machine instructions into Julia, which I sort of wound up doing post-hoc anyway. On the bright side, I do believe that my final result using function wrappers was efficient enough that I could use it to create a real NES emulator (which would have been totally impossible without that package).
I probably shouldnāt have said ātime-seriesā as I think Iāve sent you off on the wrong track. While many of the transformations Iām interested in are as you suggest (albeit multivariate and often asynchronous), what Iām actually trying to build at the moment is more general. Iām essentially trying to build a framework for myself that will facilitate the investigation of new ideas with minimal additional coding from me. This basically means being able to apply, and mix, new transformations to my pre-existing database, with the possibility of storing the output of some of these new transformations (the computationally expensive ones) in the database so they in turn can be arbitrarily called and transformed again down the track.
My background is in time-series, so I tend to think of everything as time-series, but in this framework, some of the ādataā that is stored will be non-numeric, eg strings, booleans, modelling parameters, actual models, etc, albeit it is all stored with an index type that is either DateTime, or some variant on DateTime, eg an interval of DateTime etc.
I guess part of the misunderstanding here is that I canāt really see what is unorthodox about my current approach to this problem. In my head, it is the most rational. Iāve have something, I want to use it as input to a transformation, possibly save the output for later use, or possibly pass it on to another transformation. Phrasing the framework for this as a bunch of type-stable functions seemed quite natural to me
Yes, in a way this is not that far off what I am attempting to do. Another way of thinking of it is that Iām really trying to write a nice concise language that is specified to exactly the kind of research that I do. However, without a huge investment of learning time on my part, I donāt really have the skills to implement something too fancy, or abstract. Hence this is why Iām focusing on something that (to me) seems relatively simple - a framework for passing transformations around to interact on data of known type, and sometimes store the output.
Just thought you might be interested to know: Iāve seen plenty of your and ExpandingManās comments on here before, and always thought highly of them. So I took a day off and came back with fresh eyes and now agree with you that passing functions around in structs was going to be a mistake. Iāve re-drafted the framework using dedicated transformation types and have come up with something that I think is a Pareto improvement in flexibility on what I had before (and now uses Juliaās dispatch system for everything). In summary, many thanks. You may have saved me some real grief down-the-line.