I have a function where I pass a lot of arguments for performance. These generally come from other structs and are Ref’s or vectors. I have to pass these arguments around which is slightly bothersome. I thought I’d be clever and wrap them in a struct using a macro, and then write another macro to automatically define them within the functions that use them. I wrote the following macros:
macro createArgStruct(name, args...)
startstr = "struct $name{"
argstr = ""
for idx in eachindex(args)
argstr *= "\t$(args[idx])::T$idx\n"
startstr *= "T$idx,"
end
startstr = startstr[1:end-1]
startstr *= "}\n"
startstr *= argstr
startstr *= "end"
return esc(Meta.parse(startstr))
end
macro registerStructVars(varname, structname)
vars = quote end
objectname = string(varname)
for name in fieldnames(eval(structname))
push!(vars.args, Meta.parse("$name = $objectname.$name"))
end
return esc(vars)
end
Now the arguments are used in some while loop
const someRef = Ref(true)
@createArgStruct ArgStruct someRef arg1 arg2 arg3
createArgStruct(otherStruct) = #fill in the args
function mainAlgo(someRef, otherStruct, argStruct)
@registerStructVars argStruct ArgStruct
while someRef[] = true
#do function for some of the arguments
end
mainAlgo(someRef, OtherStruct, createArgStruct(OtherStruct) )
end
The idea of the main loop in this way is that the arguments can change the type of algorithm that is run, but I don’t want to check for changes within the loop for performance. Thus if something is changed, I break out of it and start a new loop with the updated variables (my actual program is a bit more involved, but this is a simple representation of it).
I thought this was a decent solution, as everything should be type stable and it cleans up my code a bit, but it turns out it makes the while loop just a bit slower. From everything I know about Julia I had thought the performance should be identical to passing everything manually, so I guess I don’t understand something fully. Any ideas?
It does, I need to be able to break out of the loop from outside because it’s an interactive simulation. Because of this I figured I cannot avoid having at least one check from outside. The reference access and comparison have don’t take a lot of time compared to the algorithm though, so I think the overall impact on performance is small (I tested quite some time ago, and have a vague memory of it having quite minimal impact). If there’s some other way though I’d be open to suggestions.
This seems weird and non-idiomatic to me — you’re basically using tail calls to write a loop in an imperative language (which may overflow the stack since Julia doesn’t do tail-call optimization). Why not simply write a second loop, for example:
function mainAlgo(someRef, otherStruct, argStruct)
a, b, c, d = argStruct # unpack the variables for convenience and mutation
while outerloop_condition
while someRef[]
#do function for some of the arguments
end
# update the arguments for the next outer iteration
end
end
(The someRef[] check confuses me, too. Are you thinking of running this loop asynchronously and having some other thread/task update someRef[] to control when mainAlgo terminates? That’s a pretty confusing control-flow structure. If not, why use a Ref argument?)
Compared to passing the arguments directly to the function, as I mentioned in my opening post. I just tried the suggestion and it does decrease performance quite a bit, as a rough estimate about 20% (I think just the same as passing the struct like I was doing before). I don’t understand why. From my understanding of Julia, I would’ve said there would be no difference.
Yes I’m running the loop on a thread and controlling from outside when it closes. After a lot of testing this is the only way I could find to keep the loop basically as fast as just running it directly and being able to “swap out” the algorithm in the while loop. I don’t see why this control-flow structure is confusing, can you explain to me what is confusing about it, and what would be a better option?
Also, thanks for your heads up on the tail calls, I didn’t know this was non-idiomatic. I haven’t run against stack overflows yet, though. A problem I do see is that if I do two loops I cannot swap out the algorithm with a different function and keep the same speed (since the function would become type unstable). The convenient part about doing it the way I was doing before is that I basically don’t have to worry about type instability. I can pass the algorithm as an argument to the mainAlgo and haven’t found it decreases performance in any way.
There should be no impact on performance either for the NamedTuple or the struct. It is likely that there is an issue with your benchmarking, since this is not trivial to do.
I understand your doubt, as I expressed before I also was expecting no performance difference. I’m quite certain it’s not my benchmarking that’s a problem, however. There’s three observations I made that all confirm my simulation is running slower.
First of all, already just from from looking at the simulation I can see it runs a bit slower (just a slight bit, but I feel it’s noticeable). Then, within the while loop I’m incrementing a ref to an int, which is read out and reset from another thread. This also shows less updates per second. Then lastly, I’m starting my simulation with a given seed, letting it run for any amount of time and then stopping it I can look at the resulting state of my simulation. I’m simulating Ising models which are actually quite predictable in their behavior given specific conditions. This looking at two images it’s very easy to see which one has had more updates than the other. Comparing it this way I also see that after a given amount of time for a particular seed, the simulation run with the named triple as argument is less far advanced.
Thus I’m quite confident that my simulation is just running slower, although probably the 20% I mentioned earlier is not very inaccurate, and maybe I shouldn’t give a concrete performance difference.
If that 20% is on the first run, it could just be compilation overhead. If it shows up on subsequent ones, likely there is a type instability in the struct/namedtuple handling which means the compiler can not generate optimal code for it. In that case, we’d need a MWE to investigate further.