Based on reading this discussion and having to write a large script (I prefer script files), I find that using functions for “for loops” is a pain in my application. Since the matrices I’m dealing with are very large (~ 9000 x 10000), I have declared them as const and used mutation/vectorization to change its values inside for loops (I am NOT iterating over rows/cols). I was wondering if this approach is good enough or should I enclose the script in a “let block” to simulate local scoping behavior for speed. In other words, can let blocks be used instead of functions without speed penalty? Any advice much appreciated.
Functions are the way to organize behavior in Julia (and most other languages). It would be worth writing your code in functions even if it didn’t improve performance–the fact that it also improves performance is just a bonus on top of the modularity, ease of understanding, and composability that functions provide.
It’s worth getting used to writing and using functions.
But to directly answer your question, as long as all of your global variables are const
, then a let
block is not necessary for performance, nor is a function. All of the other benefits of functions still apply, though.
The const
annotation means that the binding is constant, not that the value is immutable:
julia> const v = [1, 2, 3]
3-element Array{Int64,1}:
1
2
3
julia> v[1] = -5
-5
julia> v
3-element Array{Int64,1}:
-5
2
3
If you use a let
block, then you don’t need const
, right? As long as you pass all necessary variables inside the let
block, which you should do for readability anyways.
Thank you. I realized that and so was mutating const arrays. I also followed advice and enclosed the “for loops” in a function. Strangely, when compared to my earlier attempt where I had not declared anything as const and used “global” declaration inside the for loops (without enclosing it in function), my script ran faster (41 mins vs 48 mins). Well, I will take a closer look to understand why. This is an application where tree species are migrating opportunistically in the landscape via an inverse square law kernel using convolution and FFT to speed up the migration.
True, but at that point you’ve done all the work that would have been necessary to write a function, but you haven’t gained the modularity and composability benefits of having a function. It seems like a bad trade-off compared to writing a proper function.
My earlier sentence construction was misleading. Sorry! - here is what I meant to say: *Strangely, compared to my earlier attempt where I had not declared anything as const and used “global” declaration inside the for loops (without enclosing it in function), my improved script ran slower (41 vs 48 mins) *
Puzzling - but, will try to probe
Also, OP, check out Infiltrator.jl
for help converting your script to functions. It will let you “enter” the scope of your function so you can play around with the variable defined only in that scope, simulating the fact that in a script you always have access to all the variables.
Also I think that at this stage, with Julia 1.5 around the corner, it is worth to mention that soon it will be possible to use loops with “soft scope”, thus saving the need to enclose them in functions or let
blocks, or having to explicitly declare as global
the “outer” variables that are also assigned inside the loop:
https://docs.julialang.org/en/v1.5-dev/manual/variables-and-scoping/
(EDIT: of course, the advice of using functions or let
blocks to avoid unnecessary globals will remain to be fully valid.)
(Maybe it will be necessary to prepare a sticky post to be published at the time of the 1.5 release, telling something like “if you encounter an UndefVar
error or another scoping issue with loops, please check that you are using Julia 1.5+, and if not, try again with the latest version before reporting”)
That’s good to know. Even though the rules for optimizing code for speed are fairly straightforward, putting that to practice for those who don’t code regularly in Julia (or those not initiated into programming/computer-science concepts) is surprisingly hard. This is because along with these rules you have other considerations: vectorization/mutation, type stability, passing by reference issues and so on. I know this is the price to pay for speed, and probably well worth it once users start to get into the Julia-mold. Unfortunately I can’t shake off R and Python for a long time because of such well developed libraries/applications. Hence the confusion.
Only true in the REPL. In a script, you still need a global
annotation to modify a global in a loop.
Here is a script that I cooked up to test the speed of “for loops” for different scenarios. Tested it via a script file:
julia tstSpeed.jl
x = rand(1.0:100.0, 6940, 7440);
y = rand(1.0:10.0, 6940, 7440);
xc = copy(x)
yc = copy(y)
@time for i in 1:100 # First time output
global x
for j in 1:4
x = x + y
x[x .> 50] .= -1
end
end
# NOTE: x gets mutated
# Tuck for loop in a function
function forLoop(x,y)
@time for i in 1:100 # Second time output
for j in 1:4
x = x + y
x[x .> 50] .= -1
end
end
return x
end
@time xout = forLoop(xc, yc) # Third time output
# Test the let block
let
x = copy(xc)
y = copy(yc)
@time for i in 1:100 # # Fourth time output
for j in 1:4
x = x + y
x[x .> 50] .= -1
end
end
end
Here is the typical output I got (on average):
129.479405 seconds (2.02 M allocations: 171.761 GiB, 10.85% gc time)
123.134792 seconds (5.20 k allocations: 171.668 GiB, 11.19% gc time)
123.361243 seconds (763.66 k allocations: 171.700 GiB, 11.17% gc time)
124.004760 seconds (6.80 k allocations: 171.668 GiB, 11.12% gc time)
Hope this is a good test - any advice or insights?
Thanks!!