Looking on the following code I found that a simple broadcast of getproperty is about two to three times slower than a simple for loop implementation:
struct MyStruct
x::Float64
y::Float64
z::Float64
end
#xyz = fill((x = 2.0, y = 3.0, z = 4.0), 50000)
xyz = fill(MyStruct(2.0, 3.0, 4.0), 50000)
function sl(xyz)
c = Vector{Float64}(undef, length(xyz))
for i in eachindex(c, xyz)
c[i] = xyz[i].x
end
c
end
function sl2(xyz)
getproperty.(xyz, :x)
end
display(@benchmark sl($xyz))
display(@benchmark sl2($xyz))
Output is:
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min β¦ max): 21.000 ΞΌs β¦ 266.773 ms β GC (min β¦ max): 0.00% β¦ 99.93%
Time (median): 98.800 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 168.594 ΞΌs Β± 4.606 ms β GC (mean Β± Ο): 47.31% Β± 1.73%
β β β
βββββββββββββββββββββββ ββββββββ β βββββββββββββββββββββββββββββ β
21 ΞΌs Histogram: frequency by time 208 ΞΌs <
Memory estimate: 390.69 KiB, allocs estimate: 3.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min β¦ max): 84.300 ΞΌs β¦ 293.450 ms β GC (min β¦ max): 0.00% β¦ 99.90%
Time (median): 211.100 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 282.448 ΞΌs Β± 4.864 ms β GC (mean Β± Ο): 29.80% Β± 1.73%
ββ ββββββ β βββ
βββββ β ββββββββββββββββββββββββββ βββββββββββββββββββββββββββββ β
84.3 ΞΌs Histogram: frequency by time 298 ΞΌs <
Memory estimate: 390.69 KiB, allocs estimate: 3.
With a vector of named tuples the broadcast is even slower, while the loop keeps about the same speed. The generated code of the broadcasting version is also extremely large for such a simple gathering copy.
This is with 1.11.5 on Windows and a x86_64 CPU.
Is this reproducible on other architectures?
Seems like coding βFORTRAN styleβ is still the best way to full performance in Julia, which is a bit sadβ¦
I donβt βlikeβ the loop, because its not a one-liner, but much more code. I didnβt try the comprehension, because in former times I had read some post here where comprehensions were the slowest possibility. But that seemingly has changed with newer Jula versionβ¦
Broadcasting f.(x), loops, comprehensions, and map(f, x) generally have the same performance β if they are type-stable.
In general, getproperty(x, prop) is not type-stable: if properties of x have different types, its return type depends on the value of prop, not just the type of prop (which is always Symbol). Sometimes, Julia is able to avoid this instability by constant-propagation (as in getproperty(x, :name)), but apparently constprop doesnβt propagate through broadcasting.
A performance advice is not to rely on constprop if you can avoid it. For example, here use either those comprehesions suggested above or map(i -> i.x, xyz).
Or, if accessing columns is common, use StructArrays.jl instead of regular arrays: there, extracting a column is not just cheap, but free.
As others have mentioned, the challenge here is generally that getproperty(something, :field) requires some serious constant propagation of :field in order to avoid dynamic-ness. Youβre running the value :x through some complicated machinery where itβs easy to lose track of its constant-ness. The simple solution is to move the :x βas close as possibleβ to its actual use. For example:
julia> getx(xyz) = xyz.x # or getproperty(xyz, :x), it doesn't matter anymore
getx (generic function with 1 method)
julia> function sl3(xyz)
getx.(xyz)
end
sl3 (generic function with 1 method)
julia> display(@benchmark sl3($xyz))
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min β¦ max): 14.083 ΞΌs β¦ 5.195 ms β GC (min β¦ max): 0.00% β¦ 98.65%
Time (median): 28.416 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 39.684 ΞΌs Β± 139.454 ΞΌs β GC (mean Β± Ο): 29.73% Β± 11.94%
β βββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββ βββ ββ ββ ββ β
14.1 ΞΌs Histogram: log(frequency) by time 476 ΞΌs <
Memory estimate: 416.06 KiB, allocs estimate: 3.
(for reference, sl has a minimum time of 23.3 Β΅s, and sl2 is 110 Β΅s)
The compile-time constant is not propagated through the broadcast, see #43333. As discussed in this thread and in that issue, the workaround is to use a function (or comprehension, etc) that βhard-codesβ the property instead. For example, (a -> a.x).(xyz).
Base.Fix2 does not help here. It creates a closure over the value of the second argument (noting only the captured argumentβs type in the type domain at compile time) and so has the same issue as the broadcasted property.
I donβt think the missing feature is constant propagation through higher order functions to their input functions, itβs that dot syntax canβt really control what inputs are hardcoded into the fused broadcast kernel versus being iterated. Currently itβs just keyword arguments going into the kernel (which is its own Issue #34737 because sometimes we would like to broadcast over those) and positional arguments being iterated (which we often opt out of by iterating over a Ref wrapper instead, but thatβs not the same as putting it right in the kernel). There isnβt an obvious motive or way to put values in the kernel either; as pointed out with Base.Fix2, storing a value in a parametric field of a callable (which isnβt iterated) can actually impede constant propagation, and capturing variables of inputs (in other cases, not here with :x) is semantically different and can cause an often fundamentally unfixable type instability (Issue #15276).
In theory, yes of course, a sufficiently smart compiler could absolutely do that. But if you read what Matt said:
As others have mentioned, the challenge here is generally that getproperty(something, :field) requires some serious constant propagation of :field in order to avoid dynamic-ness. Youβre running the value :x through some complicated machinery where itβs easy to lose track of its constant-ness.
heβs just saying that doing this sort of thing is generally hard for a compiler. At one point the compiler knows that x is const, but then it has to propagate that information through a bunch of layers of complicated functions and broadcast machinery, and at some point along the way, the compiler appears to have lost the information that :x was a constant value, so itβs no longer able to make the appropriate optimization.
Our compiler is generally getting smarter over time and getting better at doing constant propagation / concrete evaluation and all sorts of other tricks, but itβs still a real compiler that has real constraints and it doesnβt always perfectly figure everything out.