Why is broadcast of getproperty so slow?

Looking on the following code I found that a simple broadcast of getproperty is about two to three times slower than a simple for loop implementation:

struct MyStruct
    x::Float64
    y::Float64
    z::Float64
end

#xyz = fill((x = 2.0, y = 3.0, z = 4.0), 50000)
xyz = fill(MyStruct(2.0, 3.0, 4.0), 50000)

function sl(xyz)
    c = Vector{Float64}(undef, length(xyz))
    for i in eachindex(c, xyz)
        c[i] = xyz[i].x
    end
    c
end

function sl2(xyz)
    getproperty.(xyz, :x)
end

display(@benchmark sl($xyz))
display(@benchmark sl2($xyz))

Output is:

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):   21.000 ΞΌs … 266.773 ms  β”Š GC (min … max):  0.00% … 99.93%
 Time  (median):      98.800 ΞΌs               β”Š GC (median):     0.00%
 Time  (mean Β± Οƒ):   168.594 ΞΌs Β±   4.606 ms  β”Š GC (mean Β± Οƒ):  47.31% Β±  1.73%

    β–ˆ                      β–‚ ▁
  β–‚β–ƒβ–ˆβ–‡β–ƒβ–‚β–‚β–‚β–β–‚β–β–β–‚β–β–β–β–β–β–‚β–ƒβ–ƒβ–„β–…β–†β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–†β–…β–…β–„β–„β–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚ β–ƒ
  21 ΞΌs            Histogram: frequency by time          208 ΞΌs <

 Memory estimate: 390.69 KiB, allocs estimate: 3.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):   84.300 ΞΌs … 293.450 ms  β”Š GC (min … max):  0.00% … 99.90%
 Time  (median):     211.100 ΞΌs               β”Š GC (median):     0.00%
 Time  (mean Β± Οƒ):   282.448 ΞΌs Β±   4.864 ms  β”Š GC (mean Β± Οƒ):  29.80% Β±  1.73%

         β–‚β–…                       β–β–ˆβ–‡β–ƒβ–„β–…β–…β–„β–„β–‚
  β–‚β–‚β–ƒβ–‚β–…β–…β–‚β–ˆβ–ˆβ–‡β–„β–‚β–‚β–β–‚β–β–β–β–β–β–‚β–β–‚β–‚β–β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–…β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–†β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚ β–„
  84.3 ΞΌs          Histogram: frequency by time          298 ΞΌs <

 Memory estimate: 390.69 KiB, allocs estimate: 3.

With a vector of named tuples the broadcast is even slower, while the loop keeps about the same speed. The generated code of the broadcasting version is also extremely large for such a simple gathering copy.

This is with 1.11.5 on Windows and a x86_64 CPU.

Is this reproducible on other architectures?

Seems like coding β€œFORTRAN style” is still the best way to full performance in Julia, which is a bit sad…

2 Likes

I’m not sure what is the difference there, but this version seems to be even faster:

sl4(xyz) = [ el.x for el in xyz ]
julia> using Chairmarks

julia> @b sl4($xyz)
12.004 ΞΌs (3 allocs: 390.695 KiB)

julia> @b sl($xyz)
16.970 ΞΌs (3 allocs: 390.695 KiB)

And it is the broadcasting that it slow, not the getproperty itself:

julia> sl5(xyz) = [ getproperty(el, :x) for el in xyz ]
sl5 (generic function with 1 method)

julia> @b sl5($xyz)
12.024 ΞΌs (3 allocs: 390.695 KiB)

I can reproduce on 1.12-beta4, same OS and architecture.

If you don’t like the loop maybe a comprehension instead?

julia> function sl3(xyz)
           [i.x for i ∈ xyz]
       end

julia> @b sl2($test)
108.300 ΞΌs (3 allocs: 390.694 KiB)

julia> @b sl3($test)
25.800 ΞΌs (3 allocs: 390.694 KiB)

I don’t β€˜like’ the loop, because its not a one-liner, but much more code. I didn’t try the comprehension, because in former times I had read some post here where comprehensions were the slowest possibility. But that seemingly has changed with newer Jula version…

Broadcasting f.(x), loops, comprehensions, and map(f, x) generally have the same performance – if they are type-stable.

In general, getproperty(x, prop) is not type-stable: if properties of x have different types, its return type depends on the value of prop, not just the type of prop (which is always Symbol). Sometimes, Julia is able to avoid this instability by constant-propagation (as in getproperty(x, :name)), but apparently constprop doesn’t propagate through broadcasting.

A performance advice is not to rely on constprop if you can avoid it. For example, here use either those comprehesions suggested above or map(i -> i.x, xyz).
Or, if accessing columns is common, use StructArrays.jl instead of regular arrays: there, extracting a column is not just cheap, but free.

4 Likes

How about map(Base.Fix2(getproperty, :x), xyz)?

1 Like

As others have mentioned, the challenge here is generally that getproperty(something, :field) requires some serious constant propagation of :field in order to avoid dynamic-ness. You’re running the value :x through some complicated machinery where it’s easy to lose track of its constant-ness. The simple solution is to move the :x β€œas close as possible” to its actual use. For example:

julia> getx(xyz) = xyz.x # or getproperty(xyz, :x), it doesn't matter anymore
getx (generic function with 1 method)

julia> function sl3(xyz)
           getx.(xyz)
       end
sl3 (generic function with 1 method)

julia> display(@benchmark sl3($xyz))
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  14.083 ΞΌs …   5.195 ms  β”Š GC (min … max):  0.00% … 98.65%
 Time  (median):     28.416 ΞΌs               β”Š GC (median):     0.00%
 Time  (mean Β± Οƒ):   39.684 ΞΌs Β± 139.454 ΞΌs  β”Š GC (mean Β± Οƒ):  29.73% Β± 11.94%

  β–…β–ˆβ–†β–‚                                                         ▁
  β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–„β–„β–ƒβ–ƒβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ƒβ–…β–β–ƒβ–…β–ƒβ–…β–„β–…β–†β–‡ β–ˆ
  14.1 ΞΌs       Histogram: log(frequency) by time       476 ΞΌs <

 Memory estimate: 416.06 KiB, allocs estimate: 3.

(for reference, sl has a minimum time of 23.3 Β΅s, and sl2 is 110 Β΅s)

6 Likes

The compile-time constant is not propagated through the broadcast, see #43333. As discussed in this thread and in that issue, the workaround is to use a function (or comprehension, etc) that β€œhard-codes” the property instead. For example, (a -> a.x).(xyz).

Base.Fix2 does not help here. It creates a closure over the value of the second argument (noting only the captured argument’s type in the type domain at compile time) and so has the same issue as the broadcasted property.

9 Likes

I don’t think the missing feature is constant propagation through higher order functions to their input functions, it’s that dot syntax can’t really control what inputs are hardcoded into the fused broadcast kernel versus being iterated. Currently it’s just keyword arguments going into the kernel (which is its own Issue #34737 because sometimes we would like to broadcast over those) and positional arguments being iterated (which we often opt out of by iterating over a Ref wrapper instead, but that’s not the same as putting it right in the kernel). There isn’t an obvious motive or way to put values in the kernel either; as pointed out with Base.Fix2, storing a value in a parametric field of a callable (which isn’t iterated) can actually impede constant propagation, and capturing variables of inputs (in other cases, not here with :x) is semantically different and can cause an often fundamentally unfixable type instability (Issue #15276).

Couldn’t the compiler do things like these automatically?

In theory, yes of course, a sufficiently smart compiler could absolutely do that. But if you read what Matt said:

As others have mentioned, the challenge here is generally that getproperty(something, :field) requires some serious constant propagation of :field in order to avoid dynamic-ness. You’re running the value :x through some complicated machinery where it’s easy to lose track of its constant-ness.

he’s just saying that doing this sort of thing is generally hard for a compiler. At one point the compiler knows that x is const, but then it has to propagate that information through a bunch of layers of complicated functions and broadcast machinery, and at some point along the way, the compiler appears to have lost the information that :x was a constant value, so it’s no longer able to make the appropriate optimization.

Our compiler is generally getting smarter over time and getting better at doing constant propagation / concrete evaluation and all sorts of other tricks, but it’s still a real compiler that has real constraints and it doesn’t always perfectly figure everything out.

2 Likes

I wonder whether we can have a lint feature to at least notify people the code (and other cases) could be slow due to such type instability?

This case isn’t actually type unstable, since getproperty(::MyStruct, ::Symbol)::Float64. It’s just more branch-ey.

1 Like