Why is broadcast of getproperty so slow?

mwlidar · June 17, 2025, 12:27pm

Looking on the following code I found that a simple broadcast of getproperty is about two to three times slower than a simple for loop implementation:

struct MyStruct
    x::Float64
    y::Float64
    z::Float64
end

#xyz = fill((x = 2.0, y = 3.0, z = 4.0), 50000)
xyz = fill(MyStruct(2.0, 3.0, 4.0), 50000)

function sl(xyz)
    c = Vector{Float64}(undef, length(xyz))
    for i in eachindex(c, xyz)
        c[i] = xyz[i].x
    end
    c
end

function sl2(xyz)
    getproperty.(xyz, :x)
end

display(@benchmark sl($xyz))
display(@benchmark sl2($xyz))

Output is:

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):   21.000 μs … 266.773 ms  ┊ GC (min … max):  0.00% … 99.93%
 Time  (median):      98.800 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   168.594 μs ±   4.606 ms  ┊ GC (mean ± σ):  47.31% ±  1.73%

    █                      ▂ ▁
  ▂▃█▇▃▂▂▂▁▂▁▁▂▁▁▁▁▁▂▃▃▄▅▆████▇▆▅▅▄▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  21 μs            Histogram: frequency by time          208 μs <

 Memory estimate: 390.69 KiB, allocs estimate: 3.
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):   84.300 μs … 293.450 ms  ┊ GC (min … max):  0.00% … 99.90%
 Time  (median):     211.100 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   282.448 μs ±   4.864 ms  ┊ GC (mean ± σ):  29.80% ±  1.73%

         ▂▅                       ▁█▇▃▄▅▅▄▄▂
  ▂▂▃▂▅▅▂██▇▄▂▂▁▂▁▁▁▁▁▂▁▂▂▁▂▂▂▂▃▃▅███████████▇▆▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂ ▄
  84.3 μs          Histogram: frequency by time          298 μs <

 Memory estimate: 390.69 KiB, allocs estimate: 3.

With a vector of named tuples the broadcast is even slower, while the loop keeps about the same speed. The generated code of the broadcasting version is also extremely large for such a simple gathering copy.

This is with 1.11.5 on Windows and a x86_64 CPU.

Is this reproducible on other architectures?

Seems like coding “FORTRAN style” is still the best way to full performance in Julia, which is a bit sad…

lmiq · June 17, 2025, 12:50pm

I’m not sure what is the difference there, but this version seems to be even faster:

sl4(xyz) = [ el.x for el in xyz ]

julia> using Chairmarks

julia> @b sl4($xyz)
12.004 μs (3 allocs: 390.695 KiB)

julia> @b sl($xyz)
16.970 μs (3 allocs: 390.695 KiB)

And it is the broadcasting that it slow, not the getproperty itself:

julia> sl5(xyz) = [ getproperty(el, :x) for el in xyz ]
sl5 (generic function with 1 method)

julia> @b sl5($xyz)
12.024 μs (3 allocs: 390.695 KiB)

nilshg · June 17, 2025, 12:52pm

I can reproduce on 1.12-beta4, same OS and architecture.

If you don’t like the loop maybe a comprehension instead?

julia> function sl3(xyz)
           [i.x for i ∈ xyz]
       end

julia> @b sl2($test)
108.300 μs (3 allocs: 390.694 KiB)

julia> @b sl3($test)
25.800 μs (3 allocs: 390.694 KiB)

mwlidar · June 17, 2025, 1:13pm

I don’t ‘like’ the loop, because its not a one-liner, but much more code. I didn’t try the comprehension, because in former times I had read some post here where comprehensions were the slowest possibility. But that seemingly has changed with newer Jula version…

aplavin · June 17, 2025, 1:16pm

Broadcasting f.(x), loops, comprehensions, and map(f, x) generally have the same performance – if they are type-stable.

In general, getproperty(x, prop) is not type-stable: if properties of x have different types, its return type depends on the value of prop, not just the type of prop (which is always Symbol). Sometimes, Julia is able to avoid this instability by constant-propagation (as in getproperty(x, :name)), but apparently constprop doesn’t propagate through broadcasting.

A performance advice is not to rely on constprop if you can avoid it. For example, here use either those comprehesions suggested above or map(i -> i.x, xyz).
Or, if accessing columns is common, use StructArrays.jl instead of regular arrays: there, extracting a column is not just cheap, but free.

gdalle · June 17, 2025, 1:16pm

How about map(Base.Fix2(getproperty, :x), xyz)?

mbauman · June 17, 2025, 1:25pm

As others have mentioned, the challenge here is generally that getproperty(something, :field) requires some serious constant propagation of :field in order to avoid dynamic-ness. You’re running the value :x through some complicated machinery where it’s easy to lose track of its constant-ness. The simple solution is to move the :x “as close as possible” to its actual use. For example:

julia> getx(xyz) = xyz.x # or getproperty(xyz, :x), it doesn't matter anymore
getx (generic function with 1 method)

julia> function sl3(xyz)
           getx.(xyz)
       end
sl3 (generic function with 1 method)

julia> display(@benchmark sl3($xyz))
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  14.083 μs …   5.195 ms  ┊ GC (min … max):  0.00% … 98.65%
 Time  (median):     28.416 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   39.684 μs ± 139.454 μs  ┊ GC (mean ± σ):  29.73% ± 11.94%

  ▅█▆▂                                                         ▁
  ████▇▄▄▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▅▁▃▅▃▅▄▅▆▇ █
  14.1 μs       Histogram: log(frequency) by time       476 μs <

 Memory estimate: 416.06 KiB, allocs estimate: 3.

(for reference, sl has a minimum time of 23.3 µs, and sl2 is 110 µs)

mikmoore · June 17, 2025, 2:47pm

The compile-time constant is not propagated through the broadcast, see #43333. As discussed in this thread and in that issue, the workaround is to use a function (or comprehension, etc) that “hard-codes” the property instead. For example, (a -> a.x).(xyz).

Base.Fix2 does not help here. It creates a closure over the value of the second argument (noting only the captured argument’s type ~~in the type domain~~ at compile time) and so has the same issue as the broadcasted property.

Benny · June 17, 2025, 9:59pm

I don’t think the missing feature is constant propagation through higher order functions to their input functions, it’s that dot syntax can’t really control what inputs are hardcoded into the fused broadcast kernel versus being iterated. Currently it’s just keyword arguments going into the kernel (which is its own Issue #34737 because sometimes we would like to broadcast over those) and positional arguments being iterated (which we often opt out of by iterating over a Ref wrapper instead, but that’s not the same as putting it right in the kernel). There isn’t an obvious motive or way to put values in the kernel either; as pointed out with Base.Fix2, storing a value in a parametric field of a callable (which isn’t iterated) can actually impede constant propagation, and capturing variables of inputs (in other cases, not here with :x) is semantically different and can cause an often fundamentally unfixable type instability (Issue #15276).

e3c6 · June 18, 2025, 7:21am

Couldn’t the compiler do things like these automatically?

Mason · June 18, 2025, 9:43am

In theory, yes of course, a sufficiently smart compiler could absolutely do that. But if you read what Matt said:

As others have mentioned, the challenge here is generally that getproperty(something, :field) requires some serious constant propagation of :field in order to avoid dynamic-ness. You’re running the value :x through some complicated machinery where it’s easy to lose track of its constant-ness.

he’s just saying that doing this sort of thing is generally hard for a compiler. At one point the compiler knows that x is const, but then it has to propagate that information through a bunch of layers of complicated functions and broadcast machinery, and at some point along the way, the compiler appears to have lost the information that :x was a constant value, so it’s no longer able to make the appropriate optimization.

Our compiler is generally getting smarter over time and getting better at doing constant propagation / concrete evaluation and all sorts of other tricks, but it’s still a real compiler that has real constraints and it doesn’t always perfectly figure everything out.

liuyxpp · June 18, 2025, 3:14pm

I wonder whether we can have a lint feature to at least notify people the code (and other cases) could be slow due to such type instability?

Mason · June 18, 2025, 3:32pm

This case isn’t actually type unstable, since getproperty(::MyStruct, ::Symbol)::Float64. It’s just more branch-ey.

aryavorskiy · July 13, 2025, 5:53pm

Broadcasting treats any object as a scalar (i. e. not broadcasting over its axes) if it is wrapped into a Ref. This is how Symbols were made scalars in the first place. Since references are dynamic by nature, we can say goodbye to constant propagation and also spend time looking up the reference on each iteration.

A hacky fix would be to store the symbol as a type parameter, so const propagation can take it into account:

struct BroadcastConstContainer{T} end
Base.Broadcast.BroadcastStyle(::Type{<:BroadcastConstContainer}) = Base.Broadcast.DefaultArrayStyle{0}()
Base.size(::BroadcastConstContainer) = (1,)
Base.getindex(::BroadcastConstContainer{T}, I) where T = T
Base.Broadcast.broadcastable(sym::Symbol) = BroadcastConstContainer{sym}()

After that performance of sl2 even surpasses sl and gets closer to that of sl4 and similar.
Also note that nothing, missing and some more simple types are also using Ref with broadcast under the hood, which can also lead to performance regressions. Maybe this should be implemented into base Julia in some way?..

[UPD. Read Benny’s message above. This is almost the same, but the object is not callable — does it fix the problem?]

Topic		Replies	Views
Overloading `Base.getproperty` performance Performance question , performance	2	616	January 29, 2021
Unexpected type instability with getproperty but not setproperty! Performance	17	1605	February 4, 2021
Getproperty optimization in struct Performance	1	384	May 22, 2020
Overloading `getproperty` in a type-stable manner General Usage question	4	2059	March 28, 2020
Is it possible to broadcast the property of a structure? General Usage broadcast , struct	2	364	March 28, 2022

Why is broadcast of getproperty so slow?

Related topics