Understanding the limitations of generated functions

The docs says:

Generated functions must not mutate or observe any non-constant global state (including, for example, IO, locks, non-local dictionaries, or using hasmethod ). This means they can only read global constants, and cannot have any side effects. In other words, they must be completely pure.

I suspect that pools in the following example is a non-local dictionary, thus prohibited. But this example seems to work, and frankly, I cannot imagine how it can break (not counting future threaded-compilation issues), because pools is an append-only internal.

The example is a cache of vectors that is accessible with types as keys:

const pools = Dict{Type,Vector}()

function createpool(type::Type{T}) where T # Not exported
    pool = Vector{type}()
    pools[T] = pool
    return pool
end

@generated function getpool(type::Type{T}) where T
    if !haskey(pools, T)
        createpool(T)
    end
    retval = pools[T]
    return :($retval)
end

The trick allows compile-time cache resolution:

julia> @btime getpool(Dict{String,Symbol})
  1.124 ns (0 allocations: 0 bytes)
0-element Array{Dict{String,Symbol},1}

Maybe the docs is too strict? Could somebody help me better understand the limitations of @generated?

A supported alternative solution would also be great!

1 Like

Just don’t use a generated function. You can have a regular function which has the last line return retval.

1 Like

getpool is 66x slower then.

You should use an IdDict when the keys are types.

4 Likes

Thanks, I completely missed IdDict!

Interestingly, while it seems to give ~20% edge over a normal dict when the value type is concrete, it slows down the code enormously, as it somehow gets in trouble with Vector as its value type:

julia> const idd=IdDict{Type, Vector}( Int => [1], Float32 => [2], Vector{Int} => [3])
IdDict{Type,Array{T,1} where T} with 3 entries:
  Float32        => [2]
  Array{Int64,1} => [3]
  Int64          => [1]

julia> @btime idd[Vector{Int}]
  623.728 ns (2 allocations: 96 bytes)  #  41.926 ns (0 allocations: 0 bytes) for a normal Dict{Type, Vector}
1-element Array{Int64,1}:
 3

Please also let me note that - thanks to my ignorance - I found:

julia> const intvec_symbol = Symbol(string(Vector{Int}))
Symbol("Array{Int64,1}")

julia> const sd=Dict{Symbol,Vector}( :Int => [1], :Float32 => [2], intvec_symbol => [3])
Dict{Symbol,Array{T,1} where T} with 3 entries:
  :Float32                 => [2]
  Symbol("Array{Int64,1}") => [3]
  :Int                     => [1]

julia> @btime sd[intvec_symbol]
  7.324 ns (0 allocations: 0 bytes)
1-element Array{Int64,1}:
 3

Which allows the following implementation:

julia> const pools = Dict{Symbol, Vector}()
Dict{Symbol,Array{T,1} where T} with 0 entries

julia> function createpool(type::Type{T}) where T
           pool = Vector{type}()
           pools[Symbol(string(T))] = pool
           return pool
       end
createpool (generic function with 1 method)

julia> @generated function getpool(::Type{T}) where T
           s = Symbol(string(T))
           return quote
               if !haskey(pools, $(Base.Meta.quot(s)))
                   createpool(T)
               end
               return pools[$(Base.Meta.quot(s))]
           end
       end
getpool (generic function with 1 method)

julia> @btime getpool(Dict{String,Symbol})
  14.869 ns (0 allocations: 0 bytes)
0-element Array{Dict{String,Symbol},1}

This is my best one with only supported features used.

This could be the base of a fast type-keyed dict package, but I am not sure if anybody is interested in fast type-keyed dicts. What do you think?

No need for global mutable state or generated functions:

function getpool(::Type{T}) where T
    pool = T[]
    @eval getpool(::Type{$T}) = $pool
    return pool
end

This will probably be as fast as it gets:

julia> @btime getpool(Dict{String,Symbol})
  1.300 ns (0 allocations: 0 bytes)
0-element Array{Dict{String,Symbol},1}

I’m not sure if you should be measuring it using the Ref trick, though. I think it only gets to 1 ns, because the vector happens to be in the cache, when you repeatedly call it without interpolation, as above. With the Ref trick, the times are a bit worse:

julia> @btime getpool($(Ref(Dict{String,Symbol}))[])
  10.481 ns (0 allocations: 0 bytes)
0-element Array{Dict{String,Symbol},1}

Also, I don’t really think it is a good idea, but since you suggested it, here is a rough implementation of making a Dict that goes all-in on using the compiler and method table to make that really fast lookup:

struct CompilerDict{S} end

Base.getindex(::CompilerDict, ::Type{T}) where {T} = throw(KeyError(T))

function Base.setindex!(::CompilerDict{S}, value::V, ::Type{T}) where {S,V<:S,T}
    @eval Base.getindex(::CompilerDict{$S}, ::Type{$T}) = $value
    return value
end
4 Likes

If you don’t need to support using multiple different types of pools at the same time, you could do something like

struct Pool
    data::Vector{UInt8}
end
Pool() = Pool(UInt8[])
struct PoolWrapper{T} <: DenseVector{T}
    pool::Pool
end
Base.resize!(pw::PoolWrapper{T}, N) where {T} = (resize!(pw.pool.data, sizeof(T) * N); pw)
Base.length(pw::PoolWrapper{T}) where {T} = length(pw.pool.data) ÷ sizeof(T)
Base.size(pw::PoolWrapper) = (length(pw),)
function Base.unsafe_convert(::Type{Ptr{T}}, pw::PoolWrapper{T}) where {T}
    Base.unsafe_convert(Ptr{T}, pw.pool.data)
end
@inline function Base.getindex(pw::PoolWrapper, i::Integer)
    @boundscheck (0 < i ≤ length(pw)) || throw(BoundsError(pw, i))
    GC.@preserve pw begin
        v = unsafe_load(pointer(pw), i)
    end
    v
end
@inline function Base.setindex!(pw::PoolWrapper{T}, v, i::Integer) where {T}
    @boundscheck (0 < i ≤ length(pw)) || throw(BoundsError(pw, i))
    GC.@preserve pw begin
        unsafe_store!(pointer(pw), v, i)
    end
    v
end

I added GC.@preserves so that it’s hopefully safe, but it seems that they prevent vectorization.

1 Like

This one is crazy and generates the same code as my original version, thank you!

After thinking of it a bit I noticed that what you do is essentially a manual @generated, roughly equivalent to this:

@generated function getpool(::Type{T}) where T
    pool = T[]
    return :($pool)
end
2 Likes

I need multiple pools, so this one doesn’t help, but your code shows how much I still have to learn to master the performance tools of Julia. :slight_smile:

The issue with your IdDict example is that the value-type wasn’t concrete. Compare what happens with an IdDict{Type, Vector} vs an IdDict{Type, Vector{Int}}:

julia> const idd1=IdDict{Type, Vector}( Int => [1], Float32 => [2], Vector{Int} => [3])
IdDict{Type,Array{T,1} where T} with 3 entries:
  Array{Int64,1} => [3]
  Int64          => [1]
  Float32        => [2]

julia> const idd2=IdDict{Type, Vector{Int}}( Int => [1], Float32 => [2], Vector{Int} => [3])
IdDict{Type,Array{Int64,1}} with 3 entries:
  Array{Int64,1} => [3]
  Int64          => [1]
  Float32        => [2]

julia> using BenchmarkTools

julia> @btime idd1[Vector{Int}]
  659.919 ns (2 allocations: 96 bytes)
1-element Array{Int64,1}:
 3

julia> @btime idd2[Vector{Int}]
  10.268 ns (0 allocations: 0 bytes)
1-element Array{Int64,1}:
 3
1 Like

Yeah, I think that would do the same. :smile: But the docs say that

so I wouldn’t feel safe in the assumption that the @generated function wouldn’t suddenly reset the pool. In fact, if you want to reset/recompile the function in the method I suggested, you could manually invoke the generic function with invoke(getpool, Tuple{Type{<:Any}}, Dict{String,Symbol}).

EDIT:
The docs also say about generated functions:

It’s a bit funny that it says that, since it seems to work. Might be worth opening an issue to check if it is still the case, and this example is just a happy coincidence where it doesn’t cause a problem, or the docs need to be updated.

I think you have missed the sentence in my previous post, where I have written about the performance difference depending on concrete vs non-concrete value types of IdDicts.

When writing that post, I initially pasted both examples from my REPL, but then deleted the one with the concrete value type, as it is not applicable to the original problem, and the text already covered it. I thought that leaving it out will make the post more readable, but it seems that it made it more misreadable.

Sidenote: I see very different timing in the case of the concrete value type on OSX. I will check it on Linux tomorrow.

  | | |_| | | | (_| |  |  Version 1.4.2 (2020-05-23)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> const idd1=IdDict{Type, Vector}( Int => [1], Float32 => [2], Vector{Int} => [3])
IdDict{Type,Array{T,1} where T} with 3 entries:
  Float32        => [2]
  Array{Int64,1} => [3]
  Int64          => [1]

julia> const idd2=IdDict{Type, Vector{Int}}( Int => [1], Float32 => [2], Vector{Int} => [3])
IdDict{Type,Array{Int64,1}} with 3 entries:
  Float32        => [2]
  Array{Int64,1} => [3]
  Int64          => [1]

julia> using BenchmarkTools

julia> @btime idd1[Vector{Int}]
  648.213 ns (2 allocations: 96 bytes)
1-element Array{Int64,1}:
 3

julia> @btime idd2[Vector{Int}]
  33.921 ns (0 allocations: 0 bytes)
1-element Array{Int64,1}:
 3

It always seemed to me that eval() is a lower level tool than @generated, meaning that it is possible to implement @generated using eval, but not the other way. I never tried it, though.

But if somebody could do it, then the resulting solution hardly can be more evil than eval itself.

So it seems to me that this is an issue with the documentation. It says a lot of things about the limitations of generated functions, some of it is really hard to understand, like the one you quoted:

Will it really be generated multiple times with the same argument types? Why and when?

OTOH after searching for a minute I did not found anything about the potential problems/limitations of eval. It is not entirely clear to me, that replacing a method using @eval from inside of itself is safe. And what about using closures or generators while doing so?

You can fix that by using Any as the value-type.

The reasons I recommend this approach (and not generated functions) are amply illustrated by running this on your own machine:

julia> const pools = Dict{Type,Vector}()
Dict{Type,Array{T,1} where T}()

julia> function createpool(type::Type{T}) where T # Not exported
           pool = Vector{type}()
           pools[T] = pool
           return pool
       end
createpool (generic function with 1 method)

julia> @generated function getpool(type::Type{T}) where T
           if !haskey(pools, T)
               createpool(T)
           end
           retval = pools[T]
           return :($retval)
       end
getpool (generic function with 1 method)

julia> for i = 1:1000
           getpool(Val{i})                   # wait a loooong time due to all the compilation
       end

julia> const idd = IdDict{Any,Any}()
IdDict{Any,Any}()

julia> for i = 1:1000
           idd[Val{i}] = Val{i}[]                       # wait, but not very long
       end

julia> using BenchmarkTools

julia> function use_getpool(n)
           local v
           for j = 1:n
               v = getpool(Val{rand(1:1000)})
           end
           return v
       end
use_getpool (generic function with 1 method)

julia> function use_idd(n)
           local v
           for j = 1:n
               v = idd[Val{rand(1:1000)}]
           end
           return v
       end
use_idd (generic function with 1 method)

julia> @btime use_getpool(100)
  27.337 μs (31 allocations: 496 bytes)
Val{446}[]

julia> @btime use_idd(100)
  14.608 μs (30 allocations: 480 bytes)
Val{741}[]

If you’re indexing with types odds are that the types aren’t inferrable at each use site. In that case you’ll need to use runtime dispatch to figure out which compiled instance of getpool to call, and that’s slower than looking up a value in an IdDict.

Generated functions are good when you need them but best avoided except as a last resort. Among other issues they make your code hard to use in conjunction with Revise.

4 Likes

I accept your argument and I will check if IdDict{Any, Any} is better for my use case. (But I have to say that your example is an edge case, as changing 1000 to 10 almost vanishes the difference, and assuming that every dispatch is dynamic is not very lifelike)

The documentation of IdDict does not say much, and nothing about when to use it. I also wasn’t able to find much useful info by searching. Your first post:

is like a compiler warning. Could you please write a bit more about when should we use an IdDict?

Also, the fact that using UnionAll types instead of Any as a value type of an IdDict makes it slow could be also helpful for others, maybe worth including in the documentation. (I could make a PR). Is this specific to IdDict, or a recurring pattern among collections?

Thank you for your help!

I actually don’t know. My impression is that @generated goes down a different route that is more low-level, but I might be totally wrong. :sweat_smile: With eval, my understanding is that you’re effectively first compiling a method for the type you give it, then eval’ing a new method definition and thus invalidating the old compiled method. This leads to some terrible compiler overhead, since you end up having to compile a method on both the first and second call.

Without knowing your use case, I would be inclined to think that Tim is right, and you should probably just use an IdDict, if you’re calling it inside a tight inner loop where the types you’re calling getpool with isn’t known at compile time.

assuming that every dispatch is dynamic is not very lifelike

You know what the rest of your code looks like far better than we do, so I’ll accept that “lifelike” for you means that you’re not planning to loop over lists of objects with different types or other difficult-to-infer operations. Since you seem to like the generated function approach, you can do what you want with your own code but be aware that the downsides are:

  • your compile-time latency will be poor even if you get good runtime performance
  • you can’t empty!(pools) and start from scratch—getpool will return the old vector forever filled with whatever contents it had last. Likewise you can’t free the memory of each vector without calling empty!(getpool(T)) on every T that has ever been used.
  • direct manipulations of pools may or may not yield the same result as accessing via getpool. For example direct manipulations of pools may interact badly with precompilation due to the need to rehash the dict

Basically, that “prohibition” on mutation/observation is there because it’s the only way to ensure that the generated function will reproducibly do the same thing. Failing to observe it means that you may open yourself up to surprising behavior depending on the history of what you’ve done. For private code, only you can decide whether these concerns are relevant. If you really do want to use the generated function, I’d recommend a pattern

let pools = Dict{Type,Vector}()
    global getpool
    @generated ...
end

to prevent anyone from manipulating pools directly.

In part, you can think of an IdDict as a Dict with @nospecialize wrapped around all usage of keys. It’s specifically designed to handle the situation where the keys may be of diverse types, circumventing runtime dispatch. The other piece of it will only be clear if you know C: you can think of the objectid as a pointer, and that’s immediately available for types so looking up values is fast.

The reason that having a UnionAll type as the value-type makes it slow is presumably this line though I didn’t explicitly check. Type assertions are fast when the type being asserted is concrete, but when it’s non-concrete then it requires subtype analysis. Any gets special-cased.

3 Likes

You are right, I have checked it.

Yes, that is exactly what I am trying to do. I am a bit closer now, thank you!

2 Likes