Accessing fields within a generated function

Elrod · June 23, 2017, 3:53am

Hi,
I would like to have generated functions that work with – and can access properties of – fields of types while compiling.

Specifically,
I wrote a lot of “parameter container” types like “Covariance Matrix” and “Positive Vector” that have constraints.
What I want is for someone to be able to build a model, and say

struct MyModelsParameters{T} <: Parameters{T}
    Σ::CovarianceMatrix{3, T}
    Θ::ProbabilityVector{4, T}
    μ::MVector{2, T}
end

and have functions defined on the abstract type “Parameters”:

@generated function Base.size(A::Parameters)
    ...
end
@generated function Base.getindex(A::Parameters, i::Int)
    ...
end
@generated function Base.setindex!(A::Parameters, v::Real, i::Int)
    ...
end

such that for an object of type MyModelsParameters, we have

julia> isa(myparams, MyModelsParameters)
true
julia> size(myparams)
9

That is, I would like to have size add 3+4+2 = 9.
Similarly for get/setindex, I want them to compile to something like

function Base.getindex(A::MyModelsParameters, i::Int)
    if i < 3
        A.Σ[i]
    elseif i < 7
        A.Θ[i - 3]
    else
        A.μ[i - 7]
    end
end

The parameter vector would be indexed many thousands of times, so performance is a major concern. But so is ease / intuition of use; having to specify 3 for the length of a 2x2 covariance matrix is pushing it.

I confess, I am new to meta-programming and don’t have the firmest of grasps on many performance issues, so I am not even sure to what extant an implementation like the above would improve performance vs something more like MultiScaleArrays.

EDIT: Now reading Reflection and introspection
I think this makes things clear. This thread can be closed.

EDIT:
I could add enough code for a working example, but

@generated function Base.size{T <: Parameters}(A::T)
  p = 0
  for i ∈ T.types
    p += length(i)
  end
  return :($p)
end

Is not doing what I thought it was.
It does return the correct answer, but is 300 times slower than

return_seven(a) = 7
@benchmark size(po)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     15.604 ns (0.00% GC)
  median time:      15.863 ns (0.00% GC)
  mean time:        16.240 ns (0.00% GC)
  maximum time:     35.409 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

@benchmark return_seven(po)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     0.054 ns (0.00% GC)
  median time:      0.054 ns (0.00% GC)
  mean time:        0.054 ns (0.00% GC)
  maximum time:     0.078 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

I had thought the two functions should compile into the same thing.

Also

@code_warntype return_seven(po)
Variables:
  #self#::#return_seven
  a::Any

Body:
  begin 
      return 7
  end::Int64

@code_warntype size(po)
Variables:
  #temp#@_1
  #temp#@_2

Body:
  begin 
      return $(QuoteNode(7))
  end::Int64

Hmm.

ChrisRackauckas · June 23, 2017, 7:53am

You can use getindex(getfield(A,fieldnames(A)[j]),i) to generate this function.

for j in 1:3 
  fn = fieldnames(A)[j]
  p = fn.parameters[1]
  quote 
    f = getfield($(esc(A)),$(esc(fn))
    @inbounds f[i - $p]
  end
end

Then of course put those together in the control flow statement. I believe that should get you something where the getfield will be type-inferred since fn will be known at compile-time, and i - number will inline the constant. Of course, check the compiled code.

ChrisRackauckas · June 23, 2017, 8:02am

The performance can be improved from MultiScaleArrays since it uses bisection and dynamic arrays in there, but I was surprised at well MultiScaleArrays actually does:

github.com/SciML/MultiScaleArrays.jl

MultiScaleModels Profiling

opened 04:51AM - 21 Jan 17 UTC

closed 04:51AM - 21 Jan 17 UTC

ChrisRackauckas

I optimized the ODE solvers around these guys a bit. As a test case, I took a li…near system on the embryo from the tests. The embryo is 4 layers and a total length of 13, and so it uses very small leaf nodes. This means that, with a cheap problem and a high hierarchy, it should accentuate any differences and give an upper bound on the cost of using a MultiScaleModel. I then benchmarked the same problem solved with MultiScaleModels and then just having it be a regular length 13 contiguous array: ```julia prob = ODEProblem(f,em,(0.0,1500.0)) @benchmark sol1 = solve(prob,Tsit5(),save_timeseries=false) prob = ODEProblem(f,em[:],(0.0,1500.0)) @benchmark sol2 = solve(prob,Tsit5(),save_timeseries=false) ``` to get the results: ``` BenchmarkTools.Trial: memory estimate: 230.28 kb allocs estimate: 10920 -------------- minimum time: 16.469 ms (0.00% GC) median time: 16.942 ms (0.00% GC) mean time: 17.033 ms (0.46% GC) maximum time: 26.385 ms (33.77% GC) -------------- samples: 294 evals/sample: 1 time tolerance: 5.00% memory tolerance: 1.00% BenchmarkTools.Trial: memory estimate: 196.37 kb allocs estimate: 10409 -------------- minimum time: 7.500 ms (0.00% GC) median time: 7.846 ms (0.00% GC) mean time: 7.907 ms (0.51% GC) maximum time: 13.730 ms (42.01% GC) -------------- samples: 632 evals/sample: 1 time tolerance: 5.00% memory tolerance: 1.00% ``` This puts an upper bound for the cost of using MultiScaleModels at just over 2x. But this is without saving: saving the timeseries is more costly with a MMM. Measuring the cost with saving every step: ```julia prob = ODEProblem(f,em,(0.0,1500.0)) @benchmark sol1 = solve(prob,Tsit5()) prob = ODEProblem(f,em[:],(0.0,1500.0)) @benchmark sol2 = solve(prob,Tsit5()) ``` ``` BenchmarkTools.Trial: memory estimate: 3.20 mb allocs estimate: 43891 -------------- minimum time: 26.454 ms (0.00% GC) median time: 27.738 ms (0.00% GC) mean time: 28.962 ms (4.21% GC) maximum time: 39.190 ms (27.02% GC) -------------- samples: 173 evals/sample: 1 time tolerance: 5.00% memory tolerance: 1.00% BenchmarkTools.Trial: memory estimate: 1.52 mb allocs estimate: 21060 -------------- minimum time: 9.414 ms (0.00% GC) median time: 9.910 ms (0.00% GC) mean time: 10.252 ms (3.11% GC) maximum time: 18.033 ms (35.72% GC) -------------- samples: 488 evals/sample: 1 time tolerance: 5.00% memory tolerance: 1.00% ``` This puts an upper bound around 3x. So the maximal cost of the abstraction is about 2x-3x for ODEs. It's actually lower for SDEs and DDEs because more of the calculation in those domains are spent on the interpolation and noise generation, which are actually able to mostly avoid the costs of MMMs. So the final product is something where the abstract cost is less than the performance difference between OrdinaryDiffEq and other packages, meaning using MMMs in OrdinaryDiffEq should still be slightly faster than using other packages with contiguous arrays. I think this counts as well optimized, and so my goal will be to now make this all compatible with the solvers for stiff equations since will make a large impact. @zahachtah I think you might be interested in these results.

I think that in many cases, the speed ups will come through avoiding indexing though. For these “types which hold arrays to build an array”, you can specialize broadcasting like:

github.com

SciML/RecursiveArrayTools.jl/blob/master/src/array_partition.jl#L109


      
          end
          
          ## ones
          
          # special to work with units
          function Base.ones(A::ArrayPartition)
              N = npartitions(A)
              out = similar(A)
              for i in 1:N
                  fill!(out.x[i], oneunit(eltype(out.x[i])))
              end
              out
          end
          
          # ignore dims since array partitions are vectors
          Base.ones(A::ArrayPartition, dims::NTuple{N, Int}) where {N} = ones(A)
          
          # mutable iff all components of ArrayPartition are mutable
          @generated function ArrayInterface.ismutable(::Type{<:ArrayPartition{T, S}}) where {T, S
                                                                                              }
              res = all(ArrayInterface.ismutable, S.parameters)

and then it’s just broadcasts on the component arrays. which can be quite efficient.

Another thing to look at is just making it a CatView. CatViews.jl has a few tricks to make the iterator fast, and likely does some other performance tricks as well.

Elrod · June 23, 2017, 8:57am

I’m getting up in <6 hours, so I’m headed to bed shortly, but –

Thanks for the response ChrisRackauckas!
Your MultiScaleArrays package was the inspiration behind the parameter vectors, and one of the intentions was to pass the parameter object to Optim.

A modification of your indexing solution definitely looks like it will work.

CatView looks great.

About broadcasting—
I am doing some operations where this would be useful.
It must also be passed through Optim (although Optim doesn’t support this yet), and have autodiff find both first and second derivatives. I still have to dig into the code to see everything those entail.

The entire parameter vector must then be set equal to new values from a grid of points.
That means, all the values have to be re-assigned many thousands of times.

Maybe there is a much better way I can handle that – perhaps, I can instead work on viewing a vector (ie, one set of coordinates on the grid) as the parameters of whatever shape users specify?
If I try to do something like that, I still want a user to only have to somehow list the parameter objects they are using.
Perhaps they could define a parameter block in the same way, and pass it through a function that instantiates a model object containing a pre-allocated vector (–although I’m unsure if there is actually any need for that–), but flexibly allow the smaller parameter objects to project views onto any vector (ie, those of the grid).
I still have a lot to work out but that could then let it just pass a regular vector to Optim et al.

cstjean · June 23, 2017, 4:40pm

I tried to reproduce your QuoteNode problem, but as of Julia 0.6, @code_warntype showed return 7. Could you provide a MWE? In any case, you don’t have to write return :($p); just write return p. Generated functions can output constants. Another factor is that the return type of size(po) is unknown if the type of po is not known, but in a type-stable situation, I assume that you would get the same timing results as with return_seven. Try comparing @benchmark size($po) with @benchmark return_seven($po). The dollar sign means that the value will be spliced in, hence Julia will know which size function to call (this is the type-stable situation).

Elrod · June 25, 2017, 12:49am

Reproducible example:

julia> abstract type Parameters{T} <: AbstractArray{T,1} end

julia> abstract type ConstrainedParameters{p,T} <: Parameters{T} end

julia> abstract type ConstrainedVector{p,T} <: ConstrainedParameters{p,T} end

julia> @generated function Base.show{T <: Parameters}(io::IO, ::MIME"text/plain", Θ::T)
         quote
           for j in fieldnames(T)
             println(getfield(Θ, j))
           end
         end
       end

julia> Base.IndexStyle(::Parameters) = IndexLinear()

julia> struct PositiveVector{p, T} <: ConstrainedVector{p, T}
         Θ::Vector{T}
         x::Vector{T}
       end

julia> PositiveVector{T}(x::Vector{T}) = PositiveVector{length(x), T}(log.(x), x)
PositiveVector

julia> Base.length{p,T}(::Type{PositiveVector{p,T}}) = p

julia> Base.getindex(x::ConstrainedVector, i::Int) = x.x[i]

julia> Base.show(io::IO, ::MIME"text/plain", Θ::ConstrainedVector) = print(io, Θ.x)

julia> Base.size(x::ConstrainedVector) = size(x.x)

julia> struct param{T} <: Parameters{T}
         pv1::PositiveVector{3,T}
         pv2::PositiveVector{4,T}
       end

julia> @generated function Base.size{T <: Parameters}(A::T)
         p = 0
         for i ∈ T.types
           p += length(i)
         end
         (p, )
       end

julia> po = param(PositiveVector(rand(3)), PositiveVector(rand(4)))
[0.582707, 0.0199256, 0.439869]
[0.372666, 0.168147, 0.204537, 0.224757]


julia> size(po)
(7,)

julia> @code_warntype size(po)
Variables:
  #temp#@_1
  #temp#@_2

Body:
  begin 
      return $(QuoteNode((7,)))
  end::Tuple{Int64}

Oddly, if we rerun the exact same definition for the size method, we get a different result

julia> @generated function Base.size{T <: Parameters}(A::T)
         p = 0
         for i ∈ T.types
           p += length(i)
         end
         (p, )
       end

julia> @code_warntype size(po)
Variables:
  #self#::Base.#size
  A::Any

Body:
  begin  # line 2:
      return (7,)
  end::Tuple{Int64}

As for running @benchmark size($po), you predicted correctly:

julia> using BenchmarkTools

julia> @benchmark size(po)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     15.628 ns (0.00% GC)
  median time:      15.837 ns (0.00% GC)
  mean time:        16.879 ns (0.00% GC)
  maximum time:     34.214 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

julia> @benchmark size($po)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     0.053 ns (0.00% GC)
  median time:      0.054 ns (0.00% GC)
  mean time:        0.054 ns (0.00% GC)
  maximum time:     0.118 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> return_seven(A) = 7;

julia> @benchmark return_seven(po)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     0.053 ns (0.00% GC)
  median time:      0.054 ns (0.00% GC)
  mean time:        0.056 ns (0.00% GC)
  maximum time:     1.795 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

I think the following helps contextualize the above:

julia> a = randn(7);

julia> @benchmark size(a)
@bencBenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     24.832 ns (0.00% GC)
  median time:      25.506 ns (0.00% GC)
  mean time:        28.013 ns (2.91% GC)
  maximum time:     1.398 μs (91.99% GC)
  --------------
  samples:          10000
  evals/sample:     996

julia> @benchmark size($a)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.944 ns (0.00% GC)
  median time:      2.157 ns (0.00% GC)
  mean time:        2.133 ns (0.00% GC)
  maximum time:     20.403 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

Ie, 15 ns is nothing to worry about.

I still find the QuoteNode appearing and then disappearing strange, suggesting there is a lot going on I don’t understand.

PS. I am a fan Julia 0.6’s easy copy and paste for the REPL.

kristoffer.carlsson · June 25, 2017, 2:58am

Glad to hear it!

Topic		Replies	Views
Fast type indexing Performance	6	615	March 10, 2019
Type With Variable Number Of Parameters New to Julia	22	514	March 11, 2025
Efficient reflection on structs Performance	54	2188	February 17, 2021
On the performance of function calls that depends on a variable Performance metaprogramming	12	953	February 24, 2021
Function in type definition General Usage	24	7026	May 19, 2017

Accessing fields within a generated function

Related topics