Types and Memory allocation in the operation of Vector{Matrix{ComplexF64}}

Hi, now, I’m calculating the time evolution of Vector{Matrix{ComplexF64}} by Runge-Kutta method, and I have problems regarding the calculation below::

ρ += (h/6)*(k1 + 2*k2 + 2*k3 + k4)

here, k1~k4, are Vector{Matrix{ComplexF64}}, and h is Float64.
I thought it would be natural to think that ρ will be Vector{Matrix{ComplexF64}}, but the code_warntype teaches me it is Any.

How can I understand this behaviour?

Also, there is memory allocation in this calculation, even though there is no creation of new arrays. How can I get rid of them?

I should have given you more information.
I tried

@. ρ += (T_param.h/6)*(k1 + 2*k2 + 2*k3 + k4)

And it indeed fixed the type instability of ρ, but it increased the memory allocation, investigated by Profile.
How can I solve the type instability problem and the memory allocation problem at the same time?

MWE

julia> k1 = [rand(2,2)];

julia> k2 = copy(k1);

julia> using Test

julia> @inferred k1 + k2
ERROR: return type Vector{Matrix{Float64}} does not match inferred return type Any
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] top-level scope
   @ REPL[17]:1

I wonder why this isn’t being inferred (it seems the eltype is not being inferred). The following seems to be inferred, though:

julia> bc = Broadcast.instantiate(Broadcast.broadcasted(+, k1, k2));

julia> @inferred copy(bc)
1-element Vector{Matrix{Float64}}:
 [0.8596129565473165 1.8419784632408078; 0.4550608871647064 0.4035134314769242]

Wonder if @N5N3 might have some ideas

Hi @J_I, could you maybe post a Minimum Working Example, so that we can experiment with it and show you what’s wrong? I have a few ideas but it’s hard to tell without being able to run the code

Looks like a nested broadcast which our compiler refuse to infer now.

Here is MWE.

function MWE_1()
    h = 0.01
    k1 = [rand(ComplexF64,4,4) for _ in 1:500]
    k2 = [rand(ComplexF64,4,4) for _ in 1:500]
    k3 = [rand(ComplexF64,4,4) for _ in 1:500]
    k4 = [rand(ComplexF64,4,4) for _ in 1:500]
    ρ  = [rand(ComplexF64,4,4) for _ in 1:500]
    ρ = h/6 * (k1 + 2*k2 + 2*k3 + k4)
end

@code_warntype MWE_1()

the result is

Arguments
  #self#::Core.Const(MWE_1)
Locals
  #9::var"#9#14"
  #8::var"#8#13"
  #7::var"#7#12"
  #6::var"#6#11"
  #5::var"#5#10"
  ρ::Any
  k4::Vector{Matrix{ComplexF64}}
  k3::Vector{Matrix{ComplexF64}}
  k2::Vector{Matrix{ComplexF64}}
  k1::Vector{Matrix{ComplexF64}}
  h::Float64
Body::Any

this type instability problem is solved by adding @.

function MWE_2()
    h = 0.01
    k1 = [rand(ComplexF64,4,4) for _ in 1:500]
    k2 = [rand(ComplexF64,4,4) for _ in 1:500]
    k3 = [rand(ComplexF64,4,4) for _ in 1:500]
    k4 = [rand(ComplexF64,4,4) for _ in 1:500]
    ρ  = [rand(ComplexF64,4,4) for _ in 1:500]
    @. ρ = h/6 * (k1 + 2*k2 + 2*k3 + k4)
end

like below

Arguments
  #self#::Core.Const(MWE_2)
Locals
  #19::var"#19#24"
  #18::var"#18#23"
  #17::var"#17#22"
  #16::var"#16#21"
  #15::var"#15#20"
  ρ::Vector{Matrix{ComplexF64}}
  k4::Vector{Matrix{ComplexF64}}
  k3::Vector{Matrix{ComplexF64}}
  k2::Vector{Matrix{ComplexF64}}
  k1::Vector{Matrix{ComplexF64}}
  h::Float64
Body::Vector{Matrix{ComplexF64}}

but this makes another problem of unknown memory allocation.

        - function MWE_2()
        -     h = 0.01
        0     k1 = [rand(ComplexF64,4,4) for _ in 1:500]
        0     k2 = [rand(ComplexF64,4,4) for _ in 1:500]
        0     k3 = [rand(ComplexF64,4,4) for _ in 1:500]
        0     k4 = [rand(ComplexF64,4,4) for _ in 1:500]
        0     ρ  = [rand(ComplexF64,4,4) for _ in 1:500]
   576000     @. ρ = h/6 * (k1 + 2*k2 + 2*k3 + k4)
        - end
        - 
        - # @code_warntype MWE_1()
        - # @code_warntype MWE_2()
        - 
        - using Profile 
        - Profile.clear_malloc_data()
        - MWE_2()

If your real usage is still limited to 4x4.
The simplest fix here is use Vector of SMatrix instead of Vector of Matrix.

StaticArrays.jl has its own broadcast (+, /, …) implementation thus hopefully the rescusion detection would ignore it.

4 Likes

Probably simplest to loop over the arrays instead of using broadcast. This may be faster as well.

2 Likes

I confirmed your suggestion, and it completely solved the problem. Thank you very much!

PS. Please do not post screenshots. Post quoted code. See Please read: make it easier to help you

1 Like