A bug in using varinfo() when there is 3d data in the struct structure!

There is a bug in using varinfo when there is three-dimensional data in the struct structure. When there is 3d data inside the structure, it will greatly slow down the corresponding response speed when using varinfo. The results are as follows:

mutable struct Block3d
    data2d::Matrix
    data3d::Array{AbstractFloat, 3}
end

bloct1=Block3d(zeros(80,20000),zeros(80,200,20));
@time varinfo(r"bloct1") #0.202150 seconds (1.35 M allocations: 47.145 MiB, 10.30% gc time, 20.11% compilation time)
bloct2=Block3d(zeros(80,20000),zeros(80,200,200));
@time varinfo(r"bloct2") #data3d*10  4.509268seconds (12.80 M allocations: 418.302 MiB, 13.65% gc time)
bloct3=Block3d(zeros(80,200000),zeros(80,200,20));
@time varinfo(r"bloct3") #data2d*10  0.233853 seconds (1.28 M allocations: 42.630 MiB)

#withoutstruct
blocu1=zeros(80,200,20);
@time varinfo(r"blocu1") # 0.000525 seconds (157 allocations: 31.719 KiB)
blocu2=zeros(80,200,200);
@time varinfo(r"blocu2") #  0.000941 seconds (158 allocations: 31.758 KiB) very fast

(Note: This is only expanded 10 times as a test, but in fact many three-dimensional data volumes are hundreds or thousands of times larger, which will cause a very bad experience)

When there is no structure, 3d data by varinfo is very fast. When there is a structure, varinfo will become very slow. 2d data within the structure does not affect the varinfo speed.

This is likely due to some type instability in some path in varinfo, judging by the number of allocations and how they scale with the size of data3d. I’m not sure this is expected to be fast; varinfo is only a diagnostic tool after all. Since varinfo has to deal with potentially arbitrary data, I’m not sure this can be fixed in general.

1 Like

There is a large-sized data3d in my structure, and I need to count the occupied space.

  1. When varinfo is not fixed, how to define the data3d type can speed it up?
  2. For processing structures, there will indeed be many unknown custom types, but the system’s own types should be added to determine and then accelerate. For unknown types, follow the original method. Through comparison, the matrix type of data2d is determined and accelerated.

To my understanding there isn’t. Rather

data3d::Array{AbstractFloat, 3}

is saying the array should be expected to be of mixed element type, and it is stored as such, basically as an array of pointers to each element. Thus sending varinfo on a wild chase around memory, and is absolute poison for performance in general.

Unless you truly need to store different floating point formats in that matrix, this is not what you want. Try either

mutable struct Block3d
    data2d::Matrix
    data3d::Array{<:AbstractFloat, 3}
end

julia> bloct2=Block3d(zeros(80,20000),zeros(80,200,200));

julia> @time varinfo(r"bloct2")
  0.000057 seconds (61 allocations: 3.914 KiB)
  name         size summary
  –––––– –––––––––– –––––––
  bloct2 36.621 MiB Block3d

or introduce a type parameter

mutable struct Block3d{T<:AbstractFloat}
    data2d::Matrix{T}
    data3d::Array{T, 3}
end

PS: Please avoid double posting (Using varinfo() very slow when there is 3d data in the struct structure). This post already attracted attention. Title and category can still be changed after submission.

2 Likes

Thank you very much for your answer, this indeed solves the problem.