Julia / C mutable struct compiles much faster than struct

I’ve wrapped a C library for use in Julia, and have encountered a performance benefit to using a mutable struct as opposed to a struct; this is not what I would have expected. In short: the C-library usually deals with pointers, while in Julia we may want to map the structs (unsafe_load) to easily access values. To deal with static arrays we can use SVectors or NTuples; if a struct using either method us unsafe_load’d, it is much slower (initially) than using a mutable struct.

Take the following code (tested with both 0.7 and 1.0.1), for example:

using StaticArrays

struct S_SA
   f::SVector{1000, Float64}
end
mutable struct M_SA
   f::SVector{1000, Float64}
end

struct S_NT
   f::NTuple{1000, Float64}
end
mutable struct M_NT
   f::NTuple{1000, Float64}
end

makeptr(v::DataType) = convert(Ptr{v}, Libc.malloc(sizeof(v)))

p_s = makeptr(S_SA)
p_m = makeptr(M_SA)

println("unsafe_load for SVector mutable struct:")
@time m = unsafe_load(p_m);

println("unsafe_load for SVector struct:")
@time s = unsafe_load(p_s);

println()
println()
println()

p_ns = makeptr(S_NT)
p_nm = makeptr(M_NT)

println("unsafe_load for NTuple struct:")
@time ns = unsafe_load(p_ns);

println("unsafe_load for NTuple mutable struct:")
@time nm = unsafe_load(p_nm);

which nets the following timings for me:

unsafe_load for SVector mutable struct:
  0.002560 seconds (3.03 k allocations: 193.748 KiB)
unsafe_load for SVector struct:
  0.310711 seconds (3.03 k allocations: 192.795 KiB)


unsafe_load for NTuple struct:
  0.295900 seconds (3.03 k allocations: 192.795 KiB)
unsafe_load for NTuple mutable struct:
  0.002644 seconds (3.04 k allocations: 201.889 KiB)

Each time, the mutable struct unsafe_loads faster than the struct. On subsequent unsafe_loads, they are the same speed. The sizes of the C-structs I’m using are larger / more complex (not just static arrays, but other fields as well) than this, and the static struct can take up to 30 seconds to unsafe_load, while the mutable struct is similar to the above.

Does anyone in the community have a recommendation? I find 30+ seconds to be far too long, but would have expected that a struct in this case is the correct Julia thing to do. Suggestions very welcome.

I cannot reproduce.

julia> using BenchmarkTools

julia> @btime unsafe_load($p_m);
  875.717 ns (1 allocation: 7.88 KiB)

julia> @btime unsafe_load($p_s);
  395.254 ns (0 allocations: 0 bytes)

I think you counted compile time, and as @foobar_lv2 shows, you should use @btime or @benchmark in BenchmarkTools to measure the time for multiple times which get you rid of the compilation time.

Yes, this is an issue with the compile time.

Given that the compile time of 30 seconds for my wrapped package would need to happen each time a program is started, this does affect usability. My question is why the compile time would that much shorter for the mutable struct, and if there is a way around it. My wrapper module is set to precompile, however this pops up when I use my module, so I’m guessing the unsafe_load still needs to compile the first time it’s run, unless I can precompile it on package build?

I was unsuccessful in attempting to use SnoopCompiler (and precompile by hand) to push the performance hit into the module pre-compilation stage, to no success.

If someone has a good reference to documentation on how to use precompile() to do this, that would be awesome. I’m not sure if it is doing the compilation within the scope of the module which somehow doesn’t transfer when I use the module elsewhere, or if there’s something else going on.

Package precompilation only precompiles the Julia frontend. The LLVM backend must still recompile every time you load the package.
The library PackageCompiler.jl, however, can fully compile a package, building it into Julia’s system image.