Removing type instability causes the code slower

I have to typed code:

@code_warntype get_magnetic_symmetry_from_database(1242, 434);
MethodInstance for Spglib.get_magnetic_symmetry_from_database(::Int64, ::Int64)
  from get_magnetic_symmetry_from_database(uni_number, hall_number) @ Spglib ~/.julia/dev/Spglib/src/magnetic.jl:215
Arguments
  #self#::Core.Const(Spglib.get_magnetic_symmetry_from_database)
  uni_number::Int64
  hall_number::Int64
Locals
  num_sym::Int32
  time_reversals::Union{BitVector, Vector{Int32}}
  translations::Union{Matrix{Float64}, Vector{StaticArraysCore.SVector{3, Float64}}}
  rotations::Union{Array{Int32, 3}, Vector{StaticArraysCore.SMatrix{3, 3, Int32, 9}}}
  @_8::Bool
Body::Tuple{Vector{StaticArraysCore.SMatrix{3, 3, Int32, 9}}, Vector{StaticArraysCore.SVector{3, Float64}}, BitVector}
1 ─       Core.NewvarNode(:(num_sym))
│         Core.NewvarNode(:(time_reversals))
│         Core.NewvarNode(:(translations))
│         Core.NewvarNode(:(rotations))
│   %5  = (1 <= uni_number)::Bool
└──       goto #3 if not %5
2 ─       (@_8 = uni_number <= 1651)
└──       goto #4
3 ─       (@_8 = false)
4 ┄       goto #6 if not @_8
5 ─       goto #7
6 ─ %12 = Base.AssertionError("1 <= uni_number <= 1651")::Core.Const(AssertionError("1 <= uni_number <= 1651"))
└──       Base.throw(%12)
7 ┄ %14 = Core.apply_type(Spglib.Array, Spglib.Cint, 3)::Core.Const(Array{Int32, 3})
│   %15 = Spglib.undef::Core.Const(UndefInitializer())
│         (rotations = (%14)(%15, 3, 3, 384))
│   %17 = Core.apply_type(Spglib.Matrix, Spglib.Cdouble)::Core.Const(Matrix{Float64})
│   %18 = Spglib.undef::Core.Const(UndefInitializer())
│         (translations = (%17)(%18, 3, 384))
│   %20 = Core.apply_type(Spglib.Vector, Spglib.Cint)::Core.Const(Vector{Int32})
│   %21 = Spglib.undef::Core.Const(UndefInitializer())
│         (time_reversals = (%20)(%21, 384))
│   %23 = Core.apply_type(Spglib.Ptr, Spglib.Cint)::Core.Const(Ptr{Int32})
│   %24 = Base.cconvert(%23, rotations::Array{Int32, 3})::Array{Int32, 3}
│   %25 = Core.apply_type(Spglib.Ptr, Spglib.Cdouble)::Core.Const(Ptr{Float64})
│   %26 = Base.cconvert(%25, translations::Matrix{Float64})::Matrix{Float64}
│   %27 = Core.apply_type(Spglib.Ptr, Spglib.Cint)::Core.Const(Ptr{Int32})
│   %28 = Base.cconvert(%27, time_reversals::Vector{Int32})::Vector{Int32}
│   %29 = Base.cconvert(Spglib.Cint, uni_number)::Int32
│   %30 = Base.cconvert(Spglib.Cint, hall_number)::Int32
│   %31 = Core.apply_type(Spglib.Ptr, Spglib.Cint)::Core.Const(Ptr{Int32})
│   %32 = Base.unsafe_convert(%31, %24)::Ptr{Int32}
│   %33 = Core.apply_type(Spglib.Ptr, Spglib.Cdouble)::Core.Const(Ptr{Float64})
│   %34 = Base.unsafe_convert(%33, %26)::Ptr{Float64}
│   %35 = Core.apply_type(Spglib.Ptr, Spglib.Cint)::Core.Const(Ptr{Int32})
│   %36 = Base.unsafe_convert(%35, %28)::Ptr{Int32}
│   %37 = Base.unsafe_convert(Spglib.Cint, %29)::Int32
│   %38 = Base.unsafe_convert(Spglib.Cint, %30)::Int32
│         (num_sym = $(Expr(:foreigncall, :(Core.tuple(:spg_get_magnetic_symmetry_from_database, Spglib.libsymspg)), Int32, svec(Ptr{Int32}, Ptr{Float64}, Ptr{Int32}, Int32, Int32), 0, :(:ccall), :(%32), :(%34), :(%36), :(%37), :(%38), :(%30), :(%29), :(%28), :(%26), :(%24))))
│         Spglib.check_error()
│   %41 = Core.apply_type(Spglib.SMatrix, 3, 3, Spglib.Int32, 9)::Core.Const(StaticArraysCore.SMatrix{3, 3, Int32, 9})
│   %42 = (%41 ∘ Spglib.transpose)::Core.Const(StaticArraysCore.SMatrix{3, 3, Int32, 9} ∘ transpose)
│   %43 = rotations::Array{Int32, 3}
│   %44 = Spglib.:(:)::Core.Const(Colon())
│   %45 = Spglib.:(:)::Core.Const(Colon())
│   %46 = Base.firstindex(rotations::Array{Int32, 3}, 3)::Core.Const(1)
│   %47 = (%46:num_sym)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])
│   %48 = Base.getindex(%43, %44, %45, %47)::Array{Int32, 3}
│   %49 = (:dims,)::Core.Const((:dims,))
│   %50 = Core.apply_type(Core.NamedTuple, %49)::Core.Const(NamedTuple{(:dims,)})
│   %51 = Core.tuple(3)::Core.Const((3,))
│   %52 = (%50)(%51)::Core.Const((dims = 3,))
│   %53 = Core.kwcall(%52, Spglib.eachslice, %48)::Core.PartialStruct(Slices{Array{Int32, 3}, Tuple{Colon, Colon, Int64}, Tuple{Base.OneTo{Int64}}, SubArray{Int32, 2, Array{Int32, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, 1}, Any[Array{Int32, 3}, Core.Const((Colon(), Colon(), 1)), Tuple{Base.OneTo{Int64}}])
│         (rotations = Spglib.map(%42, %53))
│   %55 = Core.apply_type(Spglib.SVector, 3, Spglib.Float64)::Core.Const(StaticArraysCore.SVector{3, Float64})
│   %56 = translations::Matrix{Float64}
│   %57 = Spglib.:(:)::Core.Const(Colon())
│   %58 = Base.firstindex(translations::Matrix{Float64}, 2)::Core.Const(1)
│   %59 = (%58:num_sym)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])
│   %60 = Base.getindex(%56, %57, %59)::Matrix{Float64}
│   %61 = Spglib.eachcol(%60)::Core.PartialStruct(ColumnSlices{Matrix{Float64}, Tuple{Base.OneTo{Int64}}, SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}}, Any[Matrix{Float64}, Core.Const((Colon(), 1)), Tuple{Base.OneTo{Int64}}])
│         (translations = Spglib.map(%55, %61))
│   %63 = Spglib.Bool::Core.Const(Bool)
│   %64 = time_reversals::Vector{Int32}
│   %65 = Base.firstindex(time_reversals::Vector{Int32})::Core.Const(1)
│   %66 = (%65:num_sym)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])
│   %67 = Base.getindex(%64, %66)::Vector{Int32}
│   %68 = Base.broadcasted(%63, %67)::Core.PartialStruct(Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, Type{Bool}, Tuple{Vector{Int32}}}, Any[Core.Const(Base.Broadcast.DefaultArrayStyle{1}()), Core.Const(Bool), Tuple{Vector{Int32}}, Core.Const(nothing)])
│         (time_reversals = Base.materialize(%68))
│   %70 = Core.tuple(rotations::Vector{StaticArraysCore.SMatrix{3, 3, Int32, 9}}, translations::Vector{StaticArraysCore.SVector{3, Float64}}, time_reversals::BitVector)::Tuple{Vector{StaticArraysCore.SMatrix{3, 3, Int32, 9}}, Vector{StaticArraysCore.SVector{3, Float64}}, BitVector}
└──       return %70

and what I called the type-stable one:

@code_warntype get_magnetic_symmetry_from_database2(1242, 434);
MethodInstance for Spglib.get_magnetic_symmetry_from_database2(::Int64, ::Int64)
  from get_magnetic_symmetry_from_database(uni_number, hall_number) @ Spglib ~/.julia/dev/Spglib/src/magnetic.jl:215
Arguments
  #self#::Core.Const(Spglib.get_magnetic_symmetry_from_database2)
  uni_number::Int64
  hall_number::Int64
Locals
  time_reversals::BitVector
  translations::Vector{StaticArraysCore.SVector{3, Float64}}
  rotations::Vector{StaticArraysCore.SMatrix{3, 3, Int32, 9}}
  num_sym::Int32
  _time_reversals::Vector{Int32}
  _translations::Matrix{Float64}
  _rotations::Array{Int32, 3}
  @_11::Bool
Body::Tuple{Vector{StaticArraysCore.SMatrix{3, 3, Int32, 9}}, Vector{StaticArraysCore.SVector{3, Float64}}, BitVector}
1 ─       Core.NewvarNode(:(time_reversals))
│         Core.NewvarNode(:(translations))
│         Core.NewvarNode(:(rotations))
│         Core.NewvarNode(:(num_sym))
│         Core.NewvarNode(:(_time_reversals))
│         Core.NewvarNode(:(_translations))
│         Core.NewvarNode(:(_rotations))
│   %8  = (1 <= uni_number)::Bool
└──       goto #3 if not %8
2 ─       (@_11 = uni_number <= 1651)
└──       goto #4
3 ─       (@_11 = false)
4 ┄       goto #6 if not @_11
5 ─       goto #7
6 ─ %15 = Base.AssertionError("1 <= uni_number <= 1651")::Core.Const(AssertionError("1 <= uni_number <= 1651"))
└──       Base.throw(%15)
7 ┄ %17 = Core.apply_type(Spglib.Array, Spglib.Cint, 3)::Core.Const(Array{Int32, 3})
│   %18 = Spglib.undef::Core.Const(UndefInitializer())
│         (_rotations = (%17)(%18, 3, 3, 384))
│   %20 = Core.apply_type(Spglib.Matrix, Spglib.Cdouble)::Core.Const(Matrix{Float64})
│   %21 = Spglib.undef::Core.Const(UndefInitializer())
│         (_translations = (%20)(%21, 3, 384))
│   %23 = Core.apply_type(Spglib.Vector, Spglib.Cint)::Core.Const(Vector{Int32})
│   %24 = Spglib.undef::Core.Const(UndefInitializer())
│         (_time_reversals = (%23)(%24, 384))
│   %26 = Core.apply_type(Spglib.Ptr, Spglib.Cint)::Core.Const(Ptr{Int32})
│   %27 = Base.cconvert(%26, _rotations)::Array{Int32, 3}
│   %28 = Core.apply_type(Spglib.Ptr, Spglib.Cdouble)::Core.Const(Ptr{Float64})
│   %29 = Base.cconvert(%28, _translations)::Matrix{Float64}
│   %30 = Core.apply_type(Spglib.Ptr, Spglib.Cint)::Core.Const(Ptr{Int32})
│   %31 = Base.cconvert(%30, _time_reversals)::Vector{Int32}
│   %32 = Base.cconvert(Spglib.Cint, uni_number)::Int32
│   %33 = Base.cconvert(Spglib.Cint, hall_number)::Int32
│   %34 = Core.apply_type(Spglib.Ptr, Spglib.Cint)::Core.Const(Ptr{Int32})
│   %35 = Base.unsafe_convert(%34, %27)::Ptr{Int32}
│   %36 = Core.apply_type(Spglib.Ptr, Spglib.Cdouble)::Core.Const(Ptr{Float64})
│   %37 = Base.unsafe_convert(%36, %29)::Ptr{Float64}
│   %38 = Core.apply_type(Spglib.Ptr, Spglib.Cint)::Core.Const(Ptr{Int32})
│   %39 = Base.unsafe_convert(%38, %31)::Ptr{Int32}
│   %40 = Base.unsafe_convert(Spglib.Cint, %32)::Int32
│   %41 = Base.unsafe_convert(Spglib.Cint, %33)::Int32
│         (num_sym = $(Expr(:foreigncall, :(Core.tuple(:spg_get_magnetic_symmetry_from_database, Spglib.libsymspg)), Int32, svec(Ptr{Int32}, Ptr{Float64}, Ptr{Int32}, Int32, Int32), 0, :(:ccall), :(%35), :(%37), :(%39), :(%40), :(%41), :(%33), :(%32), :(%31), :(%29), :(%27))))
│         Spglib.check_error()
│   %44 = Core.apply_type(Spglib.SMatrix, 3, 3, Spglib.Int32, 9)::Core.Const(StaticArraysCore.SMatrix{3, 3, Int32, 9})
│   %45 = (%44 ∘ Spglib.transpose)::Core.Const(StaticArraysCore.SMatrix{3, 3, Int32, 9} ∘ transpose)
│   %46 = _rotations::Array{Int32, 3}
│   %47 = Spglib.:(:)::Core.Const(Colon())
│   %48 = Spglib.:(:)::Core.Const(Colon())
│   %49 = Base.firstindex(_rotations, 3)::Core.Const(1)
│   %50 = (%49:num_sym)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])
│   %51 = Base.getindex(%46, %47, %48, %50)::Array{Int32, 3}
│   %52 = (:dims,)::Core.Const((:dims,))
│   %53 = Core.apply_type(Core.NamedTuple, %52)::Core.Const(NamedTuple{(:dims,)})
│   %54 = Core.tuple(3)::Core.Const((3,))
│   %55 = (%53)(%54)::Core.Const((dims = 3,))
│   %56 = Core.kwcall(%55, Spglib.eachslice, %51)::Core.PartialStruct(Slices{Array{Int32, 3}, Tuple{Colon, Colon, Int64}, Tuple{Base.OneTo{Int64}}, SubArray{Int32, 2, Array{Int32, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, 1}, Any[Array{Int32, 3}, Core.Const((Colon(), Colon(), 1)), Tuple{Base.OneTo{Int64}}])
│         (rotations = Spglib.map(%45, %56))
│   %58 = Core.apply_type(Spglib.SVector, 3, Spglib.Float64)::Core.Const(StaticArraysCore.SVector{3, Float64})
│   %59 = _translations::Matrix{Float64}
│   %60 = Spglib.:(:)::Core.Const(Colon())
│   %61 = Base.firstindex(_translations, 2)::Core.Const(1)
│   %62 = (%61:num_sym)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])
│   %63 = Base.getindex(%59, %60, %62)::Matrix{Float64}
│   %64 = Spglib.eachcol(%63)::Core.PartialStruct(ColumnSlices{Matrix{Float64}, Tuple{Base.OneTo{Int64}}, SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}}, Any[Matrix{Float64}, Core.Const((Colon(), 1)), Tuple{Base.OneTo{Int64}}])
│         (translations = Spglib.map(%58, %64))
│   %66 = Spglib.Bool::Core.Const(Bool)
│   %67 = _time_reversals::Vector{Int32}
│   %68 = Base.firstindex(_time_reversals)::Core.Const(1)
│   %69 = (%68:num_sym)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])
│   %70 = Base.getindex(%67, %69)::Vector{Int32}
│   %71 = Base.broadcasted(%66, %70)::Core.PartialStruct(Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, Type{Bool}, Tuple{Vector{Int32}}}, Any[Core.Const(Base.Broadcast.DefaultArrayStyle{1}()), Core.Const(Bool), Tuple{Vector{Int32}}, Core.Const(nothing)])
│         (time_reversals = Base.materialize(%71))
│   %73 = Core.tuple(rotations, translations, time_reversals)::Tuple{Vector{StaticArraysCore.SMatrix{3, 3, Int32, 9}}, Vector{StaticArraysCore.SVector{3, Float64}}, BitVector}
└──       return %73

In the 1st piece of code, the local variables time_reversals, translations, & rotations are Union types, in the 2nd part, they are separated by _ prefix. But the first one is actually a little bit faster:

@btime get_magnetic_symmetry_from_database(1242, 434);
  607.841 ns (10 allocations: 25.45 KiB)

@btime get_magnetic_symmetry_from_database2(1242, 434);
  624.411 ns (10 allocations: 25.45 KiB)

They have the same allocations, but the type-stable one is slightly slower. I wonder why. Is it the trivial reason that the 2nd code has a little bit more instructions?

What was the slowdown you observed? Can you share the benchmarks?

Yes, I did. Sorry, I posted the question before finishing it.

The difference is very small. Also in the second example you have more variables. It would be easier for us to help you optimize if you shared the source code with us.

The source code is here.
I greatly appreciate it if you could help me.

Is this the type unstable code? What is the type stable one?

Will the ccall

num_sym = @ccall libsymspg.spg_get_magnetic_symmetry_from_database(
        _rotations::Ptr{Cint},
...

Always take the same amount of time? I assume correctly that the database is some file stored somewhere offline?

In order to see what causes a slowdown it’s probably necessary to see the two codes side by side.

One more question, is the code faster by not using views here and in the lines below?

    rotations = map(
        SMatrix{3,3,Int32,9} ∘ transpose, eachslice(_rotations[:, :, begin:num_sym]; dims=3)
    ) 

I guess it might be true that it’s better to allocate this as you are doing so that everything is contiguous, but I was curious.

I think it is. I have tested this function multiple times. The database is offline.

is the code faster by not using views here and in the lines below?

How to not use views here?

No, I mean you are not using views currently (except for the eachslice)
You could try to use views with

rotations = @views map(
        SMatrix{3,3,Int32,9} ∘ transpose, eachslice(_rotations[:, :, begin:num_sym]; dims=3)
    )

Which should get rid of some of the allocations.
It might be that it is faster to allocate everything as a contiguous array but you would probably still have to do some rearrangements of the indices (i.e. permutedims) because iterating over the third index is slow.
But it might also be that this part is not important for performance anyway.

If this is what you mean by type instability, the compiler does Union-splitting for small type unions (3-4), basically checking a variable’s type and branching into versions of the code for each inferred type. If there’s no red-colored variables in the @code_warntype printout and your return type is 1 concrete type, I wouldn’t worry too much about it. I mention the return type because while a variable with a few inferred types can be handled, combinations of multiple such variables can easily result in too many types, so it’s best to keep these small type unions contained in a method.