Type stable code is more expensive than type unstable code. Why?

I wrote a small code (following the advices I got from other post) just for learning some things about type stability. The two first definitions of foo() are type stable. The third definition is not. However, the type unstable function runs faster and involves less allocations than type stable ones. Is this right? I would expect the opposite.

The code is:

using StaticArrays
using Test

function foo(Ke::MMatrix{2,2,Float64}, le::Float64, area::Float64, Ey::Float64)
	xi = gausspoints1d()
	w = gaussweights1d()
	@inbounds for i = 1:10
		dNdx = shapefunder1d(2, xi[i])
		Ke .+= dNdx * dNdx' * w[i]
	end
	Ke .*= Ey * area * 2.0 / le
end

function foo(Ke::MMatrix{3,3,Float64}, le::Float64, area::Float64, Ey::Float64)
	xi = gausspoints1d()
	w = gaussweights1d()
	@inbounds for i = 1:10
		dNdx = shapefunder1d(3, xi[i])
		Ke .+= dNdx * dNdx' * w[i]
	end
	Ke .*= Ey * area * 2.0 / le
end

function foo(::Val{nelementnodes}, le::Float64, area::Float64, Ey::Float64) where nelementnodes
	xi = gausspoints1d()
	w = gaussweights1d()
	Ke = zeros(SMatrix{nelementnodes,nelementnodes})
	@inbounds for i = 1:10
		dNdx = shapefunder1d(Val(nelementnodes), xi[i])
		Ke += dNdx * dNdx' * w[i]
	end
	Ke *= Ey * area * 2.0 / le
	return Ke
end

function foo(nelementnodes::Int, le::Float64, area::Float64, Ey::Float64)
	xi = gausspoints1d()
	w = gaussweights1d()
	Ke = zeros(SMatrix{nelementnodes,nelementnodes})
	@inbounds for i = 1:10
		dNdx = shapefunder1d(nelementnodes, xi[i])
		Ke += dNdx * dNdx' * w[i]
	end
	Ke *= Ey * area * 2.0 / le
	return Ke
end

function shapefunder1d(nelementnodes::Int, xi::Float64)
		nelementnodes == 2 && return @SVector [-0.5, 0.5]
		nelementnodes == 3 && return @SVector [-0.5 + xi, -2.0 * xi, 0.5 + xi]
end

function shapefunder1d(::Val{2}, xi::Float64)
  return @SVector [-0.5, 0.5]
end

function shapefunder1d(::Val{3}, xi::Float64)
  return @SVector [-0.5 + xi, -2.0 * xi, 0.5 + xi]
end

function gausspoints1d()
	return @SVector [- 0.973906528517171720077964012084,
	     						- 0.865063366688984510732096688423,
			 						- 0.679409568299024406234327365115,
			 						- 0.433395394129247290799265943166,
			 						- 0.148874338981631210884826001130,
			 						0.148874338981631210884826001130,
			 						0.433395394129247290799265943166,
			 						0.679409568299024406234327365115,
			 						0.865063366688984510732096688423,
				 					0.973906528517171720077964012084]
end

function gaussweights1d()
	return @SVector [0.0666713443086881375935688098933,
									0.149451349150580593145776339658,
									0.219086362515982043995534934228,
									0.269266719309996355091226921569,
									0.295524224714752870173892994651,
									0.295524224714752870173892994651,
									0.269266719309996355091226921569,
									0.219086362515982043995534934228,
									0.149451349150580593145776339658,
									0.0666713443086881375935688098933]
end

# tests
nelementnodes = 3
Ke = zeros(MMatrix{nelementnodes,nelementnodes})
time1() = @time foo(Ke,  400.0, 0.8, 1000000.0)
time2() = @time foo(Val(nelementnodes), 400.0, 0.8, 1000000.0)
time3() = @time foo(nelementnodes, 400.0, 0.8, 1000000.0)

time1()
time2()
time3()

@inferred foo(Ke,  400.0, 0.8, 1000000.0)
@inferred foo(Val(nelementnodes), 400.0, 0.8, 1000000.0)
@inferred foo(nelementnodes, 400.0, 0.8, 1000000.0)

After running the code several times (copied and pasted to the REPL as the @inferred didn’t work when running as a file), I got this:

time1()
0.036427 seconds (84.85 k allocations: 3.738 MiB)
3×3 MArray{Tuple{3,3},Float64,2,9}:
4666.67 -5333.33 666.667
-5333.33 10666.7 -5333.33
666.667 -5333.33 4666.67

time2()
0.027482 seconds (41.68 k allocations: 2.117 MiB)
3×3 SArray{Tuple{3,3},Float64,2,9}:
4666.67 -5333.33 666.667
-5333.33 10666.7 -5333.33
666.667 -5333.33 4666.67

time3()
0.000030 seconds (84 allocations: 4.484 KiB)
3×3 SArray{Tuple{3,3},Float64,2,9}:
4666.67 -5333.33 666.667
-5333.33 10666.7 -5333.33
666.667 -5333.33 4666.67

@inferred foo(Ke, 400.0, 0.8, 1000000.0)
3×3 MArray{Tuple{3,3},Float64,2,9}:
1.86713e7 -2.13387e7 2.66733e6
-2.13387e7 4.26773e7 -2.13387e7
2.66733e6 -2.13387e7 1.86713e7

@inferred foo(Val(nelementnodes), 400.0, 0.8, 1000000.0)
3×3 SArray{Tuple{3,3},Float64,2,9}:
4666.67 -5333.33 666.667
-5333.33 10666.7 -5333.33
666.667 -5333.33 4666.67

@inferred foo(nelementnodes, 400.0, 0.8, 1000000.0)
ERROR: return type SArray{Tuple{3,3},Float64,2,9} does not match inferred return type Any
Stacktrace:
[1] top-level scope at none:0

As I mentioned in the other thread, please use BenchmarkTools for benchmarking. And remember to interpolate the variables.

1 Like

I think you are mostly measuring compilation time.

julia> using BenchmarkTools

julia> time1() = @btime foo(Ke,  400.0, 0.8, 1000000.0)
time1 (generic function with 1 method)

julia> time2() = @btime foo(Val(nelementnodes), 400.0, 0.8, 1000000.0)
time2 (generic function with 1 method)

julia> time3() = @btime foo(nelementnodes, 400.0, 0.8, 1000000.0)
time3 (generic function with 1 method)

julia> time1()
  75.153 ns (0 allocations: 0 bytes)
3×3 MArray{Tuple{3,3},Float64,2,9}:
  Inf  -Inf   Inf
 -Inf   Inf  -Inf
  Inf  -Inf   Inf

julia> time2()
  4.569 μs (1 allocation: 80 bytes)
3×3 SArray{Tuple{3,3},Float64,2,9}:
  4666.67   -5333.33    666.667
 -5333.33   10666.7   -5333.33 
   666.667  -5333.33   4666.67 

julia> time3()
  3.205 μs (84 allocations: 4.48 KiB)
3×3 SArray{Tuple{3,3},Float64,2,9}:
  4666.67   -5333.33    666.667
 -5333.33   10666.7   -5333.33 
   666.667  -5333.33   4666.67 
2 Likes

Thank you! @jw3126

Ok. Now I see the difference between btime and time. Thank you @DNF