Array performance Julia 0.6 vs 0.5


#1

Hello,
a simple test return strange results …

I’ve placed in a file speed05vs06.jl following code:

function speedtest()
	a = [1:10...]
	b = 2*a + 1
	@show a
	@show b
	
	if (VERSION < v"0.6.0")
		@time v = (a .< 7) & (b .< 7)
		@time v = (a .< 7) & (b .< 7)
	else
		@time v = (a .< 7) .& (b .< 7)
		@time v = (a .< 7) .& (b .< 7)	
	end
	
	return v
end

After including code with include("speed05vs06.jl") and running it with speedtest(), I obtain these results on same machine:

Julia 0.5.2:

julia> speedtest()
a = [1,2,3,4,5,6,7,8,9,10]
b = [3,5,7,9,11,13,15,17,19,21]
  0.088614 seconds (5.69 k allocations: 263.064 KB)
  0.000035 seconds (11 allocations: 8.797 KB)
10-element BitArray{1}:
  true
  true
 false
 false
 false
 false
 false
 false
 false
 false

Julia 0.6.2:

julia> speedtest()
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
b = [3, 5, 7, 9, 11, 13, 15, 17, 19, 21]
  3.140345 seconds (150.74 k allocations: 7.619 MiB)
  0.077298 seconds (8.67 k allocations: 453.101 KiB)
10-element BitArray{1}:
  true
  true
 false
 false
 false
 false
 false
 false
 false
 false

then Julia 0.5.2 seems to be faster for array operations than Julia 0.6.2!
(also excluding 1st execution, due to precompilation)

I’ve read https://docs.julialang.org/en/latest/manual/performance-tips/ , especially “More dots” paragraph, but I’m still confused …

Anyone can explain what’s wrong in my code?

Many thanks in advance

Leonardo


@time vs @btime
#2

I think there is something wrong with your code, especially this a = [1:10...], it should read a = [1:10;] instead.

Here are my timings in four versions of Julia:

Version: 0.5
julia> @time [speedtest() for i=1:10^6];
 14.647543 seconds (24.02 M allocations: 17.085 GB, 43.95% gc time)

Version: 0.6
julia> @time [speedtest() for i=1:10^6];
  3.949769 seconds (10.01 M allocations: 8.561 GiB, 49.83% gc time)

Version: 6.2
julia> @time [speedtest() for i=1:10^6];
  3.943212 seconds (10.01 M allocations: 8.561 GiB, 50.22% gc time)

Version: 0.7
julia> @time [speedtest() for i=1:10^6];
  3.256629 seconds (8.02 M allocations: 8.532 GiB, 55.83% gc time)

Without a doubt, 0.7-DEV version is fastest. All versions are run with flags: -O3 --check-bounds=no --math-mode=fast.


#3

As @Seif_Shebl noted, the splatting operator (...) is the wrong choice, as it isn’t type-stable. Traceur.jl is a good (new!) tool to identify such instabilities, although you should clean your script of timing cruft before running a trace:

julia> using Traceur

julia> function speedtest1()
               a = [1:10...]
               b = 2*a + 1
               v = (a .< 7) .& (b .< 7)
       end

julia> @trace speedtest1()
(speedtest1)() at REPL[38]:2
  b is assigned as Any at line 3
  a is assigned as Union{Array{Any,1}, Array{Int64,1}} at line 2
  v is assigned as Any at line 4
  dynamic dispatch to 2a at line 3
  dynamic dispatch to #temp# + 1 at line 3
  dynamic dispatch to (Base.broadcast)(#51, a, b) at line 4
  returns Any

Using a = collect(1:10) or a = [1:10;] as suggested fixes the instability:

julia> function speedtest2()
               a = [1:10;]
               b = 2*a + 1
               v = @. (a < 7) & (b < 7)
       end

julia> using BenchmarkTools

julia> @btime speedtest1();
  5.636 μs (27 allocations: 5.91 KiB)

julia> @btime speedtest2();
  1.053 μs (8 allocations: 4.83 KiB)

#4

Thanks to all, also for tip for command line options (I run Julia on Win64).

I was interested only in test for array comparison, then I’ve modified code as follows:

function speedtest1a()
	a = [1:10...]
	b = 2*a + 1
	@show a
	@show typeof(a)
	@show b
	@show typeof(b)
	
	if (VERSION < v"0.6.0")
		@time v = (a .< 7) & (b .< 7)
		@time v = (a .< 7) & (b .< 7)
		@time v = (a .< 7) & (b .< 7)
	else
		@time v = (a .< 7) .& (b .< 7)
		@time v = (a .< 7) .& (b .< 7)	
		@time v = (a .< 7) .& (b .< 7)	
	end
end

function speedtest1b()
	a = [1:10;]
	b = 2*a + 1
	@show a
	@show typeof(a)
	@show b
	@show typeof(b)
	
	if (VERSION < v"0.6.0")
		@time v = (a .< 7) & (b .< 7)
		@time v = (a .< 7) & (b .< 7)
		@time v = (a .< 7) & (b .< 7)
	else
		@time v = (a .< 7) .& (b .< 7)
		@time v = (a .< 7) .& (b .< 7)	
		@time v = (a .< 7) .& (b .< 7)	
	end
end

I understand that a = [1:10;] is better than a = [1:10...], but I cannot understand why also performance in subsequent code during array comparison are worst in Julia 0.6.2 for speedtest2a() than speedtest1a().
And taking into account that is no apparent difference between the types of a for two functions …

How can I understand how data is stored in a in real world? (this is a simplified code; in my case a contains data after reading data from CSV file and some subsequent elaborations)

Many thanks again

Leonardo


#5

I you want to test only the array comparison speed, then perhaps something like this is a better benchmark:

function speedtest(a, b)
    if VERSION < v"0.6.0"
        (a .< 7) & (b .< 7)
    else
        (a .< 7) .& (b .< 7)
    end
end

a = [1:10;]
b = 2*a + 1

using BenchmarkTools
@btime speedtest($a, $b)

I get the following times:

v0.5.2:  3.910 μs (11 allocations: 8.80 KiB)
v0.6.2:  641.472 ns (4 allocations: 4.33 KiB)
v0.7.0:  402.601 ns (3 allocations: 4.31 KiB)

@time vs @btime
#6

That’s a much better test. The only thing I would change is using @static if instead of if.


#7

Good point.

But I thought VERSION would be known at compile time and optimised out.
Is that not the case? If not, why not?


#8

Not sure why not, but it definitely matters on 0.6.2, as witnessed by the code_warntype. The benchmark is only about 2-3 percent faster with @static though.


#9

Constant propagation and branch elimination used to rely almost entirely upon LLVM’s optimization routines. Julia 0.7 is now doing more of this itself before it hands off to LLVM, but both are fairly dependent upon inlining. It looks like the necessary version comparison methods didn’t inline on 0.6, but they do on 0.7.


#10

Thanks to all.

I’ve understood that @btime measure performance better than @time .

Than, simplifying more my test:

a =[1:10;]
@btime a .< 7

returns:
Julia 0.5.2:
10.488 µs (4 allocations: 4.33 KiB)

Julia 0.6.2:
20.438 µs (20 allocations: 4.94 KiB)

Why making this simple change to code, I obtain very different results?

Sorry for my (probably) stupid question …

Leonardo


#11

@btime $a .< 7


#12

Well, this re-establish attended performance.

But why without interpolation results are so different for these Julia versions?

Many thanks for patience …

Leonardo