Shouldn't 1.8.0 be faster than Julia 1.7?

mindaslab · June 2, 2022, 3:44pm

Tomas_Pevny · June 2, 2022, 3:45pm

Are you measuring compile time?

mindaslab · June 2, 2022, 3:47pm

I am not sure what @time really does. I thought things will get faster in Julia.

Tomas_Pevny · June 2, 2022, 3:48pm

Can you run the command twice?
While compilation time is important, the running time is even more important.

mbauman · June 2, 2022, 3:54pm

This is a silly benchmark that is almost certainly hitting swap. Run it again and you’ll likely see wildly varying numbers on all versions of Julia depending upon what else is happening on your computer and how much memory you have available.

Tomas_Pevny · June 2, 2022, 3:59pm

To finish this,

I have @benchmark it. As @mbauman said, the variance is huge, median is the same

mindaslab · June 2, 2022, 4:01pm

Looks lime my computer is been consistently silly for more than 5 minutes

Should see if its varying widely tomorrow.

mbauman · June 2, 2022, 4:03pm

Seriously, this is not meaningful.

Sometimes when you run the “test”, your operating system needs to make space for a 7GB array. Sometimes, that space is already available.

That overhead has nothing to do with Julia versions.

Sukera · June 2, 2022, 4:04pm

Let’s not jump to conclusions about whether a benchmark is silly or not. @mbauman You don’t even know anything about the machine they’re running it on.

FWIW, I can also see a slowdown, though not as drastic. I’m running this on a machine with 20GB RAM, with plenty free to fit everything without having to swap.

julia> @time (1:10^9).^2;
  3.392685 seconds (2 allocations: 7.451 GiB, 0.24% gc time)

julia> GC.gc()

julia> @time (1:10^9).^2;
  3.690638 seconds (2 allocations: 7.451 GiB, 9.07% gc time)

julia> GC.gc()

julia> versioninfo()
Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 4

vs.

julia> @time (1:10^9).^2;
  5.470563 seconds (2 allocations: 7.451 GiB, 0.17% gc time)

julia> @time (1:10^9).^2;
  5.437031 seconds (2 allocations: 7.451 GiB, 0.11% gc time)

julia> GC.gc()

julia> @time (1:10^9).^2;
  5.735879 seconds (2 allocations: 7.451 GiB, 4.50% gc time)

julia> versioninfo()
Julia Version 1.9.0-DEV.614
Commit 0488abc625* (2022-05-24 15:28 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: 4 × Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 4 on 4 virtual cores
Environment:
  JULIA_NUM_THREADS = 4

@mindaslab could you post the output of versioninfo(;verbose=true) please?

–

Julia is not guaranteed to make every operation faster with a new version, but I do have to agree that the benchmark is kind of arbitrary and literally only measures a single allocation and application of a single function. It’s not representative of all things you can do, so concluding that 1.8 is in general slower seems a little premature to me (especially considering that a great deal of other issues have gotten substantially faster).

Sukera · June 2, 2022, 4:09pm

Apologies, that output above does not include the full picture. Here’s my output of versioninfo(;verbose=true), minus my PATH:

julia> versioninfo(;verbose=true)
Julia Version 1.9.0-DEV.614
Commit 0488abc625* (2022-05-24 15:28 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      "Arch Linux"
  uname: Linux 5.17.9-arch1-1 #1 SMP PREEMPT Wed, 18 May 2022 17:30:11 +0000 x86_64 unknown
  CPU: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2418 MHz      97177 s         65 s      21188 s     184522 s       4351 s
       #2  2800 MHz      99033 s         65 s      21217 s     184243 s       2350 s
       #3  2800 MHz     102550 s         72 s      19599 s     183762 s       1928 s
       #4  2800 MHz     100925 s         64 s      22680 s     182784 s       2273 s
  Memory: 19.400531768798828 GB (15737.29296875 MB free)
  Uptime: 31265.1 sec
  Load Avg:  1.15  1.29  2.09
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 4 on 4 virtual cores
Environment:
  JULIA_NUM_THREADS = 4
  HOME = /home/sukera
  TERM = foot

Oscar_Smith · June 2, 2022, 4:18pm

is there a difference in the @code_native?

gbaraldi · June 2, 2022, 4:22pm

The code is different for sure. I can also replicate the regression, in my computer it’s only about 25%, but it’s there.

Sukera · June 2, 2022, 4:25pm

Yes, but I’m not sure how much is due to a difference in code used.

1.7.3

julia> @code_native (1:10^9).^2;
	.text
; ┌ @ none within `##dotfunction#291#1`
	pushq	%rbp
	pushq	%r15
	pushq	%r14
	pushq	%r13
	pushq	%r12
	pushq	%rbx
	subq	$56, %rsp
	movq	%rsi, %r14
	vxorps	%xmm0, %xmm0, %xmm0
	vmovaps	%xmm0, (%rsp)
	movq	$0, 16(%rsp)
	movq	%fs:0, %rax
	movq	-8(%rax), %rcx
; │┌ @ broadcast.jl:860 within `materialize`
; ││┌ @ broadcast.jl:281 within `instantiate`
; │││┌ @ broadcast.jl:499 within `combine_axes`
; ││││┌ @ range.jl:643 within `axes`
; │││││┌ @ range.jl:699 within `length`
; ││││││┌ @ range.jl:775 within `last`
; │││││││┌ @ Base.jl:42 within `getproperty`
	movq	$4, (%rsp)
	movq	(%rcx), %rax
	movq	%rax, 8(%rsp)
	movq	%rsp, %rax
	movq	%rcx, 32(%rsp)
	movq	%rax, (%rcx)
; ││││││└└
; ││││││┌ @ int.jl:86 within `-`
	movq	(%rdi), %r15
	movq	8(%rdi), %rbp
	subq	%r15, %rbp
; │││││└└
; │││││┌ @ range.jl:700 within `length`
; ││││││┌ @ int.jl:87 within `+`
	leaq	1(%rbp), %r13
; │││││└└
; │││││┌ @ range.jl:413 within `oneto`
; ││││││┌ @ range.jl:411 within `OneTo` @ range.jl:398
; │││││││┌ @ promotion.jl:479 within `max`
	movq	%r13, %rax
	sarq	$63, %rax
	andnq	%r13, %rax, %r12
; ││││└└└└
; ││││┌ @ broadcast.jl:504 within `broadcast_shape`
; │││││┌ @ broadcast.jl:508 within `_bcs`
	movq	%r12, 48(%rsp)
	movabsq	$jl_system_image_data, %rdi
	movabsq	$140407002443377, %rax          # imm = 0x7FB30D812E71
; ││└└└└
; ││┌ @ broadcast.jl:885 within `copy`
; │││┌ @ broadcast.jl:211 within `similar` @ broadcast.jl:212 @ abstractarray.jl:828 @ abstractarray.jl:829
; ││││┌ @ boot.jl:474 within `Array` @ boot.jl:466 @ boot.jl:457
	movq	%r12, %rsi
	callq	*%rax
	movq	%rax, %rbx
; │││└└
; │││┌ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:949
; ││││┌ @ abstractarray.jl:95 within `axes`
; │││││┌ @ array.jl:151 within `size`
	movq	24(%rax), %rax
; ││││└└
; ││││┌ @ tuple.jl:381 within `==`
; │││││┌ @ tuple.jl:385 within `_eq`
; ││││││┌ @ range.jl:1043 within `==`
	cmpq	$1, %rax
	je	L183
	testq	%rax, %rax
	jne	L256
; │││││││┌ @ range.jl:609 within `isempty`
; ││││││││┌ @ operators.jl:378 within `>`
; │││││││││┌ @ int.jl:83 within `<`
	testq	%r12, %r12
; ││││└└└└└└
	jle	L189
; ││││┌ @ abstractarray.jl:95 within `axes`
; │││││┌ @ tuple.jl:221 within `map`
L154:
	movq	%rax, 40(%rsp)
; ││││└└
	movabsq	$throwdm, %rax
	leaq	40(%rsp), %rdi
	leaq	48(%rsp), %rsi
	callq	*%rax
	ud2
; ││││┌ @ tuple.jl:381 within `==`
; │││││┌ @ tuple.jl:385 within `_eq`
; ││││││┌ @ range.jl:1044 within `==`
; │││││││┌ @ range.jl:1057 within `_has_length_one`
; ││││││││┌ @ promotion.jl:468 within `==`
L183:
	cmpq	$1, %r12
; ││││└└└└└
	jne	L154
; ││└└
; ││┌ @ broadcast.jl:281 within `instantiate`
; │││┌ @ broadcast.jl:499 within `combine_axes`
; ││││┌ @ range.jl:643 within `axes`
; │││││┌ @ range.jl:413 within `oneto`
; ││││││┌ @ range.jl:411 within `OneTo` @ range.jl:398
; │││││││┌ @ promotion.jl:479 within `max`
L189:
	testq	%r13, %r13
; ││└└└└└└
; ││┌ @ broadcast.jl:885 within `copy`
; │││┌ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:960
; ││││┌ @ simdloop.jl:72 within `macro expansion`
	jle	L317
; ││││└
; ││││ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:957
; ││││┌ @ broadcast.jl:940 within `preprocess`
; │││││┌ @ broadcast.jl:943 within `preprocess_args`
; ││││││┌ @ broadcast.jl:941 within `preprocess`
; │││││││┌ @ broadcast.jl:637 within `extrude`
; ││││││││┌ @ broadcast.jl:586 within `newindexer`
; │││││││││┌ @ broadcast.jl:587 within `shapeindexer`
; ││││││││││┌ @ broadcast.jl:592 within `_newindexer`
; │││││││││││┌ @ operators.jl:278 within `!=`
; ││││││││││││┌ @ promotion.jl:468 within `==`
	testq	%rbp, %rbp
; ││││└└└└└└└└└
; ││││ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:960
; ││││┌ @ simdloop.jl:75 within `macro expansion`
	je	L263
; ││││└
; ││││┌ @ simdloop.jl within `macro expansion`
	xorl	%ebp, %ebp
	movabsq	$power_by_squaring, %r13
	nopw	%cs:(%rax,%rax)
; ││││└
; ││││┌ @ simdloop.jl:77 within `macro expansion` @ broadcast.jl:961
; │││││┌ @ broadcast.jl:597 within `getindex`
; ││││││┌ @ broadcast.jl:642 within `_broadcast_getindex`
; │││││││┌ @ broadcast.jl:666 within `_getindex`
; ││││││││┌ @ broadcast.jl:636 within `_broadcast_getindex`
; │││││││││┌ @ range.jl:860 within `getindex`
; ││││││││││┌ @ int.jl:87 within `+`
L224:
	leaq	(%r15,%rbp), %rdi
	movq	%rbx, 16(%rsp)
; ││││││└└└└└
; ││││││┌ @ broadcast.jl:643 within `_broadcast_getindex`
; │││││││┌ @ broadcast.jl:670 within `_broadcast_getindex_evalf`
; ││││││││┌ @ intfuncs.jl:290 within `^`
	movq	%r14, %rsi
	callq	*%r13
; │││││└└└└
; │││││┌ @ array.jl:903 within `setindex!`
	movq	(%rbx), %rcx
	movq	%rax, (%rcx,%rbp,8)
; ││││└└
; ││││┌ @ simdloop.jl:78 within `macro expansion`
; │││││┌ @ int.jl:87 within `+`
	incq	%rbp
; ││││└└
; ││││┌ @ simdloop.jl:75 within `macro expansion`
; │││││┌ @ int.jl:83 within `<`
	cmpq	%rbp, %r12
; │││││└
	jne	L224
	jmp	L317
; ││││└
; ││││ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:949
; ││││┌ @ tuple.jl:381 within `==`
; │││││┌ @ tuple.jl:385 within `_eq`
; ││││││┌ @ range.jl:1045 within `==` @ promotion.jl:468
L256:
	cmpq	%r12, %rax
	je	L189
	jmp	L154
; ││││└└└
; ││││ @ broadcast.jl:913 within `copyto!` @ promotion.jl:0
L263:
	xorl	%ebp, %ebp
	movabsq	$power_by_squaring, %r13
	nopw	%cs:(%rax,%rax)
L288:
	movq	%rbx, 16(%rsp)
; ││││ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:960
; ││││┌ @ simdloop.jl:77 within `macro expansion` @ broadcast.jl:961
; │││││┌ @ broadcast.jl:597 within `getindex`
; ││││││┌ @ broadcast.jl:643 within `_broadcast_getindex`
; │││││││┌ @ broadcast.jl:670 within `_broadcast_getindex_evalf`
; ││││││││┌ @ intfuncs.jl:290 within `^`
	movq	%r15, %rdi
	movq	%r14, %rsi
	callq	*%r13
; │││││└└└└
; │││││┌ @ array.jl:903 within `setindex!`
	movq	(%rbx), %rcx
	movq	%rax, (%rcx,%rbp,8)
; ││││└└
; ││││┌ @ simdloop.jl:78 within `macro expansion`
; │││││┌ @ int.jl:87 within `+`
	incq	%rbp
; ││││└└
; ││││┌ @ simdloop.jl:75 within `macro expansion`
; │││││┌ @ int.jl:83 within `<`
	cmpq	%rbp, %r12
; │││││└
	jne	L288
; │└└└└
; │ @ simdloop.jl within `##dotfunction#291#1`
L317:
	movq	8(%rsp), %rax
	movq	32(%rsp), %rcx
	movq	%rax, (%rcx)
	movq	%rbx, %rax
	addq	$56, %rsp
	popq	%rbx
	popq	%r12
	popq	%r13
	popq	%r14
	popq	%r15
	popq	%rbp
	retq
	nopl	(%rax)
; └

1.8+/1.9-DEV

julia> @code_native (1:10^9).^2;
	.text
	.file	"##dotfunction#312#3"
	.globl	"julia_##dotfunction#312#3_61"  # -- Begin function julia_##dotfunction#312#3_61
	.p2align	4, 0x90
	.type	"julia_##dotfunction#312#3_61",@function
"julia_##dotfunction#312#3_61":         # @"julia_##dotfunction#312#3_61"
; ┌ @ none within `##dotfunction#312#3`
	.cfi_startproc
# %bb.0:                                # %top
	pushq	%rbp
	.cfi_def_cfa_offset 16
	pushq	%r15
	.cfi_def_cfa_offset 24
	pushq	%r14
	.cfi_def_cfa_offset 32
	pushq	%r13
	.cfi_def_cfa_offset 40
	pushq	%r12
	.cfi_def_cfa_offset 48
	pushq	%rbx
	.cfi_def_cfa_offset 56
	subq	$56, %rsp
	.cfi_def_cfa_offset 112
	.cfi_offset %rbx, -56
	.cfi_offset %r12, -48
	.cfi_offset %r13, -40
	.cfi_offset %r14, -32
	.cfi_offset %r15, -24
	.cfi_offset %rbp, -16
	vxorps	%xmm0, %xmm0, %xmm0
	vmovaps	%xmm0, (%rsp)
	movq	%rsi, %r14
	movq	$0, 16(%rsp)
	#APP
	movq	%fs:0, %rax
	#NO_APP
	movq	-8(%rax), %rcx
; │┌ @ broadcast.jl:860 within `materialize`
; ││┌ @ broadcast.jl:281 within `instantiate`
; │││┌ @ broadcast.jl:499 within `combine_axes`
; ││││┌ @ range.jl:688 within `axes`
; │││││┌ @ range.jl:737 within `length`
; ││││││┌ @ range.jl:811 within `first`
; │││││││┌ @ Base.jl:37 within `getproperty`
	movq	$4, (%rsp)
	movq	(%rcx), %rax
	movq	%rax, 8(%rsp)
	movq	%rsp, %rax
	movq	%rcx, 32(%rsp)                  # 8-byte Spill
	movq	%rax, (%rcx)
; ││││││└└
; ││││││ @ range.jl:740 within `length`
; ││││││┌ @ int.jl:86 within `-`
	movq	(%rdi), %rax
	movq	8(%rdi), %r12
	movq	%r12, %r13
	movq	%rax, %rbp
	subq	%rax, %r13
; ││││││└
; ││││││┌ @ int.jl:87 within `+`
	leaq	1(%r13), %rax
	xorl	%r15d, %r15d
	movabsq	$9223372036854775807, %rcx      # imm = 0x7FFFFFFFFFFFFFFF
; │││││└└
; │││││┌ @ range.jl:455 within `oneto`
; ││││││┌ @ range.jl:453 within `OneTo` @ range.jl:440
; │││││││┌ @ promotion.jl:488 within `max`
; ││││││││┌ @ essentials.jl:556 within `ifelse`
	cmpq	%rcx, %r13
	cmovbq	%rax, %r15
; ││││└└└└└
; ││││┌ @ broadcast.jl:504 within `broadcast_shape`
; │││││┌ @ broadcast.jl:508 within `_bcs`
	movq	%r15, 48(%rsp)
	movabsq	$140165121583952, %rdi          # imm = 0x7F7ABC482B50
	movabsq	$140165461506848, %rax          # imm = 0x7F7AD08AFB20
; ││└└└└
; ││┌ @ broadcast.jl:885 within `copy`
; │││┌ @ broadcast.jl:211 within `similar` @ broadcast.jl:212 @ abstractarray.jl:852 @ abstractarray.jl:853
; ││││┌ @ boot.jl:478 within `Array` @ boot.jl:470 @ boot.jl:461
	movq	%r15, %rsi
	callq	*%rax
	movq	%rax, %rbx
; │││└└
; │││┌ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:949
; ││││┌ @ abstractarray.jl:95 within `axes`
; │││││┌ @ array.jl:149 within `size`
	movq	8(%rax), %rax
; ││││└└
; ││││┌ @ tuple.jl:458 within `==`
; │││││┌ @ tuple.jl:462 within `_eq`
; ││││││┌ @ range.jl:1106 within `==` @ promotion.jl:477
	cmpq	%r15, %rax
; ││││└└└
	jne	.LBB0_9
# %bb.1:                                # %L23
	movabsq	$9223372036854775806, %rax      # imm = 0x7FFFFFFFFFFFFFFE
; ││└└
; ││┌ @ broadcast.jl:281 within `instantiate`
; │││┌ @ broadcast.jl:499 within `combine_axes`
; ││││┌ @ range.jl:688 within `axes`
; │││││┌ @ range.jl:455 within `oneto`
; ││││││┌ @ range.jl:453 within `OneTo` @ range.jl:440
; │││││││┌ @ promotion.jl:488 within `max`
; ││││││││┌ @ essentials.jl:556 within `ifelse`
	cmpq	%rax, %r13
; ││└└└└└└└
; ││┌ @ broadcast.jl:885 within `copy`
; │││┌ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:960
; ││││┌ @ simdloop.jl:72 within `macro expansion`
	ja	.LBB0_8
# %bb.2:                                # %L23
	testq	%r15, %r15
	jle	.LBB0_8
# %bb.3:                                # %L43.lr.ph
	movq	%rbp, %r13
	xorl	%ebp, %ebp
; ││││└
; ││││ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:957
; ││││┌ @ broadcast.jl:940 within `preprocess`
; │││││┌ @ broadcast.jl:943 within `preprocess_args`
; ││││││┌ @ broadcast.jl:941 within `preprocess`
; │││││││┌ @ broadcast.jl:637 within `extrude`
; ││││││││┌ @ broadcast.jl:586 within `newindexer`
; │││││││││┌ @ broadcast.jl:587 within `shapeindexer`
; ││││││││││┌ @ broadcast.jl:592 within `_newindexer`
; │││││││││││┌ @ operators.jl:269 within `!=`
; ││││││││││││┌ @ promotion.jl:477 within `==`
	cmpq	%r13, %r12
; ││││└└└└└└└└└
; ││││ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:960
; ││││┌ @ simdloop.jl:75 within `macro expansion`
	jne	.LBB0_4
# %bb.6:                                # %L43.us.preheader
	movabsq	$j_power_by_squaring_63, %r13
	.p2align	4, 0x90
.LBB0_7:                                # %L43.us
                                        # =>This Inner Loop Header: Depth=1
	movq	%rbx, 16(%rsp)
; │││││ @ simdloop.jl:77 within `macro expansion` @ broadcast.jl:961
; │││││┌ @ broadcast.jl:597 within `getindex`
; ││││││┌ @ broadcast.jl:643 within `_broadcast_getindex`
; │││││││┌ @ broadcast.jl:670 within `_broadcast_getindex_evalf`
; ││││││││┌ @ intfuncs.jl:297 within `^`
	movq	%r12, %rdi
	movq	%r14, %rsi
	callq	*%r13
; │││││└└└└
; │││││┌ @ array.jl:959 within `setindex!`
	movq	(%rbx), %rcx
	movq	%rax, (%rcx,%rbp,8)
; │││││└
; │││││ @ simdloop.jl:78 within `macro expansion`
; │││││┌ @ int.jl:87 within `+`
	incq	%rbp
; │││││└
; │││││ @ simdloop.jl:75 within `macro expansion`
; │││││┌ @ int.jl:83 within `<`
	cmpq	%rbp, %r15
; │││││└
	jne	.LBB0_7
	jmp	.LBB0_8
.LBB0_4:                                # %L43.preheader
	movabsq	$j_power_by_squaring_63, %r12
	.p2align	4, 0x90
.LBB0_5:                                # %L43
                                        # =>This Inner Loop Header: Depth=1
; │││││ @ simdloop.jl:77 within `macro expansion` @ broadcast.jl:961
; │││││┌ @ broadcast.jl:597 within `getindex`
; ││││││┌ @ broadcast.jl:642 within `_broadcast_getindex`
; │││││││┌ @ broadcast.jl:666 within `_getindex`
; ││││││││┌ @ broadcast.jl:636 within `_broadcast_getindex`
; │││││││││┌ @ range.jl:901 within `getindex`
; ││││││││││┌ @ int.jl:87 within `+`
	movq	%r13, %rdi
	addq	%rbp, %rdi
	movq	%rbx, 16(%rsp)
; │││││││└└└└
; │││││││ @ broadcast.jl:643 within `_broadcast_getindex`
; │││││││┌ @ broadcast.jl:670 within `_broadcast_getindex_evalf`
; ││││││││┌ @ intfuncs.jl:297 within `^`
	movq	%r14, %rsi
	callq	*%r12
; │││││└└└└
; │││││┌ @ array.jl:959 within `setindex!`
	movq	(%rbx), %rcx
	movq	%rax, (%rcx,%rbp,8)
; │││││└
; │││││ @ simdloop.jl:78 within `macro expansion`
; │││││┌ @ int.jl:87 within `+`
	incq	%rbp
; │││││└
; │││││ @ simdloop.jl:75 within `macro expansion`
; │││││┌ @ int.jl:83 within `<`
	cmpq	%rbp, %r15
; │││││└
	jne	.LBB0_5
.LBB0_8:                                # %L110
	movq	8(%rsp), %rax
	movq	32(%rsp), %rcx                  # 8-byte Reload
	movq	%rax, (%rcx)
; │└└└└
	movq	%rbx, %rax
	addq	$56, %rsp
	.cfi_def_cfa_offset 56
	popq	%rbx
	.cfi_def_cfa_offset 48
	popq	%r12
	.cfi_def_cfa_offset 40
	popq	%r13
	.cfi_def_cfa_offset 32
	popq	%r14
	.cfi_def_cfa_offset 24
	popq	%r15
	.cfi_def_cfa_offset 16
	popq	%rbp
	.cfi_def_cfa_offset 8
	retq
.LBB0_9:                                # %L99
; │┌ @ broadcast.jl:860 within `materialize`
; ││┌ @ broadcast.jl:885 within `copy`
; │││┌ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:949
; ││││┌ @ abstractarray.jl:95 within `axes`
; │││││┌ @ tuple.jl:273 within `map`
; ││││││┌ @ range.jl:455 within `oneto`
; │││││││┌ @ range.jl:453 within `OneTo` @ range.jl:440
	.cfi_def_cfa_offset 112
	movq	%rax, 40(%rsp)
; ││││└└└└
	movabsq	$j_throwdm_67, %rax
	leaq	40(%rsp), %rdi
	leaq	48(%rsp), %rsi
	callq	*%rax
	ud2
.Lfunc_end0:
	.size	"julia_##dotfunction#312#3_61", .Lfunc_end0-"julia_##dotfunction#312#3_61"
	.cfi_endproc
; └└└└
                                        # -- End function
	.section	".note.GNU-stack","",@progbits

That being said, wrapping the expression in a function restores performance for me, so may have to do with inlining heuristics or timing at the top level & global scope (which should be avoided anyway):

julia> f() = (1:10^9).^2
f (generic function with 1 method)

julia> @time f();
  3.127988 seconds (2 allocations: 7.451 GiB, 0.22% gc time)

mindaslab · June 2, 2022, 4:31pm

Hello All,

I have rerun it and added with version info so that it might be useful for some one.

For 1.8:

julia> versioninfo(;verbose=true)
Julia Version 1.8.0-rc1
Commit 6368fdc6565 (2022-05-27 18:33 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.3.0)
  uname: Darwin 21.5.0 Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:29 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T8101 arm64 arm
  CPU: Apple M1:
              speed         user         nice          sys         idle          irq
       #1    24 MHz     268397 s          0 s     226552 s    2012133 s          0 s
       #2    24 MHz     252151 s          0 s     206049 s    2048840 s          0 s
       #3    24 MHz     216621 s          0 s     173810 s    2116610 s          0 s
       #4    24 MHz     193609 s          0 s     150167 s    2163265 s          0 s
       #5    24 MHz     249916 s          0 s      90322 s    2166801 s          0 s
       #6    24 MHz     193540 s          0 s      47919 s    2265581 s          0 s
       #7    24 MHz     147989 s          0 s      32894 s    2326158 s          0 s
       #8    24 MHz     111387 s          0 s      24122 s    2371532 s          0 s
  Memory: 16.0 GB (4686.4375 MB free)
  Uptime: 713907.0 sec
  Load Avg:  2.3203125  1.94775390625  1.94873046875
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 1 on 4 virtual cores

julia> @time (1:10^9).^2;
 11.966374 seconds (2 allocations: 7.451 GiB, 0.03% gc time)

For 1.7:

julia> versioninfo(;verbose=true)
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.2.0)
  uname: Darwin 21.5.0 Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:29 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T8101 arm64 arm
  CPU: Apple M1:
              speed         user         nice          sys         idle          irq
       #1    24 MHz     268968 s          0 s     227072 s    2015937 s          0 s
       #2    24 MHz     252696 s          0 s     206528 s    2052712 s          0 s
       #3    24 MHz     217099 s          0 s     174210 s    2120628 s          0 s
       #4    24 MHz     194038 s          0 s     150514 s    2167386 s          0 s
       #5    24 MHz     250359 s          0 s      90518 s    2171058 s          0 s
       #6    24 MHz     193898 s          0 s      48033 s    2270005 s          0 s
       #7    24 MHz     148184 s          0 s      32955 s    2330798 s          0 s
       #8    24 MHz     111516 s          0 s      24166 s    2376257 s          0 s

  Memory: 16.0 GB (4438.984375 MB free)
  Uptime: 714396.0 sec
  Load Avg:  3.8828125  2.6142578125  2.22265625
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, cyclone)


julia> @time (1:10^9).^2;
  2.412986 seconds (2 allocations: 7.451 GiB, 0.37% gc time)

Sukera · June 2, 2022, 4:32pm

Well, irrespective of the slowdown at the top level being a thing, you’re certainly hitting SWAP, and with that disk I/O. You’re allocating an array larger than your remaining free RAM. I’d suggest closing some background tasks and rerunning it.

Either way, can you please also run

f() = (1:10^9).^2
@time f();

?

mindaslab · June 2, 2022, 4:42pm

It’s just 1.42 seconds! Wow! For Julia 1.8:

julia> f() = (1:10^9).^2
f (generic function with 1 method)

julia> @time f();
  1.415760 seconds (2 allocations: 7.451 GiB, 0.32% gc time)

And 1.40 for Julia 1.7.2:

julia> f() = (1:10^9).^2
f (generic function with 1 method)

julia> @time f();
  1.404488 seconds (2 allocations: 7.451 GiB, 0.32% gc time)

Can any one explain why wrapping in function makes such a huge difference?

Sukera · June 2, 2022, 4:49pm

That’s what I like to see!

We can take a look at what @time is actually doing, with @macroexpand:

julia> @macroexpand @time (1:10^9).^2
quote
    begin
        $(Expr(:meta, :force_compile))
        local var"#8#stats" = Base.gc_num()
        local var"#10#elapsedtime" = Base.time_ns()
        Base.cumulative_compile_timing(true)
        local var"#11#compile_elapsedtimes" = Base.cumulative_compile_time_ns()
        local var"#9#val" = $(Expr(:tryfinally, :((1:10 ^ 9) .^ 2), quote
    var"#10#elapsedtime" = Base.time_ns() - var"#10#elapsedtime"
    Base.cumulative_compile_timing(false)
    var"#11#compile_elapsedtimes" = Base.cumulative_compile_time_ns() .- var"#11#compile_elapsedtimes"
end))
        local var"#12#diff" = Base.GC_Diff(Base.gc_num(), var"#8#stats")
        local var"#13#_msg" = Base.nothing
        local var"#14#has_msg" = !(Base.isnothing(var"#13#_msg"))
        var"#14#has_msg" && Base.print(var"#13#_msg", ": ")
        Base.time_print(var"#10#elapsedtime", (var"#12#diff").allocd, (var"#12#diff").total_time, Base.gc_alloc_count(var"#12#diff"), Base.first(var"#11#compile_elapsedtimes"), Base.last(var"#11#compile_elapsedtimes"), true, !var"#14#has_msg")
        var"#9#val"
    end
end

(I’ve removed all the comments signifying where the macro was defined.)

@time does nothing more than wrap your code (see the loval var"#9#val" = .. thing) and calculates how long it took to run. This has the sideeffect of running your code exactly as you wrote it - meaning in this case, it’s all run in global scope. Global scope does not allow for the same optimizations that can be done in a function, hence there will always be a performance difference between the two. Since the slowdown does not appear in a function, I’m inclined to conclude that this is not a bug with julia 1.8, but rather a benchmarking artifact due to benchmarking in global scope (which is a bad idea, as you’ve noticed).

This is also mentioned in the Performance tips:

Performance critical code should be inside a function

Any code that is performance critical should be inside a function. Code inside functions tends to run much faster than top level code, due to how Julia’s compiler works.

yangcht · July 22, 2022, 9:04am

Freya_the_Goddess · July 22, 2022, 9:54am

My computer/laptop is slow.

What makes it faster? Processor or RAM?

tim.holy · July 22, 2022, 12:03pm

Can I recommend that anyone who is trying this change it to

using BenchmarkTools
f() = (1:10^6).^2
@btime f();

As Matt says, there’s really no point allocating a 7GB RAM array as part of a benchmark. Your results will entirely depend on details of your hardware and what you have loaded concurrently. This really is meaningless and no one should draw conclusions from the other variants here.

Topic		Replies	Views
Potential performance regressions in Julia 1.8 for special un-precompiled type dispatches and how to fix them Performance performance	25	3452	October 4, 2022
Funny Benchmark with Julia (no longer) at the bottom Performance benchmark	149	6060	November 4, 2023
Benchmarks game Performance	20	3799	May 13, 2020
Yet another language benchmark Performance benchmark	9	926	June 15, 2025
Julia v1.9.0-beta2 is fast Performance	44	5756	January 5, 2023

Shouldn't 1.8.0 be faster than Julia 1.7?

Related topics