Are you measuring compile time?
I am not sure what @time
really does. I thought things will get faster in Julia.
Can you run the command twice?
While compilation time is important, the running time is even more important.
This is a silly benchmark that is almost certainly hitting swap. Run it again and youβll likely see wildly varying numbers on all versions of Julia depending upon what else is happening on your computer and how much memory you have available.
Looks lime my computer is been consistently silly for more than 5 minutes
Should see if its varying widely tomorrow.
Seriously, this is not meaningful.
Sometimes when you run the βtestβ, your operating system needs to make space for a 7GB array. Sometimes, that space is already available.
That overhead has nothing to do with Julia versions.
Letβs not jump to conclusions about whether a benchmark is silly or not. @mbauman You donβt even know anything about the machine theyβre running it on.
FWIW, I can also see a slowdown, though not as drastic. Iβm running this on a machine with 20GB RAM, with plenty free to fit everything without having to swap.
julia> @time (1:10^9).^2;
3.392685 seconds (2 allocations: 7.451 GiB, 0.24% gc time)
julia> GC.gc()
julia> @time (1:10^9).^2;
3.690638 seconds (2 allocations: 7.451 GiB, 9.07% gc time)
julia> GC.gc()
julia> versioninfo()
Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 4
vs.
julia> @time (1:10^9).^2;
5.470563 seconds (2 allocations: 7.451 GiB, 0.17% gc time)
julia> @time (1:10^9).^2;
5.437031 seconds (2 allocations: 7.451 GiB, 0.11% gc time)
julia> GC.gc()
julia> @time (1:10^9).^2;
5.735879 seconds (2 allocations: 7.451 GiB, 4.50% gc time)
julia> versioninfo()
Julia Version 1.9.0-DEV.614
Commit 0488abc625* (2022-05-24 15:28 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: 4 Γ Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
Threads: 4 on 4 virtual cores
Environment:
JULIA_NUM_THREADS = 4
@mindaslab could you post the output of versioninfo(;verbose=true)
please?
β
Julia is not guaranteed to make every operation faster with a new version, but I do have to agree that the benchmark is kind of arbitrary and literally only measures a single allocation and application of a single function. Itβs not representative of all things you can do, so concluding that 1.8 is in general slower seems a little premature to me (especially considering that a great deal of other issues have gotten substantially faster).
Apologies, that output above does not include the full picture. Hereβs my output of versioninfo(;verbose=true)
, minus my PATH
:
julia> versioninfo(;verbose=true)
Julia Version 1.9.0-DEV.614
Commit 0488abc625* (2022-05-24 15:28 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
"Arch Linux"
uname: Linux 5.17.9-arch1-1 #1 SMP PREEMPT Wed, 18 May 2022 17:30:11 +0000 x86_64 unknown
CPU: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz:
speed user nice sys idle irq
#1 2418 MHz 97177 s 65 s 21188 s 184522 s 4351 s
#2 2800 MHz 99033 s 65 s 21217 s 184243 s 2350 s
#3 2800 MHz 102550 s 72 s 19599 s 183762 s 1928 s
#4 2800 MHz 100925 s 64 s 22680 s 182784 s 2273 s
Memory: 19.400531768798828 GB (15737.29296875 MB free)
Uptime: 31265.1 sec
Load Avg: 1.15 1.29 2.09
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
Threads: 4 on 4 virtual cores
Environment:
JULIA_NUM_THREADS = 4
HOME = /home/sukera
TERM = foot
is there a difference in the @code_native
?
The code is different for sure. I can also replicate the regression, in my computer itβs only about 25%, but itβs there.
Yes, but Iβm not sure how much is due to a difference in code used.
1.7.3
julia> @code_native (1:10^9).^2;
.text
; β @ none within `##dotfunction#291#1`
pushq %rbp
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
subq $56, %rsp
movq %rsi, %r14
vxorps %xmm0, %xmm0, %xmm0
vmovaps %xmm0, (%rsp)
movq $0, 16(%rsp)
movq %fs:0, %rax
movq -8(%rax), %rcx
; ββ @ broadcast.jl:860 within `materialize`
; βββ @ broadcast.jl:281 within `instantiate`
; ββββ @ broadcast.jl:499 within `combine_axes`
; βββββ @ range.jl:643 within `axes`
; ββββββ @ range.jl:699 within `length`
; βββββββ @ range.jl:775 within `last`
; ββββββββ @ Base.jl:42 within `getproperty`
movq $4, (%rsp)
movq (%rcx), %rax
movq %rax, 8(%rsp)
movq %rsp, %rax
movq %rcx, 32(%rsp)
movq %rax, (%rcx)
; ββββββββ
; βββββββ @ int.jl:86 within `-`
movq (%rdi), %r15
movq 8(%rdi), %rbp
subq %r15, %rbp
; βββββββ
; ββββββ @ range.jl:700 within `length`
; βββββββ @ int.jl:87 within `+`
leaq 1(%rbp), %r13
; βββββββ
; ββββββ @ range.jl:413 within `oneto`
; βββββββ @ range.jl:411 within `OneTo` @ range.jl:398
; ββββββββ @ promotion.jl:479 within `max`
movq %r13, %rax
sarq $63, %rax
andnq %r13, %rax, %r12
; ββββββββ
; βββββ @ broadcast.jl:504 within `broadcast_shape`
; ββββββ @ broadcast.jl:508 within `_bcs`
movq %r12, 48(%rsp)
movabsq $jl_system_image_data, %rdi
movabsq $140407002443377, %rax # imm = 0x7FB30D812E71
; ββββββ
; βββ @ broadcast.jl:885 within `copy`
; ββββ @ broadcast.jl:211 within `similar` @ broadcast.jl:212 @ abstractarray.jl:828 @ abstractarray.jl:829
; βββββ @ boot.jl:474 within `Array` @ boot.jl:466 @ boot.jl:457
movq %r12, %rsi
callq *%rax
movq %rax, %rbx
; βββββ
; ββββ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:949
; βββββ @ abstractarray.jl:95 within `axes`
; ββββββ @ array.jl:151 within `size`
movq 24(%rax), %rax
; ββββββ
; βββββ @ tuple.jl:381 within `==`
; ββββββ @ tuple.jl:385 within `_eq`
; βββββββ @ range.jl:1043 within `==`
cmpq $1, %rax
je L183
testq %rax, %rax
jne L256
; ββββββββ @ range.jl:609 within `isempty`
; βββββββββ @ operators.jl:378 within `>`
; ββββββββββ @ int.jl:83 within `<`
testq %r12, %r12
; ββββββββββ
jle L189
; βββββ @ abstractarray.jl:95 within `axes`
; ββββββ @ tuple.jl:221 within `map`
L154:
movq %rax, 40(%rsp)
; ββββββ
movabsq $throwdm, %rax
leaq 40(%rsp), %rdi
leaq 48(%rsp), %rsi
callq *%rax
ud2
; βββββ @ tuple.jl:381 within `==`
; ββββββ @ tuple.jl:385 within `_eq`
; βββββββ @ range.jl:1044 within `==`
; ββββββββ @ range.jl:1057 within `_has_length_one`
; βββββββββ @ promotion.jl:468 within `==`
L183:
cmpq $1, %r12
; βββββββββ
jne L154
; ββββ
; βββ @ broadcast.jl:281 within `instantiate`
; ββββ @ broadcast.jl:499 within `combine_axes`
; βββββ @ range.jl:643 within `axes`
; ββββββ @ range.jl:413 within `oneto`
; βββββββ @ range.jl:411 within `OneTo` @ range.jl:398
; ββββββββ @ promotion.jl:479 within `max`
L189:
testq %r13, %r13
; ββββββββ
; βββ @ broadcast.jl:885 within `copy`
; ββββ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:960
; βββββ @ simdloop.jl:72 within `macro expansion`
jle L317
; βββββ
; ββββ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:957
; βββββ @ broadcast.jl:940 within `preprocess`
; ββββββ @ broadcast.jl:943 within `preprocess_args`
; βββββββ @ broadcast.jl:941 within `preprocess`
; ββββββββ @ broadcast.jl:637 within `extrude`
; βββββββββ @ broadcast.jl:586 within `newindexer`
; ββββββββββ @ broadcast.jl:587 within `shapeindexer`
; βββββββββββ @ broadcast.jl:592 within `_newindexer`
; ββββββββββββ @ operators.jl:278 within `!=`
; βββββββββββββ @ promotion.jl:468 within `==`
testq %rbp, %rbp
; βββββββββββββ
; ββββ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:960
; βββββ @ simdloop.jl:75 within `macro expansion`
je L263
; βββββ
; βββββ @ simdloop.jl within `macro expansion`
xorl %ebp, %ebp
movabsq $power_by_squaring, %r13
nopw %cs:(%rax,%rax)
; βββββ
; βββββ @ simdloop.jl:77 within `macro expansion` @ broadcast.jl:961
; ββββββ @ broadcast.jl:597 within `getindex`
; βββββββ @ broadcast.jl:642 within `_broadcast_getindex`
; ββββββββ @ broadcast.jl:666 within `_getindex`
; βββββββββ @ broadcast.jl:636 within `_broadcast_getindex`
; ββββββββββ @ range.jl:860 within `getindex`
; βββββββββββ @ int.jl:87 within `+`
L224:
leaq (%r15,%rbp), %rdi
movq %rbx, 16(%rsp)
; βββββββββββ
; βββββββ @ broadcast.jl:643 within `_broadcast_getindex`
; ββββββββ @ broadcast.jl:670 within `_broadcast_getindex_evalf`
; βββββββββ @ intfuncs.jl:290 within `^`
movq %r14, %rsi
callq *%r13
; βββββββββ
; ββββββ @ array.jl:903 within `setindex!`
movq (%rbx), %rcx
movq %rax, (%rcx,%rbp,8)
; ββββββ
; βββββ @ simdloop.jl:78 within `macro expansion`
; ββββββ @ int.jl:87 within `+`
incq %rbp
; ββββββ
; βββββ @ simdloop.jl:75 within `macro expansion`
; ββββββ @ int.jl:83 within `<`
cmpq %rbp, %r12
; ββββββ
jne L224
jmp L317
; βββββ
; ββββ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:949
; βββββ @ tuple.jl:381 within `==`
; ββββββ @ tuple.jl:385 within `_eq`
; βββββββ @ range.jl:1045 within `==` @ promotion.jl:468
L256:
cmpq %r12, %rax
je L189
jmp L154
; βββββββ
; ββββ @ broadcast.jl:913 within `copyto!` @ promotion.jl:0
L263:
xorl %ebp, %ebp
movabsq $power_by_squaring, %r13
nopw %cs:(%rax,%rax)
L288:
movq %rbx, 16(%rsp)
; ββββ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:960
; βββββ @ simdloop.jl:77 within `macro expansion` @ broadcast.jl:961
; ββββββ @ broadcast.jl:597 within `getindex`
; βββββββ @ broadcast.jl:643 within `_broadcast_getindex`
; ββββββββ @ broadcast.jl:670 within `_broadcast_getindex_evalf`
; βββββββββ @ intfuncs.jl:290 within `^`
movq %r15, %rdi
movq %r14, %rsi
callq *%r13
; βββββββββ
; ββββββ @ array.jl:903 within `setindex!`
movq (%rbx), %rcx
movq %rax, (%rcx,%rbp,8)
; ββββββ
; βββββ @ simdloop.jl:78 within `macro expansion`
; ββββββ @ int.jl:87 within `+`
incq %rbp
; ββββββ
; βββββ @ simdloop.jl:75 within `macro expansion`
; ββββββ @ int.jl:83 within `<`
cmpq %rbp, %r12
; ββββββ
jne L288
; βββββ
; β @ simdloop.jl within `##dotfunction#291#1`
L317:
movq 8(%rsp), %rax
movq 32(%rsp), %rcx
movq %rax, (%rcx)
movq %rbx, %rax
addq $56, %rsp
popq %rbx
popq %r12
popq %r13
popq %r14
popq %r15
popq %rbp
retq
nopl (%rax)
; β
1.8+/1.9-DEV
julia> @code_native (1:10^9).^2;
.text
.file "##dotfunction#312#3"
.globl "julia_##dotfunction#312#3_61" # -- Begin function julia_##dotfunction#312#3_61
.p2align 4, 0x90
.type "julia_##dotfunction#312#3_61",@function
"julia_##dotfunction#312#3_61": # @"julia_##dotfunction#312#3_61"
; β @ none within `##dotfunction#312#3`
.cfi_startproc
# %bb.0: # %top
pushq %rbp
.cfi_def_cfa_offset 16
pushq %r15
.cfi_def_cfa_offset 24
pushq %r14
.cfi_def_cfa_offset 32
pushq %r13
.cfi_def_cfa_offset 40
pushq %r12
.cfi_def_cfa_offset 48
pushq %rbx
.cfi_def_cfa_offset 56
subq $56, %rsp
.cfi_def_cfa_offset 112
.cfi_offset %rbx, -56
.cfi_offset %r12, -48
.cfi_offset %r13, -40
.cfi_offset %r14, -32
.cfi_offset %r15, -24
.cfi_offset %rbp, -16
vxorps %xmm0, %xmm0, %xmm0
vmovaps %xmm0, (%rsp)
movq %rsi, %r14
movq $0, 16(%rsp)
#APP
movq %fs:0, %rax
#NO_APP
movq -8(%rax), %rcx
; ββ @ broadcast.jl:860 within `materialize`
; βββ @ broadcast.jl:281 within `instantiate`
; ββββ @ broadcast.jl:499 within `combine_axes`
; βββββ @ range.jl:688 within `axes`
; ββββββ @ range.jl:737 within `length`
; βββββββ @ range.jl:811 within `first`
; ββββββββ @ Base.jl:37 within `getproperty`
movq $4, (%rsp)
movq (%rcx), %rax
movq %rax, 8(%rsp)
movq %rsp, %rax
movq %rcx, 32(%rsp) # 8-byte Spill
movq %rax, (%rcx)
; ββββββββ
; ββββββ @ range.jl:740 within `length`
; βββββββ @ int.jl:86 within `-`
movq (%rdi), %rax
movq 8(%rdi), %r12
movq %r12, %r13
movq %rax, %rbp
subq %rax, %r13
; βββββββ
; βββββββ @ int.jl:87 within `+`
leaq 1(%r13), %rax
xorl %r15d, %r15d
movabsq $9223372036854775807, %rcx # imm = 0x7FFFFFFFFFFFFFFF
; βββββββ
; ββββββ @ range.jl:455 within `oneto`
; βββββββ @ range.jl:453 within `OneTo` @ range.jl:440
; ββββββββ @ promotion.jl:488 within `max`
; βββββββββ @ essentials.jl:556 within `ifelse`
cmpq %rcx, %r13
cmovbq %rax, %r15
; βββββββββ
; βββββ @ broadcast.jl:504 within `broadcast_shape`
; ββββββ @ broadcast.jl:508 within `_bcs`
movq %r15, 48(%rsp)
movabsq $140165121583952, %rdi # imm = 0x7F7ABC482B50
movabsq $140165461506848, %rax # imm = 0x7F7AD08AFB20
; ββββββ
; βββ @ broadcast.jl:885 within `copy`
; ββββ @ broadcast.jl:211 within `similar` @ broadcast.jl:212 @ abstractarray.jl:852 @ abstractarray.jl:853
; βββββ @ boot.jl:478 within `Array` @ boot.jl:470 @ boot.jl:461
movq %r15, %rsi
callq *%rax
movq %rax, %rbx
; βββββ
; ββββ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:949
; βββββ @ abstractarray.jl:95 within `axes`
; ββββββ @ array.jl:149 within `size`
movq 8(%rax), %rax
; ββββββ
; βββββ @ tuple.jl:458 within `==`
; ββββββ @ tuple.jl:462 within `_eq`
; βββββββ @ range.jl:1106 within `==` @ promotion.jl:477
cmpq %r15, %rax
; βββββββ
jne .LBB0_9
# %bb.1: # %L23
movabsq $9223372036854775806, %rax # imm = 0x7FFFFFFFFFFFFFFE
; ββββ
; βββ @ broadcast.jl:281 within `instantiate`
; ββββ @ broadcast.jl:499 within `combine_axes`
; βββββ @ range.jl:688 within `axes`
; ββββββ @ range.jl:455 within `oneto`
; βββββββ @ range.jl:453 within `OneTo` @ range.jl:440
; ββββββββ @ promotion.jl:488 within `max`
; βββββββββ @ essentials.jl:556 within `ifelse`
cmpq %rax, %r13
; βββββββββ
; βββ @ broadcast.jl:885 within `copy`
; ββββ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:960
; βββββ @ simdloop.jl:72 within `macro expansion`
ja .LBB0_8
# %bb.2: # %L23
testq %r15, %r15
jle .LBB0_8
# %bb.3: # %L43.lr.ph
movq %rbp, %r13
xorl %ebp, %ebp
; βββββ
; ββββ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:957
; βββββ @ broadcast.jl:940 within `preprocess`
; ββββββ @ broadcast.jl:943 within `preprocess_args`
; βββββββ @ broadcast.jl:941 within `preprocess`
; ββββββββ @ broadcast.jl:637 within `extrude`
; βββββββββ @ broadcast.jl:586 within `newindexer`
; ββββββββββ @ broadcast.jl:587 within `shapeindexer`
; βββββββββββ @ broadcast.jl:592 within `_newindexer`
; ββββββββββββ @ operators.jl:269 within `!=`
; βββββββββββββ @ promotion.jl:477 within `==`
cmpq %r13, %r12
; βββββββββββββ
; ββββ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:960
; βββββ @ simdloop.jl:75 within `macro expansion`
jne .LBB0_4
# %bb.6: # %L43.us.preheader
movabsq $j_power_by_squaring_63, %r13
.p2align 4, 0x90
.LBB0_7: # %L43.us
# =>This Inner Loop Header: Depth=1
movq %rbx, 16(%rsp)
; βββββ @ simdloop.jl:77 within `macro expansion` @ broadcast.jl:961
; ββββββ @ broadcast.jl:597 within `getindex`
; βββββββ @ broadcast.jl:643 within `_broadcast_getindex`
; ββββββββ @ broadcast.jl:670 within `_broadcast_getindex_evalf`
; βββββββββ @ intfuncs.jl:297 within `^`
movq %r12, %rdi
movq %r14, %rsi
callq *%r13
; βββββββββ
; ββββββ @ array.jl:959 within `setindex!`
movq (%rbx), %rcx
movq %rax, (%rcx,%rbp,8)
; ββββββ
; βββββ @ simdloop.jl:78 within `macro expansion`
; ββββββ @ int.jl:87 within `+`
incq %rbp
; ββββββ
; βββββ @ simdloop.jl:75 within `macro expansion`
; ββββββ @ int.jl:83 within `<`
cmpq %rbp, %r15
; ββββββ
jne .LBB0_7
jmp .LBB0_8
.LBB0_4: # %L43.preheader
movabsq $j_power_by_squaring_63, %r12
.p2align 4, 0x90
.LBB0_5: # %L43
# =>This Inner Loop Header: Depth=1
; βββββ @ simdloop.jl:77 within `macro expansion` @ broadcast.jl:961
; ββββββ @ broadcast.jl:597 within `getindex`
; βββββββ @ broadcast.jl:642 within `_broadcast_getindex`
; ββββββββ @ broadcast.jl:666 within `_getindex`
; βββββββββ @ broadcast.jl:636 within `_broadcast_getindex`
; ββββββββββ @ range.jl:901 within `getindex`
; βββββββββββ @ int.jl:87 within `+`
movq %r13, %rdi
addq %rbp, %rdi
movq %rbx, 16(%rsp)
; βββββββββββ
; βββββββ @ broadcast.jl:643 within `_broadcast_getindex`
; ββββββββ @ broadcast.jl:670 within `_broadcast_getindex_evalf`
; βββββββββ @ intfuncs.jl:297 within `^`
movq %r14, %rsi
callq *%r12
; βββββββββ
; ββββββ @ array.jl:959 within `setindex!`
movq (%rbx), %rcx
movq %rax, (%rcx,%rbp,8)
; ββββββ
; βββββ @ simdloop.jl:78 within `macro expansion`
; ββββββ @ int.jl:87 within `+`
incq %rbp
; ββββββ
; βββββ @ simdloop.jl:75 within `macro expansion`
; ββββββ @ int.jl:83 within `<`
cmpq %rbp, %r15
; ββββββ
jne .LBB0_5
.LBB0_8: # %L110
movq 8(%rsp), %rax
movq 32(%rsp), %rcx # 8-byte Reload
movq %rax, (%rcx)
; βββββ
movq %rbx, %rax
addq $56, %rsp
.cfi_def_cfa_offset 56
popq %rbx
.cfi_def_cfa_offset 48
popq %r12
.cfi_def_cfa_offset 40
popq %r13
.cfi_def_cfa_offset 32
popq %r14
.cfi_def_cfa_offset 24
popq %r15
.cfi_def_cfa_offset 16
popq %rbp
.cfi_def_cfa_offset 8
retq
.LBB0_9: # %L99
; ββ @ broadcast.jl:860 within `materialize`
; βββ @ broadcast.jl:885 within `copy`
; ββββ @ broadcast.jl:913 within `copyto!` @ broadcast.jl:949
; βββββ @ abstractarray.jl:95 within `axes`
; ββββββ @ tuple.jl:273 within `map`
; βββββββ @ range.jl:455 within `oneto`
; ββββββββ @ range.jl:453 within `OneTo` @ range.jl:440
.cfi_def_cfa_offset 112
movq %rax, 40(%rsp)
; ββββββββ
movabsq $j_throwdm_67, %rax
leaq 40(%rsp), %rdi
leaq 48(%rsp), %rsi
callq *%rax
ud2
.Lfunc_end0:
.size "julia_##dotfunction#312#3_61", .Lfunc_end0-"julia_##dotfunction#312#3_61"
.cfi_endproc
; ββββ
# -- End function
.section ".note.GNU-stack","",@progbits
That being said, wrapping the expression in a function restores performance for me, so may have to do with inlining heuristics or timing at the top level & global scope (which should be avoided anyway):
julia> f() = (1:10^9).^2
f (generic function with 1 method)
julia> @time f();
3.127988 seconds (2 allocations: 7.451 GiB, 0.22% gc time)
Hello All,
I have rerun it and added with version info so that it might be useful for some one.
For 1.8:
julia> versioninfo(;verbose=true)
Julia Version 1.8.0-rc1
Commit 6368fdc6565 (2022-05-27 18:33 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin21.3.0)
uname: Darwin 21.5.0 Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:29 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T8101 arm64 arm
CPU: Apple M1:
speed user nice sys idle irq
#1 24 MHz 268397 s 0 s 226552 s 2012133 s 0 s
#2 24 MHz 252151 s 0 s 206049 s 2048840 s 0 s
#3 24 MHz 216621 s 0 s 173810 s 2116610 s 0 s
#4 24 MHz 193609 s 0 s 150167 s 2163265 s 0 s
#5 24 MHz 249916 s 0 s 90322 s 2166801 s 0 s
#6 24 MHz 193540 s 0 s 47919 s 2265581 s 0 s
#7 24 MHz 147989 s 0 s 32894 s 2326158 s 0 s
#8 24 MHz 111387 s 0 s 24122 s 2371532 s 0 s
Memory: 16.0 GB (4686.4375 MB free)
Uptime: 713907.0 sec
Load Avg: 2.3203125 1.94775390625 1.94873046875
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
Threads: 1 on 4 virtual cores
julia> @time (1:10^9).^2;
11.966374 seconds (2 allocations: 7.451 GiB, 0.03% gc time)
For 1.7:
julia> versioninfo(;verbose=true)
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin21.2.0)
uname: Darwin 21.5.0 Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:29 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T8101 arm64 arm
CPU: Apple M1:
speed user nice sys idle irq
#1 24 MHz 268968 s 0 s 227072 s 2015937 s 0 s
#2 24 MHz 252696 s 0 s 206528 s 2052712 s 0 s
#3 24 MHz 217099 s 0 s 174210 s 2120628 s 0 s
#4 24 MHz 194038 s 0 s 150514 s 2167386 s 0 s
#5 24 MHz 250359 s 0 s 90518 s 2171058 s 0 s
#6 24 MHz 193898 s 0 s 48033 s 2270005 s 0 s
#7 24 MHz 148184 s 0 s 32955 s 2330798 s 0 s
#8 24 MHz 111516 s 0 s 24166 s 2376257 s 0 s
Memory: 16.0 GB (4438.984375 MB free)
Uptime: 714396.0 sec
Load Avg: 3.8828125 2.6142578125 2.22265625
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, cyclone)
julia> @time (1:10^9).^2;
2.412986 seconds (2 allocations: 7.451 GiB, 0.37% gc time)
Well, irrespective of the slowdown at the top level being a thing, youβre certainly hitting SWAP, and with that disk I/O. Youβre allocating an array larger than your remaining free RAM. Iβd suggest closing some background tasks and rerunning it.
Either way, can you please also run
f() = (1:10^9).^2
@time f();
?
Itβs just 1.42 seconds! Wow! For Julia 1.8:
julia> f() = (1:10^9).^2
f (generic function with 1 method)
julia> @time f();
1.415760 seconds (2 allocations: 7.451 GiB, 0.32% gc time)
And 1.40 for Julia 1.7.2:
julia> f() = (1:10^9).^2
f (generic function with 1 method)
julia> @time f();
1.404488 seconds (2 allocations: 7.451 GiB, 0.32% gc time)
Can any one explain why wrapping in function makes such a huge difference?
Thatβs what I like to see!
We can take a look at what @time
is actually doing, with @macroexpand
:
julia> @macroexpand @time (1:10^9).^2
quote
begin
$(Expr(:meta, :force_compile))
local var"#8#stats" = Base.gc_num()
local var"#10#elapsedtime" = Base.time_ns()
Base.cumulative_compile_timing(true)
local var"#11#compile_elapsedtimes" = Base.cumulative_compile_time_ns()
local var"#9#val" = $(Expr(:tryfinally, :((1:10 ^ 9) .^ 2), quote
var"#10#elapsedtime" = Base.time_ns() - var"#10#elapsedtime"
Base.cumulative_compile_timing(false)
var"#11#compile_elapsedtimes" = Base.cumulative_compile_time_ns() .- var"#11#compile_elapsedtimes"
end))
local var"#12#diff" = Base.GC_Diff(Base.gc_num(), var"#8#stats")
local var"#13#_msg" = Base.nothing
local var"#14#has_msg" = !(Base.isnothing(var"#13#_msg"))
var"#14#has_msg" && Base.print(var"#13#_msg", ": ")
Base.time_print(var"#10#elapsedtime", (var"#12#diff").allocd, (var"#12#diff").total_time, Base.gc_alloc_count(var"#12#diff"), Base.first(var"#11#compile_elapsedtimes"), Base.last(var"#11#compile_elapsedtimes"), true, !var"#14#has_msg")
var"#9#val"
end
end
(Iβve removed all the comments signifying where the macro was defined.)
@time
does nothing more than wrap your code (see the loval var"#9#val" = ..
thing) and calculates how long it took to run. This has the sideeffect of running your code exactly as you wrote it - meaning in this case, itβs all run in global scope. Global scope does not allow for the same optimizations that can be done in a function, hence there will always be a performance difference between the two. Since the slowdown does not appear in a function, Iβm inclined to conclude that this is not a bug with julia 1.8, but rather a benchmarking artifact due to benchmarking in global scope (which is a bad idea, as youβve noticed).
This is also mentioned in the Performance tips:
Performance critical code should be inside a function
Any code that is performance critical should be inside a function. Code inside functions tends to run much faster than top level code, due to how Juliaβs compiler works.
Can I recommend that anyone who is trying this change it to
using BenchmarkTools
f() = (1:10^6).^2
@btime f();
As Matt says, thereβs really no point allocating a 7GB RAM array as part of a benchmark. Your results will entirely depend on details of your hardware and what you have loaded concurrently. This really is meaningless and no one should draw conclusions from the other variants here.