It seems the summary page they have hasnβt been updated. I get on my decade on Core Duo laptop (with e.g. browser running), slower than you, but way faster than on the web page (for Julia1.4-DEV, also tried 1.1.0):
real | 0m13,142s |
---|
It seems the summary page they have hasnβt been updated. I get on my decade on Core Duo laptop (with e.g. browser running), slower than you, but way faster than on the web page (for Julia1.4-DEV, also tried 1.1.0):
real | 0m13,142s |
---|
See the discussion at:
And also in the linked issue. The big obvious difference between the benchmark CPU and modern CPUs is AVX instructions, but it evidently doesnβt end there.
@Palli since you happen to have a core2 cpu, would you mind running the julia-4 benchmark as well to compare with julia-3? And possibly even send me the @code_native
itβs generating for both?
Why are there a bunch of explicit VecElement
βs there? Tuple of VecElements are so that things are passed to LLVM as LLVM-vectors instead of LLVM-arrays and then you can write llvmcall
code on them, but they have almost no purpose on their own.
On my machine at least, the nbody-fast.jl
code in your repo is faster, and as a bonus it is simpler and cleaner. Note sure if that would be true on whatever machine these benchmarks are being run as noted by @non-Jedi.
Yes, I think now might be a good time to try all that multithreading stuff if for nothing else than to put it to the test before itβs released!
this could also benefit from new threading runtime to start working on a specific sequence while still reading input
This was the version I was thinking of, but you might be right that the other option might be faster. Weβll need to test.
fasta: obvious opportunity to parallelize, but I havenβt taken the time to grok what the benchmark is actually doing yet.
Well, maybe I can help with that, and if anyone else wants to play along and try their best, even better. Here we go, my current implementation:
# Just FYI, I completely restructured the code compared to the
# version on the website, not sure if it runs like this.
const OUT = stdout
const LINE_LENGTH = 60
# First task: just repeat this string over and over with \n
# in the right places
const ALU = codeunits(
"GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG" *
"GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA" *
"CCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAAT" *
"ACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCA" *
"GCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG" *
"AGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCC" *
"AGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA")
# I want to always be able to take the next 60 chars of that
# string (without going over the edge) so I repeat it at the end.
function repeat_fasta(str, n)
# This is a reaaally ugly way of repeating a string, but
# it was consistently faster than nicer alternatives, so ... :shrug:
len = length(str)
src = Vector{UInt8}(undef, len + LINE_LENGTH)
for i in 1:len
@inbounds src[i] = str[i]
end
for i in 1:LINE_LENGTH
@inbounds src[i+len] = str[i]
end
# Well, write the requred amount of chars of that string,
# skip to the beginning of the string if you went to far.
i = 1
lines, rest = divrem(n, LINE_LENGTH)
for _ in 1:lines
write(OUT, @inbounds @view src[i:i+LINE_LENGTH-1])
write(OUT, '\n')
i += LINE_LENGTH
i > len && (i -= len)
end
write(OUT, @inbounds @view src[i:i+rest-1])
write(OUT, '\n')
end
# That was easy, now the more interesting part.
# We have an alphabet of chars with associated probabilities and
# we have to pick n chars from that alphabet according to a LCG
# random number generator.
# This inherently means we can't really parallelize the RNG because
# the numbers need to be in the right order. Of course there are
# opportunities for playing with the sweet new threads elsewhere.
# The RNG works with `Int32`s and the probabilities are given as
# `Float`s so I scale the [0, 1) range of accumulated probabilities
# up to the [0, IM) range of the RNG and store that with the
# corresponding char. Store the Aminoacids as a const `Tuple`.
struct Aminoacids
c::UInt8
p::Int32
end
function make_Aminoacids(cs, ps)
cum_p = 0.0
tmp = Aminoacids[]
for (c, p) in zip(cs, ps)
cum_p += p * IM
# the comparison is with Int32, so use it here as well
push!(tmp, Aminoacids(c, floor(Int32, cum_p)))
end
return (tmp...,)
end
# create Aminoacids with accumulated probabilities and make
# the result a constant
const IUB = let
iub_c = b"acgtBDHKMNRSVWY"
iub_p = [0.27, 0.12, 0.12, 0.27, 0.02, 0.02, 0.02,
0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02]
make_Aminoacids(iub_c, iub_p)
end
const HOMOSAPIENS = let
homosapiens_c = b"acgt"
homosapiens_p = [0.3029549426680, 0.1979883004921,
0.1975473066391, 0.3015094502008]
make_Aminoacids(homosapiens_c, homosapiens_p)
end
# This is the RNG as defined on the website. Not sure there is
# much opportunity here because it needs to be pretty much exactly
# like this.
const IM = Int32(139968)
const IA = Int32(3877)
const IC = Int32(29573)
const last_rnd = Ref(Int32(42))
gen_random() = (last_rnd[] = (last_rnd[] * IA + IC) % IM)
# After we generated a new number we need to pick our the
# corresponding Aminoacid. Some implementations use a binary
# search but I found that simply going though the Tuple seems
# to be faster.
function random_char(genelist)
r = gen_random()
for aminoacid in genelist
aminoacid.p >= r && return aminoacid.c
end
return genelist[end].c
end
# A little helper method. I need to fill a vector with chars
# to print out but that can be shorter that the line length
# (and I must not genereate more chars than needed because)
# that would leave the RNG in the wrong state for the next
# run.
function fillrand!(line, genelist, n)
for i in 1:n
@inbounds line[i] = random_char(genelist)
end
end
# Not much to see here, just fill lines until we got the required
# amount of chars printed out.
function random_fasta(genelist, n)
line = Vector{UInt8}(undef, LINE_LENGTH+1)
line[end] = UInt8('\n')
while n > LINE_LENGTH
fillrand!(line, genelist, LINE_LENGTH)
write(OUT, line)
n -= LINE_LENGTH
end
fillrand!(line, genelist, n)
line[n+1] = UInt8('\n')
write(OUT, @view line[1:n+1])
end
# Simply calling everything. Do two random ones with different
# alphabets.
function main(n)
write(OUT, ">ONE Homo sapiens alu\n")
repeat_fasta(ALU, 2n)
write(OUT, ">TWO IUB ambiguity codes\n")
random_fasta(IUB, 3n)
write(OUT, ">THREE Homo sapiens frequency\n")
random_fasta(HOMOSAPIENS, 5n)
end
main(parse(Int, ARGS[1]))
Now, the main opportunity for parallelization would be to let the RNG generating numbers in the background (into a Channel
, I think? I havenβt worked with those yet.), let a second thread convert these numbers into corresponding chars and let the final thread print everything out. Of course with the thread overhead this might be too heavy, but at least that would be my start.
Itβs slower; both with version 4 and 3 with: export JULIA_NUM_THREADS=4 (my computer has 2 cores)
-O3 seems to always be slightly slower than -O1 for me:
time ~/julia-1.4.0-DEV-8ebe5643ca/bin/julia -O1 β nbody.julia-4.julia 50000000
real | 0m19,348s |
---|---|
user | 0m19,144s |
sys | 0m0,212s |
time ~/julia-1.4.0-DEV-8ebe5643ca/bin/julia -O3 β nbody.julia-4.julia 50000000
real 0m21,855s
user 0m21,656s
sys 0m0,212s
vs.:
I seem to get for version 3:
time ~/julia-1.4.0-DEV-8ebe5643ca/bin/julia -O3 β nbody.julia-3.julia 50000000
real | 0m13,319s |
---|---|
user | 0m13,028s |
sys | 0m0,236s |
export JULIA_NUM_THREADS=1
time ~/julia-1.4.0-DEV-8ebe5643ca/bin/julia -O3 β nbody.julia-3.julia 50000000
real | 0m13,158s |
---|---|
user | 0m13,008s |
sys | 0m0,208s |
βcpu-target=core2 doesnβt seem to change much, as I guess itβs the default.
For version 4 with -O3:
julia> @code_native main(stdout, (50000000), 0.01)
.text
; β @ REPL[13]:2 within `main'
pushq %rbp
movq %rsp, %rbp
; β @ REPL[13]:32 within `main'
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
subq $472, %rsp # imm = 0x1D8
xorpd %xmm1, %xmm1
movapd %xmm1, -80(%rbp)
movsd %xmm0, -56(%rbp)
movq %rsi, %rbx
movq %rdi, %r13
movapd %xmm1, -96(%rbp)
movq %fs:0, %rax
movq $4, -96(%rbp)
movq -15712(%rax), %rcx
movq %rcx, -88(%rbp)
movabsq $140581406747936, %r12 # imm = 0x7FDBA8CFAD20
leaq -96(%rbp), %rcx
movq %rcx, -15712(%rax)
movapd %xmm1, -496(%rbp)
movapd %xmm1, -512(%rbp)
movabsq $140581334880848, %rcx # imm = 0x7FDBA4871250
movaps (%rcx), %xmm0
movaps %xmm0, -480(%rbp)
movabsq $140581334880864, %rcx # imm = 0x7FDBA4871260
movaps (%rcx), %xmm0
movaps %xmm0, -464(%rbp)
movabsq $140581334880880, %rcx # imm = 0x7FDBA4871270
movaps (%rcx), %xmm0
movaps %xmm0, -448(%rbp)
movabsq $140581334880896, %rcx # imm = 0x7FDBA4871280
movapd (%rcx), %xmm0
movapd %xmm0, -432(%rbp)
movabsq $140581334881056, %rcx # imm = 0x7FDBA4871320
xorpd %xmm0, %xmm0
movhpd (%rcx), %xmm0 # xmm0 = xmm0[0],mem[0]
movapd %xmm0, -416(%rbp)
movabsq $140581334880912, %rcx # imm = 0x7FDBA4871290
movapd (%rcx), %xmm0
movapd %xmm0, -400(%rbp)
movabsq $140581334881064, %rcx # imm = 0x7FDBA4871328
xorpd %xmm0, %xmm0
movhpd (%rcx), %xmm0 # xmm0 = xmm0[0],mem[0]
movapd %xmm0, -384(%rbp)
movabsq $140581334880928, %rcx # imm = 0x7FDBA48712A0
movaps (%rcx), %xmm0
movaps %xmm0, -368(%rbp)
movabsq $140581334881072, %rcx # imm = 0x7FDBA4871330
movsd (%rcx), %xmm0 # xmm0 = mem[0],zero
movaps %xmm0, -352(%rbp)
movabsq $140581334880944, %rcx # imm = 0x7FDBA48712B0
movaps (%rcx), %xmm0
movaps %xmm0, -336(%rbp)
movabsq $140581334881080, %rcx # imm = 0x7FDBA4871338
movsd (%rcx), %xmm0 # xmm0 = mem[0],zero
movaps %xmm0, -320(%rbp)
movabsq $140581334880960, %rcx # imm = 0x7FDBA48712C0
movaps (%rcx), %xmm0
movaps %xmm0, -304(%rbp)
movabsq $140581334880976, %rcx # imm = 0x7FDBA48712D0
movapd (%rcx), %xmm0
movapd %xmm0, -288(%rbp)
movabsq $140581334881088, %rcx # imm = 0x7FDBA4871340
xorpd %xmm0, %xmm0
movhpd (%rcx), %xmm0 # xmm0 = xmm0[0],mem[0]
leaq -15712(%rax), %r14
movapd %xmm0, -272(%rbp)
movabsq $140581334880992, %rax # imm = 0x7FDBA48712E0
movaps (%rax), %xmm0
movaps %xmm0, -256(%rbp)
movabsq $140581334881096, %rax # imm = 0x7FDBA4871348
movhpd (%rax), %xmm1 # xmm1 = xmm1[0],mem[0]
movapd %xmm1, -240(%rbp)
movabsq $140581334881008, %rax # imm = 0x7FDBA48712F0
movaps (%rax), %xmm0
movaps %xmm0, -224(%rbp)
movabsq $140581334881104, %rax # imm = 0x7FDBA4871350
movsd (%rax), %xmm0 # xmm0 = mem[0],zero
movaps %xmm0, -208(%rbp)
movabsq $140581334881024, %rax # imm = 0x7FDBA4871300
movaps (%rax), %xmm0
movaps %xmm0, -192(%rbp)
movabsq $140581334881112, %rax # imm = 0x7FDBA4871358
movsd (%rax), %xmm0 # xmm0 = mem[0],zero
movaps %xmm0, -176(%rbp)
movabsq $4566835785178257836, %rax # imm = 0x3F60A8F3531799AC
movq %rax, -160(%rbp)
; ββ @ array.jl:130 within `vect'
; βββ @ array.jl:612 within `_array_for'
; ββββ @ abstractarray.jl:671 within `similar' @ abstractarray.jl:672
; βββββ @ boot.jl:413 within `Array' @ boot.jl:404
leaq 277347728(%r12), %rax
movabsq $140581406097168, %rdi # imm = 0x7FDBA8C5BF10
movl $5, %esi
callq *%rax
movq %rax, %r15
; βββββ
; ββ @ array.jl:780 within `vect'
movq (%r15), %rax
movq $-288, %rcx # imm = 0xFEE0
xorl %edx, %edx
nopl (%rax)
L608:
movups -224(%rbp,%rcx), %xmm0
movupd -208(%rbp,%rcx), %xmm1
movups -192(%rbp,%rcx), %xmm2
movups -176(%rbp,%rcx), %xmm3
movq -160(%rbp,%rcx), %rsi
movups %xmm0, 288(%rax,%rcx)
movupd %xmm1, 304(%rax,%rcx)
movups %xmm2, 320(%rax,%rcx)
movups %xmm3, 336(%rax,%rcx)
movq %rsi, 352(%rax,%rcx)
; ββ @ array.jl:130 within `vect'
; βββ @ range.jl:597 within `iterate'
; ββββ @ promotion.jl:399 within `=='
testq %rcx, %rcx
; ββββ
je L735
; βββ @ tuple.jl:24 within `getindex'
addq $72, %rcx
incq %rdx
cmpq $5, %rdx
jb L608
movabsq $jl_bounds_error_unboxed_int, %rax
leaq -512(%rbp), %rdi
movl $6, %edx
movq %r12, %rsi
callq *%rax
; βββ
; β @ tuple.jl within `main'
L735:
movq %r14, -64(%rbp)
movq %r15, -72(%rbp)
; β
; β @ REPL[13]:34 within `main'
movabsq $julia_energy_16666, %rax
movq %r15, %rdi
callq *%rax
movsd %xmm0, -48(%rbp)
movabsq $getbuf, %rax
callq *%rax
movsd -48(%rbp), %xmm0 # xmm0 = mem[0],zero
movq %rax, %r12
; ββ @ float.jl:553 within `isfinite'
; βββ @ float.jl:403 within `-'
movapd %xmm0, %xmm1
subsd %xmm1, %xmm1
; βββ
; βββ @ float.jl:488 within `==' @ float.jl:454
xorps %xmm2, %xmm2
; βββ
ucomisd %xmm2, %xmm1
jne L802
jnp L879
; ββ @ printf.jl:150 within `macro expansion'
; βββ @ float.jl:503 within `<' @ float.jl:458
L802:
ucomisd %xmm0, %xmm2
; βββ
movabsq $140581402197232, %rax # imm = 0x7FDBA88A3CF0
movabsq $jl_system_image_data, %rcx
cmovbeq %rax, %rcx
; βββ @ float.jl:535 within `isnan'
; ββββ @ float.jl:456 within `!='
ucomisd %xmm0, %xmm0
; ββββ
movabsq $jl_system_image_data, %rax
cmovnpq %rcx, %rax
; ββ
; ββ @ io.jl:179 within `print'
; βββ @ io.jl:177 within `write'
; ββββ @ gcutils.jl:91 within `macro expansion'
; βββββ @ string.jl:85 within `sizeof'
movq (%rax), %rdx
movq %rax, -80(%rbp)
; βββββ
; βββββ @ string.jl:81 within `pointer'
; ββββββ @ pointer.jl:59 within `unsafe_convert'
; βββββββ @ pointer.jl:159 within `+'
leaq 8(%rax), %rsi
; βββββββ
movabsq $unsafe_write, %rax
movq %r13, %rdi
callq *%rax
jmp L1051
; ββββ
; β @ gcutils.jl within `main'
L879:
movq %r13, %r14
; β
; β @ REPL[13]:34 within `main'
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:992
; βββ @ array.jl:214 within `length'
movq 8(%r12), %rax
; βββ
; βββ @ int.jl:52 within `-'
decq %rax
; βββ
; ββ @ printf.jl:841 within `fix_dec' @ int.jl:49
cmpq $10, %rax
movl $9, %edx
; ββ
; β @ printf.jl:992 within `main'
cmovlq %rax, %rdx
movq %r12, -80(%rbp)
; β @ printf.jl:993 within `main'
movabsq $grisu, %rax
leaq -144(%rbp), %rdi
movl $2, %esi
movq %r12, %rcx
callq *%rax
; β
; β @ REPL[13]:34 within `main'
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:994
; βββ @ promotion.jl:399 within `=='
movq -144(%rbp), %r13
testq %r13, %r13
; βββ
je L1423
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:998
; βββ @ boot.jl:709 within `Int32'
; ββββ @ boot.jl:619 within `toInt32'
; βββββ @ boot.jl:581 within `checked_trunc_sint'
movslq %r13d, %rax
; βββββ @ boot.jl:582 within `checked_trunc_sint'
cmpq %rax, %r13
jne L1514
; βββββ @ boot.jl:580 within `checked_trunc_sint'
movq -136(%rbp), %rdx
; βββββ @ boot.jl:581 within `checked_trunc_sint'
movslq %edx, %rax
; βββββ @ boot.jl:582 within `checked_trunc_sint'
cmpq %rax, %rdx
jne L1551
; βββββ
movb -128(%rbp), %al
; ββ
testb %al, %al
je L1016
; ββ @ char.jl:229 within `print'
; βββ @ io.jl:647 within `write'
L988:
movabsq $write, %rax
movl $45, %esi
movq %r14, %rdi
movq %rdx, -48(%rbp)
callq *%rax
movq -48(%rbp), %rdx
; βββ
L1016:
movabsq $print_fixed, %rax
movl $9, %esi
movl $1, %r8d
movq %r14, %rdi
movl %r13d, %ecx
movq %r12, %r9
callq *%rax
movq %r14, %r13
; ββ @ char.jl:229 within `print'
; βββ @ io.jl:647 within `write'
L1051:
movabsq $write, %r12
movl $10, %esi
movq %r13, %rdi
callq *%r12
; βββ
; β @ REPL[13]:35 within `main'
; ββ @ range.jl:5 within `Colon'
; βββ @ range.jl:277 within `UnitRange'
; ββββ @ range.jl:282 within `unitrange_last'
; βββββ @ operators.jl:341 within `>='
; ββββββ @ int.jl:424 within `<='
testq %rbx, %rbx
; ββββββ
jle L1104
movabsq $"julia_next!_16667", %r14
nop
; β @ REPL[13]:36 within `main'
L1088:
movq %r15, %rdi
movsd -56(%rbp), %xmm0 # xmm0 = mem[0],zero
callq *%r14
; ββ @ range.jl:597 within `iterate'
; βββ @ promotion.jl:399 within `=='
decq %rbx
; βββ
jne L1088
; β @ REPL[13]:38 within `main'
L1104:
movq %r15, %rdi
movabsq $julia_energy_16666, %rax
callq *%rax
movsd %xmm0, -56(%rbp)
movabsq $getbuf, %rax
callq *%rax
movsd -56(%rbp), %xmm0 # xmm0 = mem[0],zero
movq %rax, %rbx
; ββ @ float.jl:553 within `isfinite'
; βββ @ float.jl:403 within `-'
movapd %xmm0, %xmm1
subsd %xmm1, %xmm1
; βββ
; βββ @ float.jl:488 within `==' @ float.jl:454
xorps %xmm2, %xmm2
; βββ
ucomisd %xmm2, %xmm1
jne L1163
jnp L1244
; ββ @ printf.jl:150 within `macro expansion'
; βββ @ float.jl:503 within `<' @ float.jl:458
L1163:
ucomisd %xmm0, %xmm2
; βββ
movabsq $140581402197232, %rax # imm = 0x7FDBA88A3CF0
movabsq $jl_system_image_data, %rcx
cmovbeq %rax, %rcx
; βββ @ float.jl:535 within `isnan'
; ββββ @ float.jl:456 within `!='
ucomisd %xmm0, %xmm0
; ββββ
movabsq $jl_system_image_data, %rax
cmovnpq %rcx, %rax
; ββ
; ββ @ io.jl:179 within `print'
; βββ @ io.jl:177 within `write'
; ββββ @ gcutils.jl:91 within `macro expansion'
; βββββ @ string.jl:85 within `sizeof'
movq (%rax), %rdx
movq %rax, -80(%rbp)
; βββββ
; βββββ @ string.jl:81 within `pointer'
; ββββββ @ pointer.jl:59 within `unsafe_convert'
; βββββββ @ pointer.jl:159 within `+'
leaq 8(%rax), %rsi
; βββββββ
movabsq $unsafe_write, %rax
movq %r13, %rdi
callq *%rax
movq -64(%rbp), %rbx
jmp L1390
; ββββ
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:992
; βββ @ array.jl:214 within `length'
L1244:
movq 8(%rbx), %rax
; βββ
; ββ @ printf.jl:841 within `fix_dec' @ int.jl:52
decq %rax
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:992
; βββ @ operators.jl:294 within `>'
; ββββ @ int.jl:49 within `<'
cmpq $10, %rax
movl $9, %edx
; ββββ
cmovlq %rax, %rdx
movq %rbx, -80(%rbp)
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:993
movabsq $grisu, %rax
leaq -120(%rbp), %rdi
movl $2, %esi
movq %rbx, %rcx
callq *%rax
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:994
; βββ @ promotion.jl:399 within `=='
movq -120(%rbp), %r15
testq %r15, %r15
; βββ
je L1469
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:998
; βββ @ boot.jl:709 within `Int32'
; ββββ @ boot.jl:619 within `toInt32'
; βββββ @ boot.jl:581 within `checked_trunc_sint'
movslq %r15d, %rax
; βββββ @ boot.jl:582 within `checked_trunc_sint'
cmpq %rax, %r15
jne L1585
; βββββ @ boot.jl:580 within `checked_trunc_sint'
movq -112(%rbp), %r14
; βββββ @ boot.jl:581 within `checked_trunc_sint'
movslq %r14d, %rax
; βββββ @ boot.jl:582 within `checked_trunc_sint'
cmpq %rax, %r14
jne L1622
; βββββ
movb -104(%rbp), %al
; ββ
testb %al, %al
je L1351
; ββ @ char.jl:229 within `print'
; βββ @ io.jl:647 within `write'
L1340:
movl $45, %esi
movq %r13, %rdi
callq *%r12
; βββ
L1351:
movabsq $print_fixed, %rax
movl $9, %esi
movl $1, %r8d
movq %r13, %rdi
movl %r14d, %edx
movl %r15d, %ecx
movq %rbx, %r9
callq *%rax
movq -64(%rbp), %rbx
; ββ @ char.jl:229 within `print'
; βββ @ io.jl:647 within `write'
L1390:
movl $10, %esi
movq %r13, %rdi
callq *%r12
movq -88(%rbp), %rax
movq %rax, (%rbx)
; βββ
leaq -40(%rbp), %rsp
popq %rbx
popq %r12
popq %r13
popq %r14
popq %r15
popq %rbp
retq
; β @ REPL[13]:34 within `main'
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:995
; βββ @ array.jl:780 within `setindex!'
L1423:
cmpq $0, 8(%r12)
je L1659
movq (%r12), %rax
movb $48, (%rax)
movl $1, %r13d
; βββ
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:996
movb -128(%rbp), %al
movl $1, %edx
; ββ
testb %al, %al
jne L988
jmp L1016
; β @ REPL[13]:38 within `main'
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:995
; βββ @ array.jl:780 within `setindex!'
L1469:
cmpq $0, 8(%rbx)
je L1697
movq (%rbx), %rax
movb $48, (%rax)
movl $1, %r15d
; βββ
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:996
movb -104(%rbp), %al
movl $1, %r14d
; ββ
testb %al, %al
jne L1340
jmp L1351
; β @ REPL[13]:34 within `main'
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:998
; βββ @ boot.jl:709 within `Int32'
; ββββ @ boot.jl:619 within `toInt32'
; βββββ @ boot.jl:582 within `checked_trunc_sint'
L1514:
movabsq $throw_inexacterror, %rax
movabsq $140581378234496, %rdi # imm = 0x7FDBA71C9880
movabsq $jl_system_image_data, %rsi
movq %r13, %rdx
callq *%rax
ud2
L1551:
movabsq $throw_inexacterror, %rax
movabsq $140581378234496, %rdi # imm = 0x7FDBA71C9880
movabsq $jl_system_image_data, %rsi
callq *%rax
ud2
; βββββ
; β @ REPL[13]:38 within `main'
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:998
; βββ @ boot.jl:709 within `Int32'
; ββββ @ boot.jl:619 within `toInt32'
; βββββ @ boot.jl:582 within `checked_trunc_sint'
L1585:
movabsq $throw_inexacterror, %rax
movabsq $140581378234496, %rdi # imm = 0x7FDBA71C9880
movabsq $jl_system_image_data, %rsi
movq %r15, %rdx
callq *%rax
ud2
L1622:
movabsq $throw_inexacterror, %rax
movabsq $140581378234496, %rdi # imm = 0x7FDBA71C9880
movabsq $jl_system_image_data, %rsi
movq %r14, %rdx
callq *%rax
ud2
; βββββ
; β @ REPL[13]:34 within `main'
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:995
; βββ @ array.jl:780 within `setindex!'
L1659:
movq %rsp, %rax
leaq -16(%rax), %rsi
movq %rsi, %rsp
movq $1, -16(%rax)
movabsq $jl_bounds_error_ints, %rax
movl $1, %edx
movq %r12, %rdi
callq *%rax
; βββ
; β @ REPL[13]:38 within `main'
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:995
; βββ @ array.jl:780 within `setindex!'
L1697:
movq %rsp, %rax
leaq -16(%rax), %rsi
movq %rsi, %rsp
movq $1, -16(%rax)
movabsq $jl_bounds_error_ints, %rax
movl $1, %edx
movq %rbx, %rdi
callq *%rax
nopw (%rax,%rax)
; βββ
I hit the character limit 32000 (not 32768; I was 100 letters over so posting separately)
For version 3 with -O3:
julia> @code_native NBody.perf_nbody(50000000)
.text
; β @ REPL[1]:132 within `perf_nbody'
pushq %rbp
movq %rsp, %rbp
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
subq $200, %rsp
movq %rdi, %r14
xorps %xmm0, %xmm0
movaps %xmm0, -144(%rbp)
movaps %xmm0, -160(%rbp)
movaps %xmm0, -176(%rbp)
movq $0, -128(%rbp)
movq %fs:0, %rax
; ββ @ REPL[1]:119 within `initbody'
; βββ @ REPL[1]:17 within `Body'
movq $10, -176(%rbp)
movq -15712(%rax), %rcx
movq %rcx, -168(%rbp)
leaq -176(%rbp), %rcx
movq %rcx, -15712(%rax)
leaq -15712(%rax), %r13
movabsq $jl_gc_pool_alloc, %rbx
movl $1520, %esi # imm = 0x5F0
movl $96, %edx
movq %r13, %rdi
callq *%rbx
movq %rax, %r12
movabsq $139695149897424, %r15 # imm = 0x7F0D4FC956D0
movq %r15, -8(%r12)
movabsq $-4631240860977730576, %rax # imm = 0xBFBA86F96C25EBF0
movq %rax, 16(%r12)
movabsq $139695052337072, %rax # imm = 0x7F0D49F8AFB0
movaps (%rax), %xmm0
movaps %xmm0, (%r12)
movabsq $-4640446117579192555, %rax # imm = 0xBF99D2D79A5A0715
movq %rax, 48(%r12)
movabsq $139695052337088, %rax # imm = 0x7F0D49F8AFC0
movaps (%rax), %xmm0
movaps %xmm0, 32(%r12)
movabsq $4585593052079010776, %rax # imm = 0x3FA34C95D9AB33D8
movq %rax, 64(%r12)
movq %r12, -160(%rbp)
; βββ
; β @ REPL[1]:140 within `perf_nbody'
; ββ @ REPL[1]:119 within `initbody'
; βββ @ REPL[1]:17 within `Body'
movl $1520, %esi # imm = 0x5F0
movl $96, %edx
movq %r13, %rdi
callq *%rbx
movq %rbx, %rcx
movq %rax, %rbx
movq %r15, -8(%rbx)
movabsq $-4622431185293064580, %rax # imm = 0xBFD9D353E1EB467C
movq %rax, 16(%rbx)
movabsq $139695052337104, %rax # imm = 0x7F0D49F8AFD0
movaps (%rax), %xmm0
movaps %xmm0, (%rbx)
movabsq $4576004977915405236, %rax # imm = 0x3F813C485F1123B4
movq %rax, 48(%rbx)
movabsq $139695052337120, %rax # imm = 0x7F0D49F8AFE0
movaps (%rax), %xmm0
movaps %xmm0, 32(%rbx)
movabsq $4577659745833829943, %rax # imm = 0x3F871D490D07C637
movq %rax, 64(%rbx)
movq %rbx, -152(%rbp)
; βββ
; β @ REPL[1]:148 within `perf_nbody'
; ββ @ REPL[1]:119 within `initbody'
; βββ @ REPL[1]:17 within `Body'
movl $1520, %esi # imm = 0x5F0
movl $96, %edx
movq %r13, %rdi
callq *%rcx
movq %r15, -8(%rax)
movabsq $-4626158513131520608, %rcx # imm = 0xBFCC9557BE257DA0
movq %rcx, 16(%rax)
movabsq $139695052337136, %rcx # imm = 0x7F0D49F8AFF0
movaps (%rcx), %xmm0
movaps %xmm0, (%rax)
movabsq $-4645973824767902084, %rcx # imm = 0xBF862F6BFAF23E7C
movq %rcx, 48(%rax)
movabsq $139695052337152, %rcx # imm = 0x7F0D49F8B000
movaps (%rcx), %xmm0
movaps %xmm0, 32(%rax)
movabsq $4565592097032511155, %rcx # imm = 0x3F5C3DD29CF41EB3
movq %rcx, 64(%rax)
movq %rax, -104(%rbp)
movq %rax, -144(%rbp)
; βββ
; β @ REPL[1]:156 within `perf_nbody'
; ββ @ REPL[1]:119 within `initbody'
; βββ @ REPL[1]:17 within `Body'
movl $1520, %esi # imm = 0x5F0
movl $96, %edx
movq %r13, %rdi
movabsq $jl_gc_pool_alloc, %rax
callq *%rax
movq %r15, -8(%rax)
movabsq $4595626498235032896, %rcx # imm = 0x3FC6F1F393ABE540
movq %rcx, 16(%rax)
movabsq $139695052337168, %rcx # imm = 0x7F0D49F8B010
movaps (%rcx), %xmm0
movaps %xmm0, (%rax)
movabsq $-4638202354754755082, %rcx # imm = 0xBFA1CB88587665F6
movq %rcx, 48(%rax)
movabsq $139695052337184, %rcx # imm = 0x7F0D49F8B020
movaps (%rcx), %xmm0
movaps %xmm0, 32(%rax)
movabsq $4566835785178257836, %rcx # imm = 0x3F60A8F3531799AC
movq %rcx, 64(%rax)
movq %rax, -48(%rbp)
movq %rax, -136(%rbp)
; βββ
; β @ REPL[1]:164 within `perf_nbody'
; ββ @ REPL[1]:119 within `initbody'
; βββ @ REPL[1]:17 within `Body'
movl $1520, %esi # imm = 0x5F0
movl $96, %edx
movq %r13, -184(%rbp)
movq %r13, %rdi
movabsq $jl_gc_pool_alloc, %rax
callq *%rax
movq %rax, %r13
movq %r15, -8(%r13)
xorps %xmm0, %xmm0
movaps %xmm0, (%r13)
movq $0, 16(%r13)
movaps %xmm0, 32(%r13)
movq $0, 48(%r13)
movabsq $4630752910647379422, %rax # imm = 0x4043BD3CC9BE45DE
movq %rax, 64(%r13)
movq %r13, -128(%rbp)
; βββ
; β @ REPL[1]:166 within `perf_nbody'
; ββ @ array.jl:130 within `vect'
; βββ @ array.jl:612 within `_array_for'
; ββββ @ abstractarray.jl:671 within `similar' @ abstractarray.jl:672
; βββββ @ boot.jl:413 within `Array' @ boot.jl:404
movabsq $jl_system_image_data, %rax
leaq 214180000(%rax), %rax
movabsq $139695149907168, %rdi # imm = 0x7F0D4FC97CE0
movl $5, %esi
callq *%rax
movq %rax, %r15
movzwl 16(%r15), %eax
andl $3, %eax
cmpl $3, %eax
; βββββ
; ββ @ tuple.jl:24 within `vect'
jne L892
; ββ
; ββ @ array.jl:130 within `vect'
; βββ @ array.jl:780 within `setindex!'
movq (%r15), %rcx
movq 40(%r15), %rdi
movq -8(%rdi), %rax
andl $3, %eax
cmpq $3, %rax
jne L740
testb $1, -8(%r13)
je L2237
L740:
movq %r13, (%rcx)
movq 40(%r15), %rdi
movq -8(%rdi), %rax
andl $3, %eax
cmpq $3, %rax
jne L772
testb $1, -8(%r12)
je L2262
L772:
movq %r12, 8(%rcx)
movq 40(%r15), %rdi
movq -8(%rdi), %rax
andl $3, %eax
cmpq $3, %rax
movabsq $jl_system_image_data, %r12
jne L813
testb $1, -8(%rbx)
je L2285
L813:
movq %rbx, 16(%rcx)
movq 40(%r15), %rdi
movq -8(%rdi), %rax
andl $3, %eax
cmpq $3, %rax
movq -104(%rbp), %rbx
jne L848
testb $1, -8(%rbx)
je L2308
L848:
movq %rbx, 24(%rcx)
movq 40(%r15), %rdi
movq -8(%rdi), %rax
andl $3, %eax
cmpq $3, %rax
movq -48(%rbp), %rbx
jne L883
testb $1, -8(%rbx)
je L2331
L883:
movq %rbx, 32(%rcx)
; βββ
; β @ REPL[1]:168 within `perf_nbody'
jmp L1050
; β @ REPL[1]:166 within `perf_nbody'
; ββ @ array.jl:130 within `vect'
; βββ @ array.jl:780 within `setindex!'
L892:
movq -8(%r15), %rax
movq (%r15), %rcx
andl $3, %eax
cmpq $3, %rax
jne L919
testb $1, -8(%r13)
je L2354
L919:
movq %r13, (%rcx)
movq -8(%r15), %rax
andl $3, %eax
cmpq $3, %rax
jne L947
testb $1, -8(%r12)
je L2382
L947:
movq %r12, 8(%rcx)
movq -8(%r15), %rax
andl $3, %eax
cmpq $3, %rax
movabsq $jl_system_image_data, %r12
jne L984
testb $1, -8(%rbx)
je L2408
L984:
movq %rbx, 16(%rcx)
movq -8(%r15), %rax
andl $3, %eax
cmpq $3, %rax
movq -104(%rbp), %rbx
jne L1015
testb $1, -8(%rbx)
je L2434
L1015:
movq %rbx, 24(%rcx)
movq -8(%r15), %rax
andl $3, %eax
cmpq $3, %rax
movq -48(%rbp), %rbx
jne L1046
testb $1, -8(%rbx)
je L2460
L1046:
movq %rbx, 32(%rcx)
; βββ
; β @ array.jl within `perf_nbody'
L1050:
movq %r15, -160(%rbp)
; β
; β @ REPL[1]:168 within `perf_nbody'
movabsq $julia_init_sun_16583, %rax
movq %r15, %rdi
callq *%rax
fstp %st(0)
; β @ REPL[1]:170 within `perf_nbody'
movq (%r12), %rbx
movq %rbx, -136(%rbp)
movabsq $julia_energy_16584, %rax
movq %r15, %rdi
callq *%rax
movsd %xmm0, -48(%rbp)
movabsq $getbuf, %rax
callq *%rax
movsd -48(%rbp), %xmm0 # xmm0 = mem[0],zero
movq %rax, %r13
; ββ @ float.jl:553 within `isfinite'
; βββ @ float.jl:403 within `-'
movapd %xmm0, %xmm1
subsd %xmm1, %xmm1
; βββ
; βββ @ float.jl:488 within `==' @ float.jl:454
xorps %xmm2, %xmm2
; βββ
ucomisd %xmm2, %xmm1
jne L1144
jnp L1234
; ββ @ printf.jl:150 within `macro expansion'
; βββ @ float.jl:503 within `<' @ float.jl:458
L1144:
ucomisd %xmm0, %xmm2
; βββ
movabsq $139695111274064, %rax # imm = 0x7F0D4D7BFE50
movabsq $jl_system_image_data, %rcx
cmovbeq %rax, %rcx
; βββ @ float.jl:535 within `isnan'
; ββββ @ float.jl:456 within `!='
ucomisd %xmm0, %xmm0
; ββββ
movabsq $jl_system_image_data, %rax
cmovnpq %rcx, %rax
; ββ
movq %rbx, -96(%rbp)
movq %rax, -88(%rbp)
movabsq $jl_apply_generic, %rax
movabsq $jl_system_image_data, %rdi
leaq -96(%rbp), %rsi
movl $2, %edx
callq *%rax
jmp L1530
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:992
; βββ @ array.jl:214 within `length'
L1234:
movq 8(%r13), %rax
; βββ
; ββ @ printf.jl:841 within `fix_dec' @ int.jl:52
decq %rax
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:992
; βββ @ operators.jl:294 within `>'
; ββββ @ int.jl:49 within `<'
cmpq $10, %rax
movl $9, %edx
; ββββ
cmovlq %rax, %rdx
movq %r13, -128(%rbp)
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:993
movabsq $grisu, %rax
leaq -232(%rbp), %rdi
movl $2, %esi
movq %r13, %rcx
callq *%rax
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:994
; βββ @ promotion.jl:399 within `=='
movq -232(%rbp), %r12
testq %r12, %r12
; βββ
movq %r13, -104(%rbp)
je L2141
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:998
; βββ @ boot.jl:709 within `Int32'
; ββββ @ boot.jl:619 within `toInt32'
; βββββ @ boot.jl:581 within `checked_trunc_sint'
movslq %r12d, %rax
; βββββ @ boot.jl:582 within `checked_trunc_sint'
cmpq %rax, %r12
jne L2486
; βββββ @ boot.jl:580 within `checked_trunc_sint'
movq -224(%rbp), %rcx
; βββββ @ boot.jl:581 within `checked_trunc_sint'
movslq %ecx, %rax
; βββββ @ boot.jl:582 within `checked_trunc_sint'
cmpq %rax, %rcx
jne L2523
; βββββ
movb -216(%rbp), %al
; ββ
testb %al, %al
je L1401
L1346:
movq %rbx, -96(%rbp)
movabsq $jl_system_image_data, %rax
movq %rax, -88(%rbp)
movabsq $jl_apply_generic, %rax
movabsq $jl_system_image_data, %rdi
leaq -96(%rbp), %rsi
movl $2, %edx
movq %rcx, %r13
callq *%rax
movq %r13, %rcx
L1401:
movabsq $jl_box_int32, %r13
movl %ecx, %edi
callq *%r13
movq %r13, %rcx
movq %rax, %r13
movq %r13, -144(%rbp)
movl %r12d, %edi
callq *%rcx
movq %rax, -152(%rbp)
movq %rbx, -96(%rbp)
movabsq $139695095480928, %rcx # imm = 0x7F0D4C8B0260
movq %rcx, -88(%rbp)
movq %r13, -80(%rbp)
movq %rax, -72(%rbp)
movabsq $jl_system_image_data, %rax
movq %rax, -64(%rbp)
movq -104(%rbp), %rax
movq %rax, -56(%rbp)
movabsq $jl_apply_generic, %rax
movabsq $jl_system_image_data, %rdi
leaq -96(%rbp), %rsi
movl $6, %edx
callq *%rax
movabsq $jl_system_image_data, %r12
L1530:
movq %rbx, -96(%rbp)
movabsq $139695095517360, %rax # imm = 0x7F0D4C8B90B0
movq %rax, -88(%rbp)
movabsq $jl_apply_generic, %rax
movabsq $jl_system_image_data, %rdi
leaq -96(%rbp), %rsi
movl $2, %edx
callq *%rax
; β @ REPL[1]:172 within `perf_nbody'
; ββ @ range.jl:5 within `Colon'
; βββ @ range.jl:277 within `UnitRange'
; ββββ @ range.jl:282 within `unitrange_last'
; βββββ @ operators.jl:341 within `>='
; ββββββ @ int.jl:424 within `<='
testq %r14, %r14
; ββββββ
jle L1631
movabsq $julia_advance_16585, %rbx
movabsq $139695052337248, %rax # imm = 0x7F0D49F8B060
movsd (%rax), %xmm0 # xmm0 = mem[0],zero
movsd %xmm0, -48(%rbp)
nopl (%rax)
; β @ REPL[1]:173 within `perf_nbody'
L1616:
movq %r15, %rdi
movsd -48(%rbp), %xmm0 # xmm0 = mem[0],zero
callq *%rbx
; ββ @ range.jl:597 within `iterate'
; βββ @ promotion.jl:399 within `=='
decq %r14
; βββ
jne L1616
; β @ REPL[1]:175 within `perf_nbody'
L1631:
movq (%r12), %r13
movq %r13, -144(%rbp)
movq %r15, %rdi
movabsq $julia_energy_16584, %rax
callq *%rax
movsd %xmm0, -48(%rbp)
movabsq $getbuf, %rax
callq *%rax
movsd -48(%rbp), %xmm0 # xmm0 = mem[0],zero
movq %rax, %r14
; ββ @ float.jl:553 within `isfinite'
; βββ @ float.jl:403 within `-'
movapd %xmm0, %xmm1
subsd %xmm1, %xmm1
; βββ
; ββ @ float.jl:454 within `isfinite'
xorps %xmm2, %xmm2
; ββ
ucomisd %xmm2, %xmm1
jne L1701
jnp L1791
; ββ @ printf.jl:150 within `macro expansion'
; βββ @ float.jl:503 within `<' @ float.jl:458
L1701:
ucomisd %xmm0, %xmm2
; βββ
movabsq $139695111274064, %rax # imm = 0x7F0D4D7BFE50
movabsq $jl_system_image_data, %rcx
cmovbeq %rax, %rcx
; βββ @ float.jl:535 within `isnan'
; ββββ @ float.jl:456 within `!='
ucomisd %xmm0, %xmm0
; ββββ
movabsq $jl_system_image_data, %rax
cmovnpq %rcx, %rax
; ββ
movq %r13, -96(%rbp)
movq %rax, -88(%rbp)
movabsq $jl_system_image_data, %rdi
leaq -96(%rbp), %rsi
movl $2, %edx
movabsq $jl_apply_generic, %rbx
callq *%rbx
jmp L2070
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:992
; βββ @ array.jl:214 within `length'
L1791:
movq 8(%r14), %rax
; βββ
; βββ @ int.jl:52 within `-'
decq %rax
; βββ
; ββ @ printf.jl:841 within `fix_dec' @ int.jl:49
cmpq $10, %rax
movl $9, %edx
; ββ
; β @ printf.jl:992 within `perf_nbody'
cmovlq %rax, %rdx
movq %r14, -136(%rbp)
; β @ printf.jl:993 within `perf_nbody'
movabsq $grisu, %rax
leaq -208(%rbp), %rdi
movl $2, %esi
movq %r14, %rcx
callq *%rax
; β
; β @ REPL[1]:175 within `perf_nbody'
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:994
; βββ @ promotion.jl:399 within `=='
movq -208(%rbp), %r12
testq %r12, %r12
; βββ
je L2189
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:998
; βββ @ boot.jl:709 within `Int32'
; ββββ @ boot.jl:619 within `toInt32'
; βββββ @ boot.jl:581 within `checked_trunc_sint'
movslq %r12d, %rax
; βββββ @ boot.jl:582 within `checked_trunc_sint'
cmpq %rax, %r12
jne L2560
; βββββ @ boot.jl:580 within `checked_trunc_sint'
movq -200(%rbp), %r15
; βββββ @ boot.jl:581 within `checked_trunc_sint'
movslq %r15d, %rax
; βββββ @ boot.jl:582 within `checked_trunc_sint'
cmpq %rax, %r15
jne L2597
; βββββ
movb -192(%rbp), %al
; ββ
testb %al, %al
je L1951
L1902:
movq %r13, -96(%rbp)
movabsq $jl_system_image_data, %rax
movq %rax, -88(%rbp)
movabsq $jl_system_image_data, %rdi
leaq -96(%rbp), %rsi
movl $2, %edx
movabsq $jl_apply_generic, %rax
callq *%rax
L1951:
movabsq $jl_box_int32, %rbx
movl %r15d, %edi
callq *%rbx
movq %rbx, %rcx
movabsq $jl_apply_generic, %r15
movq %rax, %rbx
movq %rbx, -152(%rbp)
movl %r12d, %edi
callq *%rcx
movq %rax, -160(%rbp)
movq %r13, -96(%rbp)
movabsq $139695095480928, %rcx # imm = 0x7F0D4C8B0260
movq %rcx, -88(%rbp)
movq %rbx, -80(%rbp)
movq %rax, -72(%rbp)
movabsq $jl_system_image_data, %rax
movq %rax, -64(%rbp)
movq %r14, -56(%rbp)
movabsq $jl_system_image_data, %rdi
leaq -96(%rbp), %rsi
movl $6, %edx
callq *%r15
movq %r15, %rbx
L2070:
movq %r13, -96(%rbp)
movabsq $139695095517360, %rax # imm = 0x7F0D4C8B90B0
movq %rax, -88(%rbp)
movabsq $jl_system_image_data, %rdi
leaq -96(%rbp), %rsi
movl $2, %edx
callq *%rbx
movq -168(%rbp), %rax
movq -184(%rbp), %rcx
movq %rax, (%rcx)
leaq -40(%rbp), %rsp
popq %rbx
popq %r12
popq %r13
popq %r14
popq %r15
popq %rbp
retq
; β @ REPL[1]:170 within `perf_nbody'
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:995
; βββ @ array.jl:780 within `setindex!'
L2141:
cmpq $0, 8(%r13)
je L2634
movq (%r13), %rax
movb $48, (%rax)
movl $1, %r12d
; βββ
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:996
movb -216(%rbp), %al
movl $1, %ecx
; ββ
testb %al, %al
jne L1346
jmp L1401
; β @ REPL[1]:175 within `perf_nbody'
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:995
; βββ @ array.jl:780 within `setindex!'
L2189:
cmpq $0, 8(%r14)
je L2672
movq (%r14), %rax
movb $48, (%rax)
movl $1, %r12d
; βββ
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:996
movb -192(%rbp), %al
movl $1, %r15d
; ββ
testb %al, %al
jne L1902
jmp L1951
; β @ REPL[1]:166 within `perf_nbody'
; ββ @ array.jl:130 within `vect'
; βββ @ array.jl:780 within `setindex!'
L2237:
movabsq $jl_gc_queue_root, %rax
movq %rcx, -112(%rbp)
callq *%rax
movq -112(%rbp), %rcx
jmp L740
L2262:
movabsq $jl_gc_queue_root, %rax
movq %rcx, %r13
callq *%rax
movq %r13, %rcx
jmp L772
L2285:
movabsq $jl_gc_queue_root, %rax
movq %rcx, %r13
callq *%rax
movq %r13, %rcx
jmp L813
L2308:
movabsq $jl_gc_queue_root, %rax
movq %rcx, %r13
callq *%rax
movq %r13, %rcx
jmp L848
L2331:
movabsq $jl_gc_queue_root, %rax
movq %rcx, %r13
callq *%rax
movq %r13, %rcx
jmp L883
L2354:
movabsq $jl_gc_queue_root, %rax
movq %r15, %rdi
movq %rcx, -112(%rbp)
callq *%rax
movq -112(%rbp), %rcx
jmp L919
L2382:
movabsq $jl_gc_queue_root, %rax
movq %r15, %rdi
movq %rcx, %r13
callq *%rax
movq %r13, %rcx
jmp L947
L2408:
movabsq $jl_gc_queue_root, %rax
movq %r15, %rdi
movq %rcx, %r13
callq *%rax
movq %r13, %rcx
jmp L984
L2434:
movabsq $jl_gc_queue_root, %rax
movq %r15, %rdi
movq %rcx, %r13
callq *%rax
movq %r13, %rcx
jmp L1015
L2460:
movabsq $jl_gc_queue_root, %rax
movq %r15, %rdi
movq %rcx, %r13
callq *%rax
movq %r13, %rcx
jmp L1046
; βββ
; β @ REPL[1]:170 within `perf_nbody'
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:998
; βββ @ boot.jl:709 within `Int32'
; ββββ @ boot.jl:619 within `toInt32'
; βββββ @ boot.jl:582 within `checked_trunc_sint'
L2486:
movabsq $throw_inexacterror, %rax
movabsq $139695095699584, %rdi # imm = 0x7F0D4C8E5880
movabsq $jl_system_image_data, %rsi
movq %r12, %rdx
callq *%rax
ud2
L2523:
movabsq $throw_inexacterror, %rax
movabsq $139695095699584, %rdi # imm = 0x7F0D4C8E5880
movabsq $jl_system_image_data, %rsi
movq %rcx, %rdx
callq *%rax
ud2
; βββββ
; β @ REPL[1]:175 within `perf_nbody'
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:998
; βββ @ boot.jl:709 within `Int32'
; ββββ @ boot.jl:619 within `toInt32'
; βββββ @ boot.jl:582 within `checked_trunc_sint'
L2560:
movabsq $throw_inexacterror, %rax
movabsq $139695095699584, %rdi # imm = 0x7F0D4C8E5880
movabsq $jl_system_image_data, %rsi
movq %r12, %rdx
callq *%rax
ud2
L2597:
movabsq $throw_inexacterror, %rax
movabsq $139695095699584, %rdi # imm = 0x7F0D4C8E5880
movabsq $jl_system_image_data, %rsi
movq %r15, %rdx
callq *%rax
ud2
; βββββ
; β @ REPL[1]:170 within `perf_nbody'
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:995
; βββ @ array.jl:780 within `setindex!'
L2634:
movq %rsp, %rax
leaq -16(%rax), %rsi
movq %rsi, %rsp
movq $1, -16(%rax)
movabsq $jl_bounds_error_ints, %rax
movl $1, %edx
movq %r13, %rdi
callq *%rax
; βββ
; β @ REPL[1]:175 within `perf_nbody'
; ββ @ printf.jl:841 within `fix_dec' @ printf.jl:995
; βββ @ array.jl:780 within `setindex!'
L2672:
movq %rsp, %rax
leaq -16(%rax), %rsi
movq %rsi, %rsp
movq $1, -16(%rax)
movabsq $jl_bounds_error_ints, %rax
movl $1, %edx
movq %r14, %rdi
callq *%rax
nopw %cs:(%rax,%rax)
; βββ
For fasta, I was looking into if a faster RNG would help (yes, probably disallowed by the rules, but I discovered a likely legal change). Strangely it hangs with my choice, and whatever datatype I tried (and then even with cast to 32-bit for type-stability).
I noticed the code used uses signed, while the fastest (currently C++) code uses unsigned. I also noticed that the RNG only returns 16-bits I think not 32-bits, with rest zero-padded.
#const last_rnd = Ref(Int32(42)) # I tries to change here to UInt32 and lines above, that works
#gen_random() = (last_rnd[] = (last_rnd[] * IA + IC) % IM)
using RandomNumbers.Xorshifts
r = Xoroshiro128Plus(0x1234567890abcdef) # with a certain seed. Note that the seed must be non-zero.
gen_random() = UInt32(rand(r, UInt8))
static auto get_random = [] {
static unsigned last = 42;
return (last = (last * Config::ia + Config::ic) % Config::im);
};
Could [any of] you check timing for UInt32 change (or look into other RNG)? Just my change to unsigned should have been faster, since assembly code shorter, but for my old laptop it was slightly slower (but so was O3):
Original with -O3
real | 0m5,080s |
---|---|
user | 0m4,944s |
sys | 0m0,192s |
Original with -O2
real | 0m5,076s |
---|---|
user | 0m4,936s |
sys | 0m0,216s |
My modified with UInt32 and -O3
real | 0m5,224s |
---|---|
user | 0m5,096s |
sys | 0m0,212s |
My modified with -O2
real | 0m5,205s |
---|---|
user | 0m5,064s |
sys | 0m0,212s |
@code_native gen_random() # For Uint32 (gets you slightly shorter than for Int32, thereafter):
.text
; β @ REPL[3]:2 within `gen_random'
movabsq $139625163228512, %rcx # imm = 0x7EFD04418560
; ββ @ int.jl:54 within `*'
imull $3877, (%rcx), %eax # imm = 0xF25
; ββ
; ββ @ int.jl:53 within `+'
addl $29573, %eax # imm = 0x7385
; ββ
; ββ @ int.jl:231 within `rem'
imulq $502748801, %rax, %rdx # imm = 0x1DF75681
shrq $46, %rdx
imull $139968, %edx, %edx # imm = 0x222C0
subl %edx, %eax
; ββ
; ββ @ refvalue.jl:33 within `setindex!'
; βββ @ Base.jl:21 within `setproperty!'
movl %eax, (%rcx)
; βββ
retq
nopl (%rax,%rax)
; β
julia> @code_native gen_random()
.text
; β @ REPL[8]:2 within `gen_random'
movabsq $139793992281584, %rcx # imm = 0x7F2453406DF0
; ββ @ int.jl:54 within `*'
imull $3877, (%rcx), %eax # imm = 0xF25
; ββ
; ββ @ int.jl:53 within `+'
addl $29573, %eax # imm = 0x7385
; ββ
; ββ @ int.jl:229 within `rem'
cltq
imulq $502748801, %rax, %rdx # imm = 0x1DF75681
movq %rdx, %rsi
shrq $63, %rsi
sarq $46, %rdx
addl %esi, %edx
imull $139968, %edx, %edx # imm = 0x222C0
subl %edx, %eax
; ββ
; ββ @ refvalue.jl:33 within `setindex!'
; βββ @ Base.jl:21 within `setproperty!'
movl %eax, (%rcx)
; βββ
retq
nopw %cs:(%rax,%rax)
; β
For xoroshiro thereβs no multiply:
julia> @code_native rand(r, UInt64)
.text
; β @ xoroshiro128.jl:68 within `rand'
; ββ @ xoroshiro128.jl:35 within `xorshift_next'
; βββ @ xoroshiro128.jl:68 within `getproperty'
movq (%rdi), %rcx
movq 8(%rdi), %rax
; βββ
; ββ @ xoroshiro128.jl:37 within `xorshift_next'
; βββ @ int.jl:317 within `xor'
movq %rcx, %rdx
xorq %rax, %rdx
; βββ
; ββ @ int.jl:53 within `xorshift_next'
addq %rcx, %rax
; ββ
; ββ @ xoroshiro128.jl:38 within `xorshift_next'
; βββ @ common.jl:1 within `xorshift_rotl'
; ββββ @ int.jl:316 within `|'
rolq $24, %rcx
; ββββ
; βββ @ int.jl:317 within `xor'
xorq %rdx, %rcx
; βββ
; βββ @ int.jl:446 within `<<' @ int.jl:439
movq %rdx, %rsi
shlq $16, %rsi
; βββ
; ββ @ int.jl:317 within `xorshift_next'
xorq %rcx, %rsi
; ββ
; ββ @ xoroshiro128.jl:38 within `xorshift_next'
; βββ @ Base.jl:21 within `setproperty!'
movq %rsi, (%rdi)
; βββ
; ββ @ int.jl:316 within `xorshift_next'
rolq $37, %rdx
; ββ
; ββ @ xoroshiro128.jl:39 within `xorshift_next'
; βββ @ Base.jl:21 within `setproperty!'
movq %rdx, 8(%rdi)
; βββ
retq
nopl (%rax)
; β
List of comparisons just got changed on the website. It now has a comparison to C and to SB Common Lisp instead of Chapel.
Iβve created the JuliaPerf organization and moved the BenchmarksGame.jl repo there https://github.com/juliaperf/BenchmarksGame.jl. Iβve also invited @non-Jedi as an owner to that organization.
The BenchmarksGame.jl repo supports correctness checking and performance checking so I feel it would be useful if we could collect the community efforts in improving the benchmarks to that repo. Feel free to maintain the repo as you wish or ignore it if you feel it isnβt useful.
How about calling it JuliaPerf since caps is pretty idiomatic for Julia orgs?
The RNG canβt be changed as all implementations need to have the same output (thatβs how correctness is checked), therefore the same stream of random numbers, therefore the same RNG.
Using unsigned ints would definitively be valid though, not sure if I forgot to test that or if it was slowerβ¦
I also noticed that the RNG only returns 16-bits I think not 32-bits, with rest zero-padded.
Hm⦠IM > typemax(Int16)
so I think Int32
/UInt32
is correct here.
Iβve created the JuliaPerf organization and moved the BenchmarksGame.jl repo there
Thatβs a nice idea, Iβll see if I can cook up a threaded fasta version there.
I wrote that, github changed it to lower case, I deleted the org and tried again, github changed it again⦠No idea why.
Maybe write a support message to GitHub? Not being able to use capitals in org names is a significant regression.
GitHub support agrees that this is a bug not an official policy change. You can change the name of the org after itβs created to have caps, so maybe try that.
Renamed!
Inspired by the Mary McGrath talk at JuliaCon I played around with the benchmark data because I wanted to see the correlation between code length and performance. The data is averaged over the best entry of each language for all benchmarks:
I think Julia comes away from that quite nicely.
Iβm not quite sure if there is a mistake in my calculations because the plot on the benchmark site has a different ordering for, e.g., Chapel and Haskell. I havenβt been able to pin down why.
Ada is not shown because its gzipped code size of ~2700 made the plot rather unreadable.
I knew that Ada was verbose, but dang. I would have thought that compression would reduce some of the bloat. What about non-compressed code size? While I tend to agree that the difference between fn
as a keyword and function
as a keyword shouldnβt matter, I do think that general verbosity matters. For example, if you always have to write String s = String(...)
thatβs pretty compressible but itβs still really annoying.
Did you remove comments and consecutive whitespace and use only gzip --fast
? Afaict, thatβs the procedure on the site, and itβd make a big difference for some cases. He might also be including all benchmark implementations rather than just the fastest ones.
As a side note, Iβve been very impressed by the Chapel code Iβve seen in the benchmarks. Itβs conciseness and clarity combined with speed has been shocking at times. Of course other times itβs non-explicit handling of state and concurrency has been puzzling.
Oh, I just scrapped all the info from the pages (e.g. this one) so the data should be identical to whatever he did. So for now I donβt have access to the raw source files, but I might do that later.
He might also be including all benchmark implementations rather than just the fastest ones.
That might be possible but unlikely, I think β the data overall is just too fitting. For example, for the mandelbrot benchmark Julia has two implementations: one fast one (factor 3.0) and one slow one (factor 29). If both were considered equally that would really screw our results.