# Version 1.7 much faster than 1.6 in a script that calls wrapped external libraries

Hi,

This is not really a question but an amazing fact that I wanted to share.
The code below is the main part of a script where all heavy work is done by external libraries (GMT, GDAl & PROJ). However, version 1.7beta is nearly 6 times faster than 1.6 or 1.5. Howβs that possible?

	...
for n = 1:np-1
tic()
for k = -n_beams:n_beams
endpt = geod(coast[n,:], az1[n] - 90 + k*2, 350000)[1]		# Compute end point coords at azim and distance 350 km
o = orthodrome([coast[n:n,:]; endpt], step=2000)			# Interpolate line at step increments
Di = grdtrack(o, G, s=true, V=:q)
Dd = mapproject(Di, G="+a+ue", o=3)							# Compute the accumulated distances in meters
dist_T = [Dd[1].data Di[1].data[:,3]]						# Matrix with [lon lat accumdist]
Dt = trend1d(dist_T, F="xm", N="p1+r", V=:q)				# Fit a straight line to the tracked Ts
if (!all(isnan.(Dt[1].data[:,2])))
D   = intersection(Dt[1].data, dist_T)					# Compute intersections between fit and track
ind = argmin(abs.(Dd[1].data .- D[1].data[1, 1]))[1]
front[m += 1, 1:2] = Di[1].data[ind, 1:2]				# Retain only the first intersection
end
end
toc()
end


with version v1.6 and 1.5

elapsed time: 2.5549329 seconds
elapsed time: 2.544714 seconds
elapsed time: 2.6814375 seconds
elapsed time: 2.7668562 seconds
elapsed time: 2.8290057 seconds
...


with 1.7.0-beta2

elapsed time: 0.4300048 seconds
elapsed time: 0.4277238 seconds
elapsed time: 0.4435151 seconds
elapsed time: 0.4234343 seconds
elapsed time: 0.4340528 seconds
...

17 Likes

This is really cool. For a hint, I would examine @code_native or @code_llvm and see where the compiled code diverges. I would pay particular attention to references to out of bounds (oob) checks.

1 Like

Suspiciously short, but anyway exactly equal between 1.6 & 1.7

julia> @code_native find_apueles(week=30);
.text
; β @ find_apueles.jl:1 within find_apueles##kw
pushq   %rbp
movq    %rsp, %rbp
subq    $32, %rsp ; β @ find_apueles.jl:3 within find_apueles##kw movq (%rcx), %rcx movabsq$"#find_apueles#4", %rax
callq   *%rax
addq    32, %rsp popq %rbp retq nopl (%rax) ; β julia> @code_llvm find_apueles(week=30); ; @ C:\j\jls\find_apueles.jl:1 within find_apueles##kw ; Function Attrs: uwtable define nonnull {}* @"julia_find_apueles##kw_1663"([1 x i64]* nocapture nonnull readonly align 8 dereferenceable(8) %0) #0 { top: ; @ C:\j\jls\find_apueles.jl:3 within find_apueles##kw ; β @ namedtuple.jl:127 within getindex %1 = getelementptr inbounds [1 x i64], [1 x i64]* %0, i64 0, i64 0 ; β %2 = load i64, i64* %1, align 8 %3 = call nonnull {}* @"j_#find_apueles#4_1666"(i64 signext %2) #0 ret {}* %3 } julia> versioninfo() Julia Version 1.7.0-beta2 Commit b570546b68 (2021-06-20 06:31 UTC) Platform Info: OS: Windows (x86_64-w64-mingw32) CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-12.0.0 (ORCJIT, skylake) Environment: JULIA_NUM_THREADS = 6  julia> @code_native find_apueles(week=1); .text ; β @ find_apueles.jl:1 within find_apueles##kw' pushq %rbp movq %rsp, %rbp subq32, %rsp
; β @ find_apueles.jl:3 within find_apueles##kw'
movq    (%rcx), %rcx
movabsq $"#find_apueles#4", %rax callq *%rax addq$32, %rsp
popq    %rbp
retq
nopl    (%rax)
; β

julia> @code_llvm find_apueles(week=30);
;  @ C:\j\jls\find_apueles.jl:1 within find_apueles##kw'
; Function Attrs: uwtable
define nonnull {}* @"julia_find_apueles##kw_1760"([1 x i64]* nocapture nonnull readonly align 8 dereferenceable(8) %0) #0 {
top:
;  @ C:\j\jls\find_apueles.jl:3 within find_apueles##kw'
; β @ namedtuple.jl:118 within getindex'
%1 = getelementptr inbounds [1 x i64], [1 x i64]* %0, i64 0, i64 0
; β
%2 = load i64, i64* %1, align 8
%3 = call nonnull {}* @"j_#find_apueles#4_1763"(i64 signext %2) #0
ret {}* %3
}

julia> versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:


I see zero differences in the benchmarks of Cuba.jl for julia v1.6 vs master:

% CUBACORES=0 julia benchmark.jl
[ Info: Performance of Cuba.jl:
0.251215 seconds (Vegas)
0.647562 seconds (Suave)
0.303899 seconds (Divonne)
0.230467 seconds (Cuhre)
[ Info: Performance of Cuba Library in C:
0.190322 seconds (Vegas)
0.620006 seconds (Suave)
0.248154 seconds (Divonne)
0.175083 seconds (Cuhre)
[ Info: Performance of Cuba Library in Fortran:
0.184770 seconds (Vegas)
0.596291 seconds (Suave)
0.226675 seconds (Divonne)
0.155555 seconds (Cuhre)
% CUBACORES=0 julia-master benchmark.jl
[ Info: Performance of Cuba.jl:
0.255095 seconds (Vegas)
0.646028 seconds (Suave)
0.312925 seconds (Divonne)
0.227860 seconds (Cuhre)
[ Info: Performance of Cuba Library in C:
0.195879 seconds (Vegas)
0.629609 seconds (Suave)
0.247174 seconds (Divonne)
0.173440 seconds (Cuhre)
[ Info: Performance of Cuba Library in Fortran:
0.178327 seconds (Vegas)
0.587523 seconds (Suave)
0.226824 seconds (Divonne)
0.155499 seconds (Cuhre)


A 6x difference in FFI overhead is quite large, to not say unrealistic

I agree itβs unrealistic, but what ever is the cause, it exists β¦ and allocates a lot of memory.

# V1.7
julia> @time find_apueles(week=30);
183.546047 seconds (47.71 M allocations: 2.168 GiB, 0.12% gc time)


and

# V1.6
julia> @time find_apueles(week=30);
1089.193155 seconds (48.76 G allocations: 728.116 GiB, 1.51% gc time, 0.08% compilation time)


There is wrapper code before calling the C & C++ libs and the cause likely lies there but still the difference exists and is huge.

Have you tried to remove the calls to external libraries? Itβs next to impossible the difference youβre seeing has to do with FFI overhead

Iβm not sticking to the hypothesis that the speed up comes from FFI improvements. I first thought on it because almost all lines in the script call wrapper code, but after seeing the amount of allocated memory I know nothing anymore.

Canβt remove the calls to external libraries because that would broke the program. I might be able to change one call to one wrapper to another but no more.

I have a suspicion that this is Windows specific.

It may be related to this:

1 Like

Yes, you are probably right. However, joa-quim also mentioned βwrapper codeβ which might be code loading related:

More generally my thought is that this has something to do with slow file system operations on Windows and Julia 1.7 reducing its dependence on those file system operations.

Another example might be reducing the use of realpath:

Some of the eternal library use cited above is done by running external processes.

Anyways, Iβm speculating wildly since we have very few leads.

Letβs do some profiling to narrow down the possibilities of where the speedup is happening. It would also be great if we could produce a minimum working example showing the speedup. Is it a particular external library or process that is faster?

1 Like

Iβve done some more investigation and I think the above line is the one where the large time difference occurs. What it does in current case is to take an in memory grid G (basically a 2D array) and a Mx2 array, send them to the GMT lib that will interpolate the grid G at locations of the lon, lat positions o and return a Mx3 matrix with the result. Now, given the Julia column-major vs C row-major nature it normally has make a copy and transpose, though I try to avoid it when I can. However, nothing of this is specific of the grdtrack module. Many of the array operations in GMT go trough the same function.

This will need further investigation to find out where the culprit hides but I donβt have much time in the next days. Iβve also no idea on how to make a MWE of this that does not involve the use of that GMT module.

Could you build an example around grdtrack? Setup a small self-contained example where you load the required modules and initialize o and G for us. Demonstrate the disparate timings on 1.6 and 1.7.

We can then run the same small example and see if we see the same timings.

Ok, this simple one shows the difference. It builds a dumb grid of 1001x1001 Float32 with a constant value of 1 and interpolates at at single location (lon=-10, lat=35)

v1.6

julia> G = gmt("grdmath -R-20/0/25/45 -I0.02 1");

julia> using BenchmarkTools

julia> @btime grdtrack([-10 35], G);
58.958 ms (3007132 allocations: 45.94 MiB)

julia> versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:


Now v1.7

julia> G = gmt("grdmath -R-20/0/25/45 -I0.02 1");

julia> using BenchmarkTools

julia> @btime grdtrack([-10 35], G);
7.638 ms (1142 allocations: 73.59 KiB)

julia> versioninfo()
Julia Version 1.7.0-beta2
Commit b570546b68 (2021-06-20 06:31 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.0 (ORCJIT, skylake)
Environment:

2 Likes

I was able to replicate your example:

Julia 1.6.0

julia> using GMT

julia> G = gmt("grdmath -R-20/0/25/45 -I0.02 1");

julia> using BenchmarkTools

julia> @btime grdtrack([-10 35], G);
58.716 ms (3007131 allocations: 45.94 MiB)

julia> versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)


Julia 1.7.01.7.0-DEV.1129 (1172c6b129)

julia> using GMT

julia> G = gmt("grdmath -R-20/0/25/45 -I0.02 1");

julia> using BenchmarkTools

julia> @btime grdtrack([-10 35], G);
5.983 ms (1116 allocations: 72.25 KiB)

julia> versioninfo()
Julia Version 1.7.0-DEV.1129
Commit 1172c6b129* (2021-05-24 11:41 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)


Hereβs what I was able to profile via the following commands

julia> using Profile

julia> @profile grdtrack([-10 35], G)
Vector{GMT.GMTdataset} with 1 segments
First segment DATA
1Γ3 Matrix{Float64}:
-10.0  35.0  1.0

julia> Profile.print(; C = true)


Julia 1.6.0

 β    β    β    β    β    β    β 68 /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1703; jl_apply
β    β    β    β    β    β    β  68 @GMT\src\grdtrack.jl:121; grdtrack(arg1::Matrix{Int64}, arg2::GMT.GMTgrid{Float32, 2})
β    β    β    β    β    β    β   68 @GMT\src\grdtrack.jl:121; grdtrack
β    β    β    β    β    β    β    68 @GMT\src\grdtrack.jl:121; #grdtrack#237
β    β    β    β    β    β    β     68 @GMT\src\grdtrack.jl:64; grdtrack
β    β    β    β    β    β    β    β 68 @GMT\src\grdtrack.jl:94; grdtrack(cmd0::String, arg1::Matrix{Int64}, arg2::GMT.GMTgrid{Float32, 2}; kwargs::Base.Iterators.Pairs{Uni...
β    β    β    β    β    β    β    β  68 @GMT\src\common_options.jl:3062; common_grd(::Dict{Symbol, Any}, ::String, ::Matrix{Int64}, ::GMT.GMTgrid{Float32, 2})
β    β    β    β    β    β    β    β   66 @GMT\src\gmt_main.jl:272; gmt(::String, ::Matrix{Int64}, ::Vararg{Any, N} where N)
β    β    β    β    β    β    β    β    66 [unknown stackframe]
β    β    β    β    β    β    β    β     66 @GMT\src\gmt_main.jl:759; GMTJL_Set_Object(API::Ptr{Nothing}, X::GMT.GMT_RESOURCE, ptr::GMT.GMTgrid{Float32, 2}, pad::Int64)
2β    β    β    β    β    β    β    β    β 66 @GMT\src\gmt_main.jl:866; grid_init(API::Ptr{Nothing}, X::GMT.GMT_RESOURCE, Grid::GMT.GMTgrid{Float32, 2}, pad::Int64, cube::Bool)
1β    β    β    β    β    β    β    β    β  1  @Base\array.jl:801; getindex
β    β    β    β    β    β    β    β    β  33 @Base\operators.jl:560; +
1β    β    β    β    β    β    β    β    β   14 /cygdrive/c/buildbot/worker/package_win64/build/src\datatype.c:841; jl_box_int64
2β    β    β    β    β    β    β    β    β    8  /cygdrive/c/buildbot/worker/package_win64/build/src\julia_internal.h:285; jl_gc_alloc_
1β    β    β    β    β    β    β    β    β     1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1206; jl_gc_pool_alloc
2β    β    β    β    β    β    β    β    β     2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1223; jl_gc_pool_alloc
β    β    β    β    β    β    β    β    β     2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1240; jl_gc_pool_alloc
β    β    β    β    β    β    β    β    β    β 2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1183; add_page
β    β    β    β    β    β    β    β    β    β  2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc-pages.c:250; jl_gc_alloc_page
β    β    β    β    β    β    β    β    β    β   2  C:\WINDOWS\System32\KERNELBASE.dll:?; VirtualAlloc
β    β    β    β    β    β    β    β    β    β    2  C:\WINDOWS\System32\umppc13702.dll:?;
β    β    β    β    β    β    β    β    β    β     2  C:\WINDOWS\System32\umppc13702.dll:?;
2β    β    β    β    β    β    β    β    β    β    β 2  C:\WINDOWS\SYSTEM32\ntdll.dll:?; NtAllocateVirtualMemory
1β    β    β    β    β    β    β    β    β     1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1245; jl_gc_pool_alloc
β    β    β    β    β    β    β    β    β    2  /cygdrive/c/buildbot/worker/package_win64/build/src\threading.c:117; jl_get_ptls_states
2β    β    β    β    β    β    β    β    β     2  /usr/x86_64-w64-mingw32/sys-root/mingw/include/psdk_inc\intrin-impl.h:790; __readgsqword
β    β    β    β    β    β    β    β    β    1  /cygdrive/c/buildbot/worker/package_win64/build/src\threading.c:125; jl_get_ptls_states
1β    β    β    β    β    β    β    β    β     1  C:\WINDOWS\System32\KERNEL32.DLL:?; BaseDumpAppcompatCacheWorker
2β    β    β    β    β    β    β    β    β    2  /cygdrive/c/buildbot/worker/package_win64/build/src\threading.c:127; jl_get_ptls_states
1β    β    β    β    β    β    β    β    β   1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2413; jl_apply_generic
1β    β    β    β    β    β    β    β    β   1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2414; jl_apply_generic
β    β    β    β    β    β    β    β    β   10 /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2415; jl_apply_generic
1β    β    β    β    β    β    β    β    β    1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2288; jl_lookup_generic_
1β    β    β    β    β    β    β    β    β    1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2314; jl_lookup_generic_
2β    β    β    β    β    β    β    β    β    4  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2336; jl_lookup_generic_
2β    β    β    β    β    β    β    β    β     2  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2250; sig_match_fast
1β    β    β    β    β    β    β    β    β    3  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2337; jl_lookup_generic_
2β    β    β    β    β    β    β    β    β     2  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2250; sig_match_fast
1β    β    β    β    β    β    β    β    β    1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2410; jl_lookup_generic_
1β    β    β    β    β    β    β    β    β   1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2420; jl_apply_generic
1β    β    β    β    β    β    β    β    β   6  [unknown stackframe]
1β    β    β    β    β    β    β    β    β    1  @Base\int.jl:923; +(a::Int64, b::UInt64)
β    β    β    β    β    β    β    β    β    4  /cygdrive/c/buildbot/worker/package_win64/build/src\datatype.c:842; jl_box_uint64
1β    β    β    β    β    β    β    β    β     4  /cygdrive/c/buildbot/worker/package_win64/build/src\julia_internal.h:285; jl_gc_alloc_
1β    β    β    β    β    β    β    β    β    β 1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1195; jl_gc_pool_alloc
β    β    β    β    β    β    β    β    β    β 1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1204; jl_gc_pool_alloc
β    β    β    β    β    β    β    β    β    β  1  /cygdrive/c/buildbot/worker/package_win64/build/src\julia_threads.h:306; maybe_collect
1β    β    β    β    β    β    β    β    β    β   1  /cygdrive/c/buildbot/worker/package_win64/build/src\julia_threads.h:313; jl_gc_state_set
1β    β    β    β    β    β    β    β    β    β 1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1244; jl_gc_pool_alloc
β    β    β    β    β    β    β    β    β  1  @Base\range.jl:674; iterate
1β    β    β    β    β    β    β    β    β   1  @Base\promotion.jl:410; ==
3β    β    β    β    β    β    β    β    β  10 /cygdrive/c/buildbot/worker/package_win64/build/src\datatype.c:798; jl_box_float32
β    β    β    β    β    β    β    β    β   7  /cygdrive/c/buildbot/worker/package_win64/build/src\julia_internal.h:285; jl_gc_alloc_
β    β    β    β    β    β    β    β    β    3  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1204; jl_gc_pool_alloc
β    β    β    β    β    β    β    β    β     3  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:880; maybe_collect
β    β    β    β    β    β    β    β    β    β 3  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:3241; jl_gc_collect
β    β    β    β    β    β    β    β    β    β  1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:3034; _jl_gc_collect
β    β    β    β    β    β    β    β    β    β   1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:2207; gc_mark_loop
β    β    β    β    β    β    β    β    β    β    1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1816; gc_mark_scan_objarray
1β    β    β    β    β    β    β    β    β    β     1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1705; gc_try_setmark
β    β    β    β    β    β    β    β    β    β  2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:3148; _jl_gc_collect
β    β    β    β    β    β    β    β    β    β   2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1530; gc_sweep_pool
β    β    β    β    β    β    β    β    β    β    2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1455; sweep_pool_pagetable
β    β    β    β    β    β    β    β    β    β     2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1425; sweep_pool_pagetable1
β    β    β    β    β    β    β    β    β    β    β 2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1405; sweep_pool_pagetable0
β    β    β    β    β    β    β    β    β    β    β  2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1385; sweep_pool_page
1β    β    β    β    β    β    β    β    β    β    β   1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1317; sweep_page
1β    β    β    β    β    β    β    β    β    β    β   1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1337; sweep_page
1β    β    β    β    β    β    β    β    β    1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1223; jl_gc_pool_alloc
1β    β    β    β    β    β    β    β    β    1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1224; jl_gc_pool_alloc
1β    β    β    β    β    β    β    β    β    1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1228; jl_gc_pool_alloc
1β    β    β    β    β    β    β    β    β    1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1243; jl_gc_pool_alloc
β    β    β    β    β    β    β    β    β  3  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2414; jl_apply_generic
1β    β    β    β    β    β    β    β    β   1  /cygdrive/c/buildbot/worker/package_win64/build/src\threading.c:115; jl_get_ptls_states
1β    β    β    β    β    β    β    β    β   2  /cygdrive/c/buildbot/worker/package_win64/build/src\threading.c:125; jl_get_ptls_states
1β    β    β    β    β    β    β    β    β    1  C:\WINDOWS\System32\KERNEL32.DLL:?; BaseDumpAppcompatCacheWorker
β    β    β    β    β    β    β    β    β  10 /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2415; jl_apply_generic
1β    β    β    β    β    β    β    β    β   1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2288; jl_lookup_generic_
4β    β    β    β    β    β    β    β    β   4  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2336; jl_lookup_generic_
3β    β    β    β    β    β    β    β    β   5  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2337; jl_lookup_generic_
β    β    β    β    β    β    β    β    β    2  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2247; sig_match_fast
1β    β    β    β    β    β    β    β    β     1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2253; sig_match_fast
1β    β    β    β    β    β    β    β    β     1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2256; sig_match_fast
1β    β    β    β    β    β    β    β    β  1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2420; jl_apply_generic
1β    β    β    β    β    β    β    β    β  5  [unknown stackframe]
β    β    β    β    β    β    β    β    β   4  @Base\multidimensional.jl:639; setindex!(::Vector{Float32}, ::Float32, ::UInt64)
4β    β    β    β    β    β    β    β    β    4  @Base\array.jl:839; setindex!
β    β    β    β    β    β    β    β   1  @GMT\src\gmt_main.jl:276; gmt(::String, ::Matrix{Int64}, ::Vararg{Any, N} where N)
β    β    β    β    β    β    β    β    1  @GMT\src\libgmt.jl:147; GMT_Call_Module
β    β    β    β    β    β    β    β     1  c:\programs\gmt6\bin\gmt_w64.dll:?; GMT_Call_Module
β    β    β    β    β    β    β    β    β 1  c:\programs\gmt6\bin\gmt_w64.dll:?; GMT_grdtrack
β    β    β    β    β    β    β    β    β  1  c:\programs\gmt6\bin\gmt_w64.dll:?; gmt_end_module
β    β    β    β    β    β    β    β    β   1  c:\programs\gmt6\bin\gmt_w64.dll:?; gmtlib_free_tmp_arrays
β    β    β    β    β    β    β    β    β    1  c:\programs\gmt6\bin\gmt_w64.dll:?; gmt_free_func
β    β    β    β    β    β    β    β    β     1  C:\WINDOWS\System32\ucrtbase.dll:?; free_base
β    β    β    β    β    β    β    β    β    β 1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; RtlFreeHeap
β    β    β    β    β    β    β    β    β    β  1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; RtlGetCurrentServiceSessionId
β    β    β    β    β    β    β    β    β    β   1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; RtlGetCurrentServiceSessionId
β    β    β    β    β    β    β    β    β    β    1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; RtlTryAcquirePebLock
1β    β    β    β    β    β    β    β    β    β     1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; NtFreeVirtualMemory
β    β    β    β    β    β    β    β   1  @GMT\src\gmt_main.jl:313; gmt(::String, ::Matrix{Int64}, ::Vararg{Any, N} where N)
β    β    β    β    β    β    β    β    1  @GMT\src\libgmt.jl:86; GMT_Destroy_Data
β    β    β    β    β    β    β    β     1  c:\programs\gmt6\bin\gmt_w64.dll:?; GMT_Destroy_Data
β    β    β    β    β    β    β    β    β 1  c:\programs\gmt6\bin\gmt_w64.dll:?; gmt_whole_earth
β    β    β    β    β    β    β    β    β  1  c:\programs\gmt6\bin\gmt_w64.dll:?; gmt_free_grid
β    β    β    β    β    β    β    β    β   1  c:\programs\gmt6\bin\gmt_w64.dll:?; gmtlib_free_grid_ptr
β    β    β    β    β    β    β    β    β    1  c:\programs\gmt6\bin\gmt_w64.dll:?; gmt_free_func
β    β    β    β    β    β    β    β    β     1  C:\WINDOWS\System32\ucrtbase.dll:?; free_base
β    β    β    β    β    β    β    β    β    β 1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; RtlFreeHeap
β    β    β    β    β    β    β    β    β    β  1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; RtlGetCurrentServiceSessionId
β    β    β    β    β    β    β    β    β    β   1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; RtlGetCurrentServiceSessionId
β    β    β    β    β    β    β    β    β    β    1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; RtlTryAcquirePebLock
1β    β    β    β    β    β    β    β    β    β     1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; NtFreeVirtualMemory


Julia 1.7.0-DEV.1129

 β    β    β    β    β    β    β 5 /cygdrive/c/Users/kittisopikulm/source/repos/julia/src\julia.h:1760; jl_apply
β    β    β    β    β    β    β  5 @GMT\src\grdtrack.jl:121; grdtrack(arg1::Matrix{Int64}, arg2::GMT.GMTgrid{Float32, 2})
β    β    β    β    β    β    β   5 @GMT\src\grdtrack.jl:121; grdtrack
β    β    β    β    β    β    β    5 @GMT\src\grdtrack.jl:121; #grdtrack#237
β    β    β    β    β    β    β     5 @GMT\src\grdtrack.jl:64; grdtrack
β    β    β    β    β    β    β    β 5 @GMT\src\grdtrack.jl:94; grdtrack(cmd0::String, arg1::Matrix{Int64}, arg2::GMT.GMTgrid{Float32, 2}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTup...
β    β    β    β    β    β    β    β  5 @GMT\src\common_options.jl:3062; common_grd(::Dict{Symbol, Any}, ::String, ::Matrix{Int64}, ::GMT.GMTgrid{Float32, 2})
β    β    β    β    β    β    β    β   4 @GMT\src\gmt_main.jl:272; gmt(::String, ::Matrix{Int64}, ::Vararg{Any})
β    β    β    β    β    β    β    β    4 [unknown stackframe]
β    β    β    β    β    β    β    β     4 @GMT\src\gmt_main.jl:759; GMTJL_Set_Object(API::Ptr{Nothing}, X::GMT.GMT_RESOURCE, ptr::GMT.GMTgrid{Float32, 2}, pad::Int64)
β    β    β    β    β    β    β    β    β 4 @GMT\src\gmt_main.jl:866; grid_init(API::Ptr{Nothing}, X::GMT.GMT_RESOURCE, Grid::GMT.GMTgrid{Float32, 2}, pad::Int64, cube::Bool)
1β    β    β    β    β    β    β    β    β  1 @Base\int.jl:87; +
β    β    β    β    β    β    β    β    β  3 @Base\multidimensional.jl:646; setindex!
3β    β    β    β    β    β    β    β    β   3 @Base\array.jl:875; setindex!
β    β    β    β    β    β    β    β   1 @GMT\src\gmt_main.jl:276; gmt(::String, ::Matrix{Int64}, ::Vararg{Any})
β    β    β    β    β    β    β    β    1 @GMT\src\libgmt.jl:147; GMT_Call_Module
β    β    β    β    β    β    β    β     1 C:\programs\gmt6\bin\gmt_w64.dll:?; GMT_Call_Module
β    β    β    β    β    β    β    β    β 1 C:\programs\gmt6\bin\gmt_w64.dll:?; GMT_grdtrack
β    β    β    β    β    β    β    β    β  1 C:\programs\gmt6\bin\gmt_w64.dll:?; GMT_Read_Data
β    β    β    β    β    β    β    β    β   1 C:\programs\gmt6\bin\gmt_w64.dll:?; gmt_whole_earth
1β    β    β    β    β    β    β    β    β    1 C:\programs\gmt6\bin\gmt_w64.dll:?; gmt_grd_BC_set


The main difference seems to be in the execution of gmt_main.jl:866 at GMT v0.35.0:

Julia 1.6.0

Julia 1.7.0-DEV

Julia 1.6.0 seems to spend much more time on Line 560 of base/operators.jl which involves argument list folding via afoldl:

While that line has not changed in Julia 1.7, afoldl has changed:

It looks like you have @kslimes to thank for this amazing improvement in speed.

24 Likes

Thanks for digging this out. A nice example also on how to use the profiler. Big thanks to @kshyatt

6 Likes

Iβm beginning to think I may be pointing at the wrong change since redefining afoldl` before the above change does not result in a significant slow down. It may be this change by @jameson instead:

2 Likes