Version 1.7 much faster than 1.6 in a script that calls wrapped external libraries

Hi,

This is not really a question but an amazing fact that I wanted to share.
The code below is the main part of a script where all heavy work is done by external libraries (GMT, GDAl & PROJ). However, version 1.7beta is nearly 6 times faster than 1.6 or 1.5. How’s that possible?

	...
	for n = 1:np-1
		tic()
		for k = -n_beams:n_beams
			endpt = geod(coast[n,:], az1[n] - 90 + k*2, 350000)[1]		# Compute end point coords at azim and distance 350 km
			o = orthodrome([coast[n:n,:]; endpt], step=2000)			# Interpolate line at step increments
			Di = grdtrack(o, G, s=true, V=:q)
			Dd = mapproject(Di, G="+a+ue", o=3)							# Compute the accumulated distances in meters
			dist_T = [Dd[1].data Di[1].data[:,3]]						# Matrix with [lon lat accumdist]
			Dt = trend1d(dist_T, F="xm", N="p1+r", V=:q)				# Fit a straight line to the tracked Ts
			if (!all(isnan.(Dt[1].data[:,2])))
				D   = intersection(Dt[1].data, dist_T)					# Compute intersections between fit and track
				ind = argmin(abs.(Dd[1].data .- D[1].data[1, 1]))[1]
				front[m += 1, 1:2] = Di[1].data[ind, 1:2]				# Retain only the first intersection
			end
		end
		toc()
	end

with version v1.6 and 1.5

elapsed time: 2.5549329 seconds
elapsed time: 2.544714 seconds
elapsed time: 2.6814375 seconds
elapsed time: 2.7668562 seconds
elapsed time: 2.8290057 seconds
...

with 1.7.0-beta2

elapsed time: 0.4300048 seconds
elapsed time: 0.4277238 seconds
elapsed time: 0.4435151 seconds
elapsed time: 0.4234343 seconds
elapsed time: 0.4340528 seconds
...
17 Likes

This is really cool. For a hint, I would examine @code_native or @code_llvm and see where the compiled code diverges. I would pay particular attention to references to out of bounds (oob) checks.

1 Like

Suspiciously short, but anyway exactly equal between 1.6 & 1.7

julia> @code_native find_apueles(week=30);
        .text
; β”Œ @ find_apueles.jl:1 within `find_apueles##kw`
        pushq   %rbp
        movq    %rsp, %rbp
        subq    $32, %rsp
; β”‚ @ find_apueles.jl:3 within `find_apueles##kw`
        movq    (%rcx), %rcx
        movabsq $"#find_apueles#4", %rax
        callq   *%rax
        addq    $32, %rsp
        popq    %rbp
        retq
        nopl    (%rax)
; β””

julia> @code_llvm find_apueles(week=30);
;  @ C:\j\jls\find_apueles.jl:1 within `find_apueles##kw`
; Function Attrs: uwtable
define nonnull {}* @"julia_find_apueles##kw_1663"([1 x i64]* nocapture nonnull readonly align 8 dereferenceable(8) %0) #0 {
top:
;  @ C:\j\jls\find_apueles.jl:3 within `find_apueles##kw`
; β”Œ @ namedtuple.jl:127 within `getindex`
   %1 = getelementptr inbounds [1 x i64], [1 x i64]* %0, i64 0, i64 0
; β””
  %2 = load i64, i64* %1, align 8
  %3 = call nonnull {}* @"j_#find_apueles#4_1666"(i64 signext %2) #0
  ret {}* %3
}

julia> versioninfo()
Julia Version 1.7.0-beta2
Commit b570546b68 (2021-06-20 06:31 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.0 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 6
julia> @code_native find_apueles(week=1);
        .text
; β”Œ @ find_apueles.jl:1 within `find_apueles##kw'
        pushq   %rbp
        movq    %rsp, %rbp
        subq    $32, %rsp
; β”‚ @ find_apueles.jl:3 within `find_apueles##kw'
        movq    (%rcx), %rcx
        movabsq $"#find_apueles#4", %rax
        callq   *%rax
        addq    $32, %rsp
        popq    %rbp
        retq
        nopl    (%rax)
; β””

julia> @code_llvm find_apueles(week=30);
;  @ C:\j\jls\find_apueles.jl:1 within `find_apueles##kw'
; Function Attrs: uwtable
define nonnull {}* @"julia_find_apueles##kw_1760"([1 x i64]* nocapture nonnull readonly align 8 dereferenceable(8) %0) #0 {
top:
;  @ C:\j\jls\find_apueles.jl:3 within `find_apueles##kw'
; β”Œ @ namedtuple.jl:118 within `getindex'
   %1 = getelementptr inbounds [1 x i64], [1 x i64]* %0, i64 0, i64 0
; β””
  %2 = load i64, i64* %1, align 8
  %3 = call nonnull {}* @"j_#find_apueles#4_1763"(i64 signext %2) #0
  ret {}* %3
}

julia> versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 6

I see zero differences in the benchmarks of Cuba.jl for julia v1.6 vs master:

% CUBACORES=0 julia benchmark.jl
[ Info: Performance of Cuba.jl:
  0.251215 seconds (Vegas)
  0.647562 seconds (Suave)
  0.303899 seconds (Divonne)
  0.230467 seconds (Cuhre)
[ Info: Performance of Cuba Library in C:
  0.190322 seconds (Vegas)
  0.620006 seconds (Suave)
  0.248154 seconds (Divonne)
  0.175083 seconds (Cuhre)
[ Info: Performance of Cuba Library in Fortran:
  0.184770 seconds (Vegas)
  0.596291 seconds (Suave)
  0.226675 seconds (Divonne)
  0.155555 seconds (Cuhre)
% CUBACORES=0 julia-master benchmark.jl 
[ Info: Performance of Cuba.jl:
  0.255095 seconds (Vegas)
  0.646028 seconds (Suave)
  0.312925 seconds (Divonne)
  0.227860 seconds (Cuhre)
[ Info: Performance of Cuba Library in C:
  0.195879 seconds (Vegas)
  0.629609 seconds (Suave)
  0.247174 seconds (Divonne)
  0.173440 seconds (Cuhre)
[ Info: Performance of Cuba Library in Fortran:
  0.178327 seconds (Vegas)
  0.587523 seconds (Suave)
  0.226824 seconds (Divonne)
  0.155499 seconds (Cuhre)

A 6x difference in FFI overhead is quite large, to not say unrealistic

I agree it’s unrealistic, but what ever is the cause, it exists … and allocates a lot of memory.

# V1.7
julia> @time find_apueles(week=30);
183.546047 seconds (47.71 M allocations: 2.168 GiB, 0.12% gc time)

and

# V1.6
julia> @time find_apueles(week=30);
1089.193155 seconds (48.76 G allocations: 728.116 GiB, 1.51% gc time, 0.08% compilation time)

There is wrapper code before calling the C & C++ libs and the cause likely lies there but still the difference exists and is huge.

Have you tried to remove the calls to external libraries? It’s next to impossible the difference you’re seeing has to do with FFI overhead

I’m not sticking to the hypothesis that the speed up comes from FFI improvements. I first thought on it because almost all lines in the script call wrapper code, but after seeing the amount of allocated memory I know nothing anymore.

Can’t remove the calls to external libraries because that would broke the program. I might be able to change one call to one wrapper to another but no more.

I have a suspicion that this is Windows specific.

It may be related to this:

Isn’t the pull request you linked specific to package loading and what is timed here is not package loading?

1 Like

Yes, you are probably right. However, joa-quim also mentioned β€œwrapper code” which might be code loading related:

More generally my thought is that this has something to do with slow file system operations on Windows and Julia 1.7 reducing its dependence on those file system operations.

Another example might be reducing the use of realpath:

Some of the eternal library use cited above is done by running external processes.

Anyways, I’m speculating wildly since we have very few leads.

Let’s do some profiling to narrow down the possibilities of where the speedup is happening. It would also be great if we could produce a minimum working example showing the speedup. Is it a particular external library or process that is faster?

1 Like

I’ve done some more investigation and I think the above line is the one where the large time difference occurs. What it does in current case is to take an in memory grid G (basically a 2D array) and a Mx2 array, send them to the GMT lib that will interpolate the grid G at locations of the lon, lat positions o and return a Mx3 matrix with the result. Now, given the Julia column-major vs C row-major nature it normally has make a copy and transpose, though I try to avoid it when I can. However, nothing of this is specific of the grdtrack module. Many of the array operations in GMT go trough the same function.

This will need further investigation to find out where the culprit hides but I don’t have much time in the next days. I’ve also no idea on how to make a MWE of this that does not involve the use of that GMT module.

Could you build an example around grdtrack? Setup a small self-contained example where you load the required modules and initialize o and G for us. Demonstrate the disparate timings on 1.6 and 1.7.

We can then run the same small example and see if we see the same timings.

Ok, this simple one shows the difference. It builds a dumb grid of 1001x1001 Float32 with a constant value of 1 and interpolates at at single location (lon=-10, lat=35)

v1.6

julia> G = gmt("grdmath -R-20/0/25/45 -I0.02 1");

julia> using BenchmarkTools

julia> @btime grdtrack([-10 35], G);
  58.958 ms (3007132 allocations: 45.94 MiB)

julia> versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 6

Now v1.7

julia> G = gmt("grdmath -R-20/0/25/45 -I0.02 1");

julia> using BenchmarkTools

julia> @btime grdtrack([-10 35], G);
  7.638 ms (1142 allocations: 73.59 KiB)

julia> versioninfo()
Julia Version 1.7.0-beta2
Commit b570546b68 (2021-06-20 06:31 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.0 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 6
2 Likes

I was able to replicate your example:

Julia 1.6.0

julia> using GMT

julia> G = gmt("grdmath -R-20/0/25/45 -I0.02 1");

julia> using BenchmarkTools

julia> @btime grdtrack([-10 35], G);
  58.716 ms (3007131 allocations: 45.94 MiB)

julia> versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

Julia 1.7.01.7.0-DEV.1129 (1172c6b129)

julia> using GMT

julia> G = gmt("grdmath -R-20/0/25/45 -I0.02 1");

julia> using BenchmarkTools

julia> @btime grdtrack([-10 35], G);
  5.983 ms (1116 allocations: 72.25 KiB)

julia> versioninfo()
Julia Version 1.7.0-DEV.1129
Commit 1172c6b129* (2021-05-24 11:41 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

Here’s what I was able to profile via the following commands

julia> using Profile

julia> @profile grdtrack([-10 35], G)
Vector{GMT.GMTdataset} with 1 segments
First segment DATA
1Γ—3 Matrix{Float64}:
 -10.0  35.0  1.0


julia> Profile.print(; C = true)  

Julia 1.6.0

 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 68 /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1703; jl_apply
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  68 @GMT\src\grdtrack.jl:121; grdtrack(arg1::Matrix{Int64}, arg2::GMT.GMTgrid{Float32, 2})
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   68 @GMT\src\grdtrack.jl:121; grdtrack
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    68 @GMT\src\grdtrack.jl:121; #grdtrack#237
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     68 @GMT\src\grdtrack.jl:64; grdtrack
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 68 @GMT\src\grdtrack.jl:94; grdtrack(cmd0::String, arg1::Matrix{Int64}, arg2::GMT.GMTgrid{Float32, 2}; kwargs::Base.Iterators.Pairs{Uni...
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  68 @GMT\src\common_options.jl:3062; common_grd(::Dict{Symbol, Any}, ::String, ::Matrix{Int64}, ::GMT.GMTgrid{Float32, 2})
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   66 @GMT\src\gmt_main.jl:272; gmt(::String, ::Matrix{Int64}, ::Vararg{Any, N} where N)
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    66 [unknown stackframe]
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     66 @GMT\src\gmt_main.jl:759; GMTJL_Set_Object(API::Ptr{Nothing}, X::GMT.GMT_RESOURCE, ptr::GMT.GMTgrid{Float32, 2}, pad::Int64)
2β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 66 @GMT\src\gmt_main.jl:866; grid_init(API::Ptr{Nothing}, X::GMT.GMT_RESOURCE, Grid::GMT.GMTgrid{Float32, 2}, pad::Int64, cube::Bool)
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  1  @Base\array.jl:801; getindex
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  33 @Base\operators.jl:560; +
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   14 /cygdrive/c/buildbot/worker/package_win64/build/src\datatype.c:841; jl_box_int64
2β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    8  /cygdrive/c/buildbot/worker/package_win64/build/src\julia_internal.h:285; jl_gc_alloc_
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1206; jl_gc_pool_alloc
2β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1223; jl_gc_pool_alloc
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1240; jl_gc_pool_alloc
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1183; add_page
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc-pages.c:250; jl_gc_alloc_page
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   2  C:\WINDOWS\System32\KERNELBASE.dll:?; VirtualAlloc
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    2  C:\WINDOWS\System32\umppc13702.dll:?;
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     2  C:\WINDOWS\System32\umppc13702.dll:?;
2β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 2  C:\WINDOWS\SYSTEM32\ntdll.dll:?; NtAllocateVirtualMemory
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1245; jl_gc_pool_alloc
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    2  /cygdrive/c/buildbot/worker/package_win64/build/src\threading.c:117; jl_get_ptls_states
2β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     2  /usr/x86_64-w64-mingw32/sys-root/mingw/include/psdk_inc\intrin-impl.h:790; __readgsqword
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1  /cygdrive/c/buildbot/worker/package_win64/build/src\threading.c:125; jl_get_ptls_states
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     1  C:\WINDOWS\System32\KERNEL32.DLL:?; BaseDumpAppcompatCacheWorker
2β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    2  /cygdrive/c/buildbot/worker/package_win64/build/src\threading.c:127; jl_get_ptls_states
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2413; jl_apply_generic
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2414; jl_apply_generic
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   10 /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2415; jl_apply_generic
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2288; jl_lookup_generic_
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2314; jl_lookup_generic_
2β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    4  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2336; jl_lookup_generic_
2β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     2  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2250; sig_match_fast
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    3  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2337; jl_lookup_generic_
2β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     2  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2250; sig_match_fast
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2410; jl_lookup_generic_
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2420; jl_apply_generic
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   6  [unknown stackframe]
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1  @Base\int.jl:923; +(a::Int64, b::UInt64)
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    4  /cygdrive/c/buildbot/worker/package_win64/build/src\datatype.c:842; jl_box_uint64
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     4  /cygdrive/c/buildbot/worker/package_win64/build/src\julia_internal.h:285; jl_gc_alloc_
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1195; jl_gc_pool_alloc
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1204; jl_gc_pool_alloc
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  1  /cygdrive/c/buildbot/worker/package_win64/build/src\julia_threads.h:306; maybe_collect
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1  /cygdrive/c/buildbot/worker/package_win64/build/src\julia_threads.h:313; jl_gc_state_set
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1244; jl_gc_pool_alloc
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  1  @Base\range.jl:674; iterate
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1  @Base\promotion.jl:410; ==
3β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  10 /cygdrive/c/buildbot/worker/package_win64/build/src\datatype.c:798; jl_box_float32
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   7  /cygdrive/c/buildbot/worker/package_win64/build/src\julia_internal.h:285; jl_gc_alloc_
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    3  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1204; jl_gc_pool_alloc
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     3  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:880; maybe_collect
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 3  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:3241; jl_gc_collect
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:3034; _jl_gc_collect
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:2207; gc_mark_loop
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1816; gc_mark_scan_objarray
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1705; gc_try_setmark
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:3148; _jl_gc_collect
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1530; gc_sweep_pool
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1455; sweep_pool_pagetable
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1425; sweep_pool_pagetable1
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1405; sweep_pool_pagetable0
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  2  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1385; sweep_pool_page
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1317; sweep_page
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1337; sweep_page
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1223; jl_gc_pool_alloc
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1224; jl_gc_pool_alloc
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1228; jl_gc_pool_alloc
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1  /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1243; jl_gc_pool_alloc
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  3  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2414; jl_apply_generic
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1  /cygdrive/c/buildbot/worker/package_win64/build/src\threading.c:115; jl_get_ptls_states
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   2  /cygdrive/c/buildbot/worker/package_win64/build/src\threading.c:125; jl_get_ptls_states
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1  C:\WINDOWS\System32\KERNEL32.DLL:?; BaseDumpAppcompatCacheWorker
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  10 /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2415; jl_apply_generic
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2288; jl_lookup_generic_
4β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   4  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2336; jl_lookup_generic_
3β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   5  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2337; jl_lookup_generic_
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    2  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2247; sig_match_fast
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2253; sig_match_fast
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2256; sig_match_fast
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  1  /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2420; jl_apply_generic
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  5  [unknown stackframe]
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   4  @Base\multidimensional.jl:639; setindex!(::Vector{Float32}, ::Float32, ::UInt64)
4β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    4  @Base\array.jl:839; setindex!
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1  @GMT\src\gmt_main.jl:276; gmt(::String, ::Matrix{Int64}, ::Vararg{Any, N} where N)
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1  @GMT\src\libgmt.jl:147; GMT_Call_Module
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     1  c:\programs\gmt6\bin\gmt_w64.dll:?; GMT_Call_Module
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 1  c:\programs\gmt6\bin\gmt_w64.dll:?; GMT_grdtrack
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  1  c:\programs\gmt6\bin\gmt_w64.dll:?; gmt_end_module
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1  c:\programs\gmt6\bin\gmt_w64.dll:?; gmtlib_free_tmp_arrays
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1  c:\programs\gmt6\bin\gmt_w64.dll:?; gmt_free_func
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     1  C:\WINDOWS\System32\ucrtbase.dll:?; free_base
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; RtlFreeHeap
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; RtlGetCurrentServiceSessionId
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; RtlGetCurrentServiceSessionId
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; RtlTryAcquirePebLock
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; NtFreeVirtualMemory
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1  @GMT\src\gmt_main.jl:313; gmt(::String, ::Matrix{Int64}, ::Vararg{Any, N} where N)
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1  @GMT\src\libgmt.jl:86; GMT_Destroy_Data
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     1  c:\programs\gmt6\bin\gmt_w64.dll:?; GMT_Destroy_Data
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 1  c:\programs\gmt6\bin\gmt_w64.dll:?; gmt_whole_earth
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  1  c:\programs\gmt6\bin\gmt_w64.dll:?; gmt_free_grid
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1  c:\programs\gmt6\bin\gmt_w64.dll:?; gmtlib_free_grid_ptr
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1  c:\programs\gmt6\bin\gmt_w64.dll:?; gmt_free_func
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     1  C:\WINDOWS\System32\ucrtbase.dll:?; free_base
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; RtlFreeHeap
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; RtlGetCurrentServiceSessionId
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; RtlGetCurrentServiceSessionId
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; RtlTryAcquirePebLock
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     1  C:\WINDOWS\SYSTEM32\ntdll.dll:?; NtFreeVirtualMemory

Julia 1.7.0-DEV.1129

 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 5 /cygdrive/c/Users/kittisopikulm/source/repos/julia/src\julia.h:1760; jl_apply
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  5 @GMT\src\grdtrack.jl:121; grdtrack(arg1::Matrix{Int64}, arg2::GMT.GMTgrid{Float32, 2})
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   5 @GMT\src\grdtrack.jl:121; grdtrack
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    5 @GMT\src\grdtrack.jl:121; #grdtrack#237
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     5 @GMT\src\grdtrack.jl:64; grdtrack
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 5 @GMT\src\grdtrack.jl:94; grdtrack(cmd0::String, arg1::Matrix{Int64}, arg2::GMT.GMTgrid{Float32, 2}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTup...
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  5 @GMT\src\common_options.jl:3062; common_grd(::Dict{Symbol, Any}, ::String, ::Matrix{Int64}, ::GMT.GMTgrid{Float32, 2})
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   4 @GMT\src\gmt_main.jl:272; gmt(::String, ::Matrix{Int64}, ::Vararg{Any})
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    4 [unknown stackframe]
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     4 @GMT\src\gmt_main.jl:759; GMTJL_Set_Object(API::Ptr{Nothing}, X::GMT.GMT_RESOURCE, ptr::GMT.GMTgrid{Float32, 2}, pad::Int64)
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 4 @GMT\src\gmt_main.jl:866; grid_init(API::Ptr{Nothing}, X::GMT.GMT_RESOURCE, Grid::GMT.GMTgrid{Float32, 2}, pad::Int64, cube::Bool)
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  1 @Base\int.jl:87; +
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  3 @Base\multidimensional.jl:646; setindex!
3β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   3 @Base\array.jl:875; setindex!
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1 @GMT\src\gmt_main.jl:276; gmt(::String, ::Matrix{Int64}, ::Vararg{Any})
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1 @GMT\src\libgmt.jl:147; GMT_Call_Module
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž     1 C:\programs\gmt6\bin\gmt_w64.dll:?; GMT_Call_Module
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž 1 C:\programs\gmt6\bin\gmt_w64.dll:?; GMT_grdtrack
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž  1 C:\programs\gmt6\bin\gmt_w64.dll:?; GMT_Read_Data
 β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž   1 C:\programs\gmt6\bin\gmt_w64.dll:?; gmt_whole_earth
1β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    β•Ž    1 C:\programs\gmt6\bin\gmt_w64.dll:?; gmt_grd_BC_set

The main difference seems to be in the execution of gmt_main.jl:866 at GMT v0.35.0:

Julia 1.6.0

Julia 1.7.0-DEV

Julia 1.6.0 seems to spend much more time on Line 560 of base/operators.jl which involves argument list folding via afoldl:

While that line has not changed in Julia 1.7, afoldl has changed:

It looks like you have @kslimes to thank for this amazing improvement in speed.

24 Likes

Thanks for digging this out. A nice example also on how to use the profiler. Big thanks to @kshyatt

6 Likes

I’m beginning to think I may be pointing at the wrong change since redefining afoldl before the above change does not result in a significant slow down. It may be this change by @jameson instead:
https://github.com/JuliaLang/julia/pull/39414

2 Likes