Extremely high first call latency Julia 1.6 versus 1.5 with multiphysics PDE solver

amartinhuertas · November 26, 2021, 7:29am

Would it be possible to simplify the code (i.e. parameters like grid size, number of equations etc.) in Julia 1.5 to run in say seconds instead of minutes? I’d expect a MWE on Julia 1.6 to run in minutes instead of hours then, which might simplify things for analysis.

There is this post. How to cut down compile time when inference is not the problem? However, I do not actually know if there is actually relation among the causes underlying the observations in this thread and the one at that thread.

goerch · November 26, 2021, 7:40am

Then they might want to modify all @noinline methods in SparseMatrixAssemblers.jl adding a @nospecialize like so:

@noinline function _numeric_loop_vector!(vec,caches,cell_vals,cell_rows)
  @nospecialize
  add_cache, vals_cache, rows_cache = caches
  @assert length(cell_vals) == length(cell_rows)
  add! = AddEntriesMap(+)
  for cell in 1:length(cell_rows)
    rows = getindex!(rows_cache,cell_rows,cell)
    vals = getindex!(vals_cache,cell_vals,cell)
    evaluate!(add_cache,add!,vec,vals,rows)
  end
end

This seems to make things workable again for me on Julia 1.7.0-rc3. But be aware of this open issue.

fverdugo · November 26, 2021, 7:59am

Hi @goerch! I have implemented function _numeric_loop_vector! in Gridap.jl

To give more background, in this function call, caches and cell_vals are potentially VERY complex nested inmutable objects (so no surprise that this is a hot spot). I am curious how @nospecialize would affect the run time performance taking into account that length(cell_rows) can be of the order of 10^7 for a large finite element mesh.

This seems to make things workable again for me on Julia 1.7.0-rc3. But be aware of this open issue.

How about calling Base.inferencebarrier on caches and cell_vals, which are the bad guys? Do you think this can also improve things?

I am aware that we have complex types in Gridap.jl, but it was not a fatal problem until Julia 1.6 for sophisticated equations.

goerch · November 26, 2021, 8:06am

Hi @fverdugo,

my first suspicion would be that the increased compile time is due to new optimizations. So I’d expect a discussion about missed optimizations next, of course;)

Will check the example from here with @btime next.

Edit: done.

Alexander-Barth · November 26, 2021, 12:40pm

I have an issue with compile time latency with a neural work in Julia 1.6 and 1.7-rc2 (compared to Julia 1.5) which might be related:

If one uses the option -O1 the compile times are vastly reduced. When inspecting the run with callgrind to CPUs time is mostly spend in LLVM even on the second call of the same function with the same arguments (using the default optimization level).

I tried to compile Julia from source changing the parameters max_methods, tupletype_depth, tuple_splat, inline_tupleret_bonus but without luck.

github.com

JuliaLang/julia/blob/master/base/compiler/types.jl#L69


      
          
          abstract type ForwardableArgtypes end
          
          struct AnalysisResults
              result
              next::AnalysisResults
              AnalysisResults(@nospecialize(result), next::AnalysisResults) = new(result, next)
              AnalysisResults(@nospecialize(result)) = new(result)
              # NullAnalysisResults() = new(nothing)
              # global const NULL_ANALYSIS_RESULTS = NullAnalysisResults()
          end
          const NULL_ANALYSIS_RESULTS = AnalysisResults(nothing)
          
          """
              InferenceResult(linfo::MethodInstance, [argtypes::ForwardableArgtypes, 𝕃::AbstractLattice])
          
          A type that represents the result of running type inference on a chunk of code.
          
          See also [`matching_cache_argtypes`](@ref).
          """
          mutable struct InferenceResult

When I interrupt Julia during the excessive compilation, it seemed to be busy in the DAGCombiner phase of LLVM.

goerch · November 26, 2021, 7:05pm

Interesting, I didn’t even get the idea: is this simply an(occasionally very) expensive optimization disabled in -O1 and enabled in -O2?

Edit: OK i checked this hypothesis for @amartinhuertas problem, and indeed: the original test case works in reasonable time on Julia 1.6.4 without adding @nospecialize if one simply reduces the optimization level to -O1!

Edit: updated https://github.com/JuliaLang/julia/issues/43206#issuecomment-980599991

goerch · November 30, 2021, 1:20pm

Trying to reduce the original MWE I stumbled about quite some type instabilities and have the distinct impression of a relation between type instabilities and longer compile times on -O2 at least under newer versions of Julia (I worked with 1.8.0 to make the best use of JET). Therefore I filed gridapapps/GridapGeosciences.jl#28 and gridapapps/GridapGeosciences.jl#29.

Can someone confirm the impression?

Edit: not a native speaker…

amartinhuertas · December 2, 2021, 8:03am

For the records, we could already “solve” (bypass) the issue. See https://github.com/JuliaLang/julia/issues/43206#issuecomment-983474073 for more details.

Topic		Replies	Views
Applying performance tips in library Performance	0	318	July 28, 2021
Benchmarking a simple PDE algorithm in Julia, Python, Matlab, C++, and Fortran Numerics diffeq	41	8869	November 19, 2019
Performance regression in 1.0.1 Performance	5	879	October 2, 2018
Julia v1.6: `using` is faster but first call is slower Performance question	1	612	February 8, 2021
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HPX, Go, Julia, Python, Rust, Swift, and Java Numerics hpc	7	535	July 8, 2023

Extremely high first call latency Julia 1.6 versus 1.5 with multiphysics PDE solver

Related topics