Extremely high first call latency Julia 1.6 versus 1.5 with multiphysics PDE solver

Would it be possible to simplify the code (i.e. parameters like grid size, number of equations etc.) in Julia 1.5 to run in say seconds instead of minutes? I’d expect a MWE on Julia 1.6 to run in minutes instead of hours then, which might simplify things for analysis.

There is this post. How to cut down compile time when inference is not the problem? However, I do not actually know if there is actually relation among the causes underlying the observations in this thread and the one at that thread.

1 Like

Then they might want to modify all @noinline methods in SparseMatrixAssemblers.jl adding a @nospecialize like so:

@noinline function _numeric_loop_vector!(vec,caches,cell_vals,cell_rows)
  @nospecialize
  add_cache, vals_cache, rows_cache = caches
  @assert length(cell_vals) == length(cell_rows)
  add! = AddEntriesMap(+)
  for cell in 1:length(cell_rows)
    rows = getindex!(rows_cache,cell_rows,cell)
    vals = getindex!(vals_cache,cell_vals,cell)
    evaluate!(add_cache,add!,vec,vals,rows)
  end
end

This seems to make things workable again for me on Julia 1.7.0-rc3. But be aware of this open issue.

1 Like

Hi @goerch! I have implemented function _numeric_loop_vector! in Gridap.jl

To give more background, in this function call, caches and cell_vals are potentially VERY complex nested inmutable objects (so no surprise that this is a hot spot). I am curious how @nospecialize would affect the run time performance taking into account that length(cell_rows) can be of the order of 10^7 for a large finite element mesh.

This seems to make things workable again for me on Julia 1.7.0-rc3. But be aware of this open issue.

How about calling Base.inferencebarrier on caches and cell_vals, which are the bad guys? Do you think this can also improve things?

I am aware that we have complex types in Gridap.jl, but it was not a fatal problem until Julia 1.6 for sophisticated equations.

Hi @fverdugo,

my first suspicion would be that the increased compile time is due to new optimizations. So I’d expect a discussion about missed optimizations next, of course;)

Will check the example from here with @btime next.

Edit: done.

1 Like

I have an issue with compile time latency with a neural work in Julia 1.6 and 1.7-rc2 (compared to Julia 1.5) which might be related:

If one uses the option -O1 the compile times are vastly reduced. When inspecting the run with callgrind to CPUs time is mostly spend in LLVM even on the second call of the same function with the same arguments (using the default optimization level).

I tried to compile Julia from source changing the parameters max_methods, tupletype_depth, tuple_splat, inline_tupleret_bonus but without luck.

When I interrupt Julia during the excessive compilation, it seemed to be busy in the DAGCombiner phase of LLVM.

2 Likes

Interesting, I didn’t even get the idea: is this simply an(occasionally very) expensive optimization disabled in -O1 and enabled in -O2?

Edit: OK i checked this hypothesis for @amartinhuertas problem, and indeed: the original test case works in reasonable time on Julia 1.6.4 without adding @nospecialize if one simply reduces the optimization level to -O1!

Edit: updated https://github.com/JuliaLang/julia/issues/43206#issuecomment-980599991

2 Likes

Trying to reduce the original MWE I stumbled about quite some type instabilities and have the distinct impression of a relation between type instabilities and longer compile times on -O2 at least under newer versions of Julia (I worked with 1.8.0 to make the best use of JET). Therefore I filed gridapapps/GridapGeosciences.jl#28 and gridapapps/GridapGeosciences.jl#29.

Can someone confirm the impression?

Edit: not a native speaker…

For the records, we could already “solve” (bypass) the issue. See https://github.com/JuliaLang/julia/issues/43206#issuecomment-983474073 for more details.

1 Like