On OS X Catalina, running 'solve' in DifferentialEquations causes crash

On my MacBook Pro, I have encountered a number of kernel crashes while running the ‘solve’ command from the DifferentialEquations.jl package. One such situation occurs when I run the exact code from Example 3 here: Ordinary Differential Equations

I have tried with ‘vanilla’ Julia, with JuliaPro, with and without Jupyter, and these crashes happen in each case, any time there is a differential equation that’s not completely basic (I can run ‘Step 1’ from the link above, for example). Normally my terminal simply closes, or Jupyter informs me that the kernel crashed during code execution, but a couple of times the kernel did print an error to the screen, which included: “inline asm error. This value type register class is not natively supported”.

I’m currently running Julia 1.4, with DifferentialEquations 6.14. Any tips on what I’m doing wrong?

@YingboMa @Elrod was this hunted down? Xref: https://github.com/SciML/DifferentialEquations.jl/issues/623

That should have been solved on the latest VectorizationBase, because it should now be querying LLVM for the supported instructions rather than checking the actual CPU.

Perhaps RecursiveFactorization should set a lower bound on this VectorizationBase version?

@perseid28 mind adding VectorizationBase, telling me the version it installed, and running solve again? Versions >= v0.12.7 should work.
Also, mind posting your full versioninfo() as well? I’m guessing you’re on a new MacBook with an Ice Lake CPU?
In case you’re curious about the problem: this CPU has a lot of new features, but because of how Apple does something, LLVM doesn’t recognize this and therefore cannot/will not use them. That bug should be fixed in Julia master, but VectorizationBase probably should have been checking with LLVM for features instead of CpuId anyway, so that’s what it started doing in v0.12.7.

yeah we should probably add that.

I can make a PR. I’ll make it a later release to also add a check for if they aren’t using the usual LLVM (e.g., installed using Gentoo’s package manager). While I hope folks don’t do that, if it’s easy enough to support…

Okay, thanks!

Thanks, that is good to know. I haven’t used Julia in a few months but this experience of it not “just working” was something new. In regards to your advice, I added VectorizationBase, v.0.12.12, but even after adding the package to my code I get the crash on the solve() call. Here’s my versioninfo():

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, goldmont)

Do you mind to try out Julia nightly https://julialang.org/downloads/nightlies/ ? It should work after https://github.com/JuliaLang/julia/pull/36330, but since I don’t have Apple hardware, I cannot test it.

We very recently updated to use these new vectorization tools since it gave some pretty massive accelerations on small stiff ODEs (<200) over OpenBLAS and MKL (without lots of tweaking). So it’s somewhat new, and it means we have to iron out a few things on a few CPUs, but the experience is so much better (10x faster small stiff ODEs than OpenBLAS for users who don’t manually change thread counts for example) that I want to just keep pushing this and fix any issues that come up, so thanks for reporting the issue because this is how the brave new world of performance will become stable.

1 Like

Could you also post the result of

using Pkg; Pkg.pkg"st -m"

please? By the way, it’s easier for others to read if you quote the output with triple-backticks ```, see PSA: make it easier to help you for more details.

Ok, thanks for letting me know.

julia> using Pkg; Pkg.pkg"st -m"
Status `~/.julia/environments/v1.4/Manifest.toml`
  [c3fe647b] AbstractAlgebra v0.9.2
  [1520ce14] AbstractTrees v0.3.3
  [79e6a3ab] Adapt v1.1.0
  [ec485272] ArnoldiMethod v0.0.4
  [4fba245c] ArrayInterface v2.9.0
  [4c555306] ArrayLayouts v0.2.6
  [aae01518] BandedMatrices v0.15.7
  [6e4b80f9] BenchmarkTools v0.5.0
  [764a87c0] BoundaryValueDiffEq v2.5.0
  [6e34b625] Bzip2_jll v1.0.6+2
  [fa961155] CEnum v0.4.1
  [a603d957] CanonicalTraits v0.2.1
  [d360d2e6] ChainRulesCore v0.8.1
  [35d6a980] ColorSchemes v3.9.0
  [3da002f7] ColorTypes v0.10.4
  [5ae59095] Colors v0.12.2
  [861a8166] Combinatorics v1.0.2
  [bbf7d656] CommonSubexpressions v0.2.0
  [34da2185] Compat v3.12.0
  [e66e0078] CompilerSupportLibraries_jll v0.3.3+0
  [8f4d0f93] Conda v1.4.1
  [88cd18e8] ConsoleProgressMonitor v0.1.2
  [187b0558] ConstructionBase v1.0.0
  [d38c429a] Contour v0.5.3
  [adafc99b] CpuId v0.2.2
  [9a962f9c] DataAPI v1.3.0
  [864edb3b] DataStructures v0.17.19
  [bcd4f6db] DelayDiffEq v5.24.1
  [2b5f629d] DiffEqBase v6.39.1
  [459566f4] DiffEqCallbacks v2.13.3
  [5a0ffddc] DiffEqFinancial v2.4.0
  [c894b116] DiffEqJump v6.9.2
  [77a26b50] DiffEqNoiseProcess v3.11.0
  [055956cb] DiffEqPhysics v3.2.0
  [163ba53b] DiffResults v1.0.2
  [b552c78f] DiffRules v1.0.1
  [0c46a032] DifferentialEquations v6.14.0
  [c619ae07] DimensionalPlotRecipes v1.2.0
  [b4f34e82] Distances v0.8.2
  [ffbed154] DocStringExtensions v0.8.2
  [d4d017d3] ExponentialUtilities v1.7.0
  [c87230d0] FFMPEG v0.3.0
  [b22a6f82] FFMPEG_jll v4.1.0+3
  [1a297f60] FillArrays v0.8.11
  [6a86dc24] FiniteDiff v2.3.2
  [53c48c17] FixedPointNumbers v0.8.1
  [59287772] Formatting v0.4.1
  [f6369f11] ForwardDiff v0.10.10
  [d7e528f0] FreeType2_jll v2.10.1+2
  [559328eb] FriBidi_jll v1.0.5+3
  [069b7b12] FunctionWrappers v1.1.1
  [28b8d3ca] GR v0.50.1
  [6b9d7cbe] GeneralizedGenerated v0.2.4
  [01680d73] GenericSVD v0.3.0
  [4d00f742] GeometryTypes v0.8.3
  [cd3eb016] HTTP v0.8.15
  [7073ff75] IJulia v1.21.2
  [18364772] IPython v0.5.0
  [d25df0c9] Inflate v0.1.2
  [83e8ac13] IniFile v0.5.0
  [42fd0dbc] IterativeSolvers v0.8.4
  [82899510] IteratorInterfaceExtensions v1.0.0
  [682c06a0] JSON v0.21.0
  [b14d175d] JuliaVariables v0.2.0
  [c1c5ebd0] LAME_jll v3.100.0+1
  [929cbde3] LLVM v1.7.0
  [b964fa9f] LaTeXStrings v1.1.0
  [2ee39098] LabelledArrays v1.2.2
  [23fbe1c1] Latexify v0.13.5
  [1d6d02ad] LeftChildRightSiblingTrees v0.1.2
  [dd192d2f] LibVPX_jll v1.8.1+1
  [093fc24a] LightGraphs v1.3.3
  [d3d80556] LineSearches v7.0.1
  [e6f89c97] LoggingExtras v0.4.1
  [bdcacae8] LoopVectorization v0.8.7
  [d00139f3] METIS_jll v5.1.0+4
  [d8e11817] MLStyle v0.3.1
  [1914dd2f] MacroTools v0.5.5
  [739be429] MbedTLS v1.0.2
  [c8ffd9c3] MbedTLS_jll v2.16.6+0
  [442fdcdd] Measures v0.3.1
  [e1d29d7a] Missings v0.4.3
  [961ee093] ModelingToolkit v3.10.2
  [46d2c3a1] MuladdMacro v0.2.2
  [f9640e96] MultiScaleArrays v1.8.1
  [d41bc354] NLSolversBase v7.6.1
  [2774e3e8] NLsolve v4.4.0
  [77ba4419] NaNMath v0.3.3
  [71a1bf82] NameResolution v0.1.3
  [6fe1bfb0] OffsetArrays v1.1.0
  [e7412a2a] Ogg_jll v1.3.4+0
  [4536629a] OpenBLAS_jll v0.3.9+4
  [458c3c95] OpenSSL_jll v1.1.1+4
  [efe28fd5] OpenSpecFun_jll v0.5.3+3
  [91d4177d] Opus_jll v1.3.1+1
  [bac558e1] OrderedCollections v1.2.0
  [1dea7af3] OrdinaryDiffEq v5.41.0
  [65888b18] ParameterizedFunctions v5.3.0
  [d96e819e] Parameters v0.12.1
  [69de0a69] Parsers v1.0.6
  [ccf2f8ad] PlotThemes v2.0.0
  [995b91a9] PlotUtils v1.0.5
  [91a5bcdd] Plots v1.4.3
  [e409e4f3] PoissonRandom v0.4.0
  [8162dcfd] PrettyPrint v0.1.0
  [33c8b6b6] ProgressLogging v0.1.3
  [92933f4c] ProgressMeter v1.3.1
  [438e738f] PyCall v1.91.4
  [e6cf234a] RandomNumbers v1.4.0
  [3cdcf5f2] RecipesBase v1.0.1
  [01d81517] RecipesPipeline v0.1.10
  [731186ca] RecursiveArrayTools v2.5.0
  [f2c3362d] RecursiveFactorization v0.1.2
  [189a3867] Reexport v0.2.0
  [ae029012] Requires v1.0.1
  [ae5879a3] ResettableStacks v1.0.0
  [f2b01f46] Roots v1.0.2
  [21efa798] SIMDPirates v0.8.10
  [476501e8] SLEEFPirates v0.5.1
  [1bc83da4] SafeTestsets v0.0.1
  [992d4aef] Showoff v0.3.1
  [699a6c99] SimpleTraits v0.9.2
  [b85f4697] SoftGlobalScope v1.0.10
  [a2af1166] SortingAlgorithms v0.3.1
  [47a9eef4] SparseDiffTools v1.9.0
  [276daf66] SpecialFunctions v0.10.3
  [90137ffa] StaticArrays v0.12.3
  [2913bbd2] StatsBase v0.33.0
  [9672c7b4] SteadyStateDiffEq v1.5.1
  [789caeaf] StochasticDiffEq v6.20.0
  [bea87d4a] SuiteSparse_jll v5.4.0+8
  [c3572dad] Sundials v4.2.3
  [fb77eaff] Sundials_jll v5.2.0+0
  [d1185830] SymbolicUtils v0.3.4
  [3783bdb8] TableTraits v1.0.0
  [5d786b92] TerminalLoggers v0.1.2
  [a759f4b9] TimerOutputs v0.5.6
  [a2a6695c] TreeViews v0.3.0
  [3a884ed6] UnPack v1.0.1
  [1986cc42] Unitful v1.3.0
  [3d5dd08c] VectorizationBase v0.12.12
  [81def892] VersionParsing v1.2.0
  [19fa3120] VertexSafeGraphs v0.1.2
  [c2297ded] ZMQ v1.2.1
  [8f1865be] ZeroMQ_jll v4.3.2+4
  [83775a58] Zlib_jll v1.2.11+14
  [700de1a5] ZygoteRules v0.2.0
  [0ac62f75] libass_jll v0.14.0+2
  [f638f0a6] libfdk_aac_jll v0.1.6+2
  [f27f6e37] libvorbis_jll v1.3.6+4
  [1270edf5] x264_jll v2019.5.25+2
  [dfaa095f] x265_jll v3.0.0+1
  [2a0f44e3] Base64 
  [ade2ca70] Dates 
  [8bb1440f] DelimitedFiles 
  [8ba89e20] Distributed 
  [7b1f6079] FileWatching 
  [b77e0a4c] InteractiveUtils 
  [76f85450] LibGit2 
  [8f399da3] Libdl 
  [37e2e46d] LinearAlgebra 
  [56ddb016] Logging 
  [d6f4376e] Markdown 
  [a63ad114] Mmap 
  [44cfe95a] Pkg 
  [de0858da] Printf 
  [3fa0cd96] REPL 
  [9a3f8284] Random 
  [ea8e919c] SHA 
  [9e88b42a] Serialization 
  [1a1011a3] SharedArrays 
  [6462fe0b] Sockets 
  [2f01184e] SparseArrays 
  [10745b16] Statistics 
  [4607b0f0] SuiteSparse 
  [8dfed614] Test 
  [cf7118a7] UUIDs 
  [4ec0a83e] Unicode

Yes, that does look better :blush:

VectorizationBase v0.12.13 has been released. It should solve the problem. There was a bug in this check:

that is now fixed.

That said, the Julia nightly should have better on those still solvers, as LLVM should be able to use AVX512 instructions there.

Thanks very much for the assistance; after upgrading VectorizationBase it runs fine now.

It does work just fine on this build 1.6.0-DEV.306, without the VectorizationBase package. @Elrod mentions possible performance increases on the nightly build as well; I’m not sure how to check this, but if there’s something you want me to run just send me some code. Otherwise I’ll go ahead and remove this build, since 1.4 has been sorted.

In both Julia 1.4 and 1.6, you could try:

using Pkg
Pkg.add(["RecursiveFactorization", "VegaLite", "DataFrames", "BenchmarkTools"])
using RecursiveFactorization
include(joinpath(pkgdir(RecursiveFactorization), "perf", "lu.jl"))

and share your results here.

You could also try benchmarking the particular problem you were solving between versions. Solving that problem uses RecursiveFactorization.jl.

Something else simple you can try to compare versions:

using BenchmarkTools
function mysum(x)
    s = zero(eltype(x))
    @simd for xᵢ ∈ x
        s += xᵢ
    end
    s
end
x = rand(128);
@btime mysum($x)

For Example 3 from that tutorial page I linked on the first page, it appears 1.4.2 runs significantly faster:

julia> @btime solve(prob);
  101.799 μs (3991 allocations: 112.48 KiB)

julia> @btime solve(prob);
  102.888 μs (3991 allocations: 112.48 KiB)

julia> versioninfo()
Julia Version 1.4.2
julia> @btime solve(prob);
  660.923 μs (3989 allocations: 113.94 KiB)

julia> @btime solve(prob);
  658.327 μs (3989 allocations: 113.94 KiB)

julia> versioninfo()
Julia Version 1.6.0-DEV.306

And for your sum code:

julia> @btime mysum($x)
  6.459 ns (0 allocations: 0 bytes)
59.37744793932639

julia> versioninfo()
Julia Version 1.4.2
julia> @btime mysum($x)
  6.332 ns (0 allocations: 0 bytes)
66.60390799012274

julia> versioninfo()
Julia Version 1.6.0-DEV.306

What if you do @btime solve($prob)? It’s kind of weird that Julia 1.6 is much slower.

I don’t remember what I imported last time, but here I’m only using DifferentialEquations and BenchmarkTools. I ran updates just prior, so both 1.4 and 1.6 are running VectorizationBase 0.12.16.

julia> @btime solve($prob);
  104.578 μs (3991 allocations: 112.48 KiB)

julia> versioninfo()
Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, goldmont)

And 1.6 is still quite a bit slower:

julia> @btime solve($prob);
  716.312 μs (3989 allocations: 113.94 KiB)

julia> versioninfo()
Julia Version 1.6.0-DEV.306
Commit 59b8dde7c1 (2020-06-26 09:21 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, icelake-client)

Interesting. Mind showing the @code_native debuginfo=:none and/or the @code_llvm debuginfo=:none of both?
The fact that sum didn’t improve suggests it is still just using ymm registers on Julia 1.6.

Then, mind also showing this for both versions:
VectorizationBase.REGISTER_COUNT and VectorizationBase.REGISTER_SIZE?
VectorizationBase.REGISTER_SIZE is now obviously in agreement with LLVM (if it isn’t, you’ll get crashes), but I’m worried about REGISTER_COUNT. Depending on those results, I’ll have something else for you to run to see if that is wrong.
If it is wrong, code will run, but much more slowly.

So that is a possible explanation for the performance degradation, which is why I want to look into it.

Sorry, I meant @code_native debuginfo=:none mysum(x)