On OS X Catalina, running 'solve' in DifferentialEquations causes crash

I can make a PR. I’ll make it a later release to also add a check for if they aren’t using the usual LLVM (e.g., installed using Gentoo’s package manager). While I hope folks don’t do that, if it’s easy enough to support…

Okay, thanks!

Thanks, that is good to know. I haven’t used Julia in a few months but this experience of it not “just working” was something new. In regards to your advice, I added VectorizationBase, v.0.12.12, but even after adding the package to my code I get the crash on the solve() call. Here’s my versioninfo():

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, goldmont)

Do you mind to try out Julia nightly https://julialang.org/downloads/nightlies/ ? It should work after https://github.com/JuliaLang/julia/pull/36330, but since I don’t have Apple hardware, I cannot test it.

We very recently updated to use these new vectorization tools since it gave some pretty massive accelerations on small stiff ODEs (<200) over OpenBLAS and MKL (without lots of tweaking). So it’s somewhat new, and it means we have to iron out a few things on a few CPUs, but the experience is so much better (10x faster small stiff ODEs than OpenBLAS for users who don’t manually change thread counts for example) that I want to just keep pushing this and fix any issues that come up, so thanks for reporting the issue because this is how the brave new world of performance will become stable.

1 Like

Could you also post the result of

using Pkg; Pkg.pkg"st -m"

please? By the way, it’s easier for others to read if you quote the output with triple-backticks ```, see PSA: make it easier to help you for more details.

Ok, thanks for letting me know.

julia> using Pkg; Pkg.pkg"st -m"
Status `~/.julia/environments/v1.4/Manifest.toml`
  [c3fe647b] AbstractAlgebra v0.9.2
  [1520ce14] AbstractTrees v0.3.3
  [79e6a3ab] Adapt v1.1.0
  [ec485272] ArnoldiMethod v0.0.4
  [4fba245c] ArrayInterface v2.9.0
  [4c555306] ArrayLayouts v0.2.6
  [aae01518] BandedMatrices v0.15.7
  [6e4b80f9] BenchmarkTools v0.5.0
  [764a87c0] BoundaryValueDiffEq v2.5.0
  [6e34b625] Bzip2_jll v1.0.6+2
  [fa961155] CEnum v0.4.1
  [a603d957] CanonicalTraits v0.2.1
  [d360d2e6] ChainRulesCore v0.8.1
  [35d6a980] ColorSchemes v3.9.0
  [3da002f7] ColorTypes v0.10.4
  [5ae59095] Colors v0.12.2
  [861a8166] Combinatorics v1.0.2
  [bbf7d656] CommonSubexpressions v0.2.0
  [34da2185] Compat v3.12.0
  [e66e0078] CompilerSupportLibraries_jll v0.3.3+0
  [8f4d0f93] Conda v1.4.1
  [88cd18e8] ConsoleProgressMonitor v0.1.2
  [187b0558] ConstructionBase v1.0.0
  [d38c429a] Contour v0.5.3
  [adafc99b] CpuId v0.2.2
  [9a962f9c] DataAPI v1.3.0
  [864edb3b] DataStructures v0.17.19
  [bcd4f6db] DelayDiffEq v5.24.1
  [2b5f629d] DiffEqBase v6.39.1
  [459566f4] DiffEqCallbacks v2.13.3
  [5a0ffddc] DiffEqFinancial v2.4.0
  [c894b116] DiffEqJump v6.9.2
  [77a26b50] DiffEqNoiseProcess v3.11.0
  [055956cb] DiffEqPhysics v3.2.0
  [163ba53b] DiffResults v1.0.2
  [b552c78f] DiffRules v1.0.1
  [0c46a032] DifferentialEquations v6.14.0
  [c619ae07] DimensionalPlotRecipes v1.2.0
  [b4f34e82] Distances v0.8.2
  [ffbed154] DocStringExtensions v0.8.2
  [d4d017d3] ExponentialUtilities v1.7.0
  [c87230d0] FFMPEG v0.3.0
  [b22a6f82] FFMPEG_jll v4.1.0+3
  [1a297f60] FillArrays v0.8.11
  [6a86dc24] FiniteDiff v2.3.2
  [53c48c17] FixedPointNumbers v0.8.1
  [59287772] Formatting v0.4.1
  [f6369f11] ForwardDiff v0.10.10
  [d7e528f0] FreeType2_jll v2.10.1+2
  [559328eb] FriBidi_jll v1.0.5+3
  [069b7b12] FunctionWrappers v1.1.1
  [28b8d3ca] GR v0.50.1
  [6b9d7cbe] GeneralizedGenerated v0.2.4
  [01680d73] GenericSVD v0.3.0
  [4d00f742] GeometryTypes v0.8.3
  [cd3eb016] HTTP v0.8.15
  [7073ff75] IJulia v1.21.2
  [18364772] IPython v0.5.0
  [d25df0c9] Inflate v0.1.2
  [83e8ac13] IniFile v0.5.0
  [42fd0dbc] IterativeSolvers v0.8.4
  [82899510] IteratorInterfaceExtensions v1.0.0
  [682c06a0] JSON v0.21.0
  [b14d175d] JuliaVariables v0.2.0
  [c1c5ebd0] LAME_jll v3.100.0+1
  [929cbde3] LLVM v1.7.0
  [b964fa9f] LaTeXStrings v1.1.0
  [2ee39098] LabelledArrays v1.2.2
  [23fbe1c1] Latexify v0.13.5
  [1d6d02ad] LeftChildRightSiblingTrees v0.1.2
  [dd192d2f] LibVPX_jll v1.8.1+1
  [093fc24a] LightGraphs v1.3.3
  [d3d80556] LineSearches v7.0.1
  [e6f89c97] LoggingExtras v0.4.1
  [bdcacae8] LoopVectorization v0.8.7
  [d00139f3] METIS_jll v5.1.0+4
  [d8e11817] MLStyle v0.3.1
  [1914dd2f] MacroTools v0.5.5
  [739be429] MbedTLS v1.0.2
  [c8ffd9c3] MbedTLS_jll v2.16.6+0
  [442fdcdd] Measures v0.3.1
  [e1d29d7a] Missings v0.4.3
  [961ee093] ModelingToolkit v3.10.2
  [46d2c3a1] MuladdMacro v0.2.2
  [f9640e96] MultiScaleArrays v1.8.1
  [d41bc354] NLSolversBase v7.6.1
  [2774e3e8] NLsolve v4.4.0
  [77ba4419] NaNMath v0.3.3
  [71a1bf82] NameResolution v0.1.3
  [6fe1bfb0] OffsetArrays v1.1.0
  [e7412a2a] Ogg_jll v1.3.4+0
  [4536629a] OpenBLAS_jll v0.3.9+4
  [458c3c95] OpenSSL_jll v1.1.1+4
  [efe28fd5] OpenSpecFun_jll v0.5.3+3
  [91d4177d] Opus_jll v1.3.1+1
  [bac558e1] OrderedCollections v1.2.0
  [1dea7af3] OrdinaryDiffEq v5.41.0
  [65888b18] ParameterizedFunctions v5.3.0
  [d96e819e] Parameters v0.12.1
  [69de0a69] Parsers v1.0.6
  [ccf2f8ad] PlotThemes v2.0.0
  [995b91a9] PlotUtils v1.0.5
  [91a5bcdd] Plots v1.4.3
  [e409e4f3] PoissonRandom v0.4.0
  [8162dcfd] PrettyPrint v0.1.0
  [33c8b6b6] ProgressLogging v0.1.3
  [92933f4c] ProgressMeter v1.3.1
  [438e738f] PyCall v1.91.4
  [e6cf234a] RandomNumbers v1.4.0
  [3cdcf5f2] RecipesBase v1.0.1
  [01d81517] RecipesPipeline v0.1.10
  [731186ca] RecursiveArrayTools v2.5.0
  [f2c3362d] RecursiveFactorization v0.1.2
  [189a3867] Reexport v0.2.0
  [ae029012] Requires v1.0.1
  [ae5879a3] ResettableStacks v1.0.0
  [f2b01f46] Roots v1.0.2
  [21efa798] SIMDPirates v0.8.10
  [476501e8] SLEEFPirates v0.5.1
  [1bc83da4] SafeTestsets v0.0.1
  [992d4aef] Showoff v0.3.1
  [699a6c99] SimpleTraits v0.9.2
  [b85f4697] SoftGlobalScope v1.0.10
  [a2af1166] SortingAlgorithms v0.3.1
  [47a9eef4] SparseDiffTools v1.9.0
  [276daf66] SpecialFunctions v0.10.3
  [90137ffa] StaticArrays v0.12.3
  [2913bbd2] StatsBase v0.33.0
  [9672c7b4] SteadyStateDiffEq v1.5.1
  [789caeaf] StochasticDiffEq v6.20.0
  [bea87d4a] SuiteSparse_jll v5.4.0+8
  [c3572dad] Sundials v4.2.3
  [fb77eaff] Sundials_jll v5.2.0+0
  [d1185830] SymbolicUtils v0.3.4
  [3783bdb8] TableTraits v1.0.0
  [5d786b92] TerminalLoggers v0.1.2
  [a759f4b9] TimerOutputs v0.5.6
  [a2a6695c] TreeViews v0.3.0
  [3a884ed6] UnPack v1.0.1
  [1986cc42] Unitful v1.3.0
  [3d5dd08c] VectorizationBase v0.12.12
  [81def892] VersionParsing v1.2.0
  [19fa3120] VertexSafeGraphs v0.1.2
  [c2297ded] ZMQ v1.2.1
  [8f1865be] ZeroMQ_jll v4.3.2+4
  [83775a58] Zlib_jll v1.2.11+14
  [700de1a5] ZygoteRules v0.2.0
  [0ac62f75] libass_jll v0.14.0+2
  [f638f0a6] libfdk_aac_jll v0.1.6+2
  [f27f6e37] libvorbis_jll v1.3.6+4
  [1270edf5] x264_jll v2019.5.25+2
  [dfaa095f] x265_jll v3.0.0+1
  [2a0f44e3] Base64 
  [ade2ca70] Dates 
  [8bb1440f] DelimitedFiles 
  [8ba89e20] Distributed 
  [7b1f6079] FileWatching 
  [b77e0a4c] InteractiveUtils 
  [76f85450] LibGit2 
  [8f399da3] Libdl 
  [37e2e46d] LinearAlgebra 
  [56ddb016] Logging 
  [d6f4376e] Markdown 
  [a63ad114] Mmap 
  [44cfe95a] Pkg 
  [de0858da] Printf 
  [3fa0cd96] REPL 
  [9a3f8284] Random 
  [ea8e919c] SHA 
  [9e88b42a] Serialization 
  [1a1011a3] SharedArrays 
  [6462fe0b] Sockets 
  [2f01184e] SparseArrays 
  [10745b16] Statistics 
  [4607b0f0] SuiteSparse 
  [8dfed614] Test 
  [cf7118a7] UUIDs 
  [4ec0a83e] Unicode

Yes, that does look better :blush:

VectorizationBase v0.12.13 has been released. It should solve the problem. There was a bug in this check:

that is now fixed.

That said, the Julia nightly should have better on those still solvers, as LLVM should be able to use AVX512 instructions there.

Thanks very much for the assistance; after upgrading VectorizationBase it runs fine now.

It does work just fine on this build 1.6.0-DEV.306, without the VectorizationBase package. @Elrod mentions possible performance increases on the nightly build as well; I’m not sure how to check this, but if there’s something you want me to run just send me some code. Otherwise I’ll go ahead and remove this build, since 1.4 has been sorted.

In both Julia 1.4 and 1.6, you could try:

using Pkg
Pkg.add(["RecursiveFactorization", "VegaLite", "DataFrames", "BenchmarkTools"])
using RecursiveFactorization
include(joinpath(pkgdir(RecursiveFactorization), "perf", "lu.jl"))

and share your results here.

You could also try benchmarking the particular problem you were solving between versions. Solving that problem uses RecursiveFactorization.jl.

Something else simple you can try to compare versions:

using BenchmarkTools
function mysum(x)
    s = zero(eltype(x))
    @simd for xᵢ ∈ x
        s += xᵢ
    end
    s
end
x = rand(128);
@btime mysum($x)

For Example 3 from that tutorial page I linked on the first page, it appears 1.4.2 runs significantly faster:

julia> @btime solve(prob);
  101.799 μs (3991 allocations: 112.48 KiB)

julia> @btime solve(prob);
  102.888 μs (3991 allocations: 112.48 KiB)

julia> versioninfo()
Julia Version 1.4.2
julia> @btime solve(prob);
  660.923 μs (3989 allocations: 113.94 KiB)

julia> @btime solve(prob);
  658.327 μs (3989 allocations: 113.94 KiB)

julia> versioninfo()
Julia Version 1.6.0-DEV.306

And for your sum code:

julia> @btime mysum($x)
  6.459 ns (0 allocations: 0 bytes)
59.37744793932639

julia> versioninfo()
Julia Version 1.4.2
julia> @btime mysum($x)
  6.332 ns (0 allocations: 0 bytes)
66.60390799012274

julia> versioninfo()
Julia Version 1.6.0-DEV.306

What if you do @btime solve($prob)? It’s kind of weird that Julia 1.6 is much slower.

I don’t remember what I imported last time, but here I’m only using DifferentialEquations and BenchmarkTools. I ran updates just prior, so both 1.4 and 1.6 are running VectorizationBase 0.12.16.

julia> @btime solve($prob);
  104.578 μs (3991 allocations: 112.48 KiB)

julia> versioninfo()
Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, goldmont)

And 1.6 is still quite a bit slower:

julia> @btime solve($prob);
  716.312 μs (3989 allocations: 113.94 KiB)

julia> versioninfo()
Julia Version 1.6.0-DEV.306
Commit 59b8dde7c1 (2020-06-26 09:21 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, icelake-client)

Interesting. Mind showing the @code_native debuginfo=:none and/or the @code_llvm debuginfo=:none of both?
The fact that sum didn’t improve suggests it is still just using ymm registers on Julia 1.6.

Then, mind also showing this for both versions:
VectorizationBase.REGISTER_COUNT and VectorizationBase.REGISTER_SIZE?
VectorizationBase.REGISTER_SIZE is now obviously in agreement with LLVM (if it isn’t, you’ll get crashes), but I’m worried about REGISTER_COUNT. Depending on those results, I’ll have something else for you to run to see if that is wrong.
If it is wrong, code will run, but much more slowly.

So that is a possible explanation for the performance degradation, which is why I want to look into it.

Sorry, I meant @code_native debuginfo=:none mysum(x)

Sorry about that.

julia> versioninfo(); @code_native debuginfo=:none mysum(x)
Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, goldmont)
	.section	__TEXT,__text,regular,pure_instructions
	movq	8(%rdi), %rax
	testq	%rax, %rax
	jle	L26
	movq	(%rdi), %rcx
	cmpq	$16, %rax
	jae	L31
	vxorpd	%xmm0, %xmm0, %xmm0
	xorl	%edx, %edx
	jmp	L130
L26:
	vxorps	%xmm0, %xmm0, %xmm0
	retq
L31:
	movq	%rax, %rdx
	leaq	96(%rcx), %rsi
	vxorpd	%xmm0, %xmm0, %xmm0
	vxorpd	%xmm1, %xmm1, %xmm1
	vxorpd	%xmm2, %xmm2, %xmm2
	vxorpd	%xmm3, %xmm3, %xmm3
	andq	$-16, %rdx
	movq	%rdx, %rdi
	nopl	(%rax)
L64:
	vaddpd	-96(%rsi), %ymm0, %ymm0
	vaddpd	-64(%rsi), %ymm1, %ymm1
	vaddpd	-32(%rsi), %ymm2, %ymm2
	vaddpd	(%rsi), %ymm3, %ymm3
	subq	$-128, %rsi
	addq	$-16, %rdi
	jne	L64
	vaddpd	%ymm0, %ymm1, %ymm0
	cmpq	%rdx, %rax
	vaddpd	%ymm0, %ymm2, %ymm0
	vaddpd	%ymm0, %ymm3, %ymm0
	vextractf128	$1, %ymm0, %xmm1
	vaddpd	%ymm1, %ymm0, %ymm0
	vpermilpd	$1, %xmm0, %xmm1 ## xmm1 = xmm0[1,0]
	vaddpd	%xmm1, %xmm0, %xmm0
	je	L158
L130:
	subq	%rdx, %rax
	leaq	(%rcx,%rdx,8), %rcx
	nopl	(%rax)
L144:
	vaddsd	(%rcx), %xmm0, %xmm0
	addq	$8, %rcx
	addq	$-1, %rax
	jne	L144
L158:
	vzeroupper
	retq
	nopw	%cs:(%rax,%rax)
	nopl	(%rax)
julia> versioninfo(); @code_native debuginfo=:none mysum(x)
Julia Version 1.6.0-DEV.306
Commit 59b8dde7c1 (2020-06-26 09:21 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, icelake-client)
	.section	__TEXT,__text,regular,pure_instructions
	movq	8(%rdi), %rax
	testq	%rax, %rax
	jle	L29
	movq	(%rdi), %rcx
	cmpq	$32, %rax
	jae	L34
	vxorpd	%xmm0, %xmm0, %xmm0
	xorl	%edx, %edx
	jmp	L160
L29:
	vxorps	%xmm0, %xmm0, %xmm0
	retq
L34:
	movl	%eax, %esi
	andl	$31, %esi
	movq	%rax, %rdx
	subq	%rsi, %rdx
	vxorpd	%xmm0, %xmm0, %xmm0
	xorl	%esi, %esi
	vxorpd	%xmm1, %xmm1, %xmm1
	vxorpd	%xmm2, %xmm2, %xmm2
	vxorpd	%xmm3, %xmm3, %xmm3
	nop
L64:
	vaddpd	(%rcx,%rsi,8), %zmm0, %zmm0
	vaddpd	64(%rcx,%rsi,8), %zmm1, %zmm1
	vaddpd	128(%rcx,%rsi,8), %zmm2, %zmm2
	vaddpd	192(%rcx,%rsi,8), %zmm3, %zmm3
	addq	$32, %rsi
	cmpq	%rsi, %rdx
	jne	L64
	vaddpd	%zmm0, %zmm1, %zmm0
	vaddpd	%zmm0, %zmm2, %zmm0
	vaddpd	%zmm0, %zmm3, %zmm0
	vextractf64x4	$1, %zmm0, %ymm1
	vaddpd	%zmm1, %zmm0, %zmm0
	vextractf128	$1, %ymm0, %xmm1
	vaddpd	%xmm1, %xmm0, %xmm0
	vpermilpd	$1, %xmm0, %xmm1 ## xmm1 = xmm0[1,0]
	vaddsd	%xmm1, %xmm0, %xmm0
	cmpq	%rdx, %rax
	je	L173
L160:
	vaddsd	(%rcx,%rdx,8), %xmm0, %xmm0
	incq	%rdx
	cmpq	%rdx, %rax
	jne	L160
L173:
	vzeroupper
	retq
	nopw	%cs:(%rax,%rax)
	nopl	(%rax,%rax)
julia> versioninfo(); VectorizationBase.REGISTER_COUNT
Julia Version 1.4.2
...
16
julia> versioninfo(); VectorizationBase.REGISTER_COUNT
Julia Version 1.6.0-DEV.306
...
16

Just to note the following happened while I was getting 1.6 ready to run this:

julia> import VectorizationBase
ERROR: ArgumentError: Package VectorizationBase not found in current path:
- Run `import Pkg; Pkg.add("VectorizationBase")` to install the VectorizationBase package.

Stacktrace:
 [1] require(::Module, ::Symbol) at ./loading.jl:893

(@v1.6) pkg> add VectorizationBase
  Resolving package versions...
Updating `~/.julia/environments/v1.6/Project.toml`
  [3d5dd08c] + VectorizationBase v0.12.16
No Changes to `~/.julia/environments/v1.6/Manifest.toml`

VectorizationBase was definitely on board earlier this evening when I ran the update. Might not be relevant, but I thought I’d mention it.

(@v1.6) pkg> update
   Updating registry at `~/.julia/registries/General`
   Updating git-repo `https://github.com/JuliaRegistries/General.git`
Updating `~/.julia/environments/v1.6/Project.toml`
  [a93c6f00] ↑ DataFrames v0.21.3 ⇒ v0.21.4
  [91a5bcdd] ↑ Plots v1.4.4 ⇒ v1.5.0
Updating `~/.julia/environments/v1.6/Manifest.toml`
  [aae01518] ↑ BandedMatrices v0.15.13 ⇒ v0.15.14
  [3da002f7] ↑ ColorTypes v0.10.4 ⇒ v0.10.5
  [5ae59095] ↑ Colors v0.12.2 ⇒ v0.12.3
  [bbf7d656] ↑ CommonSubexpressions v0.2.0 ⇒ v0.3.0
  [a93c6f00] ↑ DataFrames v0.21.3 ⇒ v0.21.4
  [f6369f11] ↑ ForwardDiff v0.10.10 ⇒ v0.10.11
  [5c1252a2] ↑ GeometryBasics v0.2.13 ⇒ v0.2.14
  [bdcacae8] ↑ LoopVectorization v0.8.7 ⇒ v0.8.8
  [961ee093] ↑ ModelingToolkit v3.10.2 ⇒ v3.11.0
  [91a5bcdd] ↑ Plots v1.4.4 ⇒ v1.5.0
  [01d81517] ↑ RecipesPipeline v0.1.10 ⇒ v0.1.11
  [21efa798] ↑ SIMDPirates v0.8.10 ⇒ v0.8.13
  [d1185830] ↑ SymbolicUtils v0.3.4 ⇒ v0.4.2
  [3d5dd08c] ↑ VectorizationBase v0.12.13 ⇒ v0.12.16

Thanks. I think I’ll have to change how VectorizationBase builds and defines flags, so that it can be 16 on 1.4.2 and 32 with Julia 1.6.
That said, 16 on both should work and at least shouldn’t hurt performance of 1.6 relative to 1.4.

So I believe the issue is something else.
Mind providing example code I can run to test the solve regression and see if I can reproduce it locally?

No problem; I have been running “Example 3” from this demo: https://docs.sciml.ai/v6.12/tutorials/ode_example/#Example-3:-Solving-Nonhomogeneous-Equations-using-Parameterized-Functions-1

using DifferentialEquations

l = 1.0; m = 1.0; g = 9.81

function pendulum!(du,u,p,t)
    du[1] = u[2]                    # θ'(t) = ω(t)
    du[2] = -3g/(2l)*sin(u[1]) + 3/(m*l^2)*p(t) # ω'(t) = -3g/(2l) sin θ(t) + 3/(ml^2)M(t)
end

u₀ = [0.01, 0.0]                       # initial state vector
tspan = (0.0,10.0)                  # time interval

M = t->0.1sin(t)                    # external torque [Nm]

prob = ODEProblem(pendulum!,u₀,tspan,M)
sol = solve(prob)