I wanted to make updated SLEEF wrappers, but I am getting lots of segfaults.
https://github.com/chriselrod/SLEEFwrap.jl
A few months ago, when I experimented with using NTuple{N,Core.VecElement{T}}
as a data type, I ran into lots of random segfaults. I don’t recall all the details, but I do recall unsafe_load
ing them, and that the segfaults went away when I replaced it with NTuple{N,T}
.
However, I haven’t run into segfaults with SIMD.jl, and it would be nice for SLEEF and SIMD.jl to play well together.
I cannot use regular NTuple{N,T}
with SLEEF, because then I get garbage answers (looks like uninitialized memory).
NTuple{N,Core.VecElement{T}}
mostly seems to work. SIMD.jl’s Vec
type is also just a wrapper for this, yet benchmarking any use of SIMD’s Vec type combined with SLEEF throws instant segfaults.
julia> using SLEEFwrap
julia> using SIMD, BenchmarkTools
julia> const Vec8 = Vec{8,Float64}
Vec{8,Float64}
julia> const V8 = NTuple{8,Core.VecElement{Float64}}
NTuple{8,VecElement{Float64}}
julia> xv = ntuple(i -> Core.VecElement(rand()), Val(8))
(VecElement{Float64}(0.8583526847040217), VecElement{Float64}(0.2687536152757337), VecElement{Float64}(0.8884708219081905), VecElement{Float64}(0.7902480329753863), VecElement{Float64}(0.34023049622713186), VecElement{Float64}(0.5061351730568031), VecElement{Float64}(0.011915278521906103), VecElement{Float64}(0.8420847167831675))
julia> # xvec = Vec8(ntuple(i -> rand(), Val(8)))
xvec = Vec8(xv)
8-element Vec{8,Float64}:
Float64⟨0.8583526847040217, 0.2687536152757337, 0.8884708219081905, 0.7902480329753863, 0.34023049622713186, 0.5061351730568031, 0.011915278521906103, 0.8420847167831675⟩
julia> SLEEFwrap.sincos(xv)
((VecElement{Float64}(0.7567667649017668), VecElement{Float64}(0.26553000049665487), VecElement{Float64}(0.7761083562736459), VecElement{Float64}(0.7105278274130774), VecElement{Float64}(0.3337043846735693), VecElement{Float64}(0.48480060287253957), VecElement{Float64}(0.011914996580888924), VecElement{Float64}(0.7460329717928119)), (VecElement{Float64}(0.6536849880027183), VecElement{Float64}(0.9641025976711434), VecElement{Float64}(0.6305995712987914), VecElement{Float64}(0.7036691029679022), VecElement{Float64}(0.9426777729689156), VecElement{Float64}(0.8746247054905448), VecElement{Float64}(0.9999290139087261), VecElement{Float64}(0.6659090065451776)))
julia> SLEEFwrap.sincos(xvec)
(Float64⟨0.7567667649017668, 0.26553000049665487, 0.7761083562736459, 0.7105278274130774, 0.3337043846735693, 0.48480060287253957, 0.011914996580888924, 0.7460329717928119⟩, Float64⟨0.6536849880027183, 0.9641025976711434, 0.6305995712987914, 0.7036691029679022, 0.9426777729689156, 0.8746247054905448, 0.9999290139087261, 0.6659090065451776⟩)
julia> SLEEFwrap.log(xv)
(VecElement{Float64}(-0.15274020952031248), VecElement{Float64}(-1.3139602474419678), VecElement{Float64}(-0.11825347164732995), VecElement{Float64}(-0.23540841700638693), VecElement{Float64}(-1.0781319609829232), VecElement{Float64}(-0.6809515049418576), VecElement{Float64}(-4.429933792967661), VecElement{Float64}(-0.17187465604430383))
julia> SLEEFwrap.log(xvec)
8-element Vec{8,Float64}:
Float64⟨-0.15274020952031248, -1.3139602474419678, -0.11825347164732995, -0.23540841700638693, -1.0781319609829232, -0.6809515049418576, -4.429933792967661, -0.17187465604430383⟩
julia> SLEEFwrap.exp(xv)
(VecElement{Float64}(2.359271027581049), VecElement{Float64}(1.3083327481283562), VecElement{Float64}(2.4314087500618013), VecElement{Float64}(2.2039430090099676), VecElement{Float64}(1.4052714630067908), VecElement{Float64}(1.6588675537938549), VecElement{Float64}(1.011986548237915), VecElement{Float64}(2.321200982909682))
julia> SLEEFwrap.exp(xvec)
8-element Vec{8,Float64}:
Float64⟨2.359271027581049, 1.3083327481283562, 2.4314087500618013, 2.2039430090099676, 1.4052714630067908, 1.6588675537938549, 1.011986548237915, 2.321200982909682⟩
julia> @btime SLEEFwrap.log($xv)
10.203 ns (-4628700970558799018 allocations: 0 bytes)
(VecElement{Float64}(-0.15274020952031248), VecElement{Float64}(-1.3139602474419678), VecElement{Float64}(-0.11825347164732995), VecElement{Float64}(-0.23540841700638693), VecElement{Float64}(-1.0781319609829232), VecElement{Float64}(-0.6809515049418576), VecElement{Float64}(-4.429933792967661), VecElement{Float64}(-0.17187465604430383))
julia> @btime SLEEFwrap.log($xvec)
signal (11): Segmentation fault
in expression starting at no file:0
#_run#8 at /home/chriselrod/.julia/packages/BenchmarkTools/dtwnm/src/execution.jl:336
unknown function (ip: 0x7f3ca00b30bb)
jl_fptr_trampoline at /home/chriselrod/Documents/languages/julia-ob/src/gf.c:1843
jl_apply_generic at /home/chriselrod/Documents/languages/julia-ob/src/gf.c:2198
inner at ./none:0
I’ve tried:
- Letting ccall use SIMD.jl’s convert method to convert the
Vec{N,T}
into a NTuple{N,Core.VecElement{T}}
.
- Extracting the field from the struct, ie
a.elts
where a is an instance of Vec{N,T}
.
- Telling ccall I was passing it a
Vec{N,T}
.
but these all led to segfaults.
For the sincos
method, which works from the REPL, both using either the NTuple{N,Core.VecElement{T}}
or Vec{N,T}
result in segfaults when trying to benchmark.
Anyone have an idea what is going on, and how I may be able to address this?
I also don’t understand enough about Julia or BenchmarkTools internals to interpret the segfault stack trace. The line #_run#8 at /home/chriselrod/.julia/packages/BenchmarkTools/dtwnm/src/execution.jl:336
sometimes says #_run#7
or #_run#9
.
Pretty fast though:
julia> @btime SLEEFwrap.log($xv)
10.205 ns (-4623537962973049998 allocations: 0 bytes)
(VecElement{Float64}(-0.34208491394127016), VecElement{Float64}(-0.1277428509493214), VecElement{Float64}(-0.19021957729279487), VecElement{Float64}(-0.8016561211097802), VecElement{Float64}(-2.067217212156899), VecElement{Float64}(-0.1172006596374793), VecElement{Float64}(-2.668476338119487), VecElement{Float64}(-0.5178181325967083))
julia> @btime SLEEFwrap.log_fast($xv)
5.435 ns (-4623537962973049999 allocations: 0 bytes)
(VecElement{Float64}(-0.3420849139412701), VecElement{Float64}(-0.12774285094932142), VecElement{Float64}(-0.19021957729279487), VecElement{Float64}(-0.8016561211097802), VecElement{Float64}(-2.0672172121568986), VecElement{Float64}(-0.1172006596374793), VecElement{Float64}(-2.6684763381194867), VecElement{Float64}(-0.5178181325967083))
julia> @btime SLEEFwrap.exp($xv)
7.829 ns (4611763878698450332 allocations: 0 bytes)
(VecElement{Float64}(2.0345769062548253), VecElement{Float64}(2.4110917704333157), VecElement{Float64}(2.2859405809045037), VecElement{Float64}(1.5660952771191212), VecElement{Float64}(1.1348919169060723), VecElement{Float64}(2.4336853349651544), VecElement{Float64}(1.071819661958549), VecElement{Float64}(1.8145166598251141))
julia> @btime SLEEFwrap.sin($xv)
10.824 ns (4604048382445319899 allocations: 0 bytes)
(VecElement{Float64}(0.6520520679002532), VecElement{Float64}(0.7707896329958752), VecElement{Float64}(0.7357528052669569), VecElement{Float64}(0.4336913600120062), VecElement{Float64}(0.12620000898890807), VecElement{Float64}(0.7766981851802598), VecElement{Float64}(0.06930222836378491), VecElement{Float64}(0.5611869269352823))
julia> @btime SLEEFwrap.sin_fast($xv)
7.702 ns (4604048382445319899 allocations: 0 bytes)
(VecElement{Float64}(0.6520520679002532), VecElement{Float64}(0.7707896329958752), VecElement{Float64}(0.7357528052669569), VecElement{Float64}(0.4336913600120062), VecElement{Float64}(0.12620000898890807), VecElement{Float64}(0.7766981851802598), VecElement{Float64}(0.06930222836378491), VecElement{Float64}(0.5611869269352823))
julia> @btime SLEEFwrap.cos($xv)
12.725 ns (4605004245534695369 allocations: 0 bytes)
(VecElement{Float64}(0.7581741889216512), VecElement{Float64}(0.6370897438085814), VecElement{Float64}(0.6772501823859506), VecElement{Float64}(0.9010614874973497), VecElement{Float64}(0.9920048173931413), VecElement{Float64}(0.629872946821572), VecElement{Float64}(0.9975957102673476), VecElement{Float64}(0.8276890920127764))
julia> @btime SLEEFwrap.cos_fast($xv)
7.756 ns (4605004245534695369 allocations: 0 bytes)
(VecElement{Float64}(0.7581741889216512), VecElement{Float64}(0.6370897438085814), VecElement{Float64}(0.6772501823859507), VecElement{Float64}(0.9010614874973497), VecElement{Float64}(0.9920048173931413), VecElement{Float64}(0.629872946821572), VecElement{Float64}(0.9975957102673476), VecElement{Float64}(0.8276890920127764))
julia> @btime SLEEFwrap.tan($xv)
17.598 ns (4605921675420273776 allocations: 0 bytes)
(VecElement{Float64}(0.8600293671664883), VecElement{Float64}(1.2098603697306811), VecElement{Float64}(1.086382587118547), VecElement{Float64}(0.481311615277844), VecElement{Float64}(0.12721713319955963), VecElement{Float64}(1.2331029441724537), VecElement{Float64}(0.06946925257448479), VecElement{Float64}(0.6780165793541951))
julia> @btime SLEEFwrap.tan_fast($xv)
12.264 ns (4605921675420273776 allocations: 0 bytes)
(VecElement{Float64}(0.8600293671664883), VecElement{Float64}(1.2098603697306811), VecElement{Float64}(1.0863825871185468), VecElement{Float64}(0.481311615277844), VecElement{Float64}(0.1272171331995596), VecElement{Float64}(1.2331029441724537), VecElement{Float64}(0.06946925257448477), VecElement{Float64}(0.6780165793541951))
julia> @btime SLEEFwrap.erf($xv)
54.018 ns (4604343905645784797 allocations: 0 bytes)
(VecElement{Float64}(0.6848617340469577), VecElement{Float64}(0.7867287531423094), VecElement{Float64}(0.7576930731379372), VecElement{Float64}(0.47417732523365935), VecElement{Float64}(0.14202377140324365), VecElement{Float64}(0.7915398934983942), VecElement{Float64}(0.07813660993221432), VecElement{Float64}(0.6005564766788369))
julia> @btime SLEEFwrap.erfc($xv)
68.593 ns (4599348646226371142 allocations: 0 bytes)
(VecElement{Float64}(0.3151382659530423), VecElement{Float64}(0.21327124685769064), VecElement{Float64}(0.24230692686206282), VecElement{Float64}(0.5258226747663407), VecElement{Float64}(0.8579762285967564), VecElement{Float64}(0.2084601065016058), VecElement{Float64}(0.9218633900677857), VecElement{Float64}(0.399443523321163))
julia> @btime SLEEFwrap.gamma($xv)
120.838 ns (4608452683576307722 allocations: 0 bytes)
(VecElement{Float64}(1.282055440401566), VecElement{Float64}(1.0852391465483837), VecElement{Float64}(1.1354506736208367), VecElement{Float64}(1.9743762331544568), VecElement{Float64}(7.437973193493231), VecElement{Float64}(1.0773258578660838), VecElement{Float64}(13.905308301696595), VecElement{Float64}(1.4988631037761113))
julia> @btime SLEEFwrap.lgamma($xv)
128.714 ns (4598119901031446912 allocations: 0 bytes)
(VecElement{Float64}(0.248464602806461), VecElement{Float64}(0.08180037427069305), VecElement{Float64}(0.12702964139168493), VecElement{Float64}(0.6802525171536727), VecElement{Float64}(2.006598391502465), VecElement{Float64}(0.07448191310473816), VecElement{Float64}(2.6322706594427645), VecElement{Float64}(0.4047068899174618))
julia> @btime SLEEFwrap.sqrt($xv)
6.018 ns (4605766359090098911 allocations: 0 bytes)
(VecElement{Float64}(0.8427857905804464), VecElement{Float64}(0.9381256108319143), VecElement{Float64}(0.9092731011250698), VecElement{Float64}(0.6697652101969048), VecElement{Float64}(0.35572098476673925), VecElement{Float64}(0.9430836162446596), VecElement{Float64}(0.2633587338823797), VecElement{Float64}(0.7718932109593795))
julia> @btime SLEEFwrap.sqrt_fast($xv)
13.171 ns (4606226399220695390 allocations: 0 bytes)
(VecElement{Float64}(0.8938605051044239), VecElement{Float64}(0.6748056558334341), VecElement{Float64}(0.8640737421502825), VecElement{Float64}(0.7593243813245429), VecElement{Float64}(0.6839764178162592), VecElement{Float64}(0.9716645016644863), VecElement{Float64}(0.41321471619861017), VecElement{Float64}(0.20800569859917845))
julia> @btime SLEEFwrap.cbrt($xv)
26.468 ns (4606211737266519962 allocations: 0 bytes)
(VecElement{Float64}(0.892232701193268), VecElement{Float64}(0.958312886649549), VecElement{Float64}(0.9385618465517778), VecElement{Float64}(0.7655056316877106), VecElement{Float64}(0.5020415448485336), VecElement{Float64}(0.9616863831147828), VecElement{Float64}(0.41086437256436426), VecElement{Float64}(0.8414690511688571))
julia> @btime SLEEFwrap.cbrt_fast($xv)
13.819 ns (4606211737266519962 allocations: 0 bytes)
(VecElement{Float64}(0.892232701193268), VecElement{Float64}(0.958312886649549), VecElement{Float64}(0.9385618465517778), VecElement{Float64}(0.7655056316877107), VecElement{Float64}(0.5020415448485336), VecElement{Float64}(0.9616863831147828), VecElement{Float64}(0.41086437256436426), VecElement{Float64}(0.8414690511688571))
I’m also not inclined to believe the number of allocations. =/