I had not heard of JET.jl. I’ll check it out.
There is also a lot of good information in this thread. It helps me feel out if what I am seeing is expected or not.
So what is the correct way to allocate the array from C, pass the pointer to Julia, and copy the pre-existing Ref’d Julia array to the memory defined for the C-pointer. I thought it was an unsafe_wrap. But the only reason I do that is to manifest the copy! from one julia array to the C-array now in julia. If it is causing an allocation, I don’t want that.
One Julia 1.10 you could use Libc.memcpy
directly.
arr = Array{Int}(undef, 1024)
Libc.memcpy(arr, cptr, sizeof(arr))
function memcpy(dst::Ptr, src::Ptr, n::Integer)
ccall(:memcpy, Ptr{Cvoid}, (Ptr{Cvoid}, Ptr{Cvoid}, Csize_t), dst, src, n)
end
on code instability. Does using @error macros to annotate to the screen a problem before returning an error code to the calling function cause any kind of type instability or allocations even though that path is not chosen?
Also, I have notice that on my home-brewed quaternions and xyz-vector classes, my normalization calls tend to allocate now even though it is a immuatable composite type. I do a divide by zero protection in there. I thought I protected for type instability but I may not have. On one of them, I think I will just throw a domain error. Could the option to throw a domain error instead of returning a normalized quaternion, cause type instability?
function normalize(q::quaternion{T}) where T<:QUATTYPE
n = norm(q)
n > eps(T) ? quaternion{T}(q.s / n, q.v / n) : throw(DomainError(q,"Normalizing Null Quaternion"))
end
norm(q::quaternion) = sqrt(q.s^2 + q.v.x^2 + q.v.y^2 + q.v.z^2)
Also, I recently expanded QUATTYPE to be const QUATTYPE=Number
to allow it to work with Symbolics. Right now it is being defined as a Float64.
I didn’t think the normalize calls used to allocate when I first designed the class. But as I’ve been hacking at it for symbolics and other things, I may have introduced an error.
I think I also noticed other strange things, like accessing my vector as v.x was more expensive to @allocated on the repel than v[1] was. v.x = 64 bytes of allocation, and v[1] is 16. The v.x, calls getfield according to @code_warntype. My v[1] just runs through an if-elseif-- type structure with getindex to return the getproperty(v,:x)
.
So that is also confusing to me. I need to check the jet. I think these immutable composite type manipulations causing allocations unexpectedly is part of the problem, and I wonder if I have broken something trying to make them work with Symbolics or some other patch along the ways or if this is because I’m evaluating on the REPL and inside a compiled function, it wouldn’t have allocated.
I’ll try JET.jl later today. I think I must have some instability in my xyzvector type (which is immutable struct) caused by my recent chages to optimize it and make it work for symbolics better. In the past, it used to be a “Real” parent type, but now it is a “Number”.
abstract type AbstractVec{T<:Number} end
" xyzVecT is the object definition to contain the cartesian information "
struct xyzVecT{T} <:AbstractVec{T}
x::T
y::T
z::T
end
# Consturctors
(::Type{xyzVecT{T}})(p::Tuple{Number,Number,Number}) where {T<:Number} = xyzVecT{T}(p...)
(::Type{xyzVecT{T}})() where {T<:Number} = xyzVecT{T}(convert(T,0),convert(T,0),convert(T,0))
(::Type{xyzVecT{T}})(v::AbstractArray{Any,1}) where{T<:Number} = length(v)==0 ? xyzVecT{T}(convert(T,0),convert(T,0),convert(T,0)) : xyzVecT{T}(convert(T,v[1]),convert(T,v[2]),convert(T,v[3]))
xVec(x::T) where {T<:Number} = xyzVecT{T}(x,zero(x),zero(x)) # To help specify unit vectors
yVec(y::T) where {T<:Number} = xyzVecT{T}(zero(y),y,zero(y)) # To help specify unit vectors
zVec(z::T) where {T<:Number} = xyzVecT{T}(zero(z),zero(z),z) # To help specify unit vectors
(::Type{xyzVecT{T}})(v::AbstractArray) where {T<:Number} = xyzVecT{T}(convert(T,v[1]),convert(T,v[2]),convert(T,v[3]))
(::Type{xyzVecT{T}})(v::xyzVecT{T}) where {T<:Number} = v
(::Type{xyzVecT{T}})(v::xyzVecT) where {T<:Number} = xyzVecT{T}(v.x,v.y,v.z)
xyzVecT(x::T,y::T,z::T) where T<:Number = xyzVecT{T}(x,y,z) # This is needed to stop the stack overflowing from
xyzVecT(x::Number,y::Number,z::Number) = xyzVecT(promote(x,y,z)...) # this.
xyzVecT() = xyzVecT{Float64}(0.0,0.0,0.0)
xyzVecT(v::AbstractArray) = xyzVecT(v[1],v[2],v[3])
[there is more]
This was one of the first things I wrote when learning Julia, so I’m sure it is full of problems still.
Yeah, I just did a test, and defining the quaternion, which uses the xyzVecT as a subclass, is not allocating when I initialize it with 4 scalars, but when I initialize it with a scalar and a V it is allocating (this is the default constructor). It may be calling the normalize.
const QUATTYPE=Number
struct quaternion{T<:QUATTYPE} <:Number
s::T
v::xyzVecT{T}
end
quaternion(s::T, v1::T, v2::T, v3::T) where T<:QUATTYPE = quaternion{T}(s, xyzVecT{T}(v1, v2, v3))
#quaternion(s::QUATTYPE, v1::QUATTYPE, v2::QUATTYPE, v3::QUATTYPE) = quaternion(promote(s,v1,v2,v3)...) # Get everything to the same type
quaternion(s::QUATTYPE, v1::QUATTYPE, v2::QUATTYPE, v3::QUATTYPE) = quaternion(promote(s,v1,v2,v3)...) # Get everything to the same type
quaternion{T}(s::QUATTYPE, v1::QUATTYPE, v2::QUATTYPE, v3::QUATTYPE) where T<:QUATTYPE = quaternion{T}(convert(T,s), xyzVecT{T}(v1, v2, v3))
quaternion() = quaternion{Float64}(1.0, xyzVecT{Float64}())
quaternion{T}() where T<:QUATTYPE = quaternion{T}(1.0, xyzVecT{T}())
quaternion(v::xyzVecT{T}) where T<:QUATTYPE = quaternion{T}(convert(T,0), v)
quaternion(s::T) where T<:QUATTYPE = quaternion{T}(s, xyzVecT{T}())
quaternion(a::AbstractArray{T,1}) where T<:QUATTYPE = quaternion{T}(a[1], xyzVecT{T}(a[2], a[3], a[4]))
Very strange, I have a lot to learn.
julia> @allocated q=quaternion{Float64}(1.0,2.0,3.0,4.0)
0
julia> @allocated q=quaternion{Float64}(1.0,v)
48
julia> @code_warntype quaternion{Float64}(1.0,v)
MethodInstance for quaternion{Float64}(::Float64, ::xyzVecT{Float64})
from (var"#ctor-self#"::Type{quaternion{T}} where T<:Number)(s, v) @ LMQuaternions C:\Users\bakerar\.julia\packages\LMQuaternions\vxMqo\src\LMQuaternions.jl:68
Arguments
#ctor-self#::Core.Const(quaternion{Float64})
s::Float64
v::xyzVecT{Float64}
Body::quaternion{Float64}
1 ─ %1 = Core.fieldtype(#ctor-self#, 1)::Core.Const(Float64)
│ %2 = Base.convert(%1, s)::Float64
│ %3 = Core.fieldtype(#ctor-self#, 2)::Core.Const(xyzVecT{Float64})
│ %4 = Base.convert(%3, v)::xyzVecT{Float64}
│ %5 = %new(#ctor-self#, %2, %4)::quaternion{Float64}
└── return %5
julia> @code_warntype quaternion{Float64}(1.0,2.0,3.0,4.0)
MethodInstance for quaternion{Float64}(::Float64, ::Float64, ::Float64, ::Float64)
from quaternion{T}(s::Number, v1::Number, v2::Number, v3::Number) where T<:Number @ LMQuaternions C:\Users\bakerar\.julia\packages\LMQuaternions\vxMqo\src\LMQuaternions.jl:75
Static Parameters
T = Float64
Arguments
#self#::Core.Const(quaternion{Float64})
s::Float64
v1::Float64
v2::Float64
v3::Float64
Body::quaternion{Float64}
1 ─ %1 = Core.apply_type(LMQuaternions.quaternion, $(Expr(:static_parameter, 1)))::Core.Const(quaternion{Float64})
│ %2 = LMQuaternions.convert($(Expr(:static_parameter, 1)), s)::Float64
│ %3 = Core.apply_type(LMQuaternions.xyzVecT, $(Expr(:static_parameter, 1)))::Core.Const(xyzVecT{Float64})
│ %4 = (%3)(v1, v2, v3)::xyzVecT{Float64}
│ %5 = (%1)(%2, %4)::quaternion{Float64}
└── return %5
AFAIK printing does cause run time dispatch, at least like it’s implemented in the standard library. If the path isn’t chosen, there will be no run-time dispatch, so it’s not problematic from a performance perspective, though. You’ll probably get extra noise from JET because of the prints, because JET does static analysis, so it doesn’t care whether the prints are actually executed.
Regarding the rest of your message, try creating a self-contained example. For example, you (rightfully) complain that quaternion{Float64}(1.0,v)
allocates, but then you didn’t provide the source for that constructor, as far as I see.
Are you aware of StatiArrays.jl ? It implements all of that.
I don’t think it was a thing when I started that vector type, but I was trying to do something equivalent by staying in static memory. I also know I only need 3 and 2 element vectors for most of the stuff. I’ve looked at swapping it out before, but it always seemed to allocate more or run slower than what I had every time I messed with it. Maybe one-day I’ll switch it out under the hood when I’ve had a chance to use it more. Also, not relying on external packages that could potentially change a lot and that might also have licenses and other legal things I need to be concerned with can be beneficial.
Yeah, I think it must be the default constructor.
It looks like try-catch block dynamically dispatches according to Jet as well as the logging. I removed all the logging to make sure. Otherwise I don’t see any dynamic dispatches. I think I’ll take out the try-catch and do the direct memory copy for one of the functions.
I tried to do this, but it is not correct. The memory get corrupted.
I am doing tests where I am calling this wholly in Julia right now, so it is a Julia generated pointer in a test function. This used to give the correct answer when I was using the unsafe_wrap which had the allocations I was trying to get rid of. The memory copy seems to start several values over in the array.
#in module
const outputRef_db_mag=Ref{Vector{Float64}}()
const outputRef_deg_phase=Ref{Vector{Float64}}()
# in setup function
# number is set by another data reader
outputRef_db_mag[]=Vector{Float64}(undef,number)
outputRef_deg_phase[]=Vector{Float64}(undef,number)
# in the calc function (where db_mag and deg_phase are separately calculated by another routine for this example)
map!(m->(voltage2db(abs(m))),outputRef_db_mag[],intermediateRef_magphase[])
map!(p->(angle_deg(p)),outputRef_deg_phase[],intermediateRef_magphase[])
function memcpy(dst::Ptr, src::Ptr, n::Integer)
ccall(:memcpy, Ptr{Cvoid}, (Ptr{Cvoid}, Ptr{Cvoid}, Csize_t), dst, src, n)
end
# The Function called from C
Base.@ccallable function julia_get_mag_phase(cmag::Ptr{Cdouble},cphase::Ptr{Cdouble},len::Csize_t)::Cint
if isCalculationValid[]
try
outlen=length(outputRef_db_mag[])
if outlen == len
memcpy(cmag, pointer_from_objref(outputRef_db_mag[]), sizeof(Cdouble)*len)
memcpy(cphase, pointer_from_objref(outputRef_deg_phase[]), sizeof(Cdouble)*len)
else
#@error "Pointer to Array of length $outlen is required! Function called with len=$len."
return Cint(1)
end
catch err
#@error "Problem copying arrays from Julia to C: $(err.msg)"
Base.invokelatest(Base.display_error, Base.catch_stack())
return Cint(2)
end
return Cint(0)
else
#@error "Results need recalculating first."
return Cint(3)
end
end
pointer_from_objref
gives you the pointer to the Array object not the data itself. What you want is pointer
You could also not strongly type the memcpy and have the ccall do the conversion for you correctly.
This appears to work with zero allocations.
Base.@ccallable function julia_get_mag_phase(cmag::Ptr{Cdouble},cphase::Ptr{Cdouble},len::Csize_t)::Cint
if isCalculationValid[]
try
outlen=length(outputRef_db_mag[])
if outlen == len
for ii=1:outlen
unsafe_store!(cmag, outputRef_db_mag[][ii],ii)
unsafe_store!(cphase, outputRef_deg_phase[][ii],ii)
end
else
#@error "Pointer to Array of length $outlen is required! Function called with len=$len."
return Cint(1)
end
catch err
#@error "Problem copying arrays from Julia to C: $(err.msg)"
Base.invokelatest(Base.display_error, Base.catch_stack())
return Cint(2)
end
return Cint(0)
else
#@error "Results need recalculating first."
return Cint(3)
end
end
Hopefully this is an ok solution.
I think you could simplify to:
@noinline function report_error(outlen, len)
@error "Pointer to Array of length $outlen is required! Function called with len=$len."
end
@noinline function report_recalculating()
@error "Results need recalculating first."
end
Base.@ccallable function julia_get_mag_phase(cmag::Ptr{Cdouble},cphase::Ptr{Cdouble},len::Csize_t)::Cint
if isCalculationValid[]
db_mag = outputRef_db_mag[]
deg_phase = outputRef_deg_phase[]
outlen = db_mag
if outlen != len
report_error(outlen, len)
return Cint(1)
end
@inbounds for ii=1:outlen
unsafe_store!(cmag, db_mag[ii], ii)
unsafe_store!(cphase, deg_phase[ii], ii)
end
return Cint(0)
else
report_recalculating()
return Cint(3)
end
end
Alternativly. If you are not doing double buffering or something in the C side.
You could “just” Base.unsafe_wrap(Array, cmag; own=false)
and use that Array for outputRef_db_mag
.
Then your ccallable collapses down to isCalculationValid
and you don’t do the data movement.
This has some really cool insight in it. Thanks!
I need think a bit on this. I’m not sure I understand it, but what I have is not allocating at this point… there… it goes back to the data access I originally proposed.
Ok. State of where I am.
I got rid of all allocations except for the original function call in the original message that accessed the 4D array from the struct. I tried making that struct mutable and it did not matter.
I reworked my code to get rid of the ambiguities at the line.
The allocations are coming from accessing the obj.data array.
This is a read operation. There are no other allocations in any of the functions, and the amount of memory being allocated adds up to all of the calls to these variables.
There was some conversation earlier in this thread, but I couldn’t find the conclusion in all of the discussion.
Here is the trace for the allocations… where I have edited some of the function and variable names just to keep it generic.
0 for ii=1:number
1024 a=obj.data[ii,j0,k0,ifreq]
1024 b=obj.data[ii,j1,k0,ifreq]
1024 c=obj.data[ii,j1,k1,ifreq]
1024 d=obj.data[ii,j0,k1,ifreq]
0 result[ii]=fnc_inlined(a,
- b,
- c,
- d,
- t,u)
If you already told me why this make sense, can you tell me again?
While I ponder what is going on here… I have disabled GC right before this line and re-enabled right after. This removes all garbage collection from the @benchmark’s even for a function that calls this 1e6 times.
To make it sustainable, I guess I need to create a little C-call function that refreshes the GC at an idle time, or after the read of the data.
I’m wondering, if I made a, b, c, and d, Ref’s, would that cure all my allocations. It seems a bit hacky.
Thanks again for the tremendous support.
Try using the allocation profiler instad of --trace-allocations
Do you have a standalone MWE for that? Would make it easier to show how I analyze were allocations are coming from.
Here is my attempt at a minimally working example. It is strange. When I have done the immutable stuctures in the past, I did not think that there was any allocation and I have lots of functions it seems that do this, but when I make an example, it seems to have similar problems to what I am seeing. Now the + seems to be allocating on the heap. Anyone know where my knowledge gap is?
#allocation test
struct mystruct1{T<:Real}
a::T
b::T
c::T
end
@inline Base.:+(a::mystruct1{T},b::mystruct1{T}) where T<:Real=mystruct1{T}(a.a+b.a,a.b+b.b,a.c+b.c)
f=[mystruct1{Float64}(i,j,k) for i=1:10,j=1:10,k=1:3];
x=mystruct1{Float64}(1.0,2.0,3.0)
y=mystruct1{Float64}(4.0,5.0,6.0)
x+y
@show @allocated x+y
c=Array{mystruct1{Float64}}(undef,1)
c[1]=x+y
@show @allocated c[1]=x+y
@show @allocated c[1]=f[1,1,1]+f[1,2,3]
a=10.0
b=20.0
g=[1.0]
@show @allocated a+b
@show @allocated g[1]=a+b
@inline function dostuff(x1,x2)
x1+x2
end
dostuff(a,b)
dostuff(x,y)
dostuff(f[1,1,1],f[1,2,3])
@show @allocated dostuff(a,b)
@show @allocated dostuff(f[1,1,1],f[1,2,3])
Results of include
include("allocation_test.jl")
#= allocation_test.jl:18 =# @allocated(x + y) = 32
#= allocation_test.jl:23 =# @allocated(c[1] = x + y) = 32
#= allocation_test.jl:25 =# @allocated(c[1] = f[1, 1, 1] + f[1, 2, 3]) = 96
#= allocation_test.jl:34 =# @allocated(a + b) = 16
#= allocation_test.jl:36 =# @allocated(g[1] = a + b) = 16
#= allocation_test.jl:46 =# @allocated(dostuff(a, b)) = 16
#= allocation_test.jl:47 =# @allocated(dostuff(f[1, 1, 1], f[1, 2, 3])) = 96