I have a simple volume rendering loop coded in julia. It uses fixed size 3d vector classes based around the StaticArrays package, derive from FieldVector:
struct Vector3D{T} <: FieldVector{3, T}
x::T
y::T
z::T
end
const v3f=Vector3D{Float32}
This keeps them on the stack generally, and i have verified that there is no memory allocation is going on in my functions that use them:
function TraceRay( vPos::Vector3D, vDir::Vector3D )::v3f
local vCurPos = vPos;
local vStep = vDir * 0.02f0;
local vColor = v3f( 0f0, 0f0, 0f0 )
local flOpacity::Float32 = 1.0
for i in 0:50
local density = 0.1 * Density( vCurPos )
local lighting = Lighting( vCurPos )
vColor += flOpacity * density * lighting
flOpacity *= (1f0 - density)
vCurPos += vStep
end
return vColor
end
@time begin
local acc::v3f = v3f( 0, 0, 0 )
for i in 1:1000
acc += TraceRay( v3f( i, 0 ,0 ), v3f( 0, 0, 1 ) )
end
end
No allocations reported!!
So, I use the TraceArray function to fill an array:
function Render()
local myimage = Matrix{v3f}(undef, 32, 32 )
for y in 1:size( myimage, 2 )
for x in 1:size( myimage, 1 )
local flX::Float32 = -.5f0 + x / size( myimage, 1 )
local flY::Float32 = -.5f0 + y / size( myimage, 2 )
myimage[x,y] = TraceRay( v3f(flX, flY, 0 ), v3f( 0, 0, 1) )
end
end
return reinterpret( RGB{Float32}, myimage )
end
@time Render();
1 Allocation is reported - the init of myimage
I then thread the outer loop:
function RenderThreaded()
local myimage = Matrix{v3f}(undef, 32, 32 )
Threads.@threads for y in 1:size( myimage, 2 )
for x in 1:size( myimage, 1 )
local flX::Float32 = -.5f0 + x / size( myimage, 1 )
local flY::Float32 = -.5f0 + y / size( myimage, 2 )
myimage[x,y] = TraceRay( v3f(flX, flY, 0 ), v3f( 0, 0, 1) )
end
end
return reinterpret( RGB{Float32}, myimage )
end
@time RenderThreaded();
Suddenly it’s doing 120K allocations instead of 1!!!
0.044553 seconds (120.35 k allocations: 7.423 MiB, 0.01% compilation time)
The number of mysterious allocations, and their total size are not affected by the image size or the number of loop iterations. This happens on every run, so it’s not some startup allocation for the first time something threaded is done.
Any idea what’s going on? Note that this system has 24 threads as set by the environment variable.
Thanks for any help