Improve performance of this index creation and accessing code

Now that there is some easier code to play with, I’ve tried running run_benchmark. Once with the given method, and once with:

        # C[iex] = ext_itp(log(rr), th, ph)
        C[iex] = log(rr)+th+ph

replacing the interpolation. The conclusion: > 60% of the time is spent in the interpolation. So, no dramatic efficiency gain from rearranging stuff around that (though I did manage to get a 10% gain by fiddling a bit).

1 Like

What is use of it? It is fictitious. :joy: I will get realistic color on Interpolation.

Useful just for profiling / benchmarking. Comparing with vs. without interpolation.

1 Like

You’re still using the global variables xmin etc. in loaddata. After that’s corrected, there still seems to be type instability in indexing in e.g. x_all:

julia> @btime $x_all[1, 1]
  7.475 μs (20 allocations: 624 bytes)
1.677044f0

So it’s better to just collect it once in advance, and then work with an Array:

julia> cx_all = @btime reshape(collect($x_all), size($x_all));
  24.100 μs (21 allocations: 61.93 KiB)

As a proof of concept, when you replace the @threads for b in 1:blocks by for b in 1, the difference in allocations is already massive:

julia> @time loaddata_original(Bp, x_all, y_all, z_all, xe_all, ye_all, ze_all, MeshBlockSize, blocks); # no xmin, xmax, etc.
🧩 Allocated 190×50×170 grid.
  0.599230 seconds (14.53 M allocations: 252.602 MiB, 19.25% gc time)

julia> @time loaddata_minmax(Bp, x_all, y_all, z_all, xe_all, ye_all, ze_all, MeshBlockSize, blocks, xmin, xmax, ymin, ymax, zmin, zmax);  # typeof(x_all) == HDF5.Dataset
🧩 Allocated 190×50×170 grid.
  0.364104 seconds (3.23 M allocations: 80.101 MiB, 24.37% gc time)

julia> @time loaddata_minmax(Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all, MeshBlockSize, blocks, xmin, xmax, ymin, ymax, zmin, zmax);  # typeof(cx_all) == Matrix{Float32}
🧩 Allocated 190×50×170 grid.
  0.184638 seconds (49 allocations: 30.811 MiB, 2.06% gc time)

For the full loaddata (threaded (4C 8T), all blocks) I then get

julia> @time loaddata_original(Bp, x_all, y_all, z_all, xe_all, ye_all, ze_all, MeshBlockSize, blocks);
🧩 Allocated 190×50×170 grid.
122.128465 seconds (12.67 G allocations: 188.904 GiB, 10.82% gc time, 770 lock conflicts)

julia> @time loaddata_minmax(Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all, MeshBlockSize, blocks, xmin, xmax, ymin, ymax, zmin, zmax);
🧩 Allocated 190×50×170 grid.
 45.027201 seconds (6.17 k allocations: 35.547 MiB)

You could still reduce allocations by getting rid of the log. and exp. in

ext_itp = extrapolate(interpolate((log.(r), θ, ϕ), Bp_block, Gridded(Linear())), Interpolations.Line())
r_eval = exp.(LinRange(log(re[1]), log(re[end]), 10(MeshBlockSize[1]+1)))

by precomputing / lazily computing / using Base.LogRange / … But I imagine that fundamentally at this point really only the for ii ... loop matters (the body of which will in total be executed 1.4 billion times). @JADekker has already given some ideas for that above. If you split the for ..., ..., ... into three nested loops, you might also be able to reuse some calculations. Likely the only way to get a major performance boost (if at all possible) is to try to make the interpolation more efficient somehow.

2 Likes

I also notice speedup but there is some instability in code performance. Notice too much variation in allocation.

julia> include("Modular2.jl")
 64.548171 seconds (7.06 G allocations: 274.009 GiB, 20.45% gc time, 18.34% compilation time: 3% of which was recompilation)
GLMakie.Screen(...)

julia> include("Modular2.jl")
 38.827293 seconds (1.66 M allocations: 163.253 MiB, 0.10% gc time, 4.24% compilation time)
GLMakie.Screen(...)

julia> include("Modular2.jl")
 59.855314 seconds (7.04 G allocations: 272.950 GiB, 23.98% gc time, 5.20% compilation time)
GLMakie.Screen(...)

See the code

full code
using HDF5, GLMakie, Interpolations, Statistics, Profile, Base.Threads, BenchmarkTools
@time begin
function readhdf(h5file::String)
	h5 = h5open(h5file, "r")
	B = h5["B"]
	@views Br::Array{Float32, 4}, Bθ::Array{Float32, 4}, Bϕ::Array{Float32, 4} = B[:,:,:,:,1], B[:,:,:,:,2], B[:,:,:,:,3]
	Bp = @. hypot(Br, Bθ) / Bϕ
	x_all, y_all, z_all = h5["x1v"], h5["x2v"], h5["x3v"]
	xe_all, ye_all, ze_all = h5["x1f"], h5["x2f"], h5["x3f"]
	MeshBlockSize::Vector{Int32} = read((HDF5.attributes(h5))["MeshBlockSize"])
	blocks::Int32 = read(HDF5.attributes(h5),"NumMeshBlocks") 
	cx_all::Matrix{Float32} = reshape(collect(x_all), size(x_all))
	cy_all::Matrix{Float32} = reshape(collect(y_all), size(y_all))
	cz_all::Matrix{Float32} = reshape(collect(z_all), size(z_all))
	cxe_all::Matrix{Float32} = reshape(collect(xe_all), size(xe_all))
	cye_all::Matrix{Float32} = reshape(collect(ye_all), size(ye_all))
	cze_all::Matrix{Float32} = reshape(collect(ze_all), size(ze_all))
	return Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all, MeshBlockSize, blocks
end

Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all, MeshBlockSize, blocks = readhdf("sane00.prim.01800.athdf")

function bound(cxe_all, cye_all, cze_all, MeshBlockSize, blocks)
	XX,YY,ZZ = ntuple(_ -> zeros(Float32, blocks* prod(MeshBlockSize.+1)), 3)
	icx=1
	for b in 1:blocks
		@views r::Vector{Float32}, θ::Vector{Float32}, ϕ::Vector{Float32} = cxe_all[:, b], cye_all[:, b], cze_all[:, b]
		for rr in r, th in θ, ph in ϕ,  
			XX[icx] = rr * sin(th) * cos(ph)
			YY[icx] = rr * sin(th) * sin(ph)
			ZZ[icx] = rr * cos(th)
			icx += 1
		end
	end
	xmin,xmax = extrema(XX); ymin,ymax = extrema(YY); zmin,zmax = extrema(ZZ)
	println("  x ∈ [$xmin, $xmax], y ∈ [$ymin, $ymax], z ∈ [$zmin, $zmax]")
	return xmin, xmax, ymin, ymax, zmin, zmax
end
	
xmin, xmax, ymin, ymax, zmin, zmax = bound(cxe_all, cye_all, cze_all, MeshBlockSize, blocks)
	

function clp(X,Y,Z,C,xmin,ymin,zmin,sx,sy,sz,nx,ny,nz,field)	
	xi = trunc(Int, (X-xmin) * sx) + 1
	yi = trunc(Int, (Y-ymin) * sy) + 1
	zi = trunc(Int, (Z-zmin) * sz) + 1
	ix = clamp(xi, 1, nx)
	iy = clamp(yi, 1, ny)
	iz = clamp(zi, 1, nz)
	field[ix, iy, iz] = C
end
	
function cart(iex,ii,jj,kk,r_eval,θ_eval,ϕ_eval,ext_itp,X,Y,Z,C)
	rr, th, ph = r_eval[ii], θ_eval[jj], ϕ_eval[kk]
	xy, z = rr .* sincos(th)
    y, x = xy .* sincos(ph)
    X[iex] = x
    Y[iex] = y
    Z[iex] = z
	C[iex] = ext_itp(log(rr), th, ph)
end

function loaddata(Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all, MeshBlockSize, blocks, xmin, xmax, ymin, ymax, zmin, zmax)
	nx,ny,nz = 10(MeshBlockSize.+1)
	field = fill(NaN32, nx,ny,nz)

	sx::Float32 = (nx-1)/(xmax-xmin); sy::Float32 = (ny-1)/(ymax-ymin); sz::Float32 = (nz-1)/(zmax-zmin)
	X, Y, Z, C = ntuple(_ -> zeros(Float32, nx*ny*nz), 4)

	@threads for b in 1:blocks
		@views r, θ, ϕ = cx_all[:, b], cy_all[:, b], cz_all[:, b]
		@views re, θe, ϕe = cxe_all[:, b], cye_all[:, b], cze_all[:, b]
		@views Bp_block = Bp[:, :, :, b]
		@views size(Bp_block)
		ext_itp = extrapolate(interpolate((log.(r), θ, ϕ), Bp_block, Gridded(Linear())), Interpolations.Line())
		r_eval = exp.(LinRange(log(re[1]), log(re[end]), nx))
		θ_eval = LinRange(θe[1], θe[end], ny)
		ϕ_eval = LinRange(ϕe[1], ϕe[end], nz)
		iex::Int32 = 1
		@inbounds for ii in 1:nx, jj in 1:ny, kk in 1:nz
			cart(iex,ii,jj,kk,r_eval,θ_eval,ϕ_eval,ext_itp,X,Y,Z,C)
			clp(X[iex],Y[iex],Z[iex],C[iex],xmin,ymin,zmin,sx,sy,sz,nx,ny,nz,field)
			iex += 1
		end
	end
	return field
end

field = loaddata(Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all, MeshBlockSize, blocks, xmin, xmax, ymin, ymax, zmin, zmax)

function tdplot(xmin, xmax, ymin, ymax, zmin, zmax, field)
	finite_vals = filter(!isnan, vec(field))
    q_low, q_high = quantile(finite_vals, (0.05, 0.95))
    fig = Figure()
    ax::LScene = LScene(fig[1,1], show_axis=false)
	volume!(ax, xmin..xmax, ymin..ymax, zmin..zmax, field, transparency=true; colorrange=(q_low, q_high))
	display(fig)
end

tdplot(xmin, xmax, ymin, ymax, zmin, zmax, field)
end

@eldee Why memory allocation varies so much?

updated code
using HDF5, GLMakie, Interpolations, Statistics, Profile, Base.Threads, BenchmarkTools
@time begin
function readhdf(h5file::String)
	h5 = h5open(h5file, "r")
	B = h5["B"]
	@views Br::Array{Float32, 4}, Bθ::Array{Float32, 4}, Bϕ::Array{Float32, 4} = B[:,:,:,:,1], B[:,:,:,:,2], B[:,:,:,:,3]
	Bp = @. hypot(Br, Bθ) / Bϕ
	x_all, y_all, z_all = h5["x1v"], h5["x2v"], h5["x3v"]
	xe_all, ye_all, ze_all = h5["x1f"], h5["x2f"], h5["x3f"]
	MeshBlockSize::Vector{Int32} = read((HDF5.attributes(h5))["MeshBlockSize"])
	blocks::Int32 = read(HDF5.attributes(h5),"NumMeshBlocks") 
	cx_all::Matrix{Float32} = reshape(collect(x_all), size(x_all))
	cy_all::Matrix{Float32} = reshape(collect(y_all), size(y_all))
	cz_all::Matrix{Float32} = reshape(collect(z_all), size(z_all))
	cxe_all::Matrix{Float32} = reshape(collect(xe_all), size(xe_all))
	cye_all::Matrix{Float32} = reshape(collect(ye_all), size(ye_all))
	cze_all::Matrix{Float32} = reshape(collect(ze_all), size(ze_all))
	return Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all, MeshBlockSize, blocks
end

Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all, MeshBlockSize, blocks = readhdf("sane00.prim.01800.athdf")

function bound(cxe_all, cye_all, cze_all, MeshBlockSize, blocks)
	XX,YY,ZZ = ntuple(_ -> zeros(Float32, blocks* prod(MeshBlockSize.+1)), 3)
	icx=1
	for b in 1:blocks
		@views r::Vector{Float32}, θ::Vector{Float32}, ϕ::Vector{Float32} = cxe_all[:, b], cye_all[:, b], cze_all[:, b]
		for rr in r, th in θ, ph in ϕ,  
			XX[icx] = rr * sin(th) * cos(ph)
			YY[icx] = rr * sin(th) * sin(ph)
			ZZ[icx] = rr * cos(th)
			icx += 1
		end
	end
	xmin,xmax = extrema(XX); ymin,ymax = extrema(YY); zmin,zmax = extrema(ZZ)
	println("  x ∈ [$xmin, $xmax], y ∈ [$ymin, $ymax], z ∈ [$zmin, $zmax]")
	return xmin, xmax, ymin, ymax, zmin, zmax
end
	
xmin, xmax, ymin, ymax, zmin, zmax = bound(cxe_all, cye_all, cze_all, MeshBlockSize, blocks)
	

function clp(X,Y,Z,C,xmin,ymin,zmin,sx,sy,sz,nx,ny,nz,field)	
	xi = trunc(Int, (X-xmin) * sx) + 1
	yi = trunc(Int, (Y-ymin) * sy) + 1
	zi = trunc(Int, (Z-zmin) * sz) + 1
	ix = clamp(xi, 1, nx)
	iy = clamp(yi, 1, ny)
	iz = clamp(zi, 1, nz)
	field[ix, iy, iz] = C
end
	
function kerr2cart(iex,a,ii,jj,kk,r_eval,θ_eval,ϕ_eval,ext_itp,X,Y,Z,C)
	rr, th, ph = r_eval[ii], θ_eval[jj], ϕ_eval[kk]
	sθ, cθ = sincos(th)
    sϕ, cϕ = sincos(ph)
    X[iex] = (rr*cϕ + a*sϕ) * sθ
    Y[iex] = (rr*sϕ - a*cϕ) * sθ
    Z[iex] = rr * cθ
	C[iex] = ext_itp(log(rr), th, ph)
end

function loaddata(Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all, MeshBlockSize, blocks, xmin, xmax, ymin, ymax, zmin, zmax)
	nx,ny,nz = 10(MeshBlockSize.+1)
	nxs,nys,nzs = nx,ny,nz #200, 200,200
	a::Float32 = 0.0	#0.98
	field = fill(NaN32, nx,ny,nz)

	sx::Float32 = (nx-1)/(xmax-xmin); sy::Float32 = (ny-1)/(ymax-ymin); sz::Float32 = (nz-1)/(zmax-zmin)
	X, Y, Z, C = ntuple(_ -> zeros(Float32, nxs*nys*nzs), 4)

	@threads for b in 1:blocks
		@views r, θ, ϕ = cx_all[:, b], cy_all[:, b], cz_all[:, b]
		@views re, θe, ϕe = cxe_all[:, b], cye_all[:, b], cze_all[:, b]
		@views Bp_block = Bp[:, :, :, b]
		@views size(Bp_block)
		ext_itp = extrapolate(interpolate((log.(r), θ, ϕ), Bp_block, Gridded(Linear())), Interpolations.Line())
		r_eval = exp.(LinRange(log(re[1]), log(re[end]), nxs))
		θ_eval = acos.(LinRange(cos(θe[1]), cos(θe[end]), nys))		# used cos and acos for smooth sampling
		ϕ_eval = LinRange(ϕe[1], ϕe[end], nzs)
		iex::Int32 = 1
		@inbounds for ii in 1:nxs, jj in 1:nys, kk in 1:nzs
			kerr2cart(iex,a,ii,jj,kk,r_eval,θ_eval,ϕ_eval,ext_itp,X,Y,Z,C)
			clp(X[iex],Y[iex],Z[iex],C[iex],xmin,ymin,zmin,sx,sy,sz,nx,ny,nz,field)
			iex += 1
		end
	end
	return field
end

field = loaddata(Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all, MeshBlockSize, blocks, xmin, xmax, ymin, ymax, zmin, zmax)

function tdplot(xmin, xmax, ymin, ymax, zmin, zmax, field)
	finite_vals = filter(!isnan, vec(field))
    q_low, q_high = quantile(finite_vals, (0.05, 0.95))
    fig = Figure(nan_color=(:white, 0.5))
    ax::LScene = LScene(fig[1,1], show_axis=false)
	volume!(ax, xmin..xmax, ymin..ymax, zmin..zmax, field, transparency=true; colorrange=(q_low, q_high))
	display(fig)
end

tdplot(xmin, xmax, ymin, ymax, zmin, zmax, field)
end
julia> include("Inter.jl")
 42.455227 seconds (24.58 M allocations: 1.264 GiB, 0.78% gc time, 29.76% compilation time: 3% of which was recompilation)

julia> include("Inter.jl")
 47.951952 seconds (7.04 G allocations: 251.978 GiB, 22.38% gc time, 4.57% compilation time)

julia> include("Inter.jl")
 54.267853 seconds (7.04 G allocations: 251.968 GiB, 20.92% gc time, 4.81% compilation time)

julia> include("Inter.jl")
 34.263482 seconds (1.57 M allocations: 158.984 MiB, 0.05% gc time, 5.29% compilation time)

I would not expect such massive variations (excluding the first run), nor can I observe them:

julia> include("code.jl")
  x ∈ [-1200.0, 1200.0], y ∈ [-1200.0, 1200.0], z ∈ [-1200.0, 1200.0]
 55.999274 seconds (19.50 M allocations: 1.082 GiB, 0.84% gc time, 36.19% compilation time)
GLMakie.Screen(...)

julia> include("code.jl")
  x ∈ [-1200.0, 1200.0], y ∈ [-1200.0, 1200.0], z ∈ [-1200.0, 1200.0]
 62.853175 seconds (1.19 M allocations: 143.744 MiB, 0.08% gc time, 5.48% compilation time)
GLMakie.Screen(...)

julia> include("code.jl")
  x ∈ [-1200.0, 1200.0], y ∈ [-1200.0, 1200.0], z ∈ [-1200.0, 1200.0]
 53.756447 seconds (1.15 M allocations: 141.678 MiB, 0.46% gc time, 7.29% compilation time)
GLMakie.Screen(...)

julia> include("code.jl")
  x ∈ [-1200.0, 1200.0], y ∈ [-1200.0, 1200.0], z ∈ [-1200.0, 1200.0]
 57.703731 seconds (1.07 M allocations: 137.641 MiB, 0.33% gc time, 10.25% compilation time)
GLMakie.Screen(...)

(Note that for some reason you don’t have bound’s printlns.)

So no idea. If I were you, I would definitely test again without the volume! call, though as long as you don’t interact with the plot (within the @time, e.g. via a wait(display(fig))), it still shouldn’t cause the variability.

I think that GPU would improve performance of this code but how to use it? KernelAbstractions.jl documentation is not detailed enough. Please can you tell GPU version of this code because it takes about half hour to run code if i substitute nxs,nys,nzs = 2nx,2ny,2nz?

Could you first explain what the code is actually doing / trying to do?

On a high level you’re performing some quantised interpolation and conversion from spherical to cartesian coordinates. But what is e.g. Bp? In particular, what’s up with the fourth axis (also in cx_all, etc.)? Why doesn’t every b-iteration overwrite some (or all) of the field[ix, iy, iz] values from a ‘previous’ iteration? (The notation of ‘previous’ here is complicated by multithreading.) Why do we need log.(r) in interpolate? Why ‘smooth’ θ_eval, but not ϕ_eval? What is a? Do we need the extrapolate? …

You should use more descriptive variable names and add comments/documentation.

Hi @eldee I am trying to 3D plot Black Hole simulation data. I am converting Kerr-Schild coordinates to Cartesian coordinates.
Bp is magnetic field ratio(poloidal/toroidal). I need log.(r) in interpolate because r(geometric series) is not uniform so i have to make it uniform.

I could not notice improvement in plot on making changes in ϕ_eval.
a is spin of Black hole. Yes, We need extrapolate as we are interpolating on cx_all which are location where color is assigned and calculating cxe_all. Although extrapolation can be ignored if i tolerate little bit error/approximation in color position and interpolate directly on cx_all.

That i too don’t understand but it gives correct plot as i have matched it with scatter plot and every feature looks same.

I agree that the documentation is a bit too sparse for me (last time I checked, a couple of months ago). So below is a CUDA.jl implementation instead. It’s about 30x faster than the multithreaded CPU version (for an Intel i7-7700K, NVIDIA RTX 3070):

julia> CUDA.@time loaddata_gpu(Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all, mins, maxs);
  1.620541 seconds (126.47 k CPU allocations: 13.120 MiB, 0.86% gc time) (3.49 k GPU allocations: 10.119 MiB, 0.97% memmgmt time)

About 90% of the time is spent in the interpolation. I’m pretty sure it’s possible to speed that up significantly via better memory patterns (e.g. using shared memory), but this requires more effort than I’m willing to put in.

Code
using HDF5, GLMakie, Interpolations, Statistics, Profile, Base.Threads, BenchmarkTools
using CUDA, StaticArrays
using CUDA: i32

function readhdf(h5file::String)
	h5 = h5open(h5file, "r")
	B = h5["B"]
	@views Br::Array{Float32, 4}, Bθ::Array{Float32, 4}, Bϕ::Array{Float32, 4} = B[:,:,:,:,1], B[:,:,:,:,2], B[:,:,:,:,3]
	Bp = @. hypot(Br, Bθ) / Bϕ
	x_all, y_all, z_all = h5["x1v"], h5["x2v"], h5["x3v"]
	xe_all, ye_all, ze_all = h5["x1f"], h5["x2f"], h5["x3f"]
	# Don't need MeshBlockSize or blocks, as this is embedded in the shape of Bp (and the x_all, ...)
	cx_all::Matrix{Float32} = reshape(collect(x_all), size(x_all))
	cy_all::Matrix{Float32} = reshape(collect(y_all), size(y_all))
	cz_all::Matrix{Float32} = reshape(collect(z_all), size(z_all))
	cxe_all::Matrix{Float32} = reshape(collect(xe_all), size(xe_all))
	cye_all::Matrix{Float32} = reshape(collect(ye_all), size(ye_all))
	cze_all::Matrix{Float32} = reshape(collect(ze_all), size(ze_all))
	return Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all
end

function find_bounds(cxe_all, cye_all, cze_all)
	# No need to collect the cartesian coordinates
	mins = @SVector fill(Inf32, 3)
	maxs = @SVector fill(-Inf32, 3)
	for b = axes(cxe_all, 2)
		@views r, θ, ϕ = cxe_all[:, b], cye_all[:, b], cze_all[:, b]
		# Don't add a type declaration, as this will force a conversion from SubArray to Array
		# i.e. 
		#	@views r::Vector{Float32} = cxe_all[:, b]
		# still copies, and will be slower than 
		#	r = cxe_all[:, b]
	
		# For reference, non-separated approach:
		# 	for rr in r, th in θ, ph in ϕ
		#		# No a here?
		# 		x = rr * sin(th) * cos(ph)
		# 		y = rr * sin(th) * sin(ph)
		# 		z = rr * cos(th)
		#		xyz = SA[x, y, z]
		#		mins = min.(mins, xyz)
		#		maxs = max.(maxs, xyz)
		# 	end
		
		# Cf. abraemer's compute_bounds
		extr_r = extrema(r)
		extr_sin_th = extrema(sin, θ)
		extr_cos_th = extrema(cos, θ)
		extr_sin_ph = extrema(sin, ϕ)
		extr_cos_ph = extrema(cos, ϕ)
		
		minmaxs = extrema.((
			(rr * sth * cph for rr in extr_r, sth in extr_sin_th, cph in extr_cos_ph),
			(rr * sth * sph for rr in extr_r, sth in extr_sin_th, sph in extr_sin_ph),
			(rr * cth for rr in extr_r, cth in extr_cos_th)
		))
		mins = min.(mins, first.(minmaxs))
		maxs = max.(maxs, last.(minmaxs))
	end
	println("  " * join(("$v ∈ [$m, $M]" for (v, m, M) in zip(("x", "y", "z"), mins, maxs)), ", "))
	return mins, maxs
end

function loaddata_gpu(Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all, mins, maxs)
	field_dims = 10 .* (size(Bp)[1:3] .+ 1)
	eval_dims = field_dims #(200, 200,200)
	a = 0.0f0	#0.98f0
	cu_field = CUDA.fill(NaN32, field_dims...)

	voxel_sizes = (field_dims .- 1f0) ./ (maxs .- mins)

	# We need explicit Vectors as on the GPU it seems we require the grid (Abstract)Vectors
	# to be the same type (not a mix of CuArray and SubArray{..., CuArray})
	# (Bp_block can be of a different type)
	logr = Vector{Float32}(undef, size(cx_all, 1))
	θ = Vector{Float32}(undef, size(cy_all, 1))
	ϕ = Vector{Float32}(undef, size(cz_all, 1))
	cu_θ_eval = CuVector{Float32}(undef, eval_dims[2])
	for b = axes(Bp, 4)
		logr .= log.(view(cx_all, :, b))
		θ .= view(cy_all, :, b)
		ϕ .= view(cz_all, :, b)
		Bp_block = view(Bp, :, :, :, b)
		
		itp = interpolate((logr, θ, ϕ), Bp_block, Gridded(Linear()))
		# First on CPU, then adapt to GPU (cf. # https://juliamath.github.io/Interpolations.jl/stable/devdocs/#GPU-Support)
		cu_itp = adapt(CuArray, itp)
		cu_ext_itp = extrapolate(cu_itp, Interpolations.Line())

		@views re, θe, ϕe = cxe_all[:, b], cye_all[:, b], cze_all[:, b]
		r_eval = Base.LogRange(re[begin], re[end], eval_dims[1])  # isbits
		cu_θ_eval .= acos.(LinRange(cos(θe[begin]), cos(θe[end]), eval_dims[2]))  # used cos and acos for smooth sampling
		# (Could create an isbits struct with lazy getindex, if we want)
		ϕ_eval = LinRange(ϕe[begin], ϕe[end], eval_dims[3])
		
		kernel = @cuda launch=false _fill_field_kernel!(cu_field, r_eval, cu_θ_eval, ϕ_eval, cu_ext_itp, mins, maxs, voxel_sizes, a)
        config = launch_configuration(kernel.fun)
		nb_eval_elems = prod(eval_dims)
        nb_threads = min(nb_eval_elems, config.threads)
        nb_blocks = clamp(cld(nb_eval_elems, nb_threads), config.blocks, 2^31 - 1)
        kernel(
			cu_field, r_eval, cu_θ_eval, ϕ_eval, cu_ext_itp, mins, maxs, voxel_sizes, a;
			threads = nb_threads, blocks = nb_blocks
		)
	end
	
	return Array(cu_field)
end

function _fill_field_kernel!(field, r_eval, θ_eval, ϕ_eval, ext_itp, mins, maxs, voxel_sizes, a)
	idx = threadIdx().x + (blockIdx().x - 1i32) * blockDim().x
	CI = CartesianIndices((length(r_eval), length(θ_eval), length(ϕ_eval)))
	@inbounds while idx <= length(CI)
		ridx, θidx, ϕidx = Tuple(CI[idx])
		rr = r_eval[ridx]; th = θ_eval[θidx]; ph = ϕ_eval[ϕidx]
		sθ, cθ = sincos(th)
	    sϕ, cϕ = sincos(ph)
	    x = (rr * cϕ + a * sϕ) * sθ
	    y = (rr * sϕ - a * cϕ) * sθ
	    z = rr * cθ

		field_ids = trunc.(Int32, (SA[x, y, z] .- mins) .* voxel_sizes) .+ 1
	    field_ids = clamp.(field_ids, 1, size(field))  
		# (Is this clamping necessary, and if so does it make sense?)
		field[field_ids...] = ext_itp(log(rr), th, ph)
		
		idx += blockDim().x * gridDim().x
	end
	return
end

function tdplot(mins, maxs, field)
	finite_vals = filter(!isnan, vec(field))
    q_low, q_high = quantile(finite_vals, (0.05, 0.95))
    fig = Figure(nan_color=(:white, 0.5))
    ax = LScene(fig[1,1], show_axis=false)
	volume!(ax, (m..M for (m, M) in zip(mins, maxs))..., field, transparency=true; colorrange=(q_low, q_high))
	display(fig)
end

function main()
    Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all = readhdf("sane00.prim.01800.athdf")
    mins, maxs = find_bounds(cxe_all, cye_all, cze_all)
    field = loaddata_gpu(Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all, mins, maxs)
    tdplot(mins, maxs, field)
end

Why do you want to extrapolate the data? I mean, in your data e.g. cxe_all[1, 1] < cx_all[1, 1], so you would indeed need extrapolate. But why do you want to evaluate outside of the data range, where you have less information to rely on to get a good estimate?

Like JM_Beckers in the topic you linked to, I don’t think this is the way to go. If you want to construct a dense field, instead of adding more interpolated samples in spherical space, just work backwards. I.e. for every voxel in field find the corresponding spherical (or Kerr-Schild) coordinates. Then interpolate in Bp to obtain a value. (The mysterious fourth Bp axis does make this a bit more confusing, though.)

1 Like

@eldee Thank you very much althogh i have Intel GPU.

Because it will give full sphere otherwise i gets a crack in sphere. cx_all are grid cell body-center positions while cxe_all are cell interface positions along x direction.
Fourth Bp axis is the block number which we are iterating in @threads for loop.

I couldn’t understand what you are suggesting :innocent:.

I have made it linear by using log.(r) because i am using linear interpolation Gridded(Linear()).

ext_itp = extrapolate(interpolate((log.(r), θ, ϕ), Bp_block, Gridded(Linear())), Interpolations.Line())

I believe this is the one thing that you must understand to make any real progress.

Like many other I have a hard time understanding your code but as far as I can tell you

  1. Have data in a polar coordinate system.
  2. Interpolate in the polar coordinate system to get more data points.
  3. Project each point forward to a Cartesian grid and write each data value to the grid.
  4. As a result some grid points may get written multiple times and whatever happened to be written last is your output, whereas other grid points may not be hit at all.

If my understanding is correct, there is a big problem because this is a fundamentally flawed way of resampling data.

The correct way is to

  1. Loop over the Cartesian output grid.
  2. Project each point backwards to the polar coordinate system.
  3. Interpolate or extrapolate from your data to these points.

There may be more intricacies to how you do step 3 best, but this is the general approach you should use.

If this is still unclear I would strongly recommend you to find a textbook on the topic of resampling and interpolation. (Sorry, can’t give a recommendation because I studied this a long time ago.)

2 Likes

Hi @eldee , Please can you give me a MWE. If i do what you suggested(i don’t understood clearly) it becomes too slow.

My wrong MWE Code
using GLMakie, Interpolations
r = Float32[1.677044, 1.8385518, 2.0156136, 2.2097273, 2.4225352, 2.6558375, 2.911608, 3.1920104, 3.499417, 3.8364286, 4.205896, 4.6109447, 5.055002, 5.5418243, 6.0755296, 6.660634, 7.3020864, 8.005314];
θ = Float32[0.049087387, 0.14726216, 0.24543692, 0.3436117];
ϕ = Float32[0.19634955, 0.5890486, 0.9817477, 1.3744467, 1.7671459, 2.1598449, 2.552544, 2.9452431, 3.3379421, 3.7306414, 4.12334, 4.5160394, 4.9087386, 5.3014374, 5.6941366, 6.086836];
Bp_block = rand(18,4,16);

function cart(iex,ii,jj,kk,r_eval,θ_eval,ϕ_eval,X,Y,Z)
    rr, th, ph = r_eval[ii], θ_eval[jj], ϕ_eval[kk]
    xy, z = rr .* sincos(th)
    y, x = xy .* sincos(ph)
    X[iex] = x
    Y[iex] = y
    Z[iex] = z
end

function cartesian_to_spherical(x, y, z)
    r = hypot(x,y,z)
    θ = r == 0 ? 0.0 : acos(z / r)
    ϕ = atan(y, x)   # robust quadrant-aware atan2
    return r, θ, ϕ
end


nx,ny,nz = 19,5,17
ext_itp = extrapolate(interpolate((log.(r), θ, ϕ), Bp_block, Gridded(Linear())), Interpolations.Line())
r_eval = exp.(LinRange(log(1.6), log(8.373081), nx))
θ_eval = LinRange(0.0, 0.3926991, ny)
ϕ_eval = LinRange(0.0, 6.2831855, nz)
X, Y, Z, C= ntuple(_ -> Vector{Float32}(undef, nx*ny*nz), 4);
global iex::Int32 = 1
@inbounds for ii in 1:nx, jj in 1:ny, kk in 1:nz
	cart(iex,ii,jj,kk,r_eval,θ_eval,ϕ_eval,X,Y,Z)
	global iex += 1
end
#@show size(X), size(unique(X)), size(Y), size(unique(Y)), size(Z), size(unique(Z)) 
color = Array{Float32}(undef, length(unique(X)), length(unique(Y)), length(unique(Z)))
println(size(color))
icx = 1
for x in unique(X), y in unique(Y), z in unique(Z)
	ri,θi,ϕi = cartesian_to_spherical(x, y, z)
	color[icx] = ext_itp(ri,θi,ϕi)
end	

fig = Figure()
ax::LScene = LScene(fig[1,1], show_axis=false)
volume!(ax, extrema(X), extrema(Y), extrema(Z), color; transparency=true)
fig

You don’t need to store the Cartesian coordinates (in X, …) (also not for the extrema, see Improve performance of this index creation and accessing code - #24 by abraemer) and more importantly, you should loop over your output field cells (voxels), not the discrete samples.

using GLMakie, Interpolations

function get_spherical_data()
	rs = Float32[1.677044, 1.8385518, 2.0156136, 2.2097273, 2.4225352, 2.6558375, 2.911608, 3.1920104, 3.499417, 3.8364286, 4.205896, 4.6109447, 5.055002, 5.5418243, 6.0755296, 6.660634, 7.3020864, 8.005314];
	θs = Float32[0.049087387, 0.14726216, 0.24543692, 0.3436117];
	ϕs = Float32[0.19634955, 0.5890486, 0.9817477, 1.3744467, 1.7671459, 2.1598449, 2.552544, 2.9452431, 3.3379421, 3.7306414, 4.12334, 4.5160394, 4.9087386, 5.3014374, 5.6941366, 6.086836];
	spherical_values = rand(18, 4, 16)
	return rs, θs, ϕs, spherical_values
end

function cartesian_to_spherical(x, y, z)
    r = hypot(x, y, z)
    θ = r == 0 ? 0.0 : acos(z / r)
    ϕ = atan(y, x) + pi   # In [0, 2pi) (like ϕs)
    return r, θ, ϕ
end

function spherical_to_cartesian(r, θ, ϕ)
    x = r * sin(θ) * cos(ϕ)
	y = r * sin(θ) * sin(ϕ)
	z = r * cos(θ)
	return x, y, z
end

function find_cartesian_bounds(rs, θs, ϕs)
	extr_r = extrema(rs)
	extr_sin_th = extrema(sin, θs)
	extr_cos_th = extrema(cos, θs)
	extr_sin_ph = extrema(sin, ϕs)
	extr_cos_ph = extrema(cos, ϕs)
	
	minmaxs = extrema.((
		(r * sth * cph for r in extr_r, sth in extr_sin_th, cph in extr_cos_ph),
		(r * sth * sph for r in extr_r, sth in extr_sin_th, sph in extr_sin_ph),
		(r * cth for r in extr_r, cth in extr_cos_th)
	))

	return first.(minmaxs), last.(minmaxs)
end

function create_cartesian_field(mins, maxs, resolution, rs, θs, ϕs, spherical_values)
	# minx, maxs determine the extent of the cartesian grid
	# resolution is the number of grid cells per axis
	cartesian_values = Array{Float32, 3}(undef, resolution)
	steps = (maxs .- mins) ./ (resolution .- 1)
	
	# There's bound to be a proper way do to periodic interpolation (not based on boundary points, 
	# as it is in https://juliamath.github.io/Interpolations.jl/stable/iterate/#Periodic).
	# You could always create a wrapper class (same for the log, if you want).
	# But here's an ugly workaround, where we first interpolate to find values for ϕ = 0
	itp_ϕ = interpolate((rs, θs, [ϕs[end] - 2pi, ϕs[begin]]), spherical_values[:, :, [end, begin]], Gridded(Linear()))
	ϕs = [0.f0; ϕs; Float32(2*pi)]
	spherical_values_ϕ0 = reshape([itp_ϕ(r, θ, 0.f0) for r in rs, θ in θs], length(rs), length(θs), 1)
	spherical_values = cat(spherical_values_ϕ0, spherical_values, spherical_values_ϕ0, dims=3)
	itp = extrapolate(interpolate((rs, θs, ϕs), spherical_values, Gridded(Linear())), 0.f0)
	# Don't extrapolate: 0.f0 is transparent
	
	# Could use a single for I = CartesianIndices(cartesian_values) loop,
	# but then we won't be able to reuse the index calculations
	# (though I doubt this will make a measureable impact on performance)
	z = mins[3]
	for k = 1:resolution[3]
		y = mins[2]
		for j = 1:resolution[2]
			x = mins[1]
			for i = 1:resolution[1]
				r, θ, ϕ = cartesian_to_spherical(x, y, z)
				cartesian_values[i, j, k] = itp(r, θ, ϕ)
				x += steps[1]
			end
			y += steps[2]
		end
		z += steps[3]
	end
	return Colon().(mins, steps, maxs)..., cartesian_values  # xs, ys, zs, cartesian_values
end


function main()
	rs, θs, ϕs, spherical_values = get_spherical_data()
	mins, maxs = find_cartesian_bounds(rs, θs, ϕs)
	resolution = (100, 100, 100)
	
	xs, ys, zs, cartesian_values = create_cartesian_field(
		mins, maxs, resolution, rs, θs, ϕs, spherical_values
	)

	fig = Figure()
	ax = LScene(fig[1,1], show_axis=false)
	volume!(ax, (m..M for (m, M) in zip(mins, maxs))..., cartesian_values; transparency=true)
        # Apparently volume!(ax, xs, ys, zs, cartesian_values) is deprecated.
        # Since the xs are spaced uniformly, the extrema do suffice
	display(fig)  # (or wait(...), when used in a script)
end

3 Likes

We should close the HDF5 file handle.

function readhdf(h5file::String)
	h5 = h5open(h5file, "r")
	B = h5["B"]
	@views Br, Bθ, Bϕ = B[:,:,:,:,1], B[:,:,:,:,2], B[:,:,:,:,3]
	Bp = @. hypot(Br, Bθ) / Bϕ
	x_all, y_all, z_all = h5["x1v"], h5["x2v"], h5["x3v"]
	xe_all, ye_all, ze_all = h5["x1f"], h5["x2f"], h5["x3f"]
	MeshBlockSize = read((HDF5.attributes(h5))["MeshBlockSize"])
	blocks::Int32 = read(HDF5.attributes(h5),"NumMeshBlocks")

	close(h5) # close the file handle

	return Bp, x_all, y_all, z_all, xe_all, ye_all, ze_all, MeshBlockSize, blocks
end

Perhaps an easier way to do this would be to use a do block closure.

function readhdf(h5file::String)
	return h5open(h5file, "r") do h5
		B = h5["B"]
		@views Br, Bθ, Bϕ = B[:,:,:,:,1], B[:,:,:,:,2], B[:,:,:,:,3]
		Bp = @. hypot(Br, Bθ) / Bϕ
		x_all, y_all, z_all = h5["x1v"], h5["x2v"], h5["x3v"]
		xe_all, ye_all, ze_all = h5["x1f"], h5["x2f"], h5["x3f"]
		MeshBlockSize = read((HDF5.attributes(h5))["MeshBlockSize"])
		blocks::Int32 = read(HDF5.attributes(h5),"NumMeshBlocks")
		return Bp, x_all, y_all, z_all, xe_all, ye_all, ze_all, MeshBlockSize, blocks
	end
end

2 Likes

@eldee Thanks, I have difficulty in understanding this interpolation part.

Please improve the code and try with this different data. it fills a lot of RAM memory on iterating over blocks.

Volume code
using GLMakie, Interpolations, HDF5, Base.Threads
@time begin
function readhdf(h5file::String)
	h5 = h5open(h5file, "r")
	B = h5["B"]
	@views Br, Bθ, Bϕ = B[:,:,:,:,1], B[:,:,:,:,2], B[:,:,:,:,3]
	Bp = @. hypot(Br, Bθ) / Bϕ
	x_all, y_all, z_all = h5["x1v"], h5["x2v"], h5["x3v"]
	xe_all, ye_all, ze_all = h5["x1f"], h5["x2f"], h5["x3f"]
	MeshBlockSize = read((HDF5.attributes(h5))["MeshBlockSize"])
	blocks::Int32 = 1	#read(HDF5.attributes(h5),"NumMeshBlocks")
	return Bp, x_all, y_all, z_all, xe_all, ye_all, ze_all, MeshBlockSize, blocks
end

Bp, x_all, y_all, z_all, xe_all, ye_all, ze_all, MeshBlockSize, blocks = readhdf("sane98.prim.01900.athdf")

function cartesian_to_spherical(x, y, z)
    r = hypot(x, y, z)
    θ = r == 0 ? 0.0 : acos(z/r)
    ϕ = atan(y, x) + pi   # In [0, 2pi) (like ϕs)
    return r, θ, ϕ
end

function spherical_to_cartesian(r, θ, ϕ)
    x = r * sin(θ) * cos(ϕ)
	y = r * sin(θ) * sin(ϕ)
	z = r * cos(θ)
	return x, y, z
end

function find_cartesian_bounds(rs, θs, ϕs)
	extr_r = extrema(rs)
	extr_sin_th = extrema(sin, θs)
	extr_cos_th = extrema(cos, θs)
	extr_sin_ph = extrema(sin, ϕs)
	extr_cos_ph = extrema(cos, ϕs)
	minmaxs = extrema.(((r * sth * cph for r in extr_r, sth in extr_sin_th, cph in extr_cos_ph),
		(r * sth * sph for r in extr_r, sth in extr_sin_th, sph in extr_sin_ph),
		(r * cth for r in extr_r, cth in extr_cos_th)))
	return first.(minmaxs), last.(minmaxs)
end

function create_cartesian_field(mins, maxs, resolution, rs, θs, ϕs, Bp_block)
	# minx, maxs determine the extent of the cartesian grid
	# resolution is the number of grid cells per axis
	cartesian_values = fill(NaN, resolution)
	steps = (maxs .- mins) ./ (resolution .- 1)
	
	# There's bound to be a proper way do to periodic interpolation (not based on boundary points, 
	# as it is in https://juliamath.github.io/Interpolations.jl/stable/iterate/#Periodic).
	# You could always create a wrapper class (same for the log, if you want).
	# But here's an ugly workaround, where we first interpolate to find values for ϕ = 0
	itp_ϕ = interpolate((rs, θs, [ϕs[end] - 2pi, ϕs[begin]]), Bp_block[:, :, [end, begin]], Gridded(Linear()))
	ϕs = [0.f0; ϕs; Float32(2*pi)]
	Bp_block_ϕ0 = reshape([itp_ϕ(r, θ, 0.f0) for r in rs, θ in θs], length(rs), length(θs), 1)
	Bp_block = cat(Bp_block_ϕ0, Bp_block, Bp_block_ϕ0, dims=3)
	itp = extrapolate(interpolate((rs, θs, ϕs), Bp_block, Gridded(Linear())), NaN)
	
	# I replaced 0.f0 with NaN.
	# Could use a single for I = CartesianIndices(cartesian_values) loop,
	# but then we won't be able to reuse the index calculations
	# (though I doubt this will make a measurable impact on performance)
	z = mins[3]
	for k = 1:resolution[3]
		y = mins[2]
		for j = 1:resolution[2]
			x = mins[1]
			for i = 1:resolution[1]
				r, θ, ϕ = cartesian_to_spherical(x, y, z)
				cartesian_values[i, j, k] = itp(r, θ, ϕ)
				x += steps[1]
			end
			y += steps[2]
		end
		z += steps[3]
	end
	return Colon().(mins, steps, maxs)..., cartesian_values  # xs, ys, zs, cartesian_values
end

fig = Figure()
ax = LScene(fig[1,1], show_axis=false)
for b in 2000    #1:blocks
	@views r, θ, ϕ = x_all[:, b], y_all[:, b], z_all[:, b]
	@views re, θe, ϕe = xe_all[:, b], ye_all[:, b], ze_all[:, b]
	@views Bp_block = Bp[:, :, :, b]
	mins, maxs = find_cartesian_bounds(re, θe, ϕe)
	resolution = (100, 100, 100)
	xs, ys, zs, cartesian_values = create_cartesian_field(mins, maxs, resolution, r, θ, ϕ, Bp_block)
	volume!(ax, (m..M for (m, M) in zip(mins, maxs))..., cartesian_values; transparency=true)
end
display(fig)
end

We have to calculate extrapolation on re, θe, ϕe = xe_all[:, b], ye_all[:, b], ze_all[:, b]. Is there any error still hidden? I doubt about extrapolation parts.


I noticed that for this different data scatter plot doesn’t match its volume plot so there should be error somewhere.

Scatter code
using HDF5, GLMakie, LinearAlgebra, Statistics

GLMakie.activate!(ssao = true)
GLMakie.closeall()

function plothdf_scatter(h5file::String)
    h5 = h5open(h5file, "r")
    num_blocks::Int32 = read(HDF5.attributes(h5), "NumMeshBlocks")
    x_all::Matrix{Float32} = read(h5["x1v"])
    y_all::Matrix{Float32} = read(h5["x2v"])
    z_all::Matrix{Float32} = read(h5["x3v"])
    B::Array{Float32, 5} = read(h5["B"])
    close(h5)
    Br, Bθ, Bϕ = @views B[:,:,:,:,1], B[:,:,:,:,2], B[:,:,:,:,3]
    Bp = @. hypot(Br, Bθ) / (Bϕ + eps(Float32))   # avoid divide-by-zero
	
    fig = Figure(size = (1000, 800))
    ax = LScene(fig[1, 1]; show_axis = false)
    
    for b in 2000#:num_blocks
        r = x_all[:, b]
        θ = y_all[:, b]
        ϕ = z_all[:, b]
        Nx, Ny, Nz = length(r), length(θ), length(ϕ)
        Ntot = Nx * Ny * Nz
        
        # preallocate fixed-size Float32 arrays
        X, Y, Z, V = ntuple(_ -> Vector{Float32}(undef, Ntot), 4)

        idx = 1
        for i in 1:Nx, j in 1:Ny, k in 1:Nz
            ri, th, ph = r[i], θ[j], ϕ[k]
            X[idx] = ri * sin(th) * cos(ph)
            Y[idx] = ri * sin(th) * sin(ph)
            Z[idx] = ri * cos(th)
            V[idx] = abs(Bp[i, j, k, b]) + eps(Float32)
            idx += 1
        end
        cr = extrema(V)
        V ./= maximum(abs, V)  # normalize per block
        scatter!(ax, X, Y, Z; markersize = 50, color = V, colormap = :hawaii, colorrange = cr, colorscale = log10, transparency = true)
    end
    fig
end

@time plothdf_scatter("sane98.prim.01900.athdf")

While my earlier code which @JM_Beckers termed conceptually wrong gives volume plot similar to scatter plot. :thinking:

My earlier conceptually wrong code
using HDF5, GLMakie, Interpolations, Statistics, Base.Threads, BenchmarkTools
@time begin
function readhdf(h5file::String)
	h5 = h5open(h5file, "r")
	B = h5["B"]
	@views Br::Array{Float32, 4}, Bθ::Array{Float32, 4}, Bϕ::Array{Float32, 4} = B[:,:,:,:,1], B[:,:,:,:,2], B[:,:,:,:,3]
	Bp = @. hypot(Br, Bθ) / Bϕ
	x_all, y_all, z_all = h5["x1v"], h5["x2v"], h5["x3v"]
	xe_all, ye_all, ze_all = h5["x1f"], h5["x2f"], h5["x3f"]
	MeshBlockSize::Vector{Int32} = read((HDF5.attributes(h5))["MeshBlockSize"])
	blocks::Int32 = read(HDF5.attributes(h5),"NumMeshBlocks") 
	cx_all::Matrix{Float32} = reshape(collect(x_all), size(x_all))
	cy_all::Matrix{Float32} = reshape(collect(y_all), size(y_all))
	cz_all::Matrix{Float32} = reshape(collect(z_all), size(z_all))
	cxe_all::Matrix{Float32} = reshape(collect(xe_all), size(xe_all))
	cye_all::Matrix{Float32} = reshape(collect(ye_all), size(ye_all))
	cze_all::Matrix{Float32} = reshape(collect(ze_all), size(ze_all))
	return Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all, MeshBlockSize, blocks
end

Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all, MeshBlockSize, blocks = readhdf("sane98.prim.01900.athdf")

function bound(cxe_all, cye_all, cze_all, MeshBlockSize, blocks)
	XX,YY,ZZ = ntuple(_ -> zeros(Float32, 1* prod(MeshBlockSize.+1)), 3)
	icx=1
	for b in 2000	#1:blocks
		@views r::Vector{Float32}, θ::Vector{Float32}, ϕ::Vector{Float32} = cxe_all[:, b], cye_all[:, b], cze_all[:, b]
		for rr in r, th in θ, ph in ϕ,  
			XX[icx] = rr * sin(th) * cos(ph)
			YY[icx] = rr * sin(th) * sin(ph)
			ZZ[icx] = rr * cos(th)
			icx += 1
		end
	end
	xmin,xmax = extrema(XX); ymin,ymax = extrema(YY); zmin,zmax = extrema(ZZ)
	println("  x ∈ [$xmin, $xmax], y ∈ [$ymin, $ymax], z ∈ [$zmin, $zmax]")
	return xmin, xmax, ymin, ymax, zmin, zmax
end
	
xmin, xmax, ymin, ymax, zmin, zmax = bound(cxe_all, cye_all, cze_all, MeshBlockSize, blocks)
	

function clp(X,Y,Z,C,xmin,ymin,zmin,sx,sy,sz,nx,ny,nz,field)	
	xi = trunc(Int, (X-xmin) * sx) + 1
	yi = trunc(Int, (Y-ymin) * sy) + 1
	zi = trunc(Int, (Z-zmin) * sz) + 1
	ix = clamp(xi, 1, nx)
	iy = clamp(yi, 1, ny)
	iz = clamp(zi, 1, nz)
	field[ix, iy, iz] = C
end
	
function kerr2cart(iex,a,ii,jj,kk,r_eval,θ_eval,ϕ_eval,ext_itp,X,Y,Z,C)
	rr, th, ph = r_eval[ii], θ_eval[jj], ϕ_eval[kk]
	sθ, cθ = sincos(th)
    sϕ, cϕ = sincos(ph)
    X[iex] = (rr*cϕ + a*sϕ) * sθ
    Y[iex] = (rr*sϕ - a*cϕ) * sθ
    Z[iex] = rr * cθ
	C[iex] = ext_itp(log(rr), th, ph)
end

function loaddata(Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all, MeshBlockSize, blocks, xmin, xmax, ymin, ymax, zmin, zmax)
	nx,ny,nz = 10(MeshBlockSize.+1)
	nxs,nys,nzs = nx,ny,nz	#300, 200,300
	a::Float32 = 0.0	#0.98
	field = fill(NaN32, nx,ny,nz)

	sx::Float32 = (nx-1)/(xmax-xmin); sy::Float32 = (ny-1)/(ymax-ymin); sz::Float32 = (nz-1)/(zmax-zmin)
	X, Y, Z, C = ntuple(_ -> zeros(Float32, nxs*nys*nzs), 4)

	@threads for b in 2000	#1:blocks
		@views r, θ, ϕ = cx_all[:, b], cy_all[:, b], cz_all[:, b]
		@views re, θe, ϕe = cxe_all[:, b], cye_all[:, b], cze_all[:, b]
		@views Bp_block = Bp[:, :, :, b]
		@views size(Bp_block)
		ext_itp = extrapolate(interpolate((log.(r), θ, ϕ), Bp_block, Gridded(Linear())), Interpolations.Line())
		r_eval = exp.(LinRange(log(re[1]), log(re[end]), nxs))
		θ_eval = acos.(LinRange(cos(θe[1]), cos(θe[end]), nys))		# used cos and acos for smooth sampling
		ϕ_eval = LinRange(ϕe[1], ϕe[end], nzs)
		iex::Int32 = 1
		@inbounds for ii in 1:nxs, jj in 1:nys, kk in 1:nzs
			kerr2cart(iex,a,ii,jj,kk,r_eval,θ_eval,ϕ_eval,ext_itp,X,Y,Z,C)
			clp(X[iex],Y[iex],Z[iex],C[iex],xmin,ymin,zmin,sx,sy,sz,nx,ny,nz,field)
			iex += 1
		end
	end
	return field
end

field = loaddata(Bp, cx_all, cy_all, cz_all, cxe_all, cye_all, cze_all, MeshBlockSize, blocks, xmin, xmax, ymin, ymax, zmin, zmax)

function tdplot(xmin, xmax, ymin, ymax, zmin, zmax, field)
	finite_vals = filter(!isnan, vec(field))
    q_low, q_high = quantile(finite_vals, (0.05, 0.95))
    fig = Figure(nan_color=(:white, 0.5))
    ax::LScene = LScene(fig[1,1], show_axis=false)
	volume!(ax, xmin..xmax, ymin..ymax, zmin..zmax, field, transparency=true; colorrange=(q_low, q_high))
	display(fig)
end

tdplot(xmin, xmax, ymin, ymax, zmin, zmax, field)
end

Honestly, this sounds very entitled to me. You don’t understand the code, therefore it must be bad, and thus it must be fixed (by others).

In any case, the ‘problem’ is we have periodicity (\phi = \phi + 2k \pi for any angle \phi and k \in \mathbb{Z}), which is not known by Interpolations.jl. So if we want to interpolate for (e.g.) \phi = 0, which falls ‘in between’ ϕs[end] == 6.086836f0 and ϕs[begin] == 0.19634955f0, we would get a BoundsError. You could circumvent this using extrapolation, but this is silly, as we do have information on both sides. So instead, we explicitly use ϕs[end] - 2pi \approx -0.196 and ϕs[begin] to get the values at \phi = 0. Then we enlarge ϕs with 0 and 2pi, and spherical_values with these extra interpolated values. At this point we no longer have issues with later interpolation, since every phi we get in the loop lies between 0 and 2pi.

On that note, my line

ϕ = atan(y, x) + pi  # In [0, 2pi) (like ϕs)

in cartesian_to_spherical was incorrect, and should instead read

ϕ = mod2pi(atan(y, x))  # In [0, 2pi) (like ϕs)

Now as I mentioned as a comment, the periodic interpolation approach is still a workaround (which needlessly allocates an extra potentially big Array), and a proper solution would be to either implement this type of periodicity directly into Interpolations.jl, or to create a wrapper which when called reduces \bmod 2 \pi and if necessary (when you fall below the first or above the last reference domain value) also considers the first and last reference function values.


The only significant amount of memory you really need (which cannot be freed) is the one from cartesian_values. Unless you want to start playing around with e.g. sparse octrees, you cannot get around this. Of course, the resolution parameter allows you to tune the size of cartesian_values.


Why? If you just want to visualise your data, you don’t need this extrapolation. So what do you even want to achieve? And, again, why? How do we know you’re not just asking us to solve your homework or whatever? Also note that we’ve strayed far from the topic, namely


You should really first know where this data comes from, and what it all means, in particular the final axis. Also, either this is publicly available data, in which case you should refer us to something like a project page instead of your personal university sharepoint. Or it is not freely available, and you should probably not share it.

There was an error in cartesian_to_spherical. But also, I’m not sure what I would expect to get when overlaying multiple volume plots in Makie.jl. Do you? Ideally, (for algorithm = :mip) I suppose it should take the (pixel/raywise) maximum of all individual plots, but from a quick test, I doubt it does. (Also, what would (you want to) happen if you mix different algorithms?)

Coming back to what the data means, does it consist of point samples, or of (averaged) volumetric samples? Should you visualise them as points, or as a volume? (How) are we supposed to reduce over the final axis? If there is a conceptually sound way to reduce, you only need to keep one cartesian_values field, which will presumably solve your memory issues (assuming volume and not scatter is the way to go).

4 Likes

I want to get volume plot. I would like to use :mip and :iso algorithms etc. This is Athena++ data and it is cell-centered dataset. Yes, I only need to keep cartesian_values field. I think it would be better to calculate the cartesian_bounds of all the blocks to create cartesian_values field that i am going to plot; instead of calculating cartesian_values field for each block separately .

Is this above statement correct? 0 ∉ [0.19634955f0, 6.086836f0]

In my data ϕ always lies between 0 and 2pi and it never goes beyond 2pi.

Because re, θe, ϕe(cell interface positions) contains points which outside the r, θ, ϕ(cell-center positions). Notice the difference between r, θ, ϕ and re, θe, ϕe values. If you plot more than one block then you will notice gap left between them.


There is another way to find bounds of all data as extreme values are stored in data. But still it will be better to calculate them for blocks as i may want to plot them separately if needed.

RootGridX1 = read((HDF5.attributes(h5))["RootGridX1"]) = [1.6, 1200.0, 1.0963050292915717]    #last quantity is Geometric ratio in that direction.
RootGridX2 = read((HDF5.attributes(h5))["RootGridX2"]) = [0.0, 3.141592653589793, 1.0]
RootGridX3 = read((HDF5.attributes(h5))["RootGridX3"]) = [0.0, 6.283185307179586, 1.0]

Assuming that the blocks correspond to disjoint subsets of the (spherical) space, indeed, just create a big Cartesian grid. Every voxel here will correspond to at most one point within the interpolated spherical data. In particular if the blocks form a (disjoint) tiling you immediately know in which block (or blocks, if you’re on the boundary) to look for the interpolation. But if this were the case, why have blocks at all?

However, if the blocks overlap you can (and probably should?) perform your interpolation in multiple blocks, and need to combine these somehow, in a way that makes sense (which again depends on what the blocks mean).

Yes, that’s the point. Draw these angles on a unit circle.

ϕ has been used for both a Vector and a scalar, so I’m not sure what you mean here. But the scalar version, corresponding to some point in Cartesian coordinates, can lie outside of the range of the reference Vector version (which only spans a strict subset of [0, 2\pi)). This means the interpolation is not straightforward any more, cf. supra.

I don’t know what the concepts of cell-interface or cell-center positions mean. But the question remains, if all you want to do is visualise the data, why do you apparently need to get values at specific predetermined positions (the _e versions)?

Presumably because you are not interpolating across block boundaries.