Optimizing the calculation of many distances

jsbyrnes · March 16, 2021, 9:59pm

Hey everyone, I’m translating some code from Julia to MATLAB and I have a seismic imaging program that now works in Julia and is a bit faster than MATLAB - but not a ton. The profiler says that the function pasted below takes up the bulk of the calculation - is there is an obvious Julia error I’m making? The three list comprehensions are the vast bulk of the time.

New to Julia so the answer may be simple. I edited this so that the code includes the constructors and some dummy constructors to make the code executable.

This code is a simple interpolation scheme for pasting values from the model (scattered points in x, y with a value v in three separate fields) onto ray paths defined by xy coodinates, ray.x and ray.y. Then the last line calculates the time of a seismic phase through a medium with the seismic velocity (u, which is 1/velocity) and two anisotropic parameters (A, the magnitude of anisotropy, t, fast direction).

I’ve tried some MATLAB tricks like reducing references to structures and substructures, or the Julia trick of rewriting the loop to use column-major matrices, but with little to no improvement.

For context, ray and model are immutable structures and I’m storing stuff in their fields that are defined as Float64 arrays with 1 dimension (hence the broadcasting when storing stuff in ray’s fields).

struct mod_sub
 n::Array{Float64,1}
 x::Array{Float64,1}
 y::Array{Float64,1}
 v::Array{Float64,1}
end

struct model_const
 u::mod_sub
 A::mod_sub
 t::mod_sub
 error::Array{Float64,1}
 BIC::Array{Float64,1}
 fit::Array{Float64,1}
 nfit::Array{Float64,1}
 llh::Array{Float64,1}
end

struct ray_const
 L::Array{Float64,1}
 u::Array{Float64,1}
 A::Array{Float64,1}
 theta::Array{Float64,1}
 phi::Array{Float64,1}
 t::Array{Float64,1}
 x::Array{Float64,1}
 y::Array{Float64,1}
 sta::Float64
 evt::Float64
 pick::Array{Float64,1}
end

function raytracer!( )

r = 1:20

model = model_const( mod_sub(8*ones(1), 100*rand(Float64, 8), 100*rand(Float64, 8), rand(Float64, 8)),
    mod_sub(8*ones(1), 100*rand(Float64, 8), 100*rand(Float64, 8), 0.1*rand(Float64, 8)),
    mod_sub(8*ones(1), 100*rand(Float64, 8), 100*rand(Float64, 8), pi*rand(Float64, 8) .+ -pi),
        0.01*ones(1), ones(1), ones(1), ones(1), ones(1))

ray = ray_const(ones(length(r) - 1), ones(length(r) - 1), ones(length(r) - 1), ones(length(r) - 1),
        ones(length(r) - 1), ones(1), range(0, 100, length=20), range(0, 100, length=20), 1, 1, ones(1))

np = length(ray.x)

u = zeros(np)
A = zeros(np)
t = zeros(np)

for k = 1:np

    ru = [ ((ray.x[k] - mx).^2) .+ ((ray.y[k] - my).^2) for mx in model.u.x, my in model.u.y]
    rA = [ ((ray.x[k] - mx).^2) .+ ((ray.y[k] - my).^2) for mx in model.A.x, my in model.A.y]
    rt = [ ((ray.x[k] - mx).^2) .+ ((ray.y[k] - my).^2) for mx in model.t.x, my in model.t.y]

    u[k] = sum((model.u.v ./ ru))./sum(1 ./ ru)
    A[k] = sum((model.A.v ./ rA))./sum(1 ./ rA)
    t[k] = sum((model.t.v ./ rt))./sum(1 ./ rt)

end

dx      = diff(ray.x);
dy      = diff(ray.y);

ray.L   .= sqrt.(dx.^2 .+ dy.^2)
ray.phi .= atan.(dy./dx)

ray.u     .= 0.5*(2*u[1:(length(u)-1)] .+ diff(u));
ray.A     .= 0.5*(2*A[1:(length(A)-1)] .+ diff(A));
ray.theta .= 0.5*(2*t[1:(length(t)-1)] .+ diff(t));

ray.t .= sum(ray.L.*(ray.u.*(1 .+ ray.A.*cos.(2*(ray.phi .- ray.theta)))))

return ray

end

lmiq · March 16, 2021, 10:16pm

If you can provide a minimal working example it will be much easier to help. There are many thing that can be improved apparently, but what really matters will depend on the sizes of your arrays, for example how large is np or the arrays you are allocating there as ru, rt, ...

jsbyrnes · March 16, 2021, 10:18pm

Yea, was doing that as you typed, sorry about that. Edited version should be executable.

jsbyrnes · March 16, 2021, 10:22pm

I think the ray.x[k] - mx, etc term changes on each iteration of the loop and each list comprehension since mx is defined by model fields on each of the three lines. But not sure if this is optimal. Tried vectorizing it, then found the part of the performance guide that says that probably won’t help in Julia and didn’t get much speed up.

jling · March 16, 2021, 10:24pm

you’re right, I mis-read your original code.

jsbyrnes · March 16, 2021, 10:25pm

no worries, thanks for responding. Appreciate it.

jling · March 16, 2021, 10:31pm

    for k = 1:np

        helper(x,y) = ((ray.x[k] - x).^2) .+ ((ray.y[k] - y).^2)
        ru = map(helper,model.u.x, model.u.y)
        rA = map(helper,model.A.x, model.A.y)
        rt = map(helper,model.t.x, model.t.y)

this seems to help a bit.

Also this saves some allocation:

sum(1 ./ ru) ->
sum(inv, ru)

jsbyrnes · March 16, 2021, 11:08pm

order of magnitude improvement with @btime on the full code - can’t thank you enough my friend.

Jeff_Emanuel · March 16, 2021, 11:15pm

(ray.x[k] - mx).^2 stays the same for each element of the my in ... iterations. Similarly, ((ray.y[k] - my).^2) stays the same for each value of mx. You can compute those separately and then combine them so you would calculate (r-m)^2 roughly n times instead of n^2, but you need to (re-)use pre-allocated storage so that extra allocations to store the sub-expressions don’t eat any gains.

This doesn’t reuse pre-allocated storage, but it demonstrates the idea:

rmx2 = [(ray.x[k] - mx).^2 for mx in model.u.x]
rmy2 = [(ray.y[k] - my).^2 for mx in model.u.y]
ru = [x+y for x in rmx2, y in rmy2]

rdeits · March 16, 2021, 11:18pm

By the way, this is unnecessary in Julia (unlike Matlab), so there’s no need to worry about it. In fact, adding more structure to your data can help improve clarity and even improve performance when it gives the compiler more information about what’s going on.

jsbyrnes · March 16, 2021, 11:26pm

this about Julia

WschW · March 16, 2021, 11:32pm

I am noticing a lot of calls to ones(1). If the size of these are fixed I would suggest either making the struct mutable and storing the mutable values directly in it, or use Base.RefValue instead of Array

Jeff_Emanuel · March 16, 2021, 11:32pm

MATLAB port showing through.

jsbyrnes · March 16, 2021, 11:34pm

Yea just started using Julia. No one in my corner of geophysics has really caught on to the fact that there’s a high level language with speeds like C so everyone is stuck on MATLAB. Trying to fix that.

jling · March 16, 2021, 11:38pm

oops I think I have a bug that map produces a vector instead of a matrix. The speedup from this map is just an illusion

lmiq · March 16, 2021, 11:38pm

Didn’t have time to do more, but: if your vectors of modsub are actually of size 8, it is possible that you can use StaticArrays and improve the performance quite a bit.

Also, you can use Parameters and make the code much clearer. Here is an example of how it can look for your first two structures:

using Parameters
using StaticArrays

@with_kw struct ModSub{N}
 n::Int = N
 x::SVector{N,Float64} = 100*rand(SVector{N,Float64})
 y::SVector{N,Float64} = 100*rand(SVector{N,Float64})
 z::SVector{N,Float64} = rand(SVector{N,Float64})
end

@with_kw struct ModelConst{N}
 u::ModSub{N} = ModSub{N}()
 A::ModSub{N} = ModSub{N}()
 t::ModSub{N} = ModSub{N}()
 error::Float64 = 0.01
 BIC::Float64 = 1.0
 fit::Float64 = 1.0
 nfit::Float64 = 1.0
 llh::Float64 = 1.0
end

# A default ModSub of length 8 can be defined with:
m = ModSub{8}()

# This is how you will, then, initialize your model, changing some parameters
# of the other ModSubs
n=8
model = ModelConst{n}( u = ModSub{n}(),
                       A = ModSub{n}(z = 0.1*rand(SVector{n,Float64})),
                       t = ModSub{n}(z = π*rand(SVector{n,Float64}) - π) )

you can follow the same line with RayConst (structs are usually defined with uppercases like that in Julia).

Edit: the use of static arrays is optional. You can use Parameters to organize the code with normal arrays as well. Note that Array{Float64,1} is the same as Vector{Float64}

Oscar_Smith · March 16, 2021, 11:43pm

Also worth noting that having n::Int = N doesn’t provide any benefit. (and just slightly increases memory usage).

lmiq · March 16, 2021, 11:46pm

Maybe if that is used for something else, it is cumbersome to recover that from the type signature, isn’t it?

Jeff_Emanuel · March 17, 2021, 12:05am

I haven’t looked at this at all, but maybe you’ll find it useful https://github.com/SeismicJulia

Peter_Adelman · March 17, 2021, 2:34am

I hacked on it a bit here, leaving the evolution for clarity.

@btime comes from the BenchmarkTools package and otherwise I stayed within base and left all variable types alone / tried not to introduce too many assumptions (I think length(ModSub.x) == length(ModSub.v) might be required always anyway).

5x speedup and removed >99% of allocations, which will help you a lot if this is a function you are going to call frequently.

Topic		Replies	Views
Making a Function FASTER! New to Julia question	26	2372	November 14, 2020
Performance improvements for simple molecular surface caclulation Performance	15	644	September 14, 2020
Ray Tracing in a week-end - Julia vs SIMD-optimized C++ Performance gpu , graphics , benchmark , pluto , staticarrays	81	14091	August 9, 2024
Understanding why one function is faster than the other New to Julia	16	1202	November 1, 2020
I translated Peter Shirley's Raytracer to Julia. C++ 1min -> Julia 2m30s -> 15s (@threads) Visualization images , graphics	27	3196	February 21, 2022

Optimizing the calculation of many distances

Related topics