Making efficient some small subroutines

After fixing the line D[i,p] = ldiv!(lu!(B), Dl) and changing your program to use in-place functions, I’d think further optimizations would be microscopic.

I don’t know if you’ll get any speedup, but in the function H(i,n,X,Y,p,γ,sh), you could move the square root in the assignment of c to the exponents within D(c, _, 2,γ,sh).

Once you’re satisfied with the performance, it would be interesting to see some kind of comparison with the python version of your program.

In original full version of my program the timings were as follows:

Julia->34 seconds
Python+Numpy->28 seconds

After improvements in Julia codes, I get

BenchmarkTools.Trial: 
  memory estimate:  6.57 GiB
  allocs estimate:  5684639
  --------------
  minimum time:     4.599 s (2.48% GC)
  median time:      4.636 s (3.40% GC)
  mean time:        4.636 s (3.40% GC)
  maximum time:     4.673 s (4.30% GC)
  --------------
  samples:          2
  evals/sample:     1

This is much better. Thanks to everyone who contributed once again.