Hi,

I’m working on an application with maximum entropy networks. To determine some parameters, a system of non-linear equation F(x)=0 needs to be solved. For a network of n nodes, each node i has an associated equation that looks like this (in its most basic form):

F(x_i) = \sum_{j\ne i}\dfrac{x_ix_j}{1+x_ix_j} - k_i\ , where k_i is known.

I’m using a matrix free quasi newton method to solve the system. The actual function evaluation of F(x) becomes a bottleneck for larger applications. For a real application the number of equations could go up to 1\text{e}7.

I’ve gained some improvements by using multithreading and @inbounds and suppressing the `i`

\ne`j`

statement as shown below. Could this be made faster still? I haven’t used pre-calculated values to avoid using a lot of memory (same motivation for using matrix free quasi newton) as there are (n^2/2-n) unique values to be calculated.

Any suggestions are welcome

```
using BenchmarkTools
function ML_baseline(x::Array{Float64,1},k::Array{Float64,1})
n = length(x)
F = zeros(Float64,n)
for i = 1:n
for j = 1:n
if i≠j
F[i] += x[i]*x[j]/(1+x[i]*x[j])
else
F[i] -= k[i]
end
end
end
return F
end
function ML_threaded_bounds_noif(x::Array{Float64,1},k::Array{Float64,1})
n = Int64(length(x))
F = zeros(Float64,n)
Threads.@threads for i = 1:n
for j = 1:n
@inbounds F[i] += x[i]*x[j]/(1+x[i]*x[j])
end
@inbounds F[i] -= x[i]*x[i]/(1+x[i]*x[i]) + k[i]
end
return F
end
# setup
n = Int(1e4)
x = rand(n)
k = round.(rand(n)*n);
@benchmark ML_baseline(x,k)
@benchmark ML_threaded_bounds_noif(x,k)
```

```
BenchmarkTools.Trial:
memory estimate: 78.20 KiB
allocs estimate: 2
--------------
minimum time: 231.226 ms (0.00% GC)
median time: 242.516 ms (0.00% GC)
mean time: 246.103 ms (0.00% GC)
maximum time: 271.991 ms (0.00% GC)
--------------
samples: 21
evals/sample: 1
BenchmarkTools.Trial:
memory estimate: 80.95 KiB
allocs estimate: 25
--------------
minimum time: 68.404 ms (0.00% GC)
median time: 73.837 ms (0.00% GC)
mean time: 81.320 ms (0.00% GC)
maximum time: 254.281 ms (0.00% GC)
--------------
samples: 62
evals/sample: 1
```

The above results were obtained using 4 threads. Systeminfo below for completeness.

```
Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.7.0)
CPU: Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
```