Your function is already faster than you think. It could be improved marginally by adding @inbounds, and by not filling the array with zeros first. And more substantially by making i the inner loop, to treat elements adjacent in memory soon after each other.
At this size I think you want to parallelise with Threads not Distributed:
julia> @time println(compute_array_normal(t, t)[t, t])
1996002
0.002596 seconds (11 allocations: 3.815 MiB) # i.e. 2.6 ms
julia> using BenchmarkTools
julia> @btime compute_array_normal(1000, 1000);
777.709 μs (2 allocations: 3.81 MiB)
julia> function compute_array_threads(m, n=m)
x = Array{Int32}(undef, (m, n))
@inbounds Threads.@threads for j = 0:n - 1
for i = 0:m - 1
x[i+1, j+1] = Int32(i*i + j*j)
end
end
return x
end;
julia> @btime compute_array(1000, 1000); # without @threads
628.875 μs (2 allocations: 3.81 MiB)
julia> @btime compute_array_threads(1000, 1000);
198.875 μs (23 allocations: 3.82 MiB)