Also, I’m far from an expert in parallel computing, but in an example like the one you’ve posted, in which every operation is trivially cheap, I would suspect that the overhead of sending data between processes vastly outweighs the performance gain from computing i = i at each iteration.
You will need to have your code inside a function and compile it first:
n = 10^7
A = SharedArray{Int}(n)
function myfill!(A,n)
@sync @parallel for i in 1:n
A[i] = i
end
end
#compile
myfill!(A,20)
#run
@time myfill!(A,n)
println(A[end])