Trying to understand best way to code loops/calculations for a simple array. Code is below. I would like to be able to active all 4 cpus on my laptop so trying to work out how to do this. I find the manual complicated to understand this simple activity with all seemingly possible options. NB. I’ve not included use of @parallel on the loops since I expected that the dotprallel. would have used all threads.

NB. Have set in shell 4 threads before running Julia

```
N=1e8
Threads.nthreads()
function trial(N::Int64)
function dotparallel(x)
return sin.(x.^2)
end
function dotparallel!(x)
x=sin.(x.^2)
end
# Pre-compile before testing
o1=1.
o1=dotparallel(o1)
o2=0.
o2=dotparallel!(o2)
# Now run parallel trials
## Initialise arrays
## Run first trial
ck1=zeros(Int(N))
@time begin
for x=1:N
ck1[Int(x)]=dotparallel(x)
end
end
println(*("Check",@sprintf("%g",ck1[N])))
## Run second trial
ck1=zeros(Int(N))
@time begin
Threads.@threads for x=1:N
ck1[Int(x)]=dotparallel(x)
end
end
println(*("Check",@sprintf("%g",ck1[N])))
## Run third trial - using inplace dot operator
ck1=[x for x=1:N]
@time ck1=dotparallel.(ck1)
println(*("Check",@sprintf("%g",ck1[N])))
## Run third trial - inplace array
ck1=[x for x=1:N]
@time ck1=dotparallel!.(ck1)
println(*("Check",@sprintf("%g",ck1[N])))
end
# Run pre-compilation trial
println("Pre-compilation run")
trial(2)
# Run benchmark case
println("Speed run")
trial(10000)
```

Results:

Pre-compilation run

0.002612 seconds (28 allocations: 1.688 KB)

Check-0.756802

0.040687 seconds (7.34 k allocations: 298.478 KB)

Check-0.756802

0.067961 seconds (12.17 k allocations: 520.295 KB)

Check-0.756802

0.066354 seconds (12.79 k allocations: 550.111 KB)

Check-0.756802

Speed run

33.462584 seconds (159.47 k allocations: 8.537 MB)

Check0.931639

52.864397 seconds (159.47 k allocations: 8.537 MB, 0.02% gc time)

Check0.931639

117.130010 seconds (160.03 k allocations: 8.622 MB)

Check0.931639

137.695686 seconds (169.52 k allocations: 8.767 MB)

Check0.931639