Rewriting function on CPU for execution on GPU

I am trying to execute a function that I am currently running on a CPU on the GPU. There are a few bottlenecks, like accessing elements of an array scalar getindex is disallowed

The code is below:

using CUDAdrv,CUDAnative,CuArrays
CuArrays.allowscalar(false)
upagents=[10,20,30,35,25,15,12,11,9]
cn=fill(0.0,40,1)
cedg=fill(0.0,40)
s=fill(0.0,40)
c=fill(0.0,40)
thet=rand(40)
the=fill(0.0,40)
C=rand(40)
A=zeros(Int,40)
A[10:20].=1

cupagents=cu(upagents)
ccn=cu(cn)
ccedg=cu(cedg)
cs=cu(s)
cc=cu(c)
cthet=cu(thet)
cthe=cu(the)
cC=cu(C)
cA=cu(A)

function update!(upagents,cn,s,c,cedg,C,thet,the,A)
  for h in upagents
    cn.=0.0
    s.=0.0
    c.=0.0
    cedg.=0.0
    cn[findall(x->x==1||x!=1&&rand()<0.5,A)].=1.0
    cedg=exp.(-abs.(thet[h].-thet)/(1.0-C[h])).*cn #--> thet[h],C[h] doesn't work
    cedg./=sum(cedg)
    s=cedg.*sin.(thet)
    c=cedg.*cos.(thet)
    the[h]=atan(sum(s),sum(c)) #-->the[h] doesn't work
  end 
end

It would be helpful to hear comments and suggestions to implement it better.

If these sizes are representative of the real problem, I don’t think the GPU will speed things up. GPU is useful for much larger arrays, think images.

No, they are not. This is just a representative example. It will be of the order of >10,000

Instead of findall, try logical indexing

cn[A .== 1 .| A .!= 1 .& rand.() .< 0.5] .= 1.0

or something like that

I tried this Writing simple kernel functions (EM algorithm) - #7 by ennvvy. Also, the scalar indexing that I have highlighted, dont work! I get the following error:

scalar getindex is disallowed

Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] assertscalar(::String) at /root/.julia/packages/GPUArrays/0lvhc/src/indexing.jl:14
 [3] getindex at /root/.julia/packages/GPUArrays/0lvhc/src/indexing.jl:54 [inlined]
 [4] iterate at ./abstractarray.jl:914 [inlined]
 [5] iterate at ./abstractarray.jl:912 [inlined]
 [6] update!(::CuArray{Float32,1,Nothing}, ::CuArray{Float32,2,Nothing}, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,1,Nothing}, ::CuArray{Float32,1,Nothing}) at ./In[7]:25
 [7] ##core#472() at /root/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:297
 [8] ##sample#473(::BenchmarkTools.Parameters) at /root/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:303
 [9] #_run#8(::Bool, ::String, ::Base.Iterators.Pairs{Symbol,Integer,NTuple{4,Symbol},NamedTuple{(:samples, :evals, :gctrial, :gcsample),Tuple{Int64,Int64,Bool,Bool}}}, ::typeof(BenchmarkTools._run), ::BenchmarkTools.Benchmark{Symbol("##benchmark#471")}, ::BenchmarkTools.Parameters) at /root/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:331
 [10] (::Base.var"#inner#2"{Base.Iterators.Pairs{Symbol,Integer,NTuple{5,Symbol},NamedTuple{(:verbose, :samples, :evals, :gctrial, :gcsample),Tuple{Bool,Int64,Int64,Bool,Bool}}},typeof(BenchmarkTools._run),Tuple{BenchmarkTools.Benchmark{Symbol("##benchmark#471")},BenchmarkTools.Parameters}})() at ./none:0
 [11] #invokelatest#1 at ./essentials.jl:713 [inlined]
 [12] #invokelatest at ./none:0 [inlined]
 [13] #run_result#37 at /root/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:32 [inlined]
 [14] #run_result at ./none:0 [inlined]
 [15] #run#39(::Base.Iterators.Pairs{Symbol,Integer,NTuple{5,Symbol},NamedTuple{(:verbose, :samples, :evals, :gctrial, :gcsample),Tuple{Bool,Int64,Int64,Bool,Bool}}}, ::typeof(run), ::BenchmarkTools.Benchmark{Symbol("##benchmark#471")}, ::BenchmarkTools.Parameters) at /root/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:46
 [16] #run at ./none:0 [inlined] (repeats 2 times)
 [17] #warmup#42 at /root/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:79 [inlined]
 [18] warmup(::BenchmarkTools.Benchmark{Symbol("##benchmark#471")}) at /root/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:79
 [19] top-level scope at /root/.julia/packages/BenchmarkTools/7aqwe/src/execution.jl:390
 [20] top-level scope at In[9]:100: