Synchronizing Cuda kernels

Sorry, that I didn’t write so long. If it’s going about device properties I know everything. If somebody interested:

using CUDAdrv

println("Name of device: $(CuDevice(0))")

println("Total amount of memory on the device.: $(totalmem(CuDevice(0)))")

for i=1:85

println("$(CUDAdrv.CUdevice_attribute(i)) : $(attribute(CuDevice(0), CUDAdrv.CUdevice_attribute(i)))")

end

If it is about my problem is to use any number of threads(ofcourse not in one function). I want to run one function and after this function do it’s calculations I want it to free threads and run second function.
It must be possible, because I run these to functions in for loop so it should work.
Pseudocode:

a=[rand(4),rand(4),rand(4)]
c=[rand(4),rand(4),rand(4)]
Table=CuArray{Float32}(undef,lengthOfTable*length(a))
for i=1:length(a)
b=cu(a[i])
d=cu(c[i])
@cuda blocks=numberOfBlocks threads=numberOfThreads someFunction(Table,b)
@cuda blocks=numberOfBlocks threads=numberOfThreads someSecondFunction(Table,d)
# these (hypothetical)functions modify only 'Table Array', but need these b and d Arrays in Calculations
end

And it works for numberOfThreads=512(max threads=1024)(or less, because in my project even for 1020 threads(that is 340*2(first function)+340(second function)=1020) it didn’t worked, but it worked for 960 threads and it worked even in loop).