using StaticArrays
Values = rand(SVector{3,Float64},10^6)
I know that I can split this into components as such:
ValuesX = getindex.(Values,1)
@benchmark getindex.($Values,1)
BenchmarkTools.Trial: 1104 samples with 1 evaluation.
Range (min … max): 3.079 ms … 24.049 ms ┊ GC (min … max): 0.00% … 79.77%
Time (median): 3.742 ms ┊ GC (median): 0.00%
Time (mean ± σ): 4.519 ms ± 3.131 ms ┊ GC (mean ± σ): 14.44% ± 16.67%
▇▆█▇▃ ▃▃▃ ▁
██████████▇▁▅▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▄███▇ █
3.08 ms Histogram: log(frequency) by time 17.5 ms <
And so on for ValuesY and ValuesZ.
How do I perform this operation more efficiently, using @views does not seem to do anything?
The reason I need it is because I have some input data given in the format/type of Values, but for the code to work on GPU I have to split it into columns.
I still think it has potential though because doing:
@benchmark @CUDA.sync CuArray(view(transpose(reshape(reinterpret(eltype(eltype($ValuesCu)), Array($ValuesCu)), 3, :)), :, $(1)))
BenchmarkTools.Trial: 254 samples with 1 evaluation.
Range (min … max): 14.121 ms … 77.067 ms ┊ GC (min … max): 0.00% … 35.87%
Time (median): 16.216 ms ┊ GC (median): 0.00%
Time (mean ± σ): 19.746 ms ± 8.772 ms ┊ GC (mean ± σ): 14.05% ± 18.84%
▃▄▅█
████▇▆▄▄▅▄▅▃▃▁▁▁▂▁▁▁▁▂▁▁▁▁▁▂▁▁▂▁▂▁▂▃▄▄▃▂▃▂▃▁▃▁▁▁▁▁▁▁▁▁▁▁▂▁▃ ▃
14.1 ms Histogram: frequency by time 49 ms <
Memory estimate: 30.52 MiB, allocs estimate: 9.
Which is still 15 times faster than what I have now. This in the end becomes really slow though, because there is both “CuArray” / “Array” operations, so constantly transfer between CPU and GPU, and I have to do this 12 times per loop, so need to find out how to remove the Array i.e. need to avoid transfer to cpu