Did you remember the dollar sign? As in
Yes, the dollar sign was included.
If I run exactly the code you posted here, and taking care of using the
$ in the benchmarking, I get:
julia> benchout = @benchmark testfun02($B); julia> benchout BechmarkTools.Trial: 54 samples with 1 evaluations. Range (min … max): 93.013 ms … 102.730 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 93.293 ms ┊ GC (median): 0.00% Time (mean ± σ): 93.632 ms ± 1.378 ms ┊ GC (mean ± σ): 0.00% ± 0.00% █ ▄▃█▇▄▄▃▄▅▄▅▄▄▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁ 93 ms Histogram: frequency by time 96.9 ms < Memory estimate: 234.45 KiB, allocs estimate: 2.
I don’t think you should be surprised by the cost of the memory access. The operations are very fast in the processor, and here you almost don’t do anything.
If you run over the columns and rows in the incorrect order, you get a code that is 10x slower, and that is only because of improper memory access order, that gives you an idea.
This is why it is so important to use immutable structures and avoid allocations and memory accesses in critical code.
Thanks. That is probably the explanation. I have to be very careful for my project development as speed plays a crucial role. Thanks all for the support