Now that b will be treated like a SubArray() there may be additional overheads in computations. Are SubArray methods in general identify these cases internally and provide the same level of performance?
It is important for performance that a function has return types only dependent on the input types and not values (search for “type stable”, also look at @code_warntype). Thus this is correct.
@traktofon the comparison is relative here so ideally the effect should be equally seen in both cases. In any case, here is the result with local variables:
julia> f_hsB_SubArray_view()
BenchmarkTools.Trial:
memory estimate: 1.06 KiB
allocs estimate: 36
--------------
minimum time: 58.579 ms (0.00% GC)
median time: 59.293 ms (0.00% GC)
mean time: 59.487 ms (0.00% GC)
maximum time: 65.059 ms (0.00% GC)
--------------
samples: 85
evals/sample: 1
julia> f_hsB_SubArray_noview()
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 43.517 ms (0.00% GC)
median time: 44.126 ms (0.00% GC)
mean time: 44.310 ms (0.00% GC)
maximum time: 49.490 ms (0.00% GC)
--------------
samples: 113
evals/sample: 1
The concern I had with local variables was optimization may optimize away computations and replace with the final result as it does not take any inputs or process it.
for example:
int factorial_5() may be short circuited to a fixed value of 120 after first JIT compilation.
Since, the arrays I use are random number sequences it’ll be dependent on the system state. Also note the @benchmark is applied on the computation only and not allocations.
I understand the concern raised on datatype instability of global variables. Just theoretically that to my understanding should be concern for assignment of previously declared or unassigned variables. Not the case here where the initialization of the variable is happening with pre-specified concrete datatype. But I am sure there may be practical constraints where such rules may not hold good.
@inline number_from_hex(c::UInt) = begin
DIGIT_ZERO = UInt('0')
DIGIT_NINE = UInt('9')
LATIN_UPPER_A = UInt('A')
LATIN_UPPER_F = UInt('F')
LATIN_A = UInt('a')
LATIN_F = UInt('f')
return (DIGIT_ZERO <= c <= DIGIT_NINE) ? c - DIGIT_ZERO :
(LATIN_UPPER_A <= c <= LATIN_UPPER_F) ? c - LATIN_UPPER_A + 10 :
(LATIN_A <= c <= LATIN_F) ? c - LATIN_A + 10 :
throw(ArgumentError("Not a hexadecimal number"))
end
function hex2bytes!(d::AbstractVector{UInt8}, s::AbstractVector{UInt8})
i, j = start(s), 0
# This line is important as this ensures computation happens in word boundary and not
# byte boundary. Boundary computation can be almost 10 times slower
n::UInt = 0
c1::UInt = 0
c2::UInt = 0
while !done(s, i)
n = 0
c1, i = next(s, i)
done(s, i) && throw(ArgumentError("source vector length must be even"))
c2, i = next(s, i)
n = number_from_hex(c1)
n <<= 4
n += number_from_hex(c2)
d[j+=1] = (n & 0xFF)
end
return d
end