Out of curiosity, have you looked at R’s source code? (note: it is under the GPL)
Also, on Julia 0.7 with a Ryzen processor (Julia launched with -O3):
julia> bitshift(x) = (unsafe_load(Ptr{UInt}(pointer(x))) << (8 - sizeof(x)))*8 >> (8- sizeof(x))*8
WARNING: deprecated syntax "call to `*` inside call to bitshift operator" at REPL[1]:1.
Use "parenthesized call to `*`" instead.
WARNING: deprecated syntax "call to `*` inside call to bitshift operator" at REPL[1]:1.
Use "parenthesized call to `*`" instead.
bitshift (generic function with 1 method)
julia> rawload(x) = unsafe_load(Ptr{UInt}(pointer(x)))
rawload (generic function with 1 method)
julia> masking(x) = unsafe_load(Ptr{UInt}(pointer(x))) & UInt(2^sizeof(x)-1)
masking (generic function with 1 method)
julia> using BenchmarkTools
julia> @benchmark bitshift("abc") # 5-6 ns
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.222 ns (0.00% GC)
median time: 1.233 ns (0.00% GC)
mean time: 1.248 ns (0.00% GC)
maximum time: 16.611 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> @benchmark rawload("abc") # 1.5ns
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.222 ns (0.00% GC)
median time: 1.232 ns (0.00% GC)
mean time: 1.249 ns (0.00% GC)
maximum time: 58.880 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> @benchmark masking("abc")
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.222 ns (0.00% GC)
median time: 1.232 ns (0.00% GC)
mean time: 1.231 ns (0.00% GC)
maximum time: 4.890 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> @benchmark bitshift("abcdefghijklmnopqrstuvwxyz") # 5-6 ns
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.222 ns (0.00% GC)
median time: 1.232 ns (0.00% GC)
mean time: 1.248 ns (0.00% GC)
maximum time: 21.651 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> @benchmark rawload("abcdefghijklmnopqrstuvwxyz") # 1.5ns
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.222 ns (0.00% GC)
median time: 1.223 ns (0.00% GC)
mean time: 1.234 ns (0.00% GC)
maximum time: 12.614 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> @benchmark masking("abcdefghijklmnopqrstuvwxyz")
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 5.169 ns (0.00% GC)
median time: 5.180 ns (0.00% GC)
mean time: 5.235 ns (0.00% GC)
maximum time: 24.166 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> versioninfo()
Julia Version 0.7.0-DEV.2778
Commit 622ab91* (2017-12-06 21:59 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: AMD Ryzen Threadripper 1950X 16-Core Processor
WORD_SIZE: 64
BLAS: libopenblas (ZEN)
LAPACK: liblapack
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, znver1)
Environment:
This doesn’t mean they’re equally fast, just that they’ve hit the floor (about 1.2ns). So…
julia> @generated function do_f_x_times(f, ::Val{x}, args...) where x
:(Base.Cartesian.@nexprs $x i -> f(args...) )
end
do_f_x_times (generic function with 1 method)
julia> do_f_x_times(println, Val(3), "hi")
hi
hi
hi
julia> @benchmark do_f_x_times(bitshift, Val(10), "abc")
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 5.550 ns (0.00% GC)
median time: 5.591 ns (0.00% GC)
mean time: 5.755 ns (0.00% GC)
maximum time: 55.595 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> @benchmark do_f_x_times(rawload, Val(10), "abc")
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 7.200 ns (0.00% GC)
median time: 7.873 ns (0.00% GC)
mean time: 12.832 ns (31.45% GC)
maximum time: 28.613 μs (99.96% GC)
--------------
samples: 10000
evals/sample: 999
julia> @benchmark do_f_x_times(masking, Val(10), "abc")
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 29.875 ns (0.00% GC)
median time: 29.885 ns (0.00% GC)
mean time: 30.501 ns (0.00% GC)
maximum time: 91.197 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 995