Yes, it is coded in Julia 0.6 (edited the comment to state it explicitly). I guess the few allocations were because of the macro creating the closure. Following what you said caused it to have zero allocations as shown below:
using BenchmarkTools
using StaticArrays
@inline relu{T <: AbstractFloat}(x::T) = max(zero(T), x)
function profile()
@btime W2*relu.(W1*input+b1)+b2 setup=(input = @SVector rand(5); W1 = @SMatrix rand(10,5); b1 = @SVector rand(10); W2 = @SMatrix rand(1,10); b2 = @SVector rand(1);)
end
profile()
# 22.266 ns (0 allocations: 0 bytes)
# 1-element StaticArrays.SArray{Tuple{1},Float64,1,1}:
# 10.2185