Building off on @Elrod’s suggestion, note that you can also do
julia> function foo(x::Array{T}) where {T<:Real}
           M = zeros(T, 1, 1, 1)
           # do stuff with M
       end
foo (generic function with 1 methods)
julia> ForwardDiff.gradient(foo, rand(31))
Note that parametrizing the type of the input function will also lead to the kind of specialized code which I think @Sukera is hinting at in his reply. The current issue seems to be that the conversion to Real is happening at run-time which is slowing down your code.