ForwardDiff: Using on functions that are defined on floats64


#1

I understand that ForwardDiff works with Dual Numbers, so far example this throws an error:

f(x::Float64) = x^2
f_diff(x) = ForwardDiff.derivative(f, x)
f_diff(2.0)
ERROR: MethodError: no method matching f(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(f),Float64},Float
64,1})
Closest candidates are:
  f(::Float64)

What I have done now, is to define my functions without the restriction on Float64, but this is suboptimal since some of these functions are truly restricted to Floats, for example in the case of parameters etc.

Is there a good way to work around this differently?


#2

Suboptimal in what way? They obviously aren’t actually restricted, if you could conceivably call them with dual numbers.

Note that there is no performance penalty for not specifying type.

julia> f(x) = x^2
f (generic function with 1 method)

julia> @code_native f(2)
	.text
; Function f {
; Location: REPL[3]:1
; Function literal_pow; {
; Location: intfuncs.jl:243
; Function *; {
; Location: REPL[3]:1
	imulq	%rdi, %rdi
;}}
	movq	%rdi, %rax
	retq
	nopl	(%rax,%rax)
;}

julia> @code_native f(2.0)
	.text
; Function f {
; Location: REPL[3]:1
; Function literal_pow; {
; Location: intfuncs.jl:243
; Function *; {
; Location: REPL[3]:1
	vmulsd	%xmm0, %xmm0, %xmm0
;}}
	retq
	nopw	%cs:(%rax,%rax)
;}

julia> @code_native f(2f0)
	.text
; Function f {
; Location: REPL[3]:1
; Function literal_pow; {
; Location: intfuncs.jl:243
; Function *; {
; Location: REPL[3]:1
	vmulss	%xmm0, %xmm0, %xmm0
;}}
	retq
	nopw	%cs:(%rax,%rax)
;}

julia> @code_native f(rand(2,2))
	.text
; Function f {
; Location: REPL[3]:1
	pushq	%rax
	movq	%rsi, (%rsp)
; Function literal_pow; {
; Location: none
; Function macro expansion; {
; Location: none
; Function ^; {
; Location: dense.jl:366
	movabsq	$power_by_squaring, %rax
	movq	(%rsi), %rdi
	movl	$2, %esi
	callq	*%rax
;}}}
	popq	%rcx
	retq
;}

julia> @code_native f("hi")
	.text
; Function f {
; Location: REPL[3]:1
	pushq	%rax
	movq	%rsi, (%rsp)
; Function literal_pow; {
; Location: none
; Function macro expansion; {
; Location: none
; Function ^; {
; Location: basic.jl:674
	movabsq	$repeat, %rax
	movq	(%rsi), %rdi
	movl	$2, %esi
	callq	*%rax
;}}}
	popq	%rcx
	retq
;}

Every time you call f with a new combination of types, a new version of f specialized for those types gets compiled. You can see that each version has different assembly above.
Dispatch (which version of f to call) gets resolved at compile time, assuming your functions are type-stable. Therefore there is no run-time penalty to being generic.

However, if you want to make it clear the function only applies to numbers, you could say f(x::Number).

EDIT:

If you feel like getting experimental, there is Zygote. This is useful, because while you can always write your own code to be generic, but code from libraries wont always be.

julia> using Zygote
[ Info: Precompiling Zygote [e88e6eb3-aa80-5325-afca-941959d7151f]

julia> f64(x::Float64) = x^2
f64 (generic function with 1 method)

julia> Zygote.derivative(f64, 3.0)
6.0

julia> @code_native Zygote.derivative(f64, 3.0)
	.text
; ┌ @ interface.jl:47 within `derivative'
; │┌ @ interface.jl:44 within `gradient'
; ││┌ @ interface.jl:38 within `#66'
; │││┌ @ REPL[3]:1 within `f64'
; ││││┌ @ intfuncs.jl:243 within `literal_pow'
; │││││┌ @ lib.jl:6 within `accum'
; ││││││┌ @ interface.jl:47 within `+'
	vaddsd	%xmm0, %xmm0, %xmm0
; │└└└└└└
	retq
	nopw	%cs:(%rax,%rax)
; └

Zygote is slow to compile, but it often ends up producing optimal code – note it’s compiled down to only adding a number to itself (ie, multiply by 2)!


#3

Wouldn’t specifying types already avoid situations in which functions become not type-stable.

For example, if I have

f(x,y) = x*y

and both variables are not restricted to be of the same type, this might actually cause the code to slow down. In my case I only work with Floats, so it seemed appropiate to restrict all functions to handle Floats only.

Zygote looks interesting indeed! Thanks for the link :slight_smile:


#4

Nope, that’s not how Julia works. You can try this yourself:

julia> f(x,y) = x*y
f (generic function with 1 method)

julia> @code_warntype f(1, 2.0)
Body::Float64
1 1 ─ %1 = (Base.sitofp)(Float64, x)::Float64                                                                                                                                                                   │╻╷╷╷╷ *
  │   %2 = (Base.mul_float)(%1, y)::Float64                                                                                                                                                                     ││╻     *
  └──      return %2 

Type annotations on function arguments are useful for:

  1. Dispatching to the correct method for a particular type
  2. Documentation for users of your code as to what type of input your function expects

but they are generally completely irrelevant for performance.


#5

For a function to be type stable, the return type must only be a function of argument types (and not of their values). This is why

julia> 4 / 1
4.0

julia> typeof(ans)
Float64

even though 4/1 = 4, which could of course still be represented with an integer.
Functions in base and other libraries tend to follow this rule, although there are exceptions (like LinearAlgebra.factorize), which normally makes it easy to follow yourself in your own code. Make sure your functions return the same type whenever their input types are the same.

Sometimes, like when writing for loops, you may have to use promote_type:

function dot_product(x,y)
    out = zero(promote_type(eltype(x), eltype(y)))
    @inbounds for i ∈ eachindex(x, y)
        out += x[i] * y[i]
    end
    out
end

where just saying out = 0.0 would cause problems when x and y contain BigFloat. Then the loop would promote the type, and the output would be either Float64 or BigFloat, depending on whether or not x and y are of length 0.
The function also assumes that the return type of * is the same as the promotion of the input types, which isn’t necessarily true (what if x contains SMatrix{4,6}, and y contains SMatrix{6,5}?), but is true often enough.

If you’re concerned about type stability, @code_warntype is your friend. Traceur could also be worth experimenting with.


#6

I see that there is still something to learn for me! Thanks a lot for your replies. Probably my concerns were premature optimization. I’ll read more on the topic and think about it.

What would be the best way to handle the following situation: I have a type that contains the parameters of my model. The model parameters are numbers and positive. Should I write a function that calls the constructor of the type, but checks first its inputs?


#7

Yes, you’d want to use an inner constructor for that case. Something like:

struct Foo{T <: Number}
  x::T 
  
  function Foo{T}(x::T) where {T <: Number}
    x > 0 || throw(ArgumentError("positive value required"))
    new{T}(x)
  end
end

You’d probably also want to define an outer constructor so that Foo(1) works (instead of requiring an explicit Foo{Int}(1) ):

Foo(x::T) where {T <: Number} = Foo{T}(x)

For a more complete example, check out: https://docs.julialang.org/en/v1/manual/constructors/#Case-Study:-Rational-1


#8

I’d like to add that if you’re doing something like optimization or Hamiltonian Monte Carlo (ie, using an algorithm where you’re taking steps in the parameter space), a library like TransformVariables may be worth looking at. For example, if x is positive, storing y = log(x) instead of x itself. log(x) will give you an appropriate error when x is real and negative.
Then you can apply the algorithm to y instead of x, which is defined on the whole real line. Thus you avoid issues like taking steps / making proposals outside of the parameter space.
x will also often behave poorly close the boundary (ie, functions defined on “x” will behave in a highly non-linear fashion).
TransformVariables handles the Jacobians of the transformations for you, and is compatible with ForwardDiff.

I bring this up because many algorithms that use gradients often benefit from these unconstraining transformations (eg, BFGS, or HMC).
Of course, if you don’t need them, it’s faster not to repeatedly take logarithms and exp.


#9

Thanks again for your comments! I’ll take a look at both suggestions. Transforming variables sounds indeed useful, but it is not relevant for my model at the moment, but could very well be in the future. :slight_smile: