Function calculating in BigFloat/Float64

Hi all

What would be the Julian approach to writing a function that, depending on an Integer parameter (the precision) would calculate using either Float64 or BigFloat and return the result in Float64 or BigFloat accordingly? The function uses some numerical constants (like one, two). Should these be defined in an if-block?

Hence I want a function that looks like this:

function foo(x, y; prec::Int)
 #function does something with x, y and constants like one, pi, etc. 

  return result #type of result depends on prec
end

If prec is not specified, the function should run in Float64, if it is specified it should run in that precision. The variables x, y can be of Float64 or BigFloat type (and I guess some promotion/demotion could be required)

Thanks for your insights
Gert

In general, you don’t have to do anything. It should just work according to promotion rules.

For example, if you have a function like…

julia> foo(x, y) = x + y
foo (generic function with 1 method)

julia> typeof(foo(1.0, 2.0))
Float64

julia> typeof(foo(1.0, big"2.0"))
BigFloat

There really isn’t a need to specify the precision that the function should run in but just let that be determined by the types of your input arguments.

Now, there is a small caveat when it comes to constants or anything hard coded within the function. Irrational constants like pi should also work to some degree depending on how they are used.

julia> foo(x, y) = x*pi + y
foo (generic function with 1 method)

If you use pi like this because it is an irrational constant it will do the multiplication with x in the precision of x so there really isn’t something you need to do.

But if you do anything like…

julia> foo(x, y) = sqrt(pi/2)*x + y
foo (generic function with 1 method)

You should be careful because now the sqrt(pi/2) will always be done in Float64 precision and rounded before multiplying with x. So if you do any operations that could round your intermediate results your final answer won’t be in the full precision. To avoid this, it is probably best to have some type parameter that depends on your input arguments and make sure every constant is promoted to that type… so something like

julia> function foo(x,y)
           T = promote_type(typeof(x),  typeof(y))
           return sqrt(T(pi)/2) * x + y
       end
foo (generic function with 1 method)

Will be generic enough to work in any precision.

julia> typeof(foo(1.2f0, 1.3f0))
Float32

julia> typeof(foo(1.2f0, 1.3))
Float64

julia> typeof(foo(1.2, 1.3))
Float64

julia> typeof(foo(1.2, big"1.3"))
BigFloat

julia> typeof(foo(Float16(1.2), Float16(1.3)))
Float16
2 Likes

The most Julian way to do this, I think, is to remove the prec::Int argument and compute using the (promotion) type of x,y. You might not even have to specialize the code in any way, e.g.

function foo(x,y)
   2x+3y
end

will work equally well on Int64s, Float64s, BigFloats, Rationals, etc. If foo needs to do something like a series approximation, you can set the number of terms for a given precision in terms of eps(promote_type(x,y))

julia> x = 3.0; y = BigFloat(2.0);

julia> promote_type(typeof(x), typeof(y))
BigFloat

julia> eps(promote_type(typeof(x), typeof(y)))
1.727233711018888925077270372560079914223200072887256277004740694033718360632485e-77

I should have been more specific, my apologies.

I don’t want to eliminate the prec parameter, I want to give the user the option to run the function in Float64 (absence of prec) or with BigFloat (prec given). I don’t want to revert by default to BigFloat since, correct me if I’m wrong, BigFloat with 53 bits specified does not default to Float64 (i.e. math operations are not run as double precision math operations but suffer from some BigFloat overhead).

So more specifically:

function foo(x, y; prec::Int)
 #function does something with x, y and constants like one, pi, etc. 
 
 # if prec is set, there should be a
 setprecision(prec)

  return result #type of result depends on prec
end

where x and y can be Float64 or BigFloat but not necessarily both the same type.

What I want to achieve is a function that calculates a certain outcome either in Float64s or in BigFloats (depending on the value of prec). So some examples:

a) x is Float64, y is Float64, prec is absent → : function runs using Float64 for all variables and math operations
b) x is Float64, y is Float64, prec is 128 → : function promotes x and y to BigFloats, and all math operations in the function are done using BigFloats with precision of 128 bits
c) x is Float64, y is BigFloat with precision 128, prec is 96 → function does all math operations using BigFloats with precision 96 bits (I assume y gets demoted on the fly, or at least the math operations where y is used are done using 96 bits and results are stored as BigFloats in 96 bits
etc etc

My goal is to keep the code as clean as possible and hence I want to avoid a bunch of if-then’s…

The easy way to get this is to define the function without prec generically, and then if the user runs foo(x::Float64, y::Float64) it will be in Float64, and if the user runs foo(x::BigFloat, y::BigFloat) it will be in BigFloat precision. If the user wants to change the precision, they can call setprecision(prec) before calling foo

2 Likes