Best way to handle variable types in function argument

performance

#1

Hello!

In my work (Climate sciences) we often use netCDF files. I’m wondering about the best way to handle the possible types in these files and how it affect the function that will use the extracted arrays inside those netCDF files.

The problem comes from the fact that the data inside the netCDF files can be either Float32 or Float64. My extraction function will fetch the data and put everything into AxisArrays (and then a custom type ClimGrid containing the metadata from the netCDF files). Hence, this means that the data inside the AxisArrays are sometimes Float32 and sometimes Float64, depending on the file.

My question is thus: what should I do for functions that are acting on the data? Should I create 2 functions, one with Float32 as argument and one with Float64 as the argument?

e.g.
foo(x::Float32)
foo(x::Float64)

Or perhaps should I just promote everything to Float64 (but this is costly, we are speaking about arrays of size 365 x 1068 x 510 for a single year of data and this can easily extends to 60-70 years.

I guess that there is an more easier answer involving “parametric” approach, but I must admit that I’m slightly lost with this approach.

Any hint or examples would be greatly appreciated!


#2

For reference, here’s the struct of ClimGrid, in case it helps understand the problem.

struct ClimGrid
  data::AxisArray
  model::String
  experiment::String
  run::String
  filename::String
  dataunits::String
  latunits::String
  lonunits::String
  var::String
end

Where data::AxisArrays can store either Float32 or Float64 data.


#3

First, you probably want to parameterize ClimGrid like:

struct ClimGrid{A <: AxisArray}
    data::A
    .
    .
end

As to the function, you can just use a signature like foo(x::AbstractFloat) to catch both Float64 and Float32 arguments. In Julia it is common to write generic functions that work on multiple types.


#4

Nice, thanks for your help! :slight_smile:

I knew the answer would point towards some generic approach, but I couldn’t see how to do it correctly.


#5

Is there anything more I should modify if I use the struct declaration you provided? Because right now, I can no longer build ClimGrid struct. Here’s an example of errors I get

struct ClimGrid{A <: AxisArray}
    data::A
    .
    .
    function ClimGrid(data; model = "N/A", experiment = "N/A", run = "N/A", filename = "N/A", 
                                 dataunits = "N/A", latunits = "N/A", lonunits = "N/A", variable = "N/A", 
                                 typeofvar = "N/A", typeofcal = "N/A")


      new(data, model, experiment, run, filename, dataunits, latunits, lonunits, variable, typeofvar, 
           typeofcal)

    end
end
axisdata = AxisArray(data, Axis{:time}(d), Axis{:lon}(1:2), Axis{:lat}(1:2))
3-dimensional AxisArray{Float64,3,...} with axes:
    :time, 2003-01-01:1 day:2005-12-31
    :lon, 1:2
    :lat, 1:2
And data, a 1096×2×2 Array{Float64,3}:
[...]

C = ClimateTools.ClimGrid(axisdata, variable = "pr")
ERROR: MethodError: no method matching ClimateTools.ClimGrid(::AxisArrays.AxisArray{Float64,3,Array{Float64,3},Tuple{AxisArrays.Axis{:time,StepRange{Date,Base.Dates.Day}},AxisArrays.Axis{:lon,UnitRange{Int64}},AxisArrays.Axis{:lat,UnitRange{Int64}}}}; variable="pr")

I’m quite lost as I thought that replacing the initial struct to a parameterized struct would have no effect in the larger scheme of things. I guess it has to do with the type if the AxisArray axisdata, but I’m sure that we should not be so specific in the struct declaration (?).

julia> typeof(axisdata)
AxisArrays.AxisArray{Float64,3,Array{Float64,3},Tuple{AxisArrays.Axis{:time,StepRange{Date,Base.Dates.Day}},AxisArrays.Axis{:lon,UnitRange{Int64}},AxisArrays.Axis{:lat,UnitRange{Int64}}}}

#6

I would just do

using AxisArrays
struct ClimGrid{A <: AxisArray}
    data::A
    model::String
    experiment::String
    # too lazy to add more fields
end

function ClimGrid(data; model = "N/A", experiment = "N/A")
    ClimGrid(data, model, experiment)
end

data = randn(3,2,2)
d = 1:3
axisdata = AxisArray(data, Axis{:time}(d), Axis{:lon}(1:2), Axis{:lat}(1:2))
ClimGrid(axisdata)

#7

Thanks! I see now where I made a mistake.

Somehow it works now that the function ClimGrid is outside the struct statement.


#8

Inner constructors for parametric types are a bit confusing. I would only use then, when you need them.

using AxisArrays

struct ClimGrid{A <: AxisArray}
    data::A
    model::String
    experiment::String
    # too lazy to add more fields
    function ClimGrid(data::A; model = "N/A", experiment = "N/A") where {A}
        new{A}(data, model, experiment)
    end
end


data = randn(3,2,2)
d = 1:3
axisdata = AxisArray(data, Axis{:time}(d), Axis{:lon}(1:2), Axis{:lat}(1:2))
ClimGrid(axisdata)

#9

Thanks for your help, my package is now much better and way less redundant! :slight_smile:


#10

I’m wondering about the mechanics of Abstract types. For the function using AbstractFloat (foo(x::AbstractFloat)) or other Abstract like AbstractArray (been using foo(x::AbstractArray{N, 2} where N for 2D arrays where I don’t know if the array contains Float32 or Float64).

How does it work? The function compiles a Float32 and a Float64 version (at run-time)? This is my understanding from what I read in the documentation. Just wanted to know if I’m right.

Thanks!


#11

Yepp.


#12

Yes. Here’s a post which is about handling types and dispatch which might clear things up.


#13

Thanks for the clarification!