Reading different types of data in Julia

Hello All, I am new to Julia, coming from python. I have collections of observational data and output from GCM model in netcdf format, but of different “types” (different grids, frequency, conventions etc, not julia types), which require slightly different treatment after reading from disk, before feeding these data to analysis utils. This may include renaming coordinates, adding missing grid metrics which is required by analysis tools, etc. A reading function should be able to take any “type” of collection as input and return NCDataset (or similar) object with all required metadata/metrics included and coordinates/variable names following the same convention. No regridding/resampling is needed at this step, just putting data to same format. I can think of several ways of how to do this in julia:

  1. Using ‘if’ blocks,
function read_collection(colname, ...; coltype="LATLON")
  if coltype=="LATLON"
    ds=read_latlon(colname)
  elseif coltype="Tripolar" 
    ds=read_tripolar(colname)
  ......................
  end
  return ds
end
  1. Using dictionary
function read_collection(colname, ...; coltype="LATLON")
  readers=Dict("LATLON" => read_latlon, "Tripolar" => read_tripolar, ...)
  return readers[coltype](colname)
end

I do something like this in python.
3. Dispatch on a dummy type

struct LatLon
end

struct Tripolar
end

function read_collection(colname, ::LatLon)
  return read_latlon(colname)
end

function read_collection(colname, ::Tripolar)
  return read_tripolar(colname)
end
  1. Dispatch on a value type
function read_collection(colname, ::Val{"LATLON"})
  return read_latlon(colname)
end

function read_collection(colname, ::Val{"Tripolar"})
  return read_tripolar(colname)
end

Which way would be preferable in julia in terms of performance and ease of expanding to more collection types in the future?

If you really want to handle all kinds of geospatial data arrangements, take a look at the georef function from the GeoStats.jl stack:

https://juliaearth.github.io/GeoStats.jl/stable/data.html

We still need to work on some important details like CRS, but already have most flexible domain types you can possibly need in practice, including, grids of cells, unstructured meshes, point sets, geometry sets, etc.

If you follow this approach with georef you will gain tons of functionalities and transforms for free, including

https://juliaearth.github.io/GeoStats.jl/stable/transforms.html

https://juliaearth.github.io/GeoStats.jl/stable/splitapplycombine.html

and visualization recipes with both the Plots.jl and the Makie.jl stacks, provided by GeoStatsPlots.jl and GeoStatsViz.jl, respectively.

1 Like

To me the 3rd option seems to be the most idiomatic and is similar to for example to the parse and read functions:

https://docs.julialang.org/en/v1/base/io-network/#Base.read

When the compiler know the types, the function calls can be inlined.
Option 1 and 2 are probably not type-stable.

1 Like

Thanks All, I’ll try to go to option 3 and look into GeoStats later.

I smell, from those mentions to latlon, that GMT.jl would make your life simpler.

1 Like