Reading different types of data in Julia

yvikhlya · April 25, 2023, 3:47pm

Hello All, I am new to Julia, coming from python. I have collections of observational data and output from GCM model in netcdf format, but of different “types” (different grids, frequency, conventions etc, not julia types), which require slightly different treatment after reading from disk, before feeding these data to analysis utils. This may include renaming coordinates, adding missing grid metrics which is required by analysis tools, etc. A reading function should be able to take any “type” of collection as input and return NCDataset (or similar) object with all required metadata/metrics included and coordinates/variable names following the same convention. No regridding/resampling is needed at this step, just putting data to same format. I can think of several ways of how to do this in julia:

Using ‘if’ blocks,

function read_collection(colname, ...; coltype="LATLON")
  if coltype=="LATLON"
    ds=read_latlon(colname)
  elseif coltype="Tripolar" 
    ds=read_tripolar(colname)
  ......................
  end
  return ds
end

Using dictionary

function read_collection(colname, ...; coltype="LATLON")
  readers=Dict("LATLON" => read_latlon, "Tripolar" => read_tripolar, ...)
  return readers[coltype](colname)
end

I do something like this in python.
3. Dispatch on a dummy type

struct LatLon
end

struct Tripolar
end

function read_collection(colname, ::LatLon)
  return read_latlon(colname)
end

function read_collection(colname, ::Tripolar)
  return read_tripolar(colname)
end

Dispatch on a value type

function read_collection(colname, ::Val{"LATLON"})
  return read_latlon(colname)
end

function read_collection(colname, ::Val{"Tripolar"})
  return read_tripolar(colname)
end

Which way would be preferable in julia in terms of performance and ease of expanding to more collection types in the future?

juliohm · April 25, 2023, 3:58pm

If you really want to handle all kinds of geospatial data arrangements, take a look at the georef function from the GeoStats.jl stack:

https://juliaearth.github.io/GeoStats.jl/stable/data.html

We still need to work on some important details like CRS, but already have most flexible domain types you can possibly need in practice, including, grids of cells, unstructured meshes, point sets, geometry sets, etc.

If you follow this approach with georef you will gain tons of functionalities and transforms for free, including

https://juliaearth.github.io/GeoStats.jl/stable/transforms.html

https://juliaearth.github.io/GeoStats.jl/stable/splitapplycombine.html

and visualization recipes with both the Plots.jl and the Makie.jl stacks, provided by GeoStatsPlots.jl and GeoStatsViz.jl, respectively.

Alexander-Barth · April 28, 2023, 11:59am

To me the 3rd option seems to be the most idiomatic and is similar to for example to the parse and read functions:

https://docs.julialang.org/en/v1/base/io-network/#Base.read

When the compiler know the types, the function calls can be inlined.
Option 1 and 2 are probably not type-stable.

yvikhlya · April 28, 2023, 4:06pm

Thanks All, I’ll try to go to option 3 and look into GeoStats later.

joa-quim · April 28, 2023, 4:53pm

I smell, from those mentions to latlon, that GMT.jl would make your life simpler.

Topic		Replies	Views
Specifying column type efficiently in CSV.read for large datasets General Usage	4	605	June 22, 2020
Is there a way to read a DataFrame from file specifying the type of each column? New to Julia question	7	91	November 1, 2024
Best way to handle variable types in function argument General Usage performance	12	983	October 28, 2017
Python function cis.read_data() equivalent in Julia Data hdf5 , python , io	10	174	June 11, 2024
Convert collection (Array, DataFrame, ...) to concrete eltype New to Julia	5	2227	August 29, 2019

Reading different types of data in Julia

Related topics