Implementing an equivalent of python's xarray (labeled nd-arrays) in julia

package

#1

One of the python packages I use the most is an excellent library called xarray, which is very convenient for handing multi-dimensional data. This package provides a DataArray object which associates a n-d array with named dimensions and coordinate data, and provides convenient methods for indexing, aligning, and broadcasting. For instance, if I have two data arrays temp(time,lat,lon) and meantemp(lat,lon), xarray allows me perform simple operations like np.sqrt(((temp-meantemp)**2).mean('time')). This expression involves automatic broadcasting, and a reduction across the named dimension. Also, the final sqrt call returns an DataArray object transparently. It’s a pretty magical tool, and very useful to people involved in atmospheric science and oceanography.

I have started a implementation in julia at https://github.com/nbren12/XArray.jl. Currently, DataArray inherits from AbstractArray, but I am not sure if this is a good idea. My main questions are

  1. Should the DataArray type inherit from AbstractArray? Or should I manually redefine all the functions like +,-,*,sqrt?
  2. If I don’t subclass from AbstractArray, how can I ensure that simple function calls like sqrt(temp) work?
  3. If I do subclass from AbstractArray, how do I define Base.similar(x::DataArray, dims) so that named coordinate information is passed along to the new object.

I would love your input, and would welcome anybody who is interested in contributing to this project. If there is already an equivalent package in julia please let me know.


#2

Before you work too much on this, have a look at NamedArrays.jl and AxisArrays.jl.

Regarding your questions, you should definitely inherit from AbstractArray. As for defining similar, you just need to add a method for your new type which creates the array with the names.

BTW, please do not call this DataArray as this is already the name of a well-know package and type.


#3

Thanks for the info. At a quick glance. AxisArray looks like what I want.
Thanks for pointing it out to me.

Just for kicks I might carry on with my project as a learning exercise. I
will change the name so it doesn’t conflict with anything.


#4

It is true that the name AxisArrays.jl is not very descriptive / discoverable (though I don’t have a better description).


#5

xarray is quite a popular library, at least in my field, so it might help
to link to it on the AxisArrays github page. That way a google search of
xarray julia would turn up something.


#6

That’s a great idea. I’ve definitely taken some inspiration from xarray. A PR would be most welcome!


#7

I was trying out AxisArrays, and it looks like the output of sqrt(A::AxisArray) is a float array.

julia> sqrt(AxisArray(rand(5), Axis{:i}(collect(1:5)))) |> typeof
Array{Float64,1}

Is this supposed to happen?


#8

Yes, at the moment we’ve not extended many functions to be axis aware. So it’s expected, but not necessarily ideal.

The best path forward here would probably be to specialize broadcast and make it axis-aware. Then any dot-broadcast function (like sqrt.(A)) will automatically be axis-aware.


#9

I agree. IMO, the best thing about xarray is it’s ability to broadcast arrays in a very smooth and intelligent fashion. I have started working on this in my little toy package, and it seems like it shouldn’t be too hard. My current solution is to pass the full inds::Tuple{Axis, Axis...} object around in Base.similar instead of turning it into a tuple of axis lengths, if that makes any sense to you. I have also implemented some of the easier broadcast! methods (e.g. addition by scalar).

Since this functionality is pretty useful to me, and is fun learning exercise besides, I would like to add some of it to AxisArrays.jl. Would you be interested in a pull request at some point?


#10

That’d be great! A pull request that added broadcast specializations would be very welcome. I think you could lean on Base.broadcast! pretty heavily, and just specialize the creation of the destination array.

We’ve taken a very similar approach to similar, but to be honest I’ve not found that method to be very useful since it’s really the caller that must be axis-aware… in which case they often just construct the AxisArray directly. Note that if you omit the dimensions entirely from similar, then it can return an AxisArray with the same axes as the original array.

Most of AxisArray’s development dates back to a Julia before the indices() function. I wonder if we couldn’t shoe-horn axis information into the actual indices of the array. It’s a very different situation from OffsetArrays, since the axes are not constrained to be integer unit ranges… but we could have a special integer unit range that also carries an axis vector. It’d describe both the integer indexing as well as the special axis behaviors. I’m not sure how much it’d gain us… but it might be worth investigating.


#11

I went ahead a pull request with a first pass at this broacasting stuff https://github.com/JuliaArrays/AxisArrays.jl/pull/54.

I think your comment about indices and similar is relevant since that shows up the julia broadcast.jl file in a few places, but in my pull request, I didn’t try to integrate that deeply with the broadcast.jl code.