Multithreaded broadcast on a grouped DimensionalData array

The groupby array is new and may not be fully tested for generic broadcast. A GitHub issue is the best place to resolve this.

(And sorry for the delay, Iโ€™m on holiday)

Donโ€™t worry, it is your duty to enjoy your holidays!

I will make the GitHub issue, but, please, donโ€™t return to this while you are on holidays.

Have fun!

Hi, @danielwe , can you repeat your attempt, but after starting julia with the option --treads=# (# = number of desired threads)?

It seems that it does not work when multiple threads are enabled.

Youโ€™re right, I get a MethodError for resize!(::DimGroupByArray,...). Sorry about that, I thought I had used multiple threads when trying this out, but apparently not.

So, could this be an issue for OhMyThreads instead of Rasters?

No, I wouldnโ€™t say that. DimensionalData defines some special data types and implements dedicated methods for them. For example, map(tmean, groups) works because of a special method of Base.map implemented in DimensionalData, see here: DimensionalData.jl/src/array/methods.jl at 2012cded9dab12cc1f622a082d4b22cdefeb1e7b ยท rafaqz/DimensionalData.jl ยท GitHub

When Julia is running single-threaded, tmap from OhMyThreads falls back to Base.map, so everything works. When Julia is running with several threads, tmap tries to do its own multithreaded thing, but it doesnโ€™t work because the types in DimensionalData need special treatment. Packages like OhMyThreads work through generic interfaces and canโ€™t be held responsible for such special cases.

Iโ€™d like to emphasize that this is almost certainly a very simple problem for someone familiar with DimensionalData. Distributing the operations across threads is straightforward, the part thatโ€™s causing trouble is reassembling the results. Unfortunately, Iโ€™ve never used DimensionalData, I just responded because Iโ€™ve got some experience with LoopVectorization and multithreading. Hopefully, someone who knows DimensionalData well will chime in eventually.

Yes, FastBroadcast only applies to broadcasts with DefaultArrayStyle. Otherwise, it falls back to whatever the custom styleโ€™s implementation is.

Hi all, sorry it took a while to get to this.

May I suggest just a threaded loop, before getting too complicated:

using DimensionalData, Dates, Statistics

tempo = range(DateTime(2000); step=Hour(1), length=365*24*2)
A = rand(X(1:0.01:2), Ti(tempo))
groups = groupby(A, Ti => month)

newdimarray = similar(groups, Float64)

Then:

julia> @btime for i in eachindex(newdimarray)
          newdimarray[i] = mean(groups[i])
       end
  5.837 ms (181 allocations: 12.22 KiB)

julia> @btime Threads.@threads :static for i in eachindex(newdimarray)
          newdimarray[i] = mean(groups[i])
       end
  2.963 ms (212 allocations: 16.02 KiB)

For the packages, it seems the main problems were append! and resize! not working.

There are two reasons this is difficult in DimensionalData.jl. The first is that irregular, unordered and categorical lookups cant be resized larger as we donโ€™t know the new lookup values. Those are not actually problems here. The second is that a DimArray can hold a view and other lazy objects that also donโ€™t define append! or resize!` - as with a DimGroupByArray here, it normally holds some other wrapper array that does the grouping. I guess we could just error in all those cases and work wherever possible.

(there is another more difficult reason. A DimArray has mostly immutable fields for performance and GPU compatability, so you canโ€™t actually update an AbstractRange lookup in resize! or append! - and the lookup size always has to match the array size so we simply cant do it. Use-cases like this need to use cat rather than append! to just return a new object)

With broadcasts, DimensionalStyle is what lets a DimArray or a Raster keep its lookups and metadata through broadcasts, and is pretty fundamental to the package. If another package needs DefaultArrayStyle then either DimensionalData.jl or that package needs an extension with dispatch on DimensionalStyle.

Thank you, @Raf .

It does not work. Each element of the outer Ti dimension is not a Float64, but another raster with two dimensions: X and Ti.

julia> groups
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ 12-element DimGroupByArray{DimArray{Float64,1},1} โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ dims โ”
  โ†“ Ti Sampled{Int64} [1, 2, โ€ฆ, 11, 12] ForwardOrdered Irregular Points
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ metadata โ”ค
  Dict{Symbol, Any} with 1 entry:
  :groupby => :Ti=>month
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ group dims โ”ค
  โ†“ X, โ†’ Ti
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  1  101ร—1488 DimArray
  2  101ร—1368 DimArray
  โ‹ฎ  
 11  101ร—1440 DimArray
 12  101ร—1464 DimArray

Your code gives:

julia> newdimarray
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ 12-element DimArray{Float64,1} โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ dims โ”
  โ†“ Ti Sampled{Int64} [1, 2, โ€ฆ, 11, 12] ForwardOrdered Irregular Points
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ metadata โ”ค
  Dict{Symbol, Any} with 1 entry:
  :groupby => :Ti=>month
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  1  6.36837e-310
  2  6.36837e-310
  3  6.36837e-310
  โ‹ฎ  
 10  3.1e-322
 11  9.7e-322
 12  6.36838e-310

And, in the mean() call, we have to specify dims=:Ti.
So, newdimarray should have the same structure as groups, but with the inner Rasters having dimensions 101ร—1 instead of 101ร—1488.
I have attempted to create such an structure, but could not figure out how.

It was an exampleโ€ฆ you can equally make a DimArray of Rasters. You can even just take the mean of the first one to get the type.

out = similar(groups, typeof(mean(first(groups))); dims=Ti))

Or just do out = similar(groups, Raster) to leave it loosely typed.

I get it that easier threading without defining the array would be nice here. But adding threading to DimensionalData.jl directly is pretty far out of scope, so to make it work someone just has to add an extension on a threading package. I would totally merge that, but I donโ€™t have time or need to write it myself.

Thank you, @Raf , it works!

In fact, I already tested this solution, but I got an error. I realize now that the error is with regards to showing the data, but not necessarily to the data itself (please, confirm). So, just adding a semicolon at the end of the line, avoids the error.

I dump here the error:

newdimarray = similar(groups, typeof(mean(first(groups), dims=Ti)) )
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ 12-element DimGroupByArray{DimArray{Float64,2},1} โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ dims โ”
  โ†“ Ti Sampled{Int64} [1, 2, โ€ฆ, 11, 12] ForwardOrdered Irregular Points
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ metadata โ”ค
  Dict{Symbol, Any} with 1 entry:
  :groupby => :Ti=>month
Error showing value of type DimensionalData.DimGroupByArray{DimMatrix{Float64, Tuple{X{DimensionalData.Dimensions.Lookups.Sampled{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Regular{Float64}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.NoMetadata}}, Ti{DimensionalData.Dimensions.Lookups.Sampled{DateTime, Vector{DateTime}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Irregular{Tuple{DateTime, DateTime}}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.NoMetadata}}}, Tuple{}, Matrix{Float64}, DimensionalData.NoName, DimensionalData.Dimensions.Lookups.NoMetadata}, 1, Tuple{Ti{DimensionalData.Dimensions.Lookups.Sampled{Int64, Vector{Int64}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Irregular{Tuple{Nothing, Nothing}}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.NoMetadata}}}, Tuple{}, Vector{DimMatrix{Float64, Tuple{X{DimensionalData.Dimensions.Lookups.Sampled{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Regular{Float64}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.NoMetadata}}, Ti{DimensionalData.Dimensions.Lookups.Sampled{DateTime, Vector{DateTime}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Irregular{Tuple{DateTime, DateTime}}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.NoMetadata}}}, Tuple{}, Matrix{Float64}, DimensionalData.NoName, DimensionalData.Dimensions.Lookups.NoMetadata}}, Symbol, Dict{Symbol, Any}}:
ERROR: UndefRefError: access to undefined reference
Stacktrace:
  [1] getindex
    @ ./essentials.jl:13 [inlined]
  [2] getindex
    @ ~/.julia/packages/DimensionalData/BZbYQ/src/array/indexing.jl:28 [inlined]
  [3] first
    @ ./abstractarray.jl:452 [inlined]
  [4] show_after(io::IOContext{โ€ฆ}, mime::MIME{โ€ฆ}, A::DimensionalData.DimGroupByArray{โ€ฆ})
    @ DimensionalData ~/.julia/packages/DimensionalData/BZbYQ/src/groupby.jl:57
  [5] show(io::IOContext{โ€ฆ}, mime::MIME{โ€ฆ}, A::DimensionalData.DimGroupByArray{โ€ฆ})
    @ DimensionalData ~/.julia/packages/DimensionalData/BZbYQ/src/array/show.jl:17
  [6] (::REPL.var"#55#56"{REPL.REPLDisplay{REPL.LineEditREPL}, MIME{Symbol("text/plain")}, Base.RefValue{Any}})(io::Any)
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:273
  [7] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:569
  [8] display(d::REPL.REPLDisplay, mime::MIME{Symbol("text/plain")}, x::Any)
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:259
  [9] display
    @ ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:278 [inlined]
 [10] display(x::Any)
    @ Base.Multimedia ./multimedia.jl:340
 [11] #invokelatest#2
    @ ./essentials.jl:892 [inlined]
 [12] invokelatest
    @ ./essentials.jl:889 [inlined]
 [13] print_response(errio::IO, response::Any, show_value::Bool, have_color::Bool, specialdisplay::Union{โ€ฆ})
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:315
 [14] (::REPL.var"#57#58"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any)
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:284
 [15] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:569
 [16] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool)
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:282
 [17] (::REPL.var"#do_respond#80"{โ€ฆ})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool)
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:911
 [18] #invokelatest#2
    @ ./essentials.jl:892 [inlined]
 [19] invokelatest
    @ ./essentials.jl:889 [inlined]
 [20] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
    @ REPL.LineEdit ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/LineEdit.jl:2656
 [21] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:1312
 [22] (::REPL.var"#62#68"{REPL.LineEditREPL, REPL.REPLBackendRef})()
    @ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:386
Some type information was truncated. Use `show(err)` to see complete types.

Yeah thatโ€™s just show failing with the nested arrays. You can put a semicolon after the line to avoid it.

Maybe make a GitHub issue with your mwe and we can fix that too.

Ok, I will open the issue.

Additionally, I have observed that, for the example given above, the non threaded solution is faster than the threaded one, but for my data, the threaded solution is faster.

Yeah, thatโ€™s just threading generally.

I also tested on a much larger array to check it was worthwhile.