The groupby array is new and may not be fully tested for generic broadcast. A GitHub issue is the best place to resolve this.
(And sorry for the delay, Iโm on holiday)
The groupby array is new and may not be fully tested for generic broadcast. A GitHub issue is the best place to resolve this.
(And sorry for the delay, Iโm on holiday)
Donโt worry, it is your duty to enjoy your holidays!
I will make the GitHub issue, but, please, donโt return to this while you are on holidays.
Have fun!
Hi, @danielwe , can you repeat your attempt, but after starting julia with the option --treads=# (# = number of desired threads)?
It seems that it does not work when multiple threads are enabled.
Youโre right, I get a MethodError
for resize!(::DimGroupByArray,...)
. Sorry about that, I thought I had used multiple threads when trying this out, but apparently not.
So, could this be an issue for OhMyThreads instead of Rasters?
No, I wouldnโt say that. DimensionalData defines some special data types and implements dedicated methods for them. For example, map(tmean, groups)
works because of a special method of Base.map
implemented in DimensionalData, see here: DimensionalData.jl/src/array/methods.jl at 2012cded9dab12cc1f622a082d4b22cdefeb1e7b ยท rafaqz/DimensionalData.jl ยท GitHub
When Julia is running single-threaded, tmap
from OhMyThreads falls back to Base.map
, so everything works. When Julia is running with several threads, tmap
tries to do its own multithreaded thing, but it doesnโt work because the types in DimensionalData need special treatment. Packages like OhMyThreads work through generic interfaces and canโt be held responsible for such special cases.
Iโd like to emphasize that this is almost certainly a very simple problem for someone familiar with DimensionalData. Distributing the operations across threads is straightforward, the part thatโs causing trouble is reassembling the results. Unfortunately, Iโve never used DimensionalData, I just responded because Iโve got some experience with LoopVectorization and multithreading. Hopefully, someone who knows DimensionalData well will chime in eventually.
Yes, FastBroadcast only applies to broadcasts with DefaultArrayStyle
. Otherwise, it falls back to whatever the custom styleโs implementation is.
Hi all, sorry it took a while to get to this.
May I suggest just a threaded loop, before getting too complicated:
using DimensionalData, Dates, Statistics
tempo = range(DateTime(2000); step=Hour(1), length=365*24*2)
A = rand(X(1:0.01:2), Ti(tempo))
groups = groupby(A, Ti => month)
newdimarray = similar(groups, Float64)
Then:
julia> @btime for i in eachindex(newdimarray)
newdimarray[i] = mean(groups[i])
end
5.837 ms (181 allocations: 12.22 KiB)
julia> @btime Threads.@threads :static for i in eachindex(newdimarray)
newdimarray[i] = mean(groups[i])
end
2.963 ms (212 allocations: 16.02 KiB)
For the packages, it seems the main problems were append!
and resize!
not working.
There are two reasons this is difficult in DimensionalData.jl. The first is that irregular, unordered and categorical lookups cant be resized larger as we donโt know the new lookup values. Those are not actually problems here. The second is that a DimArray
can hold a view and other lazy objects that also donโt define append! or
resize!` - as with a DimGroupByArray here, it normally holds some other wrapper array that does the grouping. I guess we could just error in all those cases and work wherever possible.
(there is another more difficult reason. A DimArray
has mostly immutable fields for performance and GPU compatability, so you canโt actually update an AbstractRange
lookup in resize!
or append!
- and the lookup size always has to match the array size so we simply cant do it. Use-cases like this need to use cat
rather than append!
to just return a new object)
With broadcasts, DimensionalStyle
is what lets a DimArray
or a Raster
keep its lookups and metadata through broadcasts, and is pretty fundamental to the package. If another package needs DefaultArrayStyle
then either DimensionalData.jl or that package needs an extension with dispatch on DimensionalStyle
.
Thank you, @Raf .
It does not work. Each element of the outer Ti
dimension is not a Float64, but another raster with two dimensions: X
and Ti
.
julia> groups
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ 12-element DimGroupByArray{DimArray{Float64,1},1} โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโ dims โ
โ Ti Sampled{Int64} [1, 2, โฆ, 11, 12] ForwardOrdered Irregular Points
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ metadata โค
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>month
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ group dims โค
โ X, โ Ti
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 101ร1488 DimArray
2 101ร1368 DimArray
โฎ
11 101ร1440 DimArray
12 101ร1464 DimArray
Your code gives:
julia> newdimarray
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ 12-element DimArray{Float64,1} โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ dims โ
โ Ti Sampled{Int64} [1, 2, โฆ, 11, 12] ForwardOrdered Irregular Points
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ metadata โค
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>month
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 6.36837e-310
2 6.36837e-310
3 6.36837e-310
โฎ
10 3.1e-322
11 9.7e-322
12 6.36838e-310
And, in the mean()
call, we have to specify dims=:Ti.
So, newdimarray
should have the same structure as groups, but with the inner Rasters having dimensions 101ร1
instead of 101ร1488
.
I have attempted to create such an structure, but could not figure out how.
It was an exampleโฆ you can equally make a DimArray of Rasters. You can even just take the mean of the first one to get the type.
out = similar(groups, typeof(mean(first(groups))); dims=Ti))
Or just do out = similar(groups, Raster)
to leave it loosely typed.
I get it that easier threading without defining the array would be nice here. But adding threading to DimensionalData.jl directly is pretty far out of scope, so to make it work someone just has to add an extension on a threading package. I would totally merge that, but I donโt have time or need to write it myself.
Thank you, @Raf , it works!
In fact, I already tested this solution, but I got an error. I realize now that the error is with regards to showing the data, but not necessarily to the data itself (please, confirm). So, just adding a semicolon at the end of the line, avoids the error.
I dump here the error:
newdimarray = similar(groups, typeof(mean(first(groups), dims=Ti)) )
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ 12-element DimGroupByArray{DimArray{Float64,2},1} โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโ dims โ
โ Ti Sampled{Int64} [1, 2, โฆ, 11, 12] ForwardOrdered Irregular Points
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ metadata โค
Dict{Symbol, Any} with 1 entry:
:groupby => :Ti=>month
Error showing value of type DimensionalData.DimGroupByArray{DimMatrix{Float64, Tuple{X{DimensionalData.Dimensions.Lookups.Sampled{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Regular{Float64}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.NoMetadata}}, Ti{DimensionalData.Dimensions.Lookups.Sampled{DateTime, Vector{DateTime}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Irregular{Tuple{DateTime, DateTime}}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.NoMetadata}}}, Tuple{}, Matrix{Float64}, DimensionalData.NoName, DimensionalData.Dimensions.Lookups.NoMetadata}, 1, Tuple{Ti{DimensionalData.Dimensions.Lookups.Sampled{Int64, Vector{Int64}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Irregular{Tuple{Nothing, Nothing}}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.NoMetadata}}}, Tuple{}, Vector{DimMatrix{Float64, Tuple{X{DimensionalData.Dimensions.Lookups.Sampled{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Regular{Float64}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.NoMetadata}}, Ti{DimensionalData.Dimensions.Lookups.Sampled{DateTime, Vector{DateTime}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Irregular{Tuple{DateTime, DateTime}}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.NoMetadata}}}, Tuple{}, Matrix{Float64}, DimensionalData.NoName, DimensionalData.Dimensions.Lookups.NoMetadata}}, Symbol, Dict{Symbol, Any}}:
ERROR: UndefRefError: access to undefined reference
Stacktrace:
[1] getindex
@ ./essentials.jl:13 [inlined]
[2] getindex
@ ~/.julia/packages/DimensionalData/BZbYQ/src/array/indexing.jl:28 [inlined]
[3] first
@ ./abstractarray.jl:452 [inlined]
[4] show_after(io::IOContext{โฆ}, mime::MIME{โฆ}, A::DimensionalData.DimGroupByArray{โฆ})
@ DimensionalData ~/.julia/packages/DimensionalData/BZbYQ/src/groupby.jl:57
[5] show(io::IOContext{โฆ}, mime::MIME{โฆ}, A::DimensionalData.DimGroupByArray{โฆ})
@ DimensionalData ~/.julia/packages/DimensionalData/BZbYQ/src/array/show.jl:17
[6] (::REPL.var"#55#56"{REPL.REPLDisplay{REPL.LineEditREPL}, MIME{Symbol("text/plain")}, Base.RefValue{Any}})(io::Any)
@ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:273
[7] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
@ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:569
[8] display(d::REPL.REPLDisplay, mime::MIME{Symbol("text/plain")}, x::Any)
@ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:259
[9] display
@ ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:278 [inlined]
[10] display(x::Any)
@ Base.Multimedia ./multimedia.jl:340
[11] #invokelatest#2
@ ./essentials.jl:892 [inlined]
[12] invokelatest
@ ./essentials.jl:889 [inlined]
[13] print_response(errio::IO, response::Any, show_value::Bool, have_color::Bool, specialdisplay::Union{โฆ})
@ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:315
[14] (::REPL.var"#57#58"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any)
@ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:284
[15] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
@ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:569
[16] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool)
@ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:282
[17] (::REPL.var"#do_respond#80"{โฆ})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool)
@ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:911
[18] #invokelatest#2
@ ./essentials.jl:892 [inlined]
[19] invokelatest
@ ./essentials.jl:889 [inlined]
[20] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
@ REPL.LineEdit ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/LineEdit.jl:2656
[21] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
@ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:1312
[22] (::REPL.var"#62#68"{REPL.LineEditREPL, REPL.REPLBackendRef})()
@ REPL ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:386
Some type information was truncated. Use `show(err)` to see complete types.
Yeah thatโs just show
failing with the nested arrays. You can put a semicolon after the line to avoid it.
Maybe make a GitHub issue with your mwe and we can fix that too.
Ok, I will open the issue.
Additionally, I have observed that, for the example given above, the non threaded solution is faster than the threaded one, but for my data, the threaded solution is faster.
Yeah, thatโs just threading generally.
I also tested on a much larger array to check it was worthwhile.