Suppose you have an array of Dataframes. Each Dataframe contains the time and date of the datapoint in the DateTime format. Is there a way to filter/ remove rows from the DataFrame based on the time of day the data is recorded? So for example if I had the following DataFrame D.
Data DateTime
1 │ 1 2022-03-18T05:00:00
2 │ 2 2022-03-18T10:54:00
3 │ 3 2022-03-18T13:53:00
4 │ 4 2022-03-18T20:52:00
5 │ 5 2022-03-19T05:51:00
6 │ 6 2022-03-19T11:50:00
7 │ 7 2022-03-19T15:49:00
8 │ 8 2022-03-19T20:48:00
And on each day I only wanted to keep data from a given Start and Stop time, say 9:00 AM to 5:00 PM (17:00), i.e. rows 2,3,6,7 so that the resulting Dataframe looked like this.
Data DateTime
1 │ 2 2022-03-18T10:54:00
2 │ 3 2022-03-18T13:53:00
3 │ 6 2022-03-19T11:50:00
4 │ 7 2022-03-19T15:49:00
Is there an efficient way to achieve this without having to split the Dataframe up into different days and can it be done by indexing? Would something in this format work how would one enter the appropriate DateTime criteria?
D[(D.DateTime.>= Start) .& (D.DateTime.<=Stop) , : ]
Any insights would be greatly appreciated, Thanks!
I was able to get it to work with something like this but am leaving the post up in case this is not the best solution or if it is helpful to others.
D[(hour.(D.DateTime).>= 9) .& (hour.(D.DateTime).<=16) , : ]
1 Like
This is mostly how you could write it. An alternative would be:
D[@. 9 <= hour(D.DateTime) <= 17, : ]
or
filter(:DateTime => x-> 9 <= hour(x) <= 17, D)
2 Likes
Thanks! this is amazingly helpful. Would you mind clarifying what the "@. " is doing in the code ? Is that another way of referencing the Dataframe? Also I have noticed my solution doesn’t work very well in the sense that if I wanted the cutoff time to be 5:00 pm I would actually have to place the cut off at hour(D.DateTime)<= 16
because otherwise I would retain all the datapoints that occurred 5:01, 5:32 etc. However the problem then becomes I loose all datapoints that occur at exactly 5:00pm. Is there a better way so that I could include those datapoints?
This is just an equivalent of 9 .<= hour.(D.DateTime) .<= 17
to avoid typing .
three times.
As for the second question use:
Time(9) .<= Time.(D.DateTime) .<= Time(17)
2 Likes
how to proceed!!!
julia> D[@. 9 <= hour(D.DateTime) <= 17, : ]
ERROR: MethodError: no method matching getindex(::DataFrame, ::Tuple{BitVector, Colon})
Closest candidates are:
getindex(::AbstractDataFrame, ::CartesianIndex{2}) at C:\Users\Hermesr\.julia\packages\DataFrames\6xBiG\src\other\broa
dcasting.jl:3
getindex(::AbstractDataFrame, ::Integer, ::Colon) at C:\Users\Hermesr\.julia\packages\DataFrames\6xBiG\src\dataframero
w\dataframerow.jl:210
getindex(::AbstractDataFrame, ::Integer, ::Union{Colon, Regex, AbstractVector, All, Between, Cols, InvertedIndex}) at
C:\Users\Hermesr\.julia\packages\DataFrames\6xBiG\src\dataframerow\dataframerow.jl:208
...
Stacktrace:
[1] top-level scope
@ REPL[32]:100:
#########
julia> D[[@. 9 <= hour(D.datetime) <= 17], : ]
ERROR: ArgumentError: invalid index: BitVector[[0, 1, 1, 0, 0, 1, 1, 0]] of type Vector{BitVector}
Stacktrace:
[1] to_index(I::Vector{BitVector})
@ Base .\indices.jl:297
[2] to_index(A::Vector{Int64}, i::Vector{BitVector})
@ Base .\indices.jl:277
[3] to_indices
@ .\indices.jl:333 [inlined]
[4] to_indices
@ .\indices.jl:325 [inlined]
[5] getindex(A::Vector{Int64}, I::Vector{BitVector})
@ Base .\abstractarray.jl:1218
[6] _threaded_getindex(selected_rows::Vector{BitVector}, selected_columns::UnitRange{Int64}, df_columns::Vector{Abstrac
tVector}, idx::DataFrames.Index)
@ DataFrames C:\Users\Hermesr\.julia\packages\DataFrames\6xBiG\src\dataframe\dataframe.jl:543
[7] getindex(df::DataFrame, row_inds::Vector{BitVector}, #unused#::Colon)
@ DataFrames C:\Users\Hermesr\.julia\packages\DataFrames\6xBiG\src\dataframe\dataframe.jl:586
[8] top-level scope
@ REPL[34]:1
julia>
1 Like
Try with brackets:
D[(@. 9 <= hour(D.DateTime) <= 17), :]
2 Likes
or
D[@.(9 <= hour(D.DateTime) <= 17), :]
I made an error with @.
scope.
1 Like
just for future reference the initial solution was not ideal because if you wanted the cutoff time to be 5:00 PM, using hour.(D.DateTime).<= 16
would exclude all data at exactly 5:00 pm while using hour.(D.DateTime). <=17
would include say 5:59, etc. The solution as stated in bkamins’ comments below is summarized here:
D[@.(Time(9) <= Time(D.DateTime) <=Time(17), :]
for anyone also new to Julia it might be helpful to read through bkamins comments to understand what the code is doing.
added missing parenthresis:
D[@.(Time(9) <= Time(D.DateTime) <= Time(17)), :]
1 Like