If you have a data frame, and want to iterate over each row? How do you access the row number?
data = DataFrame(a = [1, 2, 3], b = ['a', 'b', 'c'])
r = collect(eachrow(data))[1]
The row number is obviously stored but the normal way of accessing fields does not work
> fieldnames(typeof(r))
(:df, :row)
> r.row
ERROR: KeyError: key :row not found
Stacktrace:
[1] getindex at ./dict.jl:478 [inlined]
[2] getindex at /home/paul/.julia/packages/DataFrames/3sRhW/src/other/index.jl:137 [inlined]
[3] getindex(::DataFrame, ::Int64, ::Symbol) at /home/paul/.julia/packages/DataFrames/3sRhW/src/dataframe/dataframe.jl:265
[4] getindex at /home/paul/.julia/packages/DataFrames/3sRhW/src/dataframerow/dataframerow.jl:20 [inlined]
[5] getproperty(::DataFrameRow{DataFrame}, ::Symbol) at /home/paul/.julia/packages/DataFrames/3sRhW/src/dataframerow/dataframerow.jl:37
Looking in the source code, I found two solutions
> data = DataFrame(a = [1, 2, 3], b = ['a', 'b', 'c'])
> r = collect(eachrow(data))[1]
> DataFrames.row(r)
1
> getfield(r, :row)
1
DataFrames.row
is an internal function, so better not use it. I recommend enumerate(eachrow(data))
.
2 Likes
And to add DataFrames.row
should not be used as it gives a different information (number of row in a parent
DataFrame
) than rownumber
(number of row in the data frame from which the DataFrameRow
was taken).
2 Likes
anandj
April 21, 2021, 2:07am
6
example demonstrating the difference
julia> df = DataFrame(a=1:4)
4×1 DataFrame
Row │ a
│ Int64
─────┼───────
1 │ 1
2 │ 2
3 │ 3
4 │ 4
julia> dfv = @view df[[3,2], :]
2×1 SubDataFrame
Row │ a
│ Int64
─────┼───────
1 │ 3
2 │ 2
julia> dfr = dfv[1, :]
DataFrameRow
Row │ a
│ Int64
─────┼───────
3 │ 3
julia> rownumber(dfr)
1
julia> DataFrames.row(dfr)
3
4 Likes