DataFrameRow Row Number

If you have a data frame, and want to iterate over each row? How do you access the row number?

data = DataFrame(a = [1, 2, 3], b = ['a', 'b', 'c']) 
r = collect(eachrow(data))[1]

The row number is obviously stored but the normal way of accessing fields does not work

> fieldnames(typeof(r))
(:df, :row)
> r.row
ERROR: KeyError: key :row not found
Stacktrace:
 [1] getindex at ./dict.jl:478 [inlined]
 [2] getindex at /home/paul/.julia/packages/DataFrames/3sRhW/src/other/index.jl:137 [inlined]
 [3] getindex(::DataFrame, ::Int64, ::Symbol) at /home/paul/.julia/packages/DataFrames/3sRhW/src/dataframe/dataframe.jl:265
 [4] getindex at /home/paul/.julia/packages/DataFrames/3sRhW/src/dataframerow/dataframerow.jl:20 [inlined]
 [5] getproperty(::DataFrameRow{DataFrame}, ::Symbol) at /home/paul/.julia/packages/DataFrames/3sRhW/src/dataframerow/dataframerow.jl:37

Looking in the source code, I found two solutions

> data = DataFrame(a = [1, 2, 3], b = ['a', 'b', 'c']) 
> r = collect(eachrow(data))[1]
> DataFrames.row(r)
1
> getfield(r, :row)
1

DataFrames.row is an internal function, so better not use it. I recommend enumerate(eachrow(data)).

2 Likes

rownumber

And to add DataFrames.row should not be used as it gives a different information (number of row in a parent DataFrame) than rownumber (number of row in the data frame from which the DataFrameRow was taken).

2 Likes

example demonstrating the difference

julia> df = DataFrame(a=1:4)
4×1 DataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1
   2 │     2
   3 │     3
   4 │     4
julia> dfv = @view df[[3,2], :]
2×1 SubDataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     3
   2 │     2
julia> dfr = dfv[1, :]
DataFrameRow
 Row │ a
     │ Int64
─────┼───────
   3 │     3
julia> rownumber(dfr)
1
julia> DataFrames.row(dfr)
3
4 Likes