How to get row indices and how can I change row indices/row names in a Julia DataFrame?

If I take a Julia DataFrame as below

julia> using DataFrames
julia> df = DataFrame(col1 = ['α', 'β', 'γ'], col2 = ['A', 'B', 'C'])
3×2 DataFrame
│ Row │ col1 │ col2 │
│     │ Char │ Char │
├─────┼──────┼──────┤
│ 1   │ 'α'  │ 'A'  │
│ 2   │ 'β'  │ 'B'  │
│ 3   │ 'γ'  │ 'C'  │

We can get column names using name() function and change the column names using rename!() function also:

julia> names(df)
2-element Array{String,1}:
 "col1"
 "col2"

julia> rename!(df, ["I", "II"])
3×2 DataFrame
│ Row │ I    │ II   │
│     │ Char │ Char │
├─────┼──────┼──────┤
│ 1   │ 'α'  │ 'A'  │
│ 2   │ 'β'  │ 'B'  │
│ 3   │ 'γ'  │ 'C'  │

But how to get row names/indices from this DataFrame df and how to change the row indices?

You cannot. DataFrames, unlike pandas, has no concept of row indices. You can obviously change the columns "I" and "II", and you can kind of treat those as indices via findfirst, but there isn’t really a one-to-one translation to pandas in that way.

2 Likes

Thank you @pdeffebach for this info. :slight_smile:
As of now we can notice DataFrames package is released with its 1.0 version. Will it never have any method to show/change rownames as we see in R/Pandas?

Maybe! Not in the near future since it would be a fairly major addition to the API.

GroupedDataFrames are also highly performant and allow indexing (via a Tuple or NamedTuple) of the grouping indices. That’s one way to emulate the behavior already.

1 Like

Will it never have any method to show/change rownames as we see in R/Pandas?

In what use cases do you need this functionality? I am open to discuss adding it, but so far we were not given a use case that cannot be easily handled by what we provide.

2 Likes

I was looking at the TimeArray ohlc from MarketData package, which I can change into DataFrame:

julia> using MarketData, DataFrames

julia> ta = ohlc

julia> df = DataFrame(ta)
500×5 DataFrame
│ Row │ timestamp  │ Open    │ High    │ Low     │ Close   │
│     │ Date       │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼────────────┼─────────┼─────────┼─────────┼─────────┤
│ 1   │ 2000-01-03 │ 104.88  │ 112.5   │ 101.69  │ 111.94  │
⋮
│ 499 │ 2001-12-28 │ 21.97   │ 23.0    │ 21.96   │ 22.43   │
│ 500 │ 2001-12-31 │ 22.51   │ 22.66   │ 21.83   │ 21.9    │

Now, I wanted to take that Date column as row-names/row-indices, as we manipulate time series indices in Pandas. I just became curious, how to get/change rownames in Julia DataFrame?

I used Pandas quite a bit, but found row indices rather annoying than useful - a row index is in the end just another column (with some special semantics in Pandas). Same for MultiIndex.
I think it is good that they do not exist in DataFrames.jl.

2 Likes

I think it is good that they do not exist in DataFrames.jl.

I respect Pandas design as I know that a lot of thought was given to make it good (and it is super successful), but what you say is exactly what other Pandas users were telling me.

Now, I wanted to take that Date column as row-names/row-indices

As @lungben commented - you can just keep it as a column and all will work. If you really want a fast lookup on this column then use gdf = groupby(df, :timestamp) and you can do gdf[(your_date,)] to get a subset of data frame matching this date fast.

Note that the fact that lookup is fast not guaranteed by Pandas (it is sometimes fast but not always - it depends on the data).

6 Likes