If I take a Julia DataFrame as below
julia> using DataFrames
julia> df = DataFrame(col1 = ['α', 'β', 'γ'], col2 = ['A', 'B', 'C'])
3×2 DataFrame
│ Row │ col1 │ col2 │
│ │ Char │ Char │
├─────┼──────┼──────┤
│ 1 │ 'α' │ 'A' │
│ 2 │ 'β' │ 'B' │
│ 3 │ 'γ' │ 'C' │
We can get column names using name()
function and change the column names using rename!()
function also:
julia> names(df)
2-element Array{String,1}:
"col1"
"col2"
julia> rename!(df, ["I", "II"])
3×2 DataFrame
│ Row │ I │ II │
│ │ Char │ Char │
├─────┼──────┼──────┤
│ 1 │ 'α' │ 'A' │
│ 2 │ 'β' │ 'B' │
│ 3 │ 'γ' │ 'C' │
But how to get row names/indices from this DataFrame df
and how to change the row indices?
1 Like
You cannot. DataFrames, unlike pandas, has no concept of row indices. You can obviously change the columns "I"
and "II"
, and you can kind of treat those as indices via findfirst
, but there isn’t really a one-to-one translation to pandas in that way.
2 Likes
Thank you @pdeffebach for this info.
As of now we can notice DataFrames package is released with its 1.0 version. Will it never have any method to show/change rownames as we see in R/Pandas?
Maybe! Not in the near future since it would be a fairly major addition to the API.
GroupedDataFrame
s are also highly performant and allow indexing (via a Tuple
or NamedTuple
) of the grouping indices. That’s one way to emulate the behavior already.
1 Like
Will it never have any method to show/change rownames as we see in R/Pandas?
In what use cases do you need this functionality? I am open to discuss adding it, but so far we were not given a use case that cannot be easily handled by what we provide.
4 Likes
I was looking at the TimeArray ohlc
from MarketData
package, which I can change into DataFrame:
julia> using MarketData, DataFrames
julia> ta = ohlc
julia> df = DataFrame(ta)
500×5 DataFrame
│ Row │ timestamp │ Open │ High │ Low │ Close │
│ │ Date │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼────────────┼─────────┼─────────┼─────────┼─────────┤
│ 1 │ 2000-01-03 │ 104.88 │ 112.5 │ 101.69 │ 111.94 │
⋮
│ 499 │ 2001-12-28 │ 21.97 │ 23.0 │ 21.96 │ 22.43 │
│ 500 │ 2001-12-31 │ 22.51 │ 22.66 │ 21.83 │ 21.9 │
Now, I wanted to take that Date column as row-names/row-indices, as we manipulate time series indices in Pandas
. I just became curious, how to get/change rownames in Julia DataFrame?
I used Pandas quite a bit, but found row indices rather annoying than useful - a row index is in the end just another column (with some special semantics in Pandas). Same for MultiIndex.
I think it is good that they do not exist in DataFrames.jl.
3 Likes
I think it is good that they do not exist in DataFrames.jl.
I respect Pandas design as I know that a lot of thought was given to make it good (and it is super successful), but what you say is exactly what other Pandas users were telling me.
Now, I wanted to take that Date column as row-names/row-indices
As @lungben commented - you can just keep it as a column and all will work. If you really want a fast lookup on this column then use gdf = groupby(df, :timestamp)
and you can do gdf[(your_date,)]
to get a subset of data frame matching this date fast.
Note that the fact that lookup is fast not guaranteed by Pandas (it is sometimes fast but not always - it depends on the data).
7 Likes