DataFrame inplace change columns order

Is it possible to change index of two columns by each other in place?

I know than I can use something df[!, [:b, :a]], but I can have many columns… so is any efficient way to change columns with function like:

function colreorder!(df, :a => :b)
    # Something
end

select! might help?

I think this is a good need and something I’ve often wanted in DataFrames.jl. For example, if the following methods existed, they would maybe solve your problem:

julia> movecolumn(df, name::Union{Symbol, String}, i::Int) = begin end;

julia> movecolumn(df, name::Union{Symbol, String}; after::Union{Symbol, String, Int}, before::Union{Symbol, String, Int} = nothing) = begin end;

This can be done as follows:

julia> using Permutations

julia> df = DataFrame(x=1, a=2, y=3, b=4, z=5)
1×5 DataFrame
 Row │ x      a      y      b      z
     │ Int64  Int64  Int64  Int64  Int64
─────┼───────────────────────────────────
   1 │     1      2      3      4      5

julia> select(df, collect(Transposition(ncol(df), columnindex(df, :a), columnindex(df, :b))))
1×5 DataFrame
 Row │ x      b      y      a      z
     │ Int64  Int64  Int64  Int64  Int64
─────┼───────────────────────────────────
   1 │     1      4      3      2      5

Clearly this is not super convenient and requires extra package.

In fact, we have an internal implementation of efficient permutation handling using cycle notation, it is just not exposed.

Could you open an issue with a description of a functionality that would be requested? (in particular I am not sure if what @pdeffebach is needed or rather a support for specifying cycles in a general permutation?)

The other way to do it is:

julia> using Permutations

julia> df = DataFrame(x=1, a=2, y=3, b=4, z=5)
1×5 DataFrame
 Row │ x      a      y      b      z
     │ Int64  Int64  Int64  Int64  Int64
─────┼───────────────────────────────────
   1 │     1      2      3      4      5

julia> select(df, collect(Transposition(ncol(df), columnindex(df, :a), columnindex(df, :b))))
1×5 DataFrame
 Row │ x      b      y      a      z
     │ Int64  Int64  Int64  Int64  Int64
─────┼───────────────────────────────────
   1 │     1      4      3      2      5

it is better but still boilerplate code is needed.

It work in situations like this:

select!(df, 2,1,:)

now I make it this way:

function swap!(df, pair::Pair{Int, Int})
    range = 1:size(df,2)
    if !(pair[1] in range) || !(pair[2] in range) error("Inds not in range") end
    inds = collect(1:size(df,2))
    @inbounds inds[pair[1]], inds[pair[2]] = inds[pair[2]], inds[pair[1]]
    select!(df, inds)
end

Thank you! In this case Permutations needs and this is very unintuitive.

Ah - I pasted wrong output in the second solution:

julia> df = DataFrame(x=1, a=2, y=3, b=4, z=5)
1×5 DataFrame
 Row │ x      a      y      b      z
     │ Int64  Int64  Int64  Int64  Int64
─────┼───────────────────────────────────
   1 │     1      2      3      4      5

julia> rename!(transform(df, :a => :b, :b => :a), :a => :b, :b => :a)
1×5 DataFrame
 Row │ x      b      y      a      z
     │ Int64  Int64  Int64  Int64  Int64
─────┼───────────────────────────────────
   1 │     1      4      3      2      5

if the data types of the involved columns are compatible, a schema of this type should be used

function dfswapcols(df, col1, col2)
    
    df[:,col1], df[:,col2] = df[:,col2], df[:,col1]
    rename!(df, names(df)[col1]=>names(df)[col2], names(df)[col2]=>names(df)[col1])

end

Wouldn’t something like this work?


dfswapcols(df, c1::String, c2::String) = select!(df, replace(names(df), c1 => c2, c2 => c1))

dfswapcols(df, c1::Symbol, c2::Symbol) = dfswapcols(df, String(c1), String(c2))

dfswapcols(df, c1::Int64, c2::Int64) = select!(df, replace(collect(1:ncol(df)), c1 => c2, c2 => c1))

Credit rocco_sprmnt21 for the function name

for example like this, if you have a dataframe with 5 columns and you want to swap 4 and 5 or permute in a more generic way (may be not in a random way)

select!(df, permute!(names(df), [1,2,3,5,4]))


using Random
select!(df, permute!(names(df), shuffle(1:5)))

You probably don’t even need to permute! the names since select! accepts a vector of integers as an argument
select!(df, [1,2,3,5,4]) works as well

I propose this way to get some clarification on a possible implementation (even if it doesn’t make sense to do it, for some reason).

nt=Tables.columntable(df)
perm=[3,2,4,5,1]
DataFrame(NamedTuple{Tuple(permute!(collect(keys(nt)),perm ))}(nt))

I would like to know if and how to use a generated function similar to what @aplavin did here

@generated sortkeys(nt::NamedTuple{KS}) where {KS} =
    :( NamedTuple{$(Tuple(sort(collect(KS))))}(nt) )

I don’t want to offer this as a solution to the problem, but it’s just a snippet so I can ask a specific question.
It would be possible and would make sense to implement a function of the requested type

function color order!(df, old_order .=> new_order)
     #Something
end

internally, along the following lines?


julia> using DataFrames

julia> m='A':'C'
'A':1:'C'

julia> sc=10:10:30
10:10:30

julia> lt=1:3
1:3

julia> df=DataFrame(;m,sc,lt)
3×3 DataFrame
 Row │ m     sc     lt    
     │ Char  Int64  Int64
─────┼────────────────────
   1 │ A        10      1
   2 │ B        20      2
   3 │ C        30      3

julia> DataFrames.index(df).lookup[:m]=2
2

julia> DataFrames.index(df).lookup[:sc]=1
1

julia> DataFrames.index(df).names.=[:sc,:m,:lt]
3-element Vector{Symbol}:
 :sc
 :m
 :lt

julia> df
3×3 DataFrame
 Row │ sc    m      lt    
     │ Char  Int64  Int64
─────┼────────────────────
   1 │ A        10      1
   2 │ B        20      2
   3 │ C        30      3