If I have a DataFrame and want to get `unique` columns but treat each column as a

if I have a DataFrame and want to get unique columns but treat each column as a separate entity, what is the best way to do it?
MATLAB has a convenience argument unique(A(:,1:2),'rows') (yuk!) … but I can’t find an equivalent way to do it in Julia

Note that the original poster on Slack cannot see your response here on Discourse. Consider transcribing the appropriate answer back to Slack, or pinging the poster here on Discourse so they can follow this thread.
(Original message :slack:) (More Info)

unique in DataFrames works by row. You could do


julia> df = DataFrame(a = [1, 2], b = [3, 4], c = [1, 2]);

julia> unique_cols = unique(i -> i[2], pairs(eachcol(df)))
2-element Array{Pair{Symbol,AbstractArray{T,1} where T},1}:
 :a => [1, 2]
 :b => [3, 4]

julia> DataFrame(unique_cols)
2×2 DataFrame
 Row │ a      b     
     │ Int64  Int64 
─────┼──────────────
   1 │     1      3
   2 │     2      4

unique has a version which applies a function to each element of the iterator and finds unique inputs for after the function has been applied.

pairs(eachcol(df)) is an iterator of colname => vector pairs.

You apply the function i -> i[2] to each of these colname => vector pairs because you only want to compare the vectors. Obviously each column has a unique name.

you are left with a vector of pairs. The DataFrame constructor can work with a vector of colname => vector pairs. And you are done

1 Like