if I have a DataFrame and want to get unique
columns but treat each column as a separate entity, what is the best way to do it?
MATLAB has a convenience argument unique(A(:,1:2),'rows')
(yuk!) … but I can’t find an equivalent way to do it in Julia
Note that the original poster on Slack cannot see your response here on Discourse. Consider transcribing the appropriate answer back to Slack, or pinging the poster here on Discourse so they can follow this thread.
(Original message ) (More Info)
unique
in DataFrames works by row. You could do
julia> df = DataFrame(a = [1, 2], b = [3, 4], c = [1, 2]);
julia> unique_cols = unique(i -> i[2], pairs(eachcol(df)))
2-element Array{Pair{Symbol,AbstractArray{T,1} where T},1}:
:a => [1, 2]
:b => [3, 4]
julia> DataFrame(unique_cols)
2×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 3
2 │ 2 4
unique
has a version which applies a function to each element of the iterator and finds unique inputs for after the function has been applied.
pairs(eachcol(df))
is an iterator of colname => vector
pairs.
You apply the function i -> i[2]
to each of these colname => vector
pairs because you only want to compare the vectors. Obviously each column has a unique name.
you are left with a vector of pairs. The DataFrame
constructor can work with a vector of colname => vector
pairs. And you are done
1 Like