Hi, with a 2-D ndsparse table (with one value column), I am trying to select the max value for each of the 1st dimension of the index, but I am not really sure how to do that (groupby removes the second dimension, which I need to keep). For example, with a table like this:
2-d NDSparse with 8 values (Float64):
a b │
─────┼──────────
1 3 │ 0.630628
2 4 │ 0.521345
3 8 │ 0.816977
In pandas, you can usually filter out duplicates based on one index of a multi-indexed dataframe, but I’m not sure how to closely approximate that here.
Thank you!
Thanks, however, that doesn’t seem to do the selection based on only the first index while preserving whatever the second index is at that point. For example:
Sorry, I misunderstood the question. The way to do what you want would probably be with mapslices, but it’s a bit verbose. I believe something like:
function get_max(dd)
data = values(dd)
n = indmax(data)
NDSparse([keys(dd,2)[n]], [data[n]])
end
mapslices(get_max, df, 2)
should work.
I’m not sure if there’s a simple way to change which columns are index and which are data, though that’s certainly interesting. If you need that functionality, maybe you’re better off with a simple Table
Thanks! I wasn’t aware of mapslices, I will definitely check that out.
Edit: That seems to throw an exception:
julia> mapslices(get_max, df, 2)
ERROR: MethodError: all(::IndexedTables.##195#200, ::Array{Any,1}) is ambiguous. Candidates:
all(f, A::AbstractArray{#s3,N} where N where #s3>:Nulls.Null) in Nulls at /Users/christopheralexander/.julia/v0.6/Nulls/src/Nulls.jl:221
all(f, A::AbstractArray{#s3,N} where N where #s3>:Missings.Missing) in Missings at /Users/christopheralexander/.julia/v0.6/Missings/src/Missings.jl:411
Possible fix, define
all(::Any, ::AbstractArray{#s3,N} where N where #s3>:Union{Missings.Missing, Nulls.Null})