I have the following structure:
- A
DataFrame
containing parameters and results from numeric simulations. - A
Vector{Dict}
object. EachDict
contains keys equivalent to some of the columns of theDataFrame
and the corresponding values. - A mapping of sort that assigns some
result
to each case in the list ofDicts
. This mapping is hard-coded, so I manually assign a value for each case
MWE:
using DataFrames
df = DataFrame(:a => [1,2,3,1], :b => [5,6,7,8], :c =>[9,10,11,9])
cases = [Dict(:a => 1, :c => 9), Dict(:a => 3, :c => 11)]
mapping = Dict(case => value for (case,value) in zip(cases,["result1","result2"]))
What I would like to have in the end is:
df_final = DataFrame(:a => [1,2,3,1], :b => [5,6,7,8], :c =>[9,10,11,9], :result => ["result1", missing,"result2","result1"])
4Γ4 DataFrame
Row β a b c value
β Int64 Int64 Int64 String?
ββββββΌββββββββββββββββββββββββββββββ
1 β 1 5 9 result1
2 β 2 6 10 missing
3 β 3 7 11 result2
4 β 1 8 9 result1
So essentially I would like to loop through the list of cases, find all rows of df
where all parameters are identical to the current case (this will be several rows) and then set the value of the result
column accordingly. It would be nice if I wouldnβt have to explicitly code down all of the parameters to check for.
A very ugly hack to achieve this would be
for (case,value) in mapping
@view(df[vec(all(hcat((df[!,k] .== case[k] for k in keys(case))...),dims=2)),:]).value .= value
end
but I would much rather get there with DataFrames
or DataFramesMeta
syntax.
I could also try to create a GroupedDataFrame
first based on the keys from the case Dict
:
gdf = groupby(df, [keys(cases[1])...])
But then I still have to map each group to its corresponding Dict
/result
, if it exists, which is kind of the wrong way round of the loop.
Does anyone have advice on how to best achieve this?