I have the following structure:
- A
DataFramecontaining parameters and results from numeric simulations. - A
Vector{Dict}object. EachDictcontains keys equivalent to some of the columns of theDataFrameand the corresponding values. - A mapping of sort that assigns some
resultto each case in the list ofDicts. This mapping is hard-coded, so I manually assign a value for each case
MWE:
using DataFrames
df = DataFrame(:a => [1,2,3,1], :b => [5,6,7,8], :c =>[9,10,11,9])
cases = [Dict(:a => 1, :c => 9), Dict(:a => 3, :c => 11)]
mapping = Dict(case => value for (case,value) in zip(cases,["result1","result2"]))
What I would like to have in the end is:
df_final = DataFrame(:a => [1,2,3,1], :b => [5,6,7,8], :c =>[9,10,11,9], :result => ["result1", missing,"result2","result1"])
4Γ4 DataFrame
Row β a b c value
β Int64 Int64 Int64 String?
ββββββΌββββββββββββββββββββββββββββββ
1 β 1 5 9 result1
2 β 2 6 10 missing
3 β 3 7 11 result2
4 β 1 8 9 result1
So essentially I would like to loop through the list of cases, find all rows of df where all parameters are identical to the current case (this will be several rows) and then set the value of the result column accordingly. It would be nice if I wouldnβt have to explicitly code down all of the parameters to check for.
A very ugly hack to achieve this would be
for (case,value) in mapping
@view(df[vec(all(hcat((df[!,k] .== case[k] for k in keys(case))...),dims=2)),:]).value .= value
end
but I would much rather get there with DataFrames or DataFramesMeta syntax.
I could also try to create a GroupedDataFrame first based on the keys from the case Dict:
gdf = groupby(df, [keys(cases[1])...])
But then I still have to map each group to its corresponding Dict/result, if it exists, which is kind of the wrong way round of the loop.
Does anyone have advice on how to best achieve this?