Impute by index

julia> using DataFrames

julia> df_missing = DataFrame(id = [5,2,1,4,6,8], val = [1, missing, 3,8,2,missing])
6×2 DataFrame
│ Row │ id    │ val     │
│     │ Int64 │ Int64?  │
├─────┼───────┼─────────┤
│ 1   │ 5     │ 1       │
│ 2   │ 2     │ missing │
│ 3   │ 1     │ 3       │
│ 4   │ 4     │ 8       │
│ 5   │ 6     │ 2       │
│ 6   │ 8     │ missing │

julia> df_completion = DataFrame(id = [2, 8], val = [5, 13])
2×2 DataFrame
│ Row │ id    │ val   │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 2     │ 5     │
│ 2   │ 8     │ 13    │

julia> df_missing[in(df_completion.id).(df_missing.id), :val] = df_completion.val
2-element Array{Int64,1}:
  5
 13

julia> df_missing
6×2 DataFrame
│ Row │ id    │ val    │
│     │ Int64 │ Int64? │
├─────┼───────┼────────┤
│ 1   │ 5     │ 1      │
│ 2   │ 2     │ 5      │
│ 3   │ 1     │ 3      │
│ 4   │ 4     │ 8      │
│ 5   │ 6     │ 2      │
│ 6   │ 8     │ 13     │

I see you’re saying “without creating a loop” - note that you absolutely do not have to avoid writing loops in Julia, as Julia loops are not slow like in R/Python but fast like in C. In many instances a loop turns out to be the cleanest, most readable and fastest way of implementing an algorithm in Julia.

2 Likes