Strategy for recoding

I have a number of similar (but slightly different variables) that I need to recode in order to harmonize over several different years. The numbers correspond to an occupational scheme, so they have no inherent meaning other then identification.

The following code does what I want, but it becomes very verbose as I need slight variations for each variable (x in this case). Does anyone have suggestions on how I could do this in a better/less verbose (but still clear) way?

df = DataFrame(x1 = vcat(1:15, missing),
               x2 = [11, 12, 21, 35, 36, 45, 46, 56, 57, 58, 59, 71, 72, 87, 88, 99])

       transform!(df, :x1 => ByRow(passmissing(x ->
              x == 1       ? 11 :
              x == 2       ? 22 :
              x == 3       ? 33 :
              x == 4       ? 36 :
              x == 5       ? 45 :
              x == 6       ? 56 :
              x ∈ [9, 10, 11]  ? 71 :
              x ∈ [7, 8]    ? 79 :
              Integer(x))) => :x1)

       transform!(df, :x2 => ByRow(passmissing(x ->
              x ∈ [11, 12] ? 11 :
              x ∈ [21, 22] ? 22 :
              x ∈ [35, 36] ? 36 :
              x ∈ [45, 46] ? 45 :
              x ∈ [56, 57, 58, 59, 60] ? 56 :
              x ∈ [71, 72, 73, 74] ? 71 :
              x ∈ [86, 87, 88, 89] ? 79 :
              x == 99 ? missing :
              Integer(x))) => :x2)

Thanks!

I tend to keep these mappings seperate in a dictionary, and then get the values I need like this:

julia> codes = Dict(1 => 11, 2 => 22, 3 => 33, 4 => 36, 5 => 45, 6 => 56, ([9, 10, 11] .=> 71)..., ([7, 8] .=> 79)...)
Dict{Int64, Int64} with 11 entries:
  5  => 45
  7  => 79
  8  => 79
  1  => 11
  4  => 36
  6  => 56
  2  => 22
  10 => 71
  11 => 71
  9  => 71
  3  => 33

julia> get.((codes, ), df.x1, missing)
16-element Vector{Union{Missing, Int64}}:
 11
 22
 33
 36
 45
 56
 79
 79
 71
 71
 71
   missing
   missing
   missing
   missing
   missing
4 Likes

It’s not clear to me if you’re looking for some pattern that can better synthesize the sequence of transforms or whatever.
Something like this achieves the same result as the example.
If there are no “regularities” in the transforms you use, your best bet is what Nilshg suggests

transform!(df, :x1 => ByRow(passmissing(x ->
            x ∈ [1,2,3] ? 11x :   
            x ∈ [4,5] ? 9x :      
              x == 6 ? 56 :
              x ∈ [9, 10, 11]  ? 71 :
              x ∈ [7, 8]    ? 79 :
              Integer(x))) => :x1)

Thanks for the answer! Why is the splat needed to make this work?

Because we want to turn [7, 8] .=> 79 into multiple arguments to the Dict constructor, in order to mix “normal” arguments like 1 => 11 with the vector arguments that use broadcasting. Consider:

julia> Dict([7, 8] .=> 79)
Dict{Int64, Int64} with 2 entries:
  7 => 79
  8 => 79

julia> Dict(5, [7, 8] .=> 79)
ERROR: MethodError: no method matching Dict(::Int64, ::Vector{Pair{Int64, Int64}})
1 Like