Creating one column from another

Suppose I have a dataframe column (or a vector) with a sequence of numbers ranging from 1 to 4 (for example). Suppose also that a second column or vector must be constructed in the following way:

The new column has a 3, say, in a particular row because that is the new number the first vector changes to after having been 4, say. So, for example:

Col1: [4, 4, 4, 3, 3, 1, 2, 2, 2]

would result in

Col2: [3, 3, 3, 1, 1, 2, NaN, NaN, NaN]

Not that the last three items are NaN because there is no information about what comes after 2.

My question is simply, what code in Julia finds Col2 in an efficient t way?

1 Like

This should work

julia> using DataFramesMeta, ShiftedArrays;

julia> df = DataFrame(Col1 = [4, 4, 4, 3, 3, 1, 2, 2, 2]);

julia> df2 = @chain df begin
           unique(:Col1)
           @transform :Col2 = lead(:Col1)
       end
4Γ—2 DataFrame
 Row β”‚ Col1   Col2    
     β”‚ Int64  Int64?  
─────┼────────────────
   1 β”‚     4        3
   2 β”‚     3        1
   3 β”‚     1        2
   4 β”‚     2  missing 

julia> leftjoin(df, df2, on = "Col1")
9Γ—2 DataFrame
 Row β”‚ Col1   Col2    
     β”‚ Int64  Int64?  
─────┼────────────────
   1 β”‚     4        3
   2 β”‚     4        3
   3 β”‚     4        3
   4 β”‚     3        1
   5 β”‚     3        1
   6 β”‚     1        2
   7 β”‚     2  missing 
   8 β”‚     2  missing 
   9 β”‚     2  missing 
3 Likes

Here is my attempt for a vector:

function f(c1)

        c2 = zeros(length(c1))

        c2[end] = prev = NaN

        for i in reverse(eachindex(c1[begin:end-1]))
                if c1[i] == c1[i+1]
                        c2[i] = prev
                else
                        c2[i] = prev = c1[i+1]
                end
        end

        return c2

end

c1 = [4, 4, 4, 3, 3, 1, 2, 2, 2]
f(c1) |> println

# [3.0, 3.0, 3.0, 1.0, 1.0, 2.0, NaN, NaN, NaN]

Great solution! It assumes that values won’t be repeated after a streak has ended (e.g., [4, 4, 4, 3, 3, 1, 2, 2, 2, 1, 2, 3, 4]), but that should be safe to do, depending on the context.

True. Good catch.

The most straightforward solution, can be less efficient for very long streaks of equal values:

julia> col1 = [4., 4, 4, 3, 3, 1, 2, 2, 2]

julia> col2 = map(enumerate(col1)) do (i, x)
           ix = findnext(!=(x), col1, i)
           isnothing(ix) ? NaN : col1[ix]
       end
9-element Vector{Float64}:
   3.0
   3.0
   3.0
   1.0
   1.0
   2.0
 NaN
 NaN
 NaN