# Creating one column from another

Suppose I have a dataframe column (or a vector) with a sequence of numbers ranging from 1 to 4 (for example). Suppose also that a second column or vector must be constructed in the following way:

The new column has a 3, say, in a particular row because that is the new number the first vector changes to after having been 4, say. So, for example:

Col1: [4, 4, 4, 3, 3, 1, 2, 2, 2]

would result in

Col2: [3, 3, 3, 1, 1, 2, NaN, NaN, NaN]

Not that the last three items are NaN because there is no information about what comes after 2.

My question is simply, what code in Julia finds Col2 in an efficient t way?

1 Like

This should work

``````julia> using DataFramesMeta, ShiftedArrays;

julia> df = DataFrame(Col1 = [4, 4, 4, 3, 3, 1, 2, 2, 2]);

julia> df2 = @chain df begin
unique(:Col1)
@transform :Col2 = lead(:Col1)
end
4Γ2 DataFrame
Row β Col1   Col2
β Int64  Int64?
ββββββΌββββββββββββββββ
1 β     4        3
2 β     3        1
3 β     1        2
4 β     2  missing

julia> leftjoin(df, df2, on = "Col1")
9Γ2 DataFrame
Row β Col1   Col2
β Int64  Int64?
ββββββΌββββββββββββββββ
1 β     4        3
2 β     4        3
3 β     4        3
4 β     3        1
5 β     3        1
6 β     1        2
7 β     2  missing
8 β     2  missing
9 β     2  missing
``````
3 Likes

Here is my attempt for a vector:

``````function f(c1)

c2 = zeros(length(c1))

c2[end] = prev = NaN

for i in reverse(eachindex(c1[begin:end-1]))
if c1[i] == c1[i+1]
c2[i] = prev
else
c2[i] = prev = c1[i+1]
end
end

return c2

end

c1 = [4, 4, 4, 3, 3, 1, 2, 2, 2]
f(c1) |> println

# [3.0, 3.0, 3.0, 1.0, 1.0, 2.0, NaN, NaN, NaN]
``````

Great solution! It assumes that values wonβt be repeated after a streak has ended (e.g., [4, 4, 4, 3, 3, 1, 2, 2, 2, 1, 2, 3, 4]), but that should be safe to do, depending on the context.

True. Good catch.

The most straightforward solution, can be less efficient for very long streaks of equal values:

``````julia> col1 = [4., 4, 4, 3, 3, 1, 2, 2, 2]

julia> col2 = map(enumerate(col1)) do (i, x)
ix = findnext(!=(x), col1, i)
isnothing(ix) ? NaN : col1[ix]
end
9-element Vector{Float64}:
3.0
3.0
3.0
1.0
1.0
2.0
NaN
NaN
NaN
``````