# How to correctly create ''slices'' of a column by using indexes of another one, in a DataFrame

Hi all, I have a question regarding how to correctly solve this:

I have a `DataFrame` that has the following columns:

``````df.i1 = [[1,2,3,4,5,1,2],[1,2,3,4,5,6,8,10,11],[1,2,3,4,5]]
df.i2 = [[4,2,1],[3,3,3],[1,2,2]]
``````

And I need to generate three new columns, that have the values from `df.i1` using the indexes from `df.i2`, so the result should be something like this: following `df.i1`, the first slice for the first item should go from 1 to 4, the following one from `4+1` to `4+2`, and the last one from `4+2+1` to `4+2+1` (which is also the total length of the sequence, as `df.i2` is generated by using the `rle` function in another column)

``````df.v1 = [[1,2,3,4],[1,2,3],[1]]

df.v2 = [[5,1],[4,5,6],[2,3]]

df.v3 = [[2],[8,10,11],[4,5]]
``````

So far I tried using something like this:

``````for i in eachrow(df)

mi = i.i[1]

df_a.v1 = [x[1:mi] for x in i.zeros]

end

``````

That can be extended to the other two cases, but I am getting the following error:

``````MethodError: no method matching getindex(::Int64, ::UnitRange{Int64})
``````

Any help is welcome, I suppose that I am on the right track by looping over the rows of the df and trying to generate the slices that way, but I might be wrong.

Thanks!
Cheers.

Iβm not sure I understand your description, and Iβm on my phone so I canβt check if this works, but maybe

``````transform!(df,
(:i1,:i2) => ((i1, i2) -> [i1[1:i] for i in i2])  => :v3
)
``````

Even if this isnβt quite the solution, maybe it will help you come up with an answer β `transform!()` is for adding new columns to a dataframe.

The second line does the following steps.

1. Selects input columns
2. Applies an anonymous function to each row of the input columns
3. Saves the output to the final vector name.

Note: You canβt use `:v3` in subsequent transformations within the same `transform!()` call. You need to call it again on `df` if you want to use `:v3` as an input in addition transformations.

Is this what you want?

``````julia> df
3Γ2 DataFrame
Row β i1                             i2
β Arrayβ¦                         Arrayβ¦
ββββββΌββββββββββββββββββββββββββββββββββββββββββ
1 β [1, 2, 3, 4, 5, 1, 2]          [4, 2, 1]
2 β [1, 2, 3, 4, 5, 6, 8, 10, 11]  [3, 3, 3]
3 β [1, 2, 3, 4, 5]                [1, 2, 2]

julia> function cutv(x,idx)
ends = cumsum(idx)
starts = [1; ends[1:end-1] .+ 1]
return [x[s:e] for (s,e) in zip(starts, ends)]
end
cutv (generic function with 1 method)

julia> function trans(xs, idxs)
newvs = [cutv(xs[i], idxs[i]) for i in eachindex(xs, idxs)]
return [[v[i] for v in newvs] for i in eachindex(newvs...)]
end
trans (generic function with 1 method)

julia> df.v1, df.v2, df.v3 = trans(df.i1, df.i2);

julia> df
3Γ5 DataFrame
Row β i1                             i2         v1            v2         v3
β Arrayβ¦                         Arrayβ¦     Arrayβ¦        Arrayβ¦     Arrayβ¦
ββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β [1, 2, 3, 4, 5, 1, 2]          [4, 2, 1]  [1, 2, 3, 4]  [5, 1]     [2]
2 β [1, 2, 3, 4, 5, 6, 8, 10, 11]  [3, 3, 3]  [1, 2, 3]     [4, 5, 6]  [8, 10, 11]
3 β [1, 2, 3, 4, 5]                [1, 2, 2]  [1]           [2, 3]     [4, 5]
``````
2 Likes

Yes!! Both functions worked perfectly, thanks. I will try to get my head around both functions, as Iβm not really sure I understand the second one.

Thanks again!
Juan

You mean in my solution.
The process is:

1. in `cutv` we cut individual `i1` vectors using the `i2` information.
2. in `trans` we first create `newvs` that cuts all vectors in `i1`, next we rearrange vector of vectors the way you want.

Another version of `cutv` function which takes iterators instead of vectors.

Perhaps some other tinkerer can make it more compact/Julian/faster:

``````julia> cutitr(itr1,itr2) = last(
foldl(
(s,t)->(
(r1, e1) = foldl( (x,y)->(
(e,r) = Iterators.peel(x[1]) ;
(r, push!(x[2],e))
), 1:t ; init=(s[1], Int[]) ) ;
(r1, push!(s[2], e1))
), itr2 ; init=(itr1,Vector{Int}[])
)
)
cutitr (generic function with 1 method)

julia> cutitr(i1,i2)
3-element Vector{Vector{Int64}}:
[1, 2, 3, 4]
[5, 1]
[2]

julia> cutitr(i1,i2) == cutv(i1,i2)
true
``````

(it takes about 1.4x slower than vector version on my machine)

2 Likes

Nice. I did not optimize my code for speed/allocations but for being easy to understand (but appreciate the βneed for speedβ).

1 Like

Nice!! I can tell that this one es faster, only because the code took me a while to understand haha. If my data becomes bigger, I will switch to this version. Thanks!!
Best,