How to correctly create ''slices'' of a column by using indexes of another one, in a DataFrame

Hi all, I have a question regarding how to correctly solve this:

I have a DataFrame that has the following columns:

df.i1 = [[1,2,3,4,5,1,2],[1,2,3,4,5,6,8,10,11],[1,2,3,4,5]]
df.i2 = [[4,2,1],[3,3,3],[1,2,2]]

And I need to generate three new columns, that have the values from df.i1 using the indexes from df.i2, so the result should be something like this: following df.i1, the first slice for the first item should go from 1 to 4, the following one from 4+1 to 4+2, and the last one from 4+2+1 to 4+2+1 (which is also the total length of the sequence, as df.i2 is generated by using the rle function in another column)

df.v1 = [[1,2,3,4],[1,2,3],[1]]

df.v2 = [[5,1],[4,5,6],[2,3]]

df.v3 = [[2],[8,10,11],[4,5]]

So far I tried using something like this:

for i in eachrow(df)

		mi = i.i[1]

		df_a.v1 = [x[1:mi] for x in i.zeros]
		
end

That can be extended to the other two cases, but I am getting the following error:

MethodError: no method matching getindex(::Int64, ::UnitRange{Int64})

Any help is welcome, I suppose that I am on the right track by looping over the rows of the df and trying to generate the slices that way, but I might be wrong.

Thanks!
Cheers.

I’m not sure I understand your description, and I’m on my phone so I can’t check if this works, but maybe

transform!(df, 
    (:i1,:i2) => ((i1, i2) -> [i1[1:i] for i in i2])  => :v3
)

Even if this isn’t quite the solution, maybe it will help you come up with an answer – transform!() is for adding new columns to a dataframe.

The second line does the following steps.

  1. Selects input columns
  2. Applies an anonymous function to each row of the input columns
  3. Saves the output to the final vector name.

Note: You can’t use :v3 in subsequent transformations within the same transform!() call. You need to call it again on df if you want to use :v3 as an input in addition transformations.

Is this what you want?

julia> df
3Γ—2 DataFrame
 Row β”‚ i1                             i2
     β”‚ Array…                         Array…
─────┼──────────────────────────────────────────
   1 β”‚ [1, 2, 3, 4, 5, 1, 2]          [4, 2, 1]
   2 β”‚ [1, 2, 3, 4, 5, 6, 8, 10, 11]  [3, 3, 3]
   3 β”‚ [1, 2, 3, 4, 5]                [1, 2, 2]

julia> function cutv(x,idx)
           ends = cumsum(idx)
           starts = [1; ends[1:end-1] .+ 1]
           return [x[s:e] for (s,e) in zip(starts, ends)]
       end
cutv (generic function with 1 method)

julia> function trans(xs, idxs)
           newvs = [cutv(xs[i], idxs[i]) for i in eachindex(xs, idxs)]
           return [[v[i] for v in newvs] for i in eachindex(newvs...)]
       end
trans (generic function with 1 method)

julia> df.v1, df.v2, df.v3 = trans(df.i1, df.i2);

julia> df
3Γ—5 DataFrame
 Row β”‚ i1                             i2         v1            v2         v3
     β”‚ Array…                         Array…     Array…        Array…     Array…
─────┼────────────────────────────────────────────────────────────────────────────────
   1 β”‚ [1, 2, 3, 4, 5, 1, 2]          [4, 2, 1]  [1, 2, 3, 4]  [5, 1]     [2]
   2 β”‚ [1, 2, 3, 4, 5, 6, 8, 10, 11]  [3, 3, 3]  [1, 2, 3]     [4, 5, 6]  [8, 10, 11]
   3 β”‚ [1, 2, 3, 4, 5]                [1, 2, 2]  [1]           [2, 3]     [4, 5]
2 Likes

Yes!! Both functions worked perfectly, thanks. I will try to get my head around both functions, as I’m not really sure I understand the second one.

Thanks again!
Juan

You mean in my solution.
The process is:

  1. in cutv we cut individual i1 vectors using the i2 information.
  2. in trans we first create newvs that cuts all vectors in i1, next we rearrange vector of vectors the way you want.

Another version of cutv function which takes iterators instead of vectors.

Perhaps some other tinkerer can make it more compact/Julian/faster:

julia> cutitr(itr1,itr2) = last(
         foldl(
           (s,t)->(
             (r1, e1) = foldl( (x,y)->( 
               (e,r) = Iterators.peel(x[1]) ; 
               (r, push!(x[2],e))
             ), 1:t ; init=(s[1], Int[]) ) ; 
             (r1, push!(s[2], e1)) 
           ), itr2 ; init=(itr1,Vector{Int}[])
         )
       )
cutitr (generic function with 1 method)

julia> cutitr(i1,i2)
3-element Vector{Vector{Int64}}:
 [1, 2, 3, 4]
 [5, 1]
 [2]

julia> cutitr(i1,i2) == cutv(i1,i2)
true

(it takes about 1.4x slower than vector version on my machine)

2 Likes

Nice. I did not optimize my code for speed/allocations but for being easy to understand :smile: (but appreciate the β€œneed for speed”).

1 Like

Nice!! I can tell that this one es faster, only because the code took me a while to understand haha. If my data becomes bigger, I will switch to this version. Thanks!!
Best,