Help needed regarding making list, and modifying them, inside a DataFrame

Let’s define a helper function first, that creates the output array you want for each row:

julia> function indicate_missings(missings_range, seqs_length)  
         v = zeros(Bool, seqs_length)
         v[first(missings_range):last(missings_range)] .= true
         v
       end
indicate_missings (generic function with 1 method)

This takes a single missings_range element and a single seqs_length value and returns the desired vector for it. You can test it with, for eg.:

julia> indicate_missings([2, 5], 10) |> println
Bool[0, 1, 1, 1, 1, 0, 0, 0, 0, 0]

Then, you can do a transform like this:

julia> transform(df, 
         [:missings_range, :seqs_length] => ByRow(indicate_missings) => :ranges_missing)
6×4 DataFrame
 Row │ pdb_names  missings_range  seqs_length  ranges_missing                    
     │ String     Vector{Int64}   Int64        Vector{Bool}                      
─────┼───────────────────────────────────────────────────────────────────────────
   1 │ C1_1       [2, 5]                   10  Bool[0, 1, 1, 1, 1, 0, 0, 0, 0, …
   2 │ C1_1       [9, 10]                  10  Bool[0, 0, 0, 0, 0, 0, 0, 0, 1, …
   3 │ C1_2       [9, 10]                  10  Bool[0, 0, 0, 0, 0, 0, 0, 0, 1, …
   4 │ C1_3       [1, 4]                    7  Bool[1, 1, 1, 1, 0, 0, 0]
   5 │ C2_1       [1, 4]                    7  Bool[1, 1, 1, 1, 0, 0, 0]
   6 │ C2_2       [1, 4]                    7  Bool[1, 1, 1, 1, 0, 0, 0]

If you have questions about any part of this, please feel free to ask!

4 Likes