Expand row to multiple rows

I have this data:

      df=DataFrame(name=["aaa","bbbb","ccc"],
                    frqty=["apple\\4\\melon\\5","plume\\12","graphes\\7\\mango\\9\\persic\\11"])

the goal is to get a group of rows for each name and two columns with the fruit name and quantity.

like this:


6×3 DataFrame
 Row │ name    fr         qty       
     │ String  SubStrin…  SubStrin… 
─────┼──────────────────────────────
   1 │ aaa     apple      4
   2 │ aaa     melon      5
   3 │ bbbb    plume      12
   4 │ ccc     graphes    7
   5 │ ccc     mango      9
   6 │ ccc     persic     11

I have found a solution but I would like to see (and learn) different ways of achieving the result.
I also need some clarification on the solution I found.

             function odds(x,car)
                   split(x,car)[1:2:end]
             end

            function evens(x,car)
               split(x,car)[2:2:end]
           end

        transform!( df,:frqty=>(x->(zip(odds.(x,"\\"),evens.(x,"\\"))))=>[:fr,:qty])
       select!(df,Not(:frqty))
      ddff=vcat([DataFrame(:name=>repeat([r.name],length(r.fr)), :fr=>r.fr,:qty=>r.qty) for r in eachrow(df)]...)

Are you reading this data from some source file? If so, how are you handling that part? If it was me, I would prefer to deal with the \\ delimiter when reading the file rather than storing it like this in a DataFrame and having to sort it out afterwards. Parsing the data appropriately at the point of ingestion should also eliminate the issue you now have with your fruit and quantity columns being SubStrings.

1 Like

How about this:

fruitinfo(s) = map(Iterators.partition(split(s, "\\"), 2)) do (f, n)
    (fruit = f, number = parse(Int, n))
end

combine(
    groupby(df, :name),
    :frqty => (fs -> reduce(vcat, fruitinfo.(fs))) => AsTable)
1 Like

I say this: :clap: :clap:

now I look at it well and try to understand how it works.

tanks!

Hi.
it’s not a real problem. I’m just trying to do exercises to get used to the Julian environment.

Ah actually the second part could just be

combine(
    groupby(df, :name),
    :frqty => fruitinfo ∘ only => AsTable)

The problem is that you need combine if you want to do a transformation that results in more output rows than input rows, while transform needs to return the same number.
But then you want to run your function on every row, so you group by the variable name. Then you know you will receive a vector with only one element as the input in each row, so you extract that element with only (so it errors if your assumption is wrong) and then do fruitinfo on that one value. You get a list of named tuples back, and the sink argument => AsTable directly converts that into two new columns.

1 Like

You want flatten, I think.

julia> df=DataFrame(name=["aaa","bbbb","ccc"],
                           frqty=["apple\\4\\melon\\5","plume\\12","graphes\\7\\mango\\9\\persic\\11"])
3×2 DataFrame
 Row │ name    frqty                            
     │ String  String                           
─────┼──────────────────────────────────────────
   1 │ aaa     apple\\4\\melon\\5
   2 │ bbbb    plume\\12
   3 │ ccc     graphes\\7\\mango\\9\\persic\\11

julia> df.x = split.(df.frqty, "\\")
3-element Array{Array{SubString{String},1},1}:
 ["apple", "4", "melon", "5"]
 ["plume", "12"]
 ["graphes", "7", "mango", "9", "persic", "11"]

julia> flatten(df, :x)
12×3 DataFrame
 Row │ name    frqty                             x         
     │ String  String                            SubStrin… 
─────┼─────────────────────────────────────────────────────
   1 │ aaa     apple\\4\\melon\\5                apple
   2 │ aaa     apple\\4\\melon\\5                4
   3 │ aaa     apple\\4\\melon\\5                melon
   4 │ aaa     apple\\4\\melon\\5                5
   5 │ bbbb    plume\\12                         plume
   6 │ bbbb    plume\\12                         12
   7 │ ccc     graphes\\7\\mango\\9\\persic\\11  graphes
   8 │ ccc     graphes\\7\\mango\\9\\persic\\11  7
   9 │ ccc     graphes\\7\\mango\\9\\persic\\11  mango
  10 │ ccc     graphes\\7\\mango\\9\\persic\\11  9
  11 │ ccc     graphes\\7\\mango\\9\\persic\\11  persic
  12 │ ccc     graphes\\7\\mango\\9\\persic\\11  11
1 Like

Thank you very much for explaining how things work.

I hadn’t read the details of the combine function description yet. I think that in this example you have produced one will find a lot of the possibilities of this function and you have saved me a mountain of time studying it.

Things seem also work like this:

           combine(groupby(df, :name),:frqty => (fs -> fruitinfo(fs[1])) => AsTable)

or this:

           combine(groupby(df, :name), :frqty => fruitinfo ∘ first => AsTable)

One of the questions I wanted to have about my naive solution was how to convert the string format of the qty column to numeric.
your solution solves this problem at the root.

A further lesson is that of the use of the do … end construct which I have to see because I did not know it.
Wouldn’t a “simple” map (itr, func) suffice in this case?

Flattening lists was an idea I had thought of, but I didn’t know how to put it into practice.
This is Columbus’s egg!!!

julia> select!(df,Not(:frqty))
3×3 DataFrame
 Row │ name    fr                                 qty
     │ String  Array…                             Array…
─────┼──────────────────────────────────────────────────────────────────────────────
   1 │ aaa     SubString{String}["apple", "melo…  SubString{String}["4", "5"]
   2 │ bbbb    SubString{String}["plume"]         SubString{String}["12"]
   3 │ ccc     SubString{String}["graphes", "ma…  SubString{String}["7", "9", "11"]

julia> flatten(df, [:fr,:qty])
6×3 DataFrame
 Row │ name    fr         qty
     │ String  SubStrin…  SubStrin…
─────┼──────────────────────────────
   1 │ aaa     apple      4
   2 │ aaa     melon      5
   3 │ bbbb    plume      12
   4 │ ccc     graphes    7
   5 │ ccc     mango      9
   6 │ ccc     persic     11

Now I can ask the two questions I wanted to ask.
How do you convert the qty column to integers?
And why does the transform function also put the type in front of the new column values?

       transform!( df,:frqty=>(x->(zip(odds.(x,"\\"),evens.(x,"\\"))))=>[:fr,:qty])
  1. You have to use Parse, or tryparse (be sure to convert to missing if tryparse falls back to nothing
  2. Those are just for printing. So you can see the types of your columns, they aren’t the names of the columns.

for 2. I am referring to the text circled in red

for 1.


julia> ddff=vcat([DataFrame(:name=>repeat([r.name],length(r.fr)), :fr=>r.fr,:qty=>parse.(Int,r.qty)) for r in eachrow(df)]...)
6×3 DataFrame
 Row │ name    fr         qty   
     │ String  SubStrin…  Int64 
─────┼──────────────────────────
   1 │ aaa     apple          4
   2 │ aaa     melon          5
   3 │ bbbb    plume         12
   4 │ ccc     graphes        7
   5 │ ccc     mango          9
   6 │ ccc     persic        11

Yes, this is how Vectors of SubString{String} are printed (in certain conditions). In fact, Vectors in general are printed as TypeValidForAllElements[element1, element2, ...].

1 Like