Expand row to multiple rows

rocco_sprmnt21 · December 15, 2020, 1:26pm

I have this data:

      df=DataFrame(name=["aaa","bbbb","ccc"],
                    frqty=["apple\\4\\melon\\5","plume\\12","graphes\\7\\mango\\9\\persic\\11"])

the goal is to get a group of rows for each name and two columns with the fruit name and quantity.

like this:


6×3 DataFrame
 Row │ name    fr         qty       
     │ String  SubStrin…  SubStrin… 
─────┼──────────────────────────────
   1 │ aaa     apple      4
   2 │ aaa     melon      5
   3 │ bbbb    plume      12
   4 │ ccc     graphes    7
   5 │ ccc     mango      9
   6 │ ccc     persic     11

I have found a solution but I would like to see (and learn) different ways of achieving the result.
I also need some clarification on the solution I found.

             function odds(x,car)
                   split(x,car)[1:2:end]
             end

            function evens(x,car)
               split(x,car)[2:2:end]
           end

        transform!( df,:frqty=>(x->(zip(odds.(x,"\\"),evens.(x,"\\"))))=>[:fr,:qty])
       select!(df,Not(:frqty))
      ddff=vcat([DataFrame(:name=>repeat([r.name],length(r.fr)), :fr=>r.fr,:qty=>r.qty) for r in eachrow(df)]...)

mthelm85 · December 15, 2020, 1:45pm

Are you reading this data from some source file? If so, how are you handling that part? If it was me, I would prefer to deal with the \\ delimiter when reading the file rather than storing it like this in a DataFrame and having to sort it out afterwards. Parsing the data appropriately at the point of ingestion should also eliminate the issue you now have with your fruit and quantity columns being SubStrings.

jules · December 15, 2020, 2:22pm

How about this:

fruitinfo(s) = map(Iterators.partition(split(s, "\\"), 2)) do (f, n)
    (fruit = f, number = parse(Int, n))
end

combine(
    groupby(df, :name),
    :frqty => (fs -> reduce(vcat, fruitinfo.(fs))) => AsTable)

rocco_sprmnt21 · December 15, 2020, 2:38pm

I say this:

now I look at it well and try to understand how it works.

tanks!

rocco_sprmnt21 · December 15, 2020, 2:43pm

Hi.
it’s not a real problem. I’m just trying to do exercises to get used to the Julian environment.

jules · December 15, 2020, 3:07pm

Ah actually the second part could just be

combine(
    groupby(df, :name),
    :frqty => fruitinfo ∘ only => AsTable)

The problem is that you need combine if you want to do a transformation that results in more output rows than input rows, while transform needs to return the same number.
But then you want to run your function on every row, so you group by the variable name. Then you know you will receive a vector with only one element as the input in each row, so you extract that element with only (so it errors if your assumption is wrong) and then do fruitinfo on that one value. You get a list of named tuples back, and the sink argument => AsTable directly converts that into two new columns.

pdeffebach · December 15, 2020, 4:52pm

You want flatten, I think.

julia> df=DataFrame(name=["aaa","bbbb","ccc"],
                           frqty=["apple\\4\\melon\\5","plume\\12","graphes\\7\\mango\\9\\persic\\11"])
3×2 DataFrame
 Row │ name    frqty                            
     │ String  String                           
─────┼──────────────────────────────────────────
   1 │ aaa     apple\\4\\melon\\5
   2 │ bbbb    plume\\12
   3 │ ccc     graphes\\7\\mango\\9\\persic\\11

julia> df.x = split.(df.frqty, "\\")
3-element Array{Array{SubString{String},1},1}:
 ["apple", "4", "melon", "5"]
 ["plume", "12"]
 ["graphes", "7", "mango", "9", "persic", "11"]

julia> flatten(df, :x)
12×3 DataFrame
 Row │ name    frqty                             x         
     │ String  String                            SubStrin… 
─────┼─────────────────────────────────────────────────────
   1 │ aaa     apple\\4\\melon\\5                apple
   2 │ aaa     apple\\4\\melon\\5                4
   3 │ aaa     apple\\4\\melon\\5                melon
   4 │ aaa     apple\\4\\melon\\5                5
   5 │ bbbb    plume\\12                         plume
   6 │ bbbb    plume\\12                         12
   7 │ ccc     graphes\\7\\mango\\9\\persic\\11  graphes
   8 │ ccc     graphes\\7\\mango\\9\\persic\\11  7
   9 │ ccc     graphes\\7\\mango\\9\\persic\\11  mango
  10 │ ccc     graphes\\7\\mango\\9\\persic\\11  9
  11 │ ccc     graphes\\7\\mango\\9\\persic\\11  persic
  12 │ ccc     graphes\\7\\mango\\9\\persic\\11  11

rocco_sprmnt21 · December 15, 2020, 5:10pm

Thank you very much for explaining how things work.

I hadn’t read the details of the combine function description yet. I think that in this example you have produced one will find a lot of the possibilities of this function and you have saved me a mountain of time studying it.

Things seem also work like this:

           combine(groupby(df, :name),:frqty => (fs -> fruitinfo(fs[1])) => AsTable)

or this:

           combine(groupby(df, :name), :frqty => fruitinfo ∘ first => AsTable)

One of the questions I wanted to have about my naive solution was how to convert the string format of the qty column to numeric.
your solution solves this problem at the root.

A further lesson is that of the use of the do … end construct which I have to see because I did not know it.
Wouldn’t a “simple” map (itr, func) suffice in this case?

rocco_sprmnt21 · December 15, 2020, 5:22pm

Flattening lists was an idea I had thought of, but I didn’t know how to put it into practice.
This is Columbus’s egg!!!

julia> select!(df,Not(:frqty))
3×3 DataFrame
 Row │ name    fr                                 qty
     │ String  Array…                             Array…
─────┼──────────────────────────────────────────────────────────────────────────────
   1 │ aaa     SubString{String}["apple", "melo…  SubString{String}["4", "5"]
   2 │ bbbb    SubString{String}["plume"]         SubString{String}["12"]
   3 │ ccc     SubString{String}["graphes", "ma…  SubString{String}["7", "9", "11"]

julia> flatten(df, [:fr,:qty])
6×3 DataFrame
 Row │ name    fr         qty
     │ String  SubStrin…  SubStrin…
─────┼──────────────────────────────
   1 │ aaa     apple      4
   2 │ aaa     melon      5
   3 │ bbbb    plume      12
   4 │ ccc     graphes    7
   5 │ ccc     mango      9
   6 │ ccc     persic     11

Now I can ask the two questions I wanted to ask.
How do you convert the qty column to integers?
And why does the transform function also put the type in front of the new column values?

       transform!( df,:frqty=>(x->(zip(odds.(x,"\\"),evens.(x,"\\"))))=>[:fr,:qty])

pdeffebach · December 15, 2020, 5:31pm

You have to use Parse, or tryparse (be sure to convert to missing if tryparse falls back to nothing
Those are just for printing. So you can see the types of your columns, they aren’t the names of the columns.

rocco_sprmnt21 · December 15, 2020, 5:48pm

for 2. I am referring to the text circled in red

for 1.


julia> ddff=vcat([DataFrame(:name=>repeat([r.name],length(r.fr)), :fr=>r.fr,:qty=>parse.(Int,r.qty)) for r in eachrow(df)]...)
6×3 DataFrame
 Row │ name    fr         qty   
     │ String  SubStrin…  Int64 
─────┼──────────────────────────
   1 │ aaa     apple          4
   2 │ aaa     melon          5
   3 │ bbbb    plume         12
   4 │ ccc     graphes        7
   5 │ ccc     mango          9
   6 │ ccc     persic        11

Henrique_Becker · December 15, 2020, 5:50pm

Yes, this is how Vectors of SubString{String} are printed (in certain conditions). In fact, Vectors in general are printed as TypeValidForAllElements[element1, element2, ...].

Topic		Replies	Views
Split dataframe row into multiple rows Data dataframes	8	1795	May 1, 2022
Split and flatten a field from CSV and use as DataFrame New to Julia dataframes	5	725	May 27, 2020
Separating a column into a variable number of possible columns Data dataframes	22	1408	October 3, 2021
Split string to multiple columns New to Julia	10	3566	August 1, 2019
Split Column in Dataframe New to Julia dataframes	5	2192	February 4, 2022

Expand row to multiple rows

Related topics