Side effects (or intended effects) in TidierData?

julia> df4 = DataFrame(x = ["a", "b", "a", "b", "C", "a"], y = 1:6, yz = 13:18, a = [join(rand('a':'z',4)) for _ in 1:6], ab = 12:-1:7)
6Γ—5 DataFrame
 Row β”‚ x       y      yz     a       ab    
     β”‚ String  Int64  Int64  String  Int64
─────┼─────────────────────────────────────
   1 β”‚ a           1     13  coes       12
   2 β”‚ b           2     14  nwoz       11
   3 β”‚ a           3     15  gber       10
   4 β”‚ b           4     16  ompu        9
   5 β”‚ C           5     17  ktgq        8
   6 β”‚ a           6     18  edkt        7

julia> nested_df = @nest(df4, n2 = starts_with("a"), n3 = y:yz)
3Γ—3 DataFrame
 Row β”‚ x       n3             n2            
     β”‚ String  DataFrame      DataFrame
─────┼──────────────────────────────────────
   1 β”‚ a       3Γ—2 DataFrame  3Γ—2 DataFrame
   2 β”‚ b       2Γ—2 DataFrame  2Γ—2 DataFrame
   3 β”‚ C       1Γ—2 DataFrame  1Γ—2 DataFrame

julia> @chain nested_df begin
           @unnest_wider(n3:n2, names_sep = nothing)        
           @unnest_longer(y:ab)
       end
9Γ—5 DataFrame
 Row β”‚ x       y        yz       a     ab      
     β”‚ String  Int64?   Int64?   Any   Int64?
─────┼─────────────────────────────────────────
   1 β”‚ a             1       13  coes       12
   2 β”‚ a             3       15  gber       10
   3 β”‚ a             6       18  edkt        7
   4 β”‚ b             2       14  nwoz       11
   5 β”‚ b             4       16  ompu        9
   6 β”‚ C             5       17  k           8
   7 β”‚ C       missing  missing  t     missing
   8 β”‚ C       missing  missing  g     missing
   9 β”‚ C       missing  missing  q     missing

That looks unfortunate, probably want to open an issue? If you use CategoricalArrays or symbols or something, you get an error that there is no iterate-method for it. The String is happy with it.

I don’t know how to open an issue. I just wanted to point out something that apparently (but I may be wrong) isn’t consistent with the rest of the examples in the manual chapter on @unnest_xyz macros.
If you also think it is a relevant fact, please make the issue yourself.

That is really not difficult and worth to learn. Just go to GitHub Β· Where software is built and click on the button β€œNew Issue”.

Then describe the issue you encountered, preferably including an example.

1 Like

is that okay?

1 Like

Not really. An issue should contain:

  • the code you executed
  • the output of executing that code
  • the expected output

and usually also the output of the command:

versioninfo()

Sometimes a link to discourse can be useful, but an issue should not only contain the link to a discussion on discourse.

thank you for pointing this out. this is not intended.

it seemed to be an issue when unnesting wider before longer… which was not intended.

I have now fixed it in the β€œunnest_wider_edgecase” branch.

julia> nested_df = @nest(df4, n2 = starts_with("a"), n3 = y:yz)
3Γ—3 DataFrame
 Row β”‚ x       n3             n2            
     β”‚ String  DataFrame      DataFrame     
─────┼──────────────────────────────────────
   1 β”‚ a       3Γ—2 DataFrame  3Γ—2 DataFrame 
   2 β”‚ b       2Γ—2 DataFrame  2Γ—2 DataFrame 
   3 β”‚ C       1Γ—2 DataFrame  1Γ—2 DataFrame 

julia> @chain nested_df begin
                  @unnest_wider(n3:n2, names_sep = nothing)        
                  @unnest_longer(y:ab)
              end
6Γ—5 DataFrame
 Row β”‚ x       y      yz     a       ab    
     β”‚ String  Int64  Int64  String  Int64 
─────┼─────────────────────────────────────
   1 β”‚ a           1     13  yzyb       12
   2 β”‚ a           3     15  tijm       10
   3 β”‚ a           6     18  dxcd        7
   4 β”‚ b           2     14  ijkj       11
   5 β”‚ b           4     16  gavn        9
   6 β”‚ C           5     17  zvtt        8

julia> @chain nested_df begin
                  @unnest_longer(n3:n2)
                  @unnest_wider(n3:n2, names_sep = nothing)        
              end
6Γ—5 DataFrame
 Row β”‚ x       yz     y      a       ab    
     β”‚ String  Int64  Int64  String  Int64 
─────┼─────────────────────────────────────
   1 β”‚ a          13      1  yzyb       12
   2 β”‚ a          15      3  tijm       10
   3 β”‚ a          18      6  dxcd        7
   4 β”‚ b          14      2  ijkj       11
   5 β”‚ b          16      4  gavn        9
   6 β”‚ C          17      5  zvtt        8
4 Likes

Is it possible to update to the fixed version?
How?

You can add the specific branch by doing

(@v1.11) pkg> add TidierData#unnest_wider_edgecase
2 Likes

I updated following the instructions
I don’t want to overwhelm your attention (I also confess that none of the following cases are of practical interest to me, but it’s just pure curiosity), and if you don’t find it useful to dwell on these observations, I’d understand.
Now suppose we have the following DataFrame.
The results of the following flattenings are clear to me for the first case but not for the last two.


julia> df4
6Γ—6 DataFrame
 Row β”‚ x       y      yz     a               b       ab     
     β”‚ String  Int64  Int64  Any             String  Any
─────┼──────────────────────────────────────────────────────
   1 β”‚ a           1     13  ["AAA", "BBB"]  ttyu    12
   2 β”‚ b           2     14  quk             ojx     11
   3 β”‚ a           3     15  sfgdb           ufo     10
   4 β”‚ b           4     16  upku            pgafaq  9
   5 β”‚ C           5     17  dhmvk           yggc    8
   6 β”‚ a           6     18  hfz             zpfun   [7, 7]

julia> @unnest_longer( df4,[ab])
7Γ—6 DataFrame
 Row β”‚ x       y      yz     a               b       ab    
     β”‚ String  Int64  Int64  Any             String  Int64
─────┼─────────────────────────────────────────────────────
   1 β”‚ a           1     13  ["AAA", "BBB"]  ttyu       12
   2 β”‚ b           2     14  quk             ojx        11
   3 β”‚ a           3     15  sfgdb           ufo        10
   4 β”‚ b           4     16  upku            pgafaq      9
   5 β”‚ C           5     17  dhmvk           yggc        8
   6 β”‚ a           6     18  hfz             zpfun       7
   7 β”‚ a           6     18  hfz             zpfun       7

julia> @unnest_longer( df4,a)
22Γ—6 DataFrame
 Row β”‚ x       y      yz     a    b       ab     
     β”‚ String  Int64  Int64  Any  String  Any
─────┼───────────────────────────────────────────
   1 β”‚ a           1     13  AAA  ttyu    12
   2 β”‚ a           1     13  BBB  ttyu    12
   3 β”‚ b           2     14  q    ojx     11
   4 β”‚ b           2     14  u    ojx     11
   5 β”‚ b           2     14  k    ojx     11
   6 β”‚ a           3     15  s    ufo     10
   7 β”‚ a           3     15  f    ufo     10
   8 β”‚ a           3     15  g    ufo     10
   9 β”‚ a           3     15  d    ufo     10
  10 β”‚ a           3     15  b    ufo     10
  11 β”‚ b           4     16  u    pgafaq  9
  12 β”‚ b           4     16  p    pgafaq  9
  13 β”‚ b           4     16  k    pgafaq  9
  14 β”‚ b           4     16  u    pgafaq  9
  15 β”‚ C           5     17  d    yggc    8
  16 β”‚ C           5     17  h    yggc    8
  17 β”‚ C           5     17  m    yggc    8
  18 β”‚ C           5     17  v    yggc    8
  19 β”‚ C           5     17  k    yggc    8
  20 β”‚ a           6     18  h    zpfun   [7, 7]
  21 β”‚ a           6     18  f    zpfun   [7, 7]
  22 β”‚ a           6     18  z    zpfun   [7, 7]

julia> @unnest_longer( df4,a,ab)
22Γ—6 DataFrame
 Row β”‚ x       y      yz     a    b       ab      
     β”‚ String  Int64  Int64  Any  String  Int64?
─────┼────────────────────────────────────────────
   1 β”‚ a           1     13  AAA  ttyu         12
   2 β”‚ a           1     13  BBB  ttyu    missing
   3 β”‚ b           2     14  q    ojx          11
   4 β”‚ b           2     14  u    ojx     missing
   5 β”‚ b           2     14  k    ojx     missing
   6 β”‚ a           3     15  s    ufo          10
   7 β”‚ a           3     15  f    ufo     missing
   8 β”‚ a           3     15  g    ufo     missing
   9 β”‚ a           3     15  d    ufo     missing
  10 β”‚ a           3     15  b    ufo     missing
  11 β”‚ b           4     16  u    pgafaq        9
  12 β”‚ b           4     16  p    pgafaq  missing
  13 β”‚ b           4     16  k    pgafaq  missing
  14 β”‚ b           4     16  u    pgafaq  missing
  15 β”‚ C           5     17  d    yggc          8
  16 β”‚ C           5     17  h    yggc    missing
  17 β”‚ C           5     17  m    yggc    missing
  18 β”‚ C           5     17  v    yggc    missing
  19 β”‚ C           5     17  k    yggc    missing
  20 β”‚ a           6     18  h    zpfun         7
  21 β”‚ a           6     18  f    zpfun         7
  22 β”‚ a           6     18  z    zpfun   missing
1 Like

this was an edge case that slipped by me in testing actually. thank you for pointing it out. It is now fixed in that same branch if you re download it. I have compared it to the R implementation as well to confirm they match

julia> df4 = DataFrame(
           x = ["a", "b", "a", "b", "C", "a"],
           y = [1, 2, 3, 4, 5, 6],
           yz = [13, 14, 15, 16, 17, 18],
           a = [ ["AAA", "BBB"], "quk", "sfgdb", "upku", "dhmvk", "hfz" ],
           b = ["ttyu", "ojx", "ufo", "pgafaq", "yggc", "zpfun"],
           ab = [12, 11, 10, 9, 8, [7, 7]]
       );

julia> @unnest_longer( df4,a)
7Γ—6 DataFrame
 Row β”‚ x       y      yz     a       b       ab     
     β”‚ String  Int64  Int64  String  String  Any    
─────┼──────────────────────────────────────────────
   1 β”‚ a           1     13  AAA     ttyu    12
   2 β”‚ a           1     13  BBB     ttyu    12
   3 β”‚ b           2     14  quk     ojx     11
   4 β”‚ a           3     15  sfgdb   ufo     10
   5 β”‚ b           4     16  upku    pgafaq  9
   6 β”‚ C           5     17  dhmvk   yggc    8
   7 β”‚ a           6     18  hfz     zpfun   [7, 7]

julia> @unnest_longer(df4, a, ab)
8Γ—6 DataFrame
 Row β”‚ x       y      yz     a       b       ab    
     β”‚ String  Int64  Int64  String  String  Int64 
─────┼─────────────────────────────────────────────
   1 β”‚ a           1     13  AAA     ttyu       12
   2 β”‚ a           1     13  BBB     ttyu       12
   3 β”‚ b           2     14  quk     ojx        11
   4 β”‚ a           3     15  sfgdb   ufo        10
   5 β”‚ b           4     16  upku    pgafaq      9
   6 β”‚ C           5     17  dhmvk   yggc        8
   7 β”‚ a           6     18  hfz     zpfun       7
   8 β”‚ a           6     18  hfz     zpfun       7
1 Like

This is a typo maybe? Based on this commit, perhaps you meant to say β€œIt is now fixed” there?

2 Likes