julia> df4 = DataFrame(x = ["a", "b", "a", "b", "C", "a"], y = 1:6, yz = 13:18, a = [join(rand('a':'z',4)) for _ in 1:6], ab = 12:-1:7)
6Γ5 DataFrame
Row β x y yz a ab
β String Int64 Int64 String Int64
ββββββΌβββββββββββββββββββββββββββββββββββββ
1 β a 1 13 coes 12
2 β b 2 14 nwoz 11
3 β a 3 15 gber 10
4 β b 4 16 ompu 9
5 β C 5 17 ktgq 8
6 β a 6 18 edkt 7
julia> nested_df = @nest(df4, n2 = starts_with("a"), n3 = y:yz)
3Γ3 DataFrame
Row β x n3 n2
β String DataFrame DataFrame
ββββββΌββββββββββββββββββββββββββββββββββββββ
1 β a 3Γ2 DataFrame 3Γ2 DataFrame
2 β b 2Γ2 DataFrame 2Γ2 DataFrame
3 β C 1Γ2 DataFrame 1Γ2 DataFrame
julia> @chain nested_df begin
@unnest_wider(n3:n2, names_sep = nothing)
@unnest_longer(y:ab)
end
9Γ5 DataFrame
Row β x y yz a ab
β String Int64? Int64? Any Int64?
ββββββΌβββββββββββββββββββββββββββββββββββββββββ
1 β a 1 13 coes 12
2 β a 3 15 gber 10
3 β a 6 18 edkt 7
4 β b 2 14 nwoz 11
5 β b 4 16 ompu 9
6 β C 5 17 k 8
7 β C missing missing t missing
8 β C missing missing g missing
9 β C missing missing q missing
That looks unfortunate, probably want to open an issue? If you use CategoricalArrays or symbols or something, you get an error that there is no iterate-method for it. The String is happy with it.
I donβt know how to open an issue. I just wanted to point out something that apparently (but I may be wrong) isnβt consistent with the rest of the examples in the manual chapter on @unnest_xyz macros.
If you also think it is a relevant fact, please make the issue yourself.
That is really not difficult and worth to learn. Just go to GitHub Β· Where software is built and click on the button βNew Issueβ.
Then describe the issue you encountered, preferably including an example.
Not really. An issue should contain:
- the code you executed
- the output of executing that code
- the expected output
and usually also the output of the command:
versioninfo()
Sometimes a link to discourse can be useful, but an issue should not only contain the link to a discussion on discourse.
thank you for pointing this out. this is not intended.
it seemed to be an issue when unnesting wider before longer⦠which was not intended.
I have now fixed it in the βunnest_wider_edgecaseβ branch.
julia> nested_df = @nest(df4, n2 = starts_with("a"), n3 = y:yz)
3Γ3 DataFrame
Row β x n3 n2
β String DataFrame DataFrame
ββββββΌββββββββββββββββββββββββββββββββββββββ
1 β a 3Γ2 DataFrame 3Γ2 DataFrame
2 β b 2Γ2 DataFrame 2Γ2 DataFrame
3 β C 1Γ2 DataFrame 1Γ2 DataFrame
julia> @chain nested_df begin
@unnest_wider(n3:n2, names_sep = nothing)
@unnest_longer(y:ab)
end
6Γ5 DataFrame
Row β x y yz a ab
β String Int64 Int64 String Int64
ββββββΌβββββββββββββββββββββββββββββββββββββ
1 β a 1 13 yzyb 12
2 β a 3 15 tijm 10
3 β a 6 18 dxcd 7
4 β b 2 14 ijkj 11
5 β b 4 16 gavn 9
6 β C 5 17 zvtt 8
julia> @chain nested_df begin
@unnest_longer(n3:n2)
@unnest_wider(n3:n2, names_sep = nothing)
end
6Γ5 DataFrame
Row β x yz y a ab
β String Int64 Int64 String Int64
ββββββΌβββββββββββββββββββββββββββββββββββββ
1 β a 13 1 yzyb 12
2 β a 15 3 tijm 10
3 β a 18 6 dxcd 7
4 β b 14 2 ijkj 11
5 β b 16 4 gavn 9
6 β C 17 5 zvtt 8
Is it possible to update to the fixed version?
How?
You can add the specific branch by doing
(@v1.11) pkg> add TidierData#unnest_wider_edgecase
I updated following the instructions
I donβt want to overwhelm your attention (I also confess that none of the following cases are of practical interest to me, but itβs just pure curiosity), and if you donβt find it useful to dwell on these observations, Iβd understand.
Now suppose we have the following DataFrame.
The results of the following flattenings are clear to me for the first case but not for the last two.
julia> df4
6Γ6 DataFrame
Row β x y yz a b ab
β String Int64 Int64 Any String Any
ββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β a 1 13 ["AAA", "BBB"] ttyu 12
2 β b 2 14 quk ojx 11
3 β a 3 15 sfgdb ufo 10
4 β b 4 16 upku pgafaq 9
5 β C 5 17 dhmvk yggc 8
6 β a 6 18 hfz zpfun [7, 7]
julia> @unnest_longer( df4,[ab])
7Γ6 DataFrame
Row β x y yz a b ab
β String Int64 Int64 Any String Int64
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β a 1 13 ["AAA", "BBB"] ttyu 12
2 β b 2 14 quk ojx 11
3 β a 3 15 sfgdb ufo 10
4 β b 4 16 upku pgafaq 9
5 β C 5 17 dhmvk yggc 8
6 β a 6 18 hfz zpfun 7
7 β a 6 18 hfz zpfun 7
julia> @unnest_longer( df4,a)
22Γ6 DataFrame
Row β x y yz a b ab
β String Int64 Int64 Any String Any
ββββββΌβββββββββββββββββββββββββββββββββββββββββββ
1 β a 1 13 AAA ttyu 12
2 β a 1 13 BBB ttyu 12
3 β b 2 14 q ojx 11
4 β b 2 14 u ojx 11
5 β b 2 14 k ojx 11
6 β a 3 15 s ufo 10
7 β a 3 15 f ufo 10
8 β a 3 15 g ufo 10
9 β a 3 15 d ufo 10
10 β a 3 15 b ufo 10
11 β b 4 16 u pgafaq 9
12 β b 4 16 p pgafaq 9
13 β b 4 16 k pgafaq 9
14 β b 4 16 u pgafaq 9
15 β C 5 17 d yggc 8
16 β C 5 17 h yggc 8
17 β C 5 17 m yggc 8
18 β C 5 17 v yggc 8
19 β C 5 17 k yggc 8
20 β a 6 18 h zpfun [7, 7]
21 β a 6 18 f zpfun [7, 7]
22 β a 6 18 z zpfun [7, 7]
julia> @unnest_longer( df4,a,ab)
22Γ6 DataFrame
Row β x y yz a b ab
β String Int64 Int64 Any String Int64?
ββββββΌββββββββββββββββββββββββββββββββββββββββββββ
1 β a 1 13 AAA ttyu 12
2 β a 1 13 BBB ttyu missing
3 β b 2 14 q ojx 11
4 β b 2 14 u ojx missing
5 β b 2 14 k ojx missing
6 β a 3 15 s ufo 10
7 β a 3 15 f ufo missing
8 β a 3 15 g ufo missing
9 β a 3 15 d ufo missing
10 β a 3 15 b ufo missing
11 β b 4 16 u pgafaq 9
12 β b 4 16 p pgafaq missing
13 β b 4 16 k pgafaq missing
14 β b 4 16 u pgafaq missing
15 β C 5 17 d yggc 8
16 β C 5 17 h yggc missing
17 β C 5 17 m yggc missing
18 β C 5 17 v yggc missing
19 β C 5 17 k yggc missing
20 β a 6 18 h zpfun 7
21 β a 6 18 f zpfun 7
22 β a 6 18 z zpfun missing
this was an edge case that slipped by me in testing actually. thank you for pointing it out. It is now fixed in that same branch if you re download it. I have compared it to the R implementation as well to confirm they match
julia> df4 = DataFrame(
x = ["a", "b", "a", "b", "C", "a"],
y = [1, 2, 3, 4, 5, 6],
yz = [13, 14, 15, 16, 17, 18],
a = [ ["AAA", "BBB"], "quk", "sfgdb", "upku", "dhmvk", "hfz" ],
b = ["ttyu", "ojx", "ufo", "pgafaq", "yggc", "zpfun"],
ab = [12, 11, 10, 9, 8, [7, 7]]
);
julia> @unnest_longer( df4,a)
7Γ6 DataFrame
Row β x y yz a b ab
β String Int64 Int64 String String Any
ββββββΌββββββββββββββββββββββββββββββββββββββββββββββ
1 β a 1 13 AAA ttyu 12
2 β a 1 13 BBB ttyu 12
3 β b 2 14 quk ojx 11
4 β a 3 15 sfgdb ufo 10
5 β b 4 16 upku pgafaq 9
6 β C 5 17 dhmvk yggc 8
7 β a 6 18 hfz zpfun [7, 7]
julia> @unnest_longer(df4, a, ab)
8Γ6 DataFrame
Row β x y yz a b ab
β String Int64 Int64 String String Int64
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββ
1 β a 1 13 AAA ttyu 12
2 β a 1 13 BBB ttyu 12
3 β b 2 14 quk ojx 11
4 β a 3 15 sfgdb ufo 10
5 β b 4 16 upku pgafaq 9
6 β C 5 17 dhmvk yggc 8
7 β a 6 18 hfz zpfun 7
8 β a 6 18 hfz zpfun 7
This is a typo maybe? Based on this commit, perhaps you meant to say βIt is now fixedβ there?