Hi all,
I am pretty new to using DataFrames (but loving it so far), however, I could not find an obvious way to achieve the following transformation in the docs.
Suppose I have a dataframe df
below:
julia> using DataFrames
julia> df = DataFrame(A=[1,2,3,4], B=["a; b", missing, "b", "a; c"])
4×2 DataFrame
Row │ A B
│ Int64 String?
─────┼────────────────
1 │ 1 a; b
2 │ 2 missing
3 │ 3 b
4 │ 4 a; c
and I want to convert it into:
julia> df2 = DataFrame(A=[1,1,2,3,4,4], B=["a", "b", missing, "b", "a","c"])
6×2 DataFrame
Row │ A B
│ Int64 String?
─────┼────────────────
1 │ 1 a
2 │ 1 b
3 │ 2 missing
4 │ 3 b
5 │ 4 a
6 │ 4 c
What is the best way to achieve this? I tried:
transform(
df,
:B =>
ByRow(
x ->
ismissing(x) ? missing :
string.(split(x, ";")),
) => :C,
)
But this just creates this:
Row │ A B C
│ Int64 String? Array…?
─────┼─────────────────────────────
1 │ 1 a; b ["a", " b"]
2 │ 2 missing missing
3 │ 3 b ["b"]
4 │ 4 a; c ["a", " c"]
Which is not quite right. Does anybody have any ideas?