Why this dataframes.jl example uses a semicolon at the beginning?

I was reading some examples (from https://bkamins.github.io) about how to use DataFrames.jl to select, filter, combine… and I have a question.

df = DataFrame(
  id = 1:4,
  name = ["Aaron Aardvark", "Belen Barboza",  "Elżbieta Elbląg", "Felipe Fittipaldi"],
  age = [50, 45, 30, 25],
  eye = ["blue", "brown", "blue",  "brown"],
  grade_1 = [95, 90, 95, 90],
  grade_2 = [75, 90, 75, 95],
  grade_3 = [85, 85, 80, 85]
  )

select(df, :name => ByRow(x -> (; ([:firsname, :lastname] .=> split(x))...)))

What I don’t understand is this part:

(; ([:firsname, :lastname] .=> split(x))…

Why are they using a semicolon ; there?
Why do they need to spat … the output?

There is another thread about the semicolon

But I think it’s not the same. In that thread they use it to suppress printing the output.

Maybe in my example it’s related with “positional arguments” for functions, but I don’t understand it here.

1 Like

It’s because the anonymous function creates a named tuple. You can construct a named tuple programatically in Julia with Pairs.

julia> (; ([:a, :b] .=> 5)...)
(a = 5, b = 5)

That’s a pretty opaque piece of code for sure.

4 Likes

Can I achieve the same result without creating a named tuple?

You could put them in separate commands, write things out more explicitely.

If you find this syntax messy there’s another way to write it that is maybe clearer:

select(df, :name => ByRow(x -> NamedTuple([:firsname, :lastname] .=> split(x))))
3 Likes

There’s no requirement to use the select/combine/transform minilanguage for everything you do in DataFrames. I would probably just write this as:

x = split.(df.name);
df.first_name, df.last_name = first.(x), last.(x)
2 Likes

Why does my example need a splat

select(df, :name => ByRow(x -> (; ([:firsname, :lastname] .=> split(x))...)))

but yours doesn’t
select(df, :name => ByRow(x -> NamedTuple([:firsname, :lastname] .=> split(x))))
?
if both create NamedTuple

It’s just two different things that work differently: NamedTuple(itr) is a function (constructor) that takes a single parameter: an iterator of pairs. It’s a single parameter so you must not use splat. Another way to construct a named tuple is the (; ...) syntax. This one takes a list of pairs, here each pair is a “parameter”. So to make a list of parameters from a single iterator you need the splat.

3 Likes