Why this dataframes.jl example uses a semicolon at the beginning?

Juan · November 9, 2021, 8:21pm

I was reading some examples (from https://bkamins.github.io) about how to use DataFrames.jl to select, filter, combine… and I have a question.

df = DataFrame(
  id = 1:4,
  name = ["Aaron Aardvark", "Belen Barboza",  "Elżbieta Elbląg", "Felipe Fittipaldi"],
  age = [50, 45, 30, 25],
  eye = ["blue", "brown", "blue",  "brown"],
  grade_1 = [95, 90, 95, 90],
  grade_2 = [75, 90, 75, 95],
  grade_3 = [85, 85, 80, 85]
  )

select(df, :name => ByRow(x -> (; ([:firsname, :lastname] .=> split(x))...)))

What I don’t understand is this part:

(; ([:firsname, :lastname] .=> split(x))…

Why are they using a semicolon ; there?
Why do they need to spat … the output?

There is another thread about the semicolon

But I think it’s not the same. In that thread they use it to suppress printing the output.

Maybe in my example it’s related with “positional arguments” for functions, but I don’t understand it here.

pdeffebach · November 9, 2021, 8:57pm

It’s because the anonymous function creates a named tuple. You can construct a named tuple programatically in Julia with Pairs.

julia> (; ([:a, :b] .=> 5)...)
(a = 5, b = 5)

That’s a pretty opaque piece of code for sure.

Juan · November 10, 2021, 1:06am

Can I achieve the same result without creating a named tuple?

pdeffebach · November 10, 2021, 1:10am

You could put them in separate commands, write things out more explicitely.

sijo · November 10, 2021, 7:39am

If you find this syntax messy there’s another way to write it that is maybe clearer:

select(df, :name => ByRow(x -> NamedTuple([:firsname, :lastname] .=> split(x))))

nilshg · November 10, 2021, 8:47am

There’s no requirement to use the select/combine/transform minilanguage for everything you do in DataFrames. I would probably just write this as:

x = split.(df.name);
df.first_name, df.last_name = first.(x), last.(x)

Juan · November 10, 2021, 6:09pm

Why does my example need a splat

select(df, :name => ByRow(x -> (; ([:firsname, :lastname] .=> split(x))...)))

but yours doesn’t
select(df, :name => ByRow(x -> NamedTuple([:firsname, :lastname] .=> split(x))))
?
if both create NamedTuple

sijo · November 10, 2021, 6:39pm

It’s just two different things that work differently: NamedTuple(itr) is a function (constructor) that takes a single parameter: an iterator of pairs. It’s a single parameter so you must not use splat. Another way to construct a named tuple is the (; ...) syntax. This one takes a list of pairs, here each pair is a “parameter”. So to make a list of parameters from a single iterator you need the splat.

Topic		Replies	Views
Understanding splat operator in return statement New to Julia question	2	656	July 22, 2021
DataFrames: why is `df[2,]` the same as `df[2]`? Data question , syntax	17	2035	September 27, 2017
Run multiple instances of transform on specific column combinations of a GroupedDataFrame in DataFrames mini language New to Julia question , dataframes	22	702	December 23, 2022
[documentation] Questions about the use of semicolon New to Julia	1	453	August 4, 2021
Initializing a dataframe New to Julia	23	10875	March 15, 2020

Why this dataframes.jl example uses a semicolon at the beginning?

Related topics