Inline function on transform

So, I am using the DataFrames package and have this dataset called salestrain that has a column called date. Now date is a column of strings representing the date of the sales, each is written like "12.10.2014"; but I’m only interested in the month, in this particular example: 10.

So I create a grouped dataframe:
gsalestrain = groupby(salestrain, :item_id)

And now I try to add my desired month column:
salestrainmonth = transform(gsalestrain, :date => x -> split.(x, '.')[2])

But that returns me an error:
ERROR: LoadError: BoundsError: attempt to access 1-element Array{Array{SubString{String},1},1} at index [2]

Now, if I create an auxiliar function:
getmonth(x) = split(x,'.')[2]

and try:

salestrainmonth = transform(gsalestrain, :date => x -> getmonth.(x))

it works fine.

I imagine that’s because the ‘dot notation’ is applied also with the index selection, right? My question is: is it really necessary to create and auxiliar function or is there a way to make it work with a inline function?

1 Like

The issue here is unrelated to transform or DataFrames:

julia> x = ["1.1.2000", "1.2.2000"]
2-element Array{String,1}:
 "1.1.2000"
 "1.2.2000"

julia> split.(x, ".")
2-element Array{Array{SubString{String},1},1}:
 ["1", "1", "2000"]
 ["1", "2", "2000"]

julia> split.(x, ".")[2]
3-element Array{SubString{String},1}:
 "1"
 "2"
 "2000"

julia> getmonth(x) = split(x, '.')[2]
getmonth (generic function with 1 method)

julia> getmonth.(x)
2-element Array{SubString{String},1}:
 "1"
 "2"

so the getmonth function splits its single argument and then takes the second element of the result, while split.(x) splits every element in x and therefore returns an array of arrays, each of which with containing the split results. So in getmonth you are indexing into the split result, while in split.(x)[2] you’re taking the second element of an array of arrays, which in itself is an array (the results of splitting the second argument of x, so ["1", "2", "2000"] in my example).

What you therefore want is to also broadcast the selection of the second argument:

julia> getindex.(split.(x, '.'), 2)
2-element Array{SubString{String},1}:
 "1"
 "2"

Here, I’m using getindex, which is the function that [] brackets actually calls under the hood, and tell it to get the second index of each element in split.(x, '.').