So, I am using the DataFrames
package and have this dataset called salestrain
that has a column called date
. Now date is a column of strings representing the date of the sales, each is written like "12.10.2014"
; but I’m only interested in the month, in this particular example: 10
.
So I create a grouped dataframe:
gsalestrain = groupby(salestrain, :item_id)
And now I try to add my desired month column:
salestrainmonth = transform(gsalestrain, :date => x -> split.(x, '.')[2])
But that returns me an error:
ERROR: LoadError: BoundsError: attempt to access 1-element Array{Array{SubString{String},1},1} at index [2]
Now, if I create an auxiliar function:
getmonth(x) = split(x,'.')[2]
and try:
salestrainmonth = transform(gsalestrain, :date => x -> getmonth.(x))
it works fine.
I imagine that’s because the ‘dot notation’ is applied also with the index selection, right? My question is: is it really necessary to create and auxiliar function or is there a way to make it work with a inline function?
1 Like
The issue here is unrelated to transform
or DataFrames:
julia> x = ["1.1.2000", "1.2.2000"]
2-element Array{String,1}:
"1.1.2000"
"1.2.2000"
julia> split.(x, ".")
2-element Array{Array{SubString{String},1},1}:
["1", "1", "2000"]
["1", "2", "2000"]
julia> split.(x, ".")[2]
3-element Array{SubString{String},1}:
"1"
"2"
"2000"
julia> getmonth(x) = split(x, '.')[2]
getmonth (generic function with 1 method)
julia> getmonth.(x)
2-element Array{SubString{String},1}:
"1"
"2"
so the getmonth
function splits its single argument and then takes the second element of the result, while split.(x)
splits every element in x
and therefore returns an array of arrays, each of which with containing the split results. So in getmonth
you are indexing into the split result, while in split.(x)[2]
you’re taking the second element of an array of arrays, which in itself is an array (the results of splitting the second argument of x
, so ["1", "2", "2000"]
in my example).
What you therefore want is to also broadcast the selection of the second argument:
julia> getindex.(split.(x, '.'), 2)
2-element Array{SubString{String},1}:
"1"
"2"
Here, I’m using getindex
, which is the function that []
brackets actually calls under the hood, and tell it to get the second index of each element in split.(x, '.')
.