Splits column type differs different - Any vs. Array{SubString{String},1}. Why? I just wanted to save memory so I stored the value (that can be repeated) to emptyStringArray.
Even if I understand that I made something bad to column :Splits, I donβt understand why X1βs type is changed to Any in batches6. I would thought that the columns are independent. Something wrong with @mutate?
I donβt know what Emptystringarray is. The reason for the Any is likely that there is no method promote_type(x::Vector{<AbstractString}, y::Emptystringarray) so julia defaults to promoting the vector to type Any.
The problem is that type inference sometimes breaks down if you reference a global variable in a closure, and Query.jl depends on type inference not breaking down right now
Two ways to fix this at the moment:
You can declare emptyStringArray to be const, i.e. const emptyStringArray = ...
You can put emptyStringArray and the query into a function
The proper solution to this is to drop the dependency on type inference in Query.jl. It has been on my todo list for about 2-3 years now There is no fundamental reason that this could not be done, but it is a bit of a pain to implement. At some point Iβll push myself to do it, but no promises.
So the reason the Splits column here turns into Any is that the expression length(_.Splits) > 0 ? split(_.Splits, ',') : 1. can return either an array of SubString, or a Float64 value. So we need to make the column type one that can handle both of these cases, which is Any. We call this type of situation a type instability, because that expression returns a value of a different type depending on the values of the inputs.
That the X1 column then also turns into Any is at the end of the day a bug in Query.jl that is just cumbersome to fix for me But, that is no good excuse, of course. That bug is triggered by the type instability in the other column.