Convert collection (Array, DataFrame, ...) to concrete eltype

Suppose I have a collection, e.g. DataFrame, with Any eltype but all elements having same concrete type:

df = DataFrame(a=Any[1, 2, 3])

For further processing I need to make it type-stable, but don’t see how to do that. Any obvious way I’m missing here?

Such situation occurs when reading a “dirty” dataset with all kind of wrong values, and cleaning it afterwards.

Perhaps not the most efficient, but:

julia> df = DataFrame(a=Any[1, 2, 3], b=Any[1., 2, 3])
3×2 DataFrame
│ Row │ a   │ b   │
│     │ Any │ Any │
├─────┼─────┼─────┤
│ 1   │ 1   │ 1.0 │
│ 2   │ 2   │ 2   │
│ 3   │ 3   │ 3   │

julia> for n in names(df)
           df[!,n] = [x for x in df[!,n]]
       end

julia> df
3×2 DataFrame
│ Row │ a     │ b    │
│     │ Int64 │ Real │
├─────┼───────┼──────┤
│ 1   │ 1     │ 1.0  │
│ 2   │ 2     │ 2    │
│ 3   │ 3     │ 3    │

I might misunderstand but will this do?

julia> using DataFrames

julia> df = DataFrame(a=Any[1, 2, 3])

julia> df.a = Int64.(df.a)

julia> df
3×1 DataFrame
│ Row │ a     │
│     │ Int64 │
├─────┼───────┤
│ 1   │ 1     │
│ 2   │ 2     │
│ 3   │ 3     │

or maybe

eltype(df.a[1]).(df.a)

if you want it to be more generic (and can rely on that first value…)

I find comprehensions tend to solve this automatically for me

julia> using DataFrames

julia> df = DataFrame(a=Any[1, 2, 3])
3×1 DataFrame
│ Row │ a   │
│     │ Any │
├─────┼─────┤
│ 1   │ 1   │
│ 2   │ 2   │
│ 3   │ 3   │
julia> [aa for aa in df.a]
3-element Array{Int64,1}:
 1
 2
 3

Thanks for suggestions! For now comprehensions seems like the best easy choice

for n in names(df)
    df[!,n] = [x for x in df[!,n]]
end

Explicitly using type of the first element like typeof(df.a[1]).(df.a) (note typeof instead of eltype as was suggested - so that it works for arrays as well) is definitely less general. E.g. it doesn’t work for Union{..., Nothing} which is pretty common, and other small unions which are handled well by comprehensions.

For larger datasets where performance is important it would be better to have a helper function to skip columns which already have proper types. Unfortunately, I don’t think it’s possible to determine if the type is correct without checking all values anyway…

I wonder if this would be a nice feature to be built into DataFrames. Something like narrowtypes!(df) which in simplest form does your loop, but could be made more efficient by skipping any column which already has a concrete type. Like this,

for n in names(df)
    isconcretetype(eltype(df[!, n])) && continue
    df[!,n] = [x for x in df[!,n]]
end