# Finding the "minimal" necessary type of a vector and converting to it

I am having a hard time finding the “minimal” type necessary for a vector and converting to it. By “minimal” I mean the type that can still hold all of the values without being too general. For example:

`String > Float64 > Int`

since `Strings` can contain `Float64` which in turn can contain `Int`. I understand that not all `Int` can be expressed exactly as `Float64` but

``````julia> promote_type(Int, Float64)
Float64
``````

My attempt so far is the following:

``````using Parsers
struct ConvType
T::DataType
needsmissing::Bool
end

function guesstype(v; n = 10000)
if n >= length(v)
vu = copy(v)
else
inds = sample(1:length(v), n, replace = false)
vu = v[inds]
end
missings = ismissing.(vu)
needsmissing = any(missings)
vu = vu[missings .== false]
min_T = Int
for val in vu
new_T = _promote(val)
if  new_T <: AbstractFloat || new_T == String
min_T = new_T
end
min_T == String && break
end
return ConvType(min_T, needsmissing)
end

function _promote(s::String)
s = strip(s)
p = Parsers.tryparse(Float64, s)
if isnothing(p)
return String
else
return _promote(p)
end
end

function _promote(n::T) where T <: AbstractFloat
if round(n) == n
return Int
else
return T
end
end

function _promote(a::T) where T
return T
end

function convone(s::String, ::Type{T}) where T <: Integer
s = replace(s, r"\.\d*"=>"")
return Parsers.parse(T, s)
end

convone(s::String, ::Type{T}) where T <: Number = Parsers.parse(T, s)

function convone(n::T, ::Type{String}) where T <: Number
string(n)
end

convone(a::T, ::Type{T}) where T = a

function convone(a::T, ::Type{S}) where T<:Number where S<:Number
S(a)
end

convone(::Missing, ::Type{T}) = missing

function conveach(v)
T = guesstype(v)
for (i, val) in enumerate(v)
v[i] = convone(val, T.T)
end
OT = T.needsmissing ? Union{T.T, Missing} : T.T
Vector{OT}(v)
end

``````

Here some example output

``````julia> conveach(Any[1, "123", 1., "1.00"])
4-element Array{Int64,1}:
1
123
1
1

julia> conveach(["1.1", "2", 1, 2])
4-element Array{Float64,1}:
1.1
2.0
1.0
2.0

julia> conveach(["a", "1", 1, 1.1])
4-element Array{String,1}:
"a"
"1"
"1"
"1.1"
``````

I understand that Julia’s type system/hierarchy is extremely complicated but this code seems quite involved just to promote between `String`, `Float64` and `Int`. Any suggestions on making this easier?

Your algorithm is inevitably going to be a bit messy here because it is type-unstable: your output depends on the values of the data and not just on the types. Moreover, that type instability will make everything downstream of your `conveach` function more complicated as well, because it has to deal with data that might be represented by strings or numbers…

In what context does this arise? If you are dealing with data in a messy format, I would try to write a preprocessing script that cleans up your data first before processing. e.g. why not just make everything floating-point?

3 Likes

Yeah the data is kind of weird. It’s basically a bunch of individual data from an API pasted together and the api seems not to be consistent. So I figured I would write something generic and it got out of hand. I guess just converting everything to Float64 is the best solution. The `convert` vs `parse` situation is still a bit annoying but easy to handle. Thanks for your help!

1 Like

You might want to check out BangBang.jl for this, especially `push!!`. If you just need regular `promote` behavior, it’s as easy as:

``````julia> using BangBang

julia> foldl(push!!, Any[1, 1.1, 0x2, 3f0], init=Union{}[])
4-element Array{Float64,1}:
1.0
1.1
2.0
3.0
``````

To get the behavior you want, it’s a little more complicated, but using your functions `convone` and `_promote`, not much more:

``````julia> foldl(Any[1, "123", 1., "1.00"], init=Union{}[]) do a, i
push!!(a, convone(i, _promote(i)))
end
4-element Array{Int64,1}:
1
123
1
1
``````
2 Likes

Thank you. I’ll try that out.