I was analysing some code today and I am unable to read information from a simple file while keeping type stability. I would like to pass the results to functions that have to be very optimized, so I believe this is an important issue…
At the beginning, I had just a CSV file like this
a,b
test1,1.
test2,2.
test3,3.
test4,4.
that I was reading naively using CSV.jl
and DataFrames.jl
function f()
fracfile = DataFrame(CSV.File("test.csv"))
x = fracfile[1,:b]
return x
end
but runing @code_warntype f()
says that x
is of type Any
. A fast search lead me to discover that DataFrames.jl
are not type stable. However, the blog there suggests that one can use Tables.columntable
in order to generate type-stable data. I tried with
function f()
fracfile = DataFrame(CSV.File("test.csv"))
stable = Tables.columntable(fracfile)[:b]
x = stable[1]
return x
end
but again no luck. Both stable
and x
are Any
. I have further tried to cast variables to Vector{Float64}
, but it does not change anything. I also tried to add particular types to the CSV.File
function using the argument types = Dict(1=> String, 2=>Float64))
. Finally, I tried to use TypedTables.jl
instead of DataFrames.jl
as apparently these should offer type stability. But again to no effect.
The only way I found to get type-stable code is hacky and probably not very correct…
function f()
fracfile = DataFrame(CSV.File("test.csv"))
#x::Float64 = parse(Float64, fracfile[1,2]) #all blue in codewarn!
x::Vector{Float64} = Vector{Float64}(fracfile[:,2]) #still one temp variable is red in codewarn, but x is of the intended type.
return x
end
I understand that the compiler cannot know the contents of the file beforehand, so type inference is complicated. But I cannot really get why if I use TypedTables.jl
or if I explicitly try to cast variables the compiler can still do nothing. Is there any simple solution for this? Thank you very much in advance!