Hello all!
I’m working with some data using CSV.read and the DataFrame type. The data in its raw form is seldom needed in my code, which got me thinking if I indeed need to store it in my mutable struct in the first place. However, this got me thinking that, should I choose to store it, would it be more efficient to store it as a DataFrame or as a sort of “pointer” to a CSV.read instruction?
Here’s the overall idea:
mutable struct foo{D<:DataFrame}
a::D
otherfields
end
mutable struct bar{F<:Function}
b::F
otherfields
end
function baz(filepath::String)
data = CSV.read(filepath,DataFrame)
otherfields = ... # not real code
foo(data,otherfields)
end
function qux(filepath::String)
@eval data = () -> CSV.read($filepath,DataFrame)
otherfields = ... # not real code
bar(data,otherfields)
end
Bearing in mind that these structs would be repeatedly passed down in my code, is there anything I should consider efficiency-wise?
I tried benchmarking both routes separately from the rest of my code and achieved slightly less memory and less allocations by storing the actual DataFrame, which came to me as quite a surprise.