I want to read a CSV file and do some path-dependent calculations involving multiple columns (i.e., no vectorization allowed in the actual problem.)
I was surprised to see that in a simple example, DataFrames introduce over 100x overhead. Is there a faster way to iterate over rows? Or alternatively, a way to parse a CSV directly into a NamedTuple?
julia> d = [(a=rand(),b=rand()) for _ in 1:10^6]; julia> df = DataFrame(d); julia> function f(xs) s = 0.0; for x in xs s += x.a * x.b end s end f (generic function with 1 method) julia> function g(xs) s = 0.0 for x in eachrow(xs) s += x.a * x.b end s end g (generic function with 1 method) julia> @btime f($d) 577.269 μs (0 allocations: 0 bytes) 249855.20496448214 julia> @btime g($df) 105.782 ms (6998979 allocations: 122.05 MiB) 249855.20496448386