Accessing a column value from DataFrameRow allocates

It depends what you want to do:

  1. If you want to perform a single operation then it probably does not matter;
  2. If you want to do millions of such operations then:
    • either use higher-level functions provided by DataFrames.jl like select or combine and they will be efficient;
    • if you want to use low-level operations, like loops, then:
      • if your data frame is not wide then convert it to NamedTuple with Tables.columntable - this operation will be cheap and later all you do with it is type stable;
      • if your data frame is very wide but you do not need to process all columns then drop unneeded columns and do what I described in point above
      • if your data frame is very wide and you need all columns then you have a problem - this is the case when writing type stable code is hard and you should rather consider using combine or select as they are optimized to efficiently handle such cases.

In summary - being type stable is not a free lunch as it heavily burdens the Julia compiler. DataFrames.jl was designed to be maximally flexible, but this means that it must be type unstable (otherwise you would not be able to e.g. dynamically add columns to a data frame). Also functions provided by DataFrames.jl were optimized to automatically “enable” type-stability of operations. Finally - as I have said - if your data is narrow then turning it to a type-stable NamedTuple is cheap.

2 Likes