NamedTuple vs Field


#1

I was wondering whether there is a performance penalty of using NamedTuples instead of just Fields of structs. If, for example, I have an Atom and depending on what calculations you want to do there could be different data attached to this Atom struct. Before NamedTuple I used to keep adding fields, making constructors initializing them to 0 or whatever in cases where that particular datafield wasn’t used. With NamedTuple I wonder if there will be a performance penalty making the following substitution:

struct Atom{T <: AbstractFloat}
  name::Symbol 
  Z::Int
  position :: Point3{T} #x,y,z
  l_soc::T
  U::T
  ....
end

to something like

struct Atom{T <: AbstractFloat}
  name::Symbol 
  Z::Int
  position :: Point3{T} #x,y,z
  data::NamedTuple
end

where things like l_soc and U etc would now be stored by name and value in the data field.

If then I originally have a function

function hubbard_model(atoms::Array{<:Atom,1},...)
  for at in atoms
    U = at.U
    ....
  end
  ...
end

Would/should there be a performance difference using the following function ( together with the second definition of Atom)?

function hubbard_model(atoms::Array{<:Atom,1},...)
  for at in atoms
    U = at.data[:U]
    ....
  end
  ...
end

I’m trying to understand just how much NamedTuples differ from just using extra fields. If there is no performance difference it would seem as if there is no benefit to using fields over just using one field, it being a NamedTuple. The only reason I would ever consider using the latter is to do prototyping and reuse the same type for different calculations/ workflows.


#2

::NamedTuple won’t be very efficient, since it is abstractly-typed. You could use Dict to store random extra properties pretty efficiently if you start to have a lot of them. Otherwise, just declaring the fields explicitly as it sounds like you have been doing is probably best.


#4

I see, yea I guess if I would specify what 'NamedTuple` it sort of defeats the purpose. Ok I’ll keep doing what I am for now and use typed Dicts if I want flexibility thanks!


#5

Could you elaborate on this? Dict{Symbol,Any} is not concrete either and will slow down performance, no?


#6

Performance cannot easily be defined in the abstract. There are cases when it’ll be slower, there are cases when it’ll be faster. It’ll almost definitely load and start executing much faster. It may be completely irrelevant to the runtime performance of the program (the most likely scenario).


#7

Dict{Symbol,Any} is concrete


#8

Yes that is absolutely true. My comment was more meant as an advice it a struct containing a Dict{String,Any} may have performance issues or not.

My actual problem is a large scale Gtk Application where Program startup is more than 60 seconds while the second time its less then 1 sec. I profiled that once and it seems to be an inference issue. Since I use several Dict{String,Any} my Custom Gtk widgets I thought that this might be an issue.


#9

Since you seem to know the type of the additional fields, you can just specify the types of the NamedTuple fields, it should be more efficient than a dict.


#10

If I read the OP question correctly, the question was about whether it would be more efficient (no) or less typing (also no), not whether it was feasible.

And NT fields are not “more efficient” that a dict, since efficiency is not an absolute measure. On a relative scale, Dict is more efficient in general (O(1) vs O(n)), but indeed, many NT operations manage to constant-fold back to O(1). In exchange, they provides some extra features (element-types are tracked on a per-key basis). However, the trade is that NT can be harder to work with (immutability does have its drawbacks), and the optimizations that make it O(1) in rare cases can also cause it to accidentally become O(n^2). These are roughly all the same observations as comparing a Tuple and Array, for which we have more empirical evidence of how they get used and how they perform.


#11

No thats the point, it’s not really that I don’t know the types at any point during the program evaluation, it’s just that I don’t know all the possible fields and their types I will use in certain calculations (future ones) . But they are all all atoms, so I’d like to keep using the Atom type without having to redefine everything I did before. Probably a dictionary with the varying data and a couple of fields with ‘always there’ data is the best solution