I’m not surprised — if allocating empty dictionaries takes a significant fraction of your compute time, then you must do only a small number of subsequent operations per dictionary. For very small dictionaries being accessed a limited number of times it will be probably faster to use another data structure (even linear search).
(For performance optimizing dictionaries, one may also want to consider custom hash function — Dictionary with custom hash function)