I see it’s 16-byte aligned, for Julia allocations (also if going straight to Libc.malloc).
I’m thinking of a 32-bit pointer (Ptr32) idea, that wouldn’t have byte-addressability, i.e. for base pointers, then if you add e.g. 1 or other offset, you get promoted to regular Ptr).
Even without my idea, I’m not sure status quo is better when you have 64-byte cache lines.
It works for correctness, but is there a drawback if different unrelated allocations share a cacheline?
I think it might actually be ok, for single-threaded, or if you have separate threads that hit the same L1 cache.
But if not, the cache line will be ok in L2+ (or at least L3), but it would go to one L1, and does it jump around between cores, slowing down?
This is maybe a rare issue in practice, so is it more valuable to not waste memory?
It seems to me 64-byte allocations it’s too bad, most allocations are that large or larger anyway. My Ptr32 idea is actually to optimize String
s so don’t worry too much about such smaller.
Trees (and linked lists) need a minimum of two pointers, 16 bytes yes, but often with something more, also are B-trees not yet popular for RAM too?
What are them main arguments for or against larger (or smaller) minimum lengths.
Even large allocations are only 16-byte aligned, at least for Libc, but they could at least be 64-byte or more aligned?