I see it’s 16-byte aligned, for Julia allocations (also if going straight to Libc.malloc).
I’m thinking of a 32-bit pointer (Ptr32) idea, that wouldn’t have byte-addressability, i.e. for base pointers, then if you add e.g. 1 or other offset, you get promoted to regular Ptr).
Even without my idea, I’m not sure status quo is better when you have 64-byte cache lines.
It works for correctness, but is there a drawback if different unrelated allocations share a cacheline?
I think it might actually be ok, for single-threaded, or if you have separate threads that hit the same L1 cache.
But if not, the cache line will be ok in L2+ (or at least L3), but it would go to one L1, and does it jump around between cores, slowing down?
This is maybe a rare issue in practice, so is it more valuable to not waste memory?
It seems to me 64-byte allocations it’s too bad, most allocations are that large or larger anyway. My Ptr32 idea is actually to optimize Strings so don’t worry too much about such smaller.
Trees (and linked lists) need a minimum of two pointers, 16 bytes yes, but often with something more, also are B-trees not yet popular for RAM too?
What are them main arguments for or against larger (or smaller) minimum lengths.
Even large allocations are only 16-byte aligned, at least for Libc, but they could at least be 64-byte or more aligned?