Improved allocation design, with 4-byte pointers, and sometimes 5-byte in effect

Sorry, I didn’t mean “map a file in the filesystem”, I referred to “issue a syscall to map memory”, which is typically mmap with an anonymous mapping, or madvise.

In the julia world, we just ask malloc for the backing memory of large Memory instances. malloc in turn asks the OS kernel.

This btw is why many users have such performance issues with dTLB! Due to this architecture, we don’t get to control the fine details of the request and are at the mercy of e.g. glibc defaults that are not tuned to julia. It is extremely advisable to inform the kernel that userspace (aka julia) is too stupid to correctly convey madvise details about desired huge-pages, and to system-wide override userspace choices that are bad for julia performance (just don’t run large mySQL instances on the same machine). Cf eg here.

In the JVM world, they have to talk to the OS kernel relatively directly, because compressed object references together with the JVM array memory layout require very careful management of precious virtual address space. But they do set all the madvise options correctly, no system-wide kernel tuning required!

For large arrays, you can simply try out the alignment:
julia> @noinline function foo(n)
a=Vector{Int}(undef, n)
trailing_zeros(reinterpret(UInt64,pointer(a)))
end
foo (generic function with 1 method)

julia> function foo2(n)
       mi = 64
       ma = 0
       for i=1:1000
       t=foo(n)
       mi = min(mi, t)
       ma = max(ma, t)
       end
       mi, ma
       end
foo2 (generic function with 1 method)

julia> begin
       @show foo2(1)
       @show foo2(1<<10)
       @show foo2(1<<16)
       @show foo2(1<<24)
       end
foo2(1) = (5, 13)
foo2(1 << 10) = (6, 15)
foo2(1 << 16) = (6, 12)
foo2(1 << 24) = (6, 6)

You see that large array allocations are always aligned to exactly 64 bytes: The request goes to the OS kernel, gets page / hugepage aligned, and the initial 64 bytes are filled with metadata to allow free to clean up.

Yes.

I think that is guaranteed by hardware, pretty much, i.e. it would not be possible to write an OS that does it differently. Mapping between file and physical memory is pure software, but the mapping between virtual address and physical address is done by the MMU, directly in silicon. And the layout of page-tables has certain alignment guarantees.

So if you wrote a kernel that allows you to map a file with an offset of e.g. 1 – sure, no problem for your first process, just need to record and handle that right on page-fault. Then a different process tries to map the file with an offset of zero and the two use that for IPC – and now you’re fucked because the structure of the pagetable, as required by the in-silicon pagewalkers, just doesn’t have space to even encode an unaligned virtual-to-physical mapping, and even though userspace never sees physical addresses, this implies a constraint on alignment / offset between two different mappings that alias, i.e. target the same physical memory.

1 Like