A major flaw with json is its inability to store non-UTF8 strings which inadvertently makes them unreliable for storing unix filenames (here a filename is defined as any sequence of bytes except
NUL
and/
, including invalid UTF-8).
[…]
Some prior art on this would be Rust’sraw string literal
notation andqsn
which is a format based on those ideas. It is documented quite nicely here, along with good general coverage of this topic: http://www.oilshell.org/release/latest/doc/qsn.html
ZON strings are no different from Zig string literals in this context, so the answer is: yes, ZON strings store any sequence of bytes, agnostic to their interpretation. The literal in file must itself be UTF-8 encoded, because Zig sources - and by extension ZON - are always UTF-8 encoded, but you can represent any byte sequence which is not valid UTF-8 by using simple escape sequences (the main relevant one here being
\xNN
).
shell> touch \xff
julia> touch("\xff") # The former didn't work, see:
-rw-rw-r-- 1 pharaldsson pharaldsson 0 feb 23 17:23 xff
-rw-rw-r-- 1 pharaldsson pharaldsson 0 feb 23 17:24 ‘’$‘\377’
I can open this file, but lets say I want to serialize the filename itself, or if Pkg has to handle such (it it even a possibility of it happening, if generated from within Julia? I suppose you can make such a Package/module).
You can do escape_string(filename)
and it’s seemingly always needed to be careful, unless your code/Julia API already7 does it. You can do it redundantly with no ill effect, only a bit of expansion, it’s just then you must unescape too twice, we the users need to know if needed or not.
Does it happen by default for [filenames] in JSON packages? Probably not, it would mean you need to know if potential illegal UTF-8 (always, for non-validated strings, and filenames can’t be validated). It seems to me Pkg (using TOML) doesn’t escape (or only in once place), since it doesn’t deal with filenames. It’s unclear it TOML does it for you (for other contexts). Python and C++ have good path/filesystems APIs, and there’s talk of similar in Julia, without just naked strings, and then likely should handle escaping for you. Strictly speaking the escaping is need for all Julia strings… unless when they are known to be validated UTF-8.
On Windows with UTF-16, are all even-numbered byte sequences also valid? And even odd, like the file above?
I was looking into ZON encoding and if we should possibly support it (it’s not just for Zig or its packages system):