Julia install issue (foreign name)

I installed Julia, and am trying to execute it by typing julia in Windows PowerShell. Then, the terminal shows

PS C:\Users\김시현> julia
ERROR: could not load library "C:\Users\김?�현\.julia\juliaup\julia-1.10.3+0.x64.w64.mingw32\lib\julia\sys.dll"
The specified module could not be found.

I think the error results from

김?�현
I don’t understand where this weird character comes from. How should I fix this?

Perhaps it is your user name on your computer?

I think that indeed Julia will not work if you install it in a home folder with non-ascii letters or with spaces…
What you can do if you use bash:

export JULIA_DEPOT_PATH="/.julia"

if the folder /.julia exists and can be written as normal user…

You might need a different command for powershell…

The key message: Create and environment variable with the name JULIA_DEPOT_PATH and set it to a path that exists, is writable and contains only ASCII characters…

Then everything should work fine…

This looks like a bug in either julia or juliaup on Windows. Do the 64 bit “portable” or “installer” manual downloads from Download Julia work?

2 Likes

You may also want to change the environment variable JULIAUP_DEPOT_PATH:

Indeed, we’ve seen many reports of users having troubles to load libraries from paths which include non-ASCII characters: Failed process 'gdk-pixbuf-query-loaders.exe' during load on Windows · Issue #461 · JuliaGraphics/Gtk.jl · GitHub, (julia.exe:7396): Gtk-WARNING **: 21:07:31.246: Could not load a pixbuf from icon theme. · Issue #497 · JuliaGraphics/Gtk.jl · GitHub, julia.exe starting error with Korean path-name. · Issue #33486 · JuliaLang/julia · GitHub. This may have been fixed by update libuv to v2-1.48.0 by vtjnash · Pull Request #49937 · JuliaLang/julia · GitHub, which is in Julia v1.11, although I’m still not 100% libuv is related to this specific issue, I guess some Windows users will have to confirm that.

1 Like

An update to libuv (pr#49937) is included in v1.11-beta.
I tested it in a clean windows sandbox:

Seems works fine.


“你好” means “hello” in chinese.

Did you experience issues in the same configuration with Julia < 1.11 though?

The issue was that Windows paths use invalid UTF-16 to encode Korean names. Libuv uses UTF-8. Until recently libuv insisted on strict UTF-8 which means that when Windows gives it invalid UTF-16 (unpaired surrogates), it simply cannot handle it. There is, however, a thing called WTF-8, which is just UTF-8 allowing surrogate code points (they’re fine from UTF-8’s perspective, just technically disallowed), which allows handling arbitrary invalid UTF-16. @jameson and I finally got libuv to use WTF-8, so this is expected to be fixed now.

10 Likes

Bad news: after more careful testing, this issue still exists on v1.11-beta1.

The problem in my last test: my main system language is set to Chinese and the code page is 936, in the sandbox the language is set to English and the code page is also 936.

cp936 seems to maintain compatibility with UTF8, so no invalid UTF16 is generated.

Set the global language of the sandbox to Korean.
The default code page changes to cp 949.

Create a new test folder “테스트” again
Both v1.10.3 and v1.11.0-beta1 cannot load sys.dll
When the path contains only ascii characters, everything works fine.


The problem seems to be in the call to MultiByteToWideChar, where it is assumed that the input string is encoded in UTF8, while the actual incoming encoding is affected by the system language and code page settings.

2 Likes

I tested this on Julia 1.11 and got the same error. I was able to fix the issue and run Julia on a test user after enabling “Beta: Use Unicode UTF-8 for worldwide language support” in Time & Language → Language → “Administrative Language Settings”, and rebooting.

4 Likes

That makes more sense to me, thanks. What’s failing is basically our implementation of dlopen, which explains the issues I linked above (they’re about loading libraries), and has nothing to do with libuv, which explains why the mentioned PR didn’t solve this specific issue (as I expected).

2 Likes

Are you able to open an issue describing this or make PR to address this? Would be great if this just worked now that libuv supports it. Of
Course, I also thin like that everyone should just use UTF-8 for stuff like this, so I don’t see the “workaround” as a workaround really—it’s the way things should be done.

I recommend just REINSTALL your pc and change your user to english name for anyone who has same problem, it would be fast that you search solution on the internet and figure out it yourself.

it’s not working well which has installed on the destination path that contains Korean characters (it has not encoded as utf-8 by the historical reasons) in windows system.

installing the other programming languages (like Python, R, Matlab) got the same problem.

I believe this is no longer an “upstream” issue (in libuv), but if anyone wants to fix this open issue in Julia, then make sure there’s no security issue, or well fix it also, see below.

That seems like overkill, but if you do then do choose a good or best code page (either UTF-8 or) GB_18030 for Chinese and Koren/Hangul, 2-bytes per letter, twice as efficient as UTF-8 or anything other, also supporting all languages. Just changing to English or well ASCII in the path should be enough (and all filenames used by Julia), without reinstall, should be enough to start Julia and get anything to work. I was assuming the Korean EUC-KR encoding used, and many encodings including e.g. cp936, have an ASCII subset (if using only that, I believe you’re safe). If you DO reinstall, and choose appropriate encoding, e.g. UTF-8, then you should get away with any path/filename, not just English/ASCII, though likely not spaces in many contexts… (good software should though also handle that).

You may want to avoid e.g. cp936, for security reasons, at least for PHP. I would like to know if the security issue applies to Julia too, for sure, or at least potentially, and if also for GB_18030 which is seemingly a superset encoding of it.

If you do not want to rename, e.g. your user folder to ASCII, then maybe keeping it and making an ASCII symlink to it for Julia would help:

There are only three Unicode encodings, thereof IMHO 1 or 2 are good, UTF-8, and well this (not UTF-16) is the other one:

So use UTF-8 (aka in Windows codepage 65001 or in source code CP_UTF8).

Or use GB_18030 which is also good for the web. UTF-16 is no good for the web since it’s not ASCII compatible. Other sort-of important encodings, none of Unicode UTFs, and UTF-8 always way more popular on the web, are Windows-1521 for Russian/Cyrillic, Unified Hangul Code/CP949/EUC-KR, EUC-JP and the even lesser used Shift_JIS that seems to be crashing in use (but it and UTF-8 are though supported by QR codes).

Are you using (most likely): Code page 936 (Microsoft Windows) - Wikipedia

Windows code page 936 (abbreviated MS936, Windows-936 or (ambiguously) CP936)

or (likely not, and maybe the info below also confuses the two): Code page 936 (IBM) - Wikipedia

Exploit PoC is public.

Details: Security Alert: CVE-2024-4577 - PHP CGI Argument Injection Vulnerability | DEVCORE

Note currently verified as exploitable on installation with the following locales:

  • Traditional Chinese (Code Page 950)
  • Simplified Chinese (Code Page 936)
  • Japanese (Code Page 932)

When “Best Fit” isn’t

“A nasty bug with a very simple exploit—perfect for a Friday afternoon,” researchers with security firm WatchTowr wrote.

CVE-2024-4577, as the vulnerability is tracked, stems from errors in the way PHP converts unicode characters into ASCII. A feature built into Windows known as Best Fit allows attackers to use a technique known as argument injection to pass user-supplied input into commands executed by an application, in this case, PHP. Exploits allow attackers to bypass CVE-2012-1823, a critical code execution vulnerability patched in PHP in 2012.

IMPACT:

EXPLOITATION MECHANISM:

  • The Windows feature “Best Fit” converts 0xAD to 0x2D.
  • Malicious code can be passed through “php://input” and executed using the “auto_prepend_file” option to call “include_path.”
  • We have also seen the “auto_append_file” option.

EXPLOITATION IN THE WILD:

  • Since June 6th, our telemetry has revealed numerous exploitation attempts against this vulnerability.

SOLUTION:

  • Upgrade PHP on any vulnerable hosts to the latest version.
  • To mitigate this vulnerability, we recommend our customers install the latest content updates for Palo Alto Networks Advanced Threat Protection (ATP).

@essenciary does Genie.jl have anything like auto_prepend_file or auto_append_file? Do you think this is very PHP specific, not an issue for your package or any Julia-related or other languages?

About the 2024 security issue for PHP (now patched) for at least cp936, that may or may not apply to Julia too (probably not, seems to rely on obscure PHP feature: Using "auto_prepend_file" into ".user.ini" file in PHP - Stack Overflow)

I.e. converts (0x00 prepended to) soft-hyphen, aka syllable hyphen optional hyphen (a usually invisible non-char), to (0x00 then) visible hyphen-minus (I can see it being insecure), as seen here:

https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit936.txt

0x00ad 0x002d ;-

note also there:

WCTABLE 24482
[…]
0xad 0x4f03 ;��

and

0x2d 0x002d ;-

Info on this here: Index of /Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit

0x0063 0x63 ;Latin Small Letter C

0x221e 0x38 ;Infinity << Best Fit Mapping

0xff41 0x61 ;Fullwidth Latin Small Letter A << Best Fit Mapping

It seems to me that “fullwidth forms for legacy CJK font compatibility” like

julia> Char(0xff41)
'a': Unicode U+FF41 (category Ll: Letter, lowercase)

will be converted to:

julia> Char(0x61)
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

It seems intentional, but maybe not always desired. Though I think it’s probably not a security issue. More like you can’t have different files with only this change in the filename or path. Just as Windows file system is case insensitive.