Impact of Spectre on Julia?


#1

Just wondering what the impact of the Spectre vulnerability will be on Julia. It looks like LLVM will introduce a mitigation that basically disables the use of indirect jump instructions, which has the potential to negate most peformance benefit of branch prediction. More qualified analysis can be found in this HackerNews comment thread.

What I’m curious about:

  • Do Julia devs think that such a change should be incorporated into Julia? Or could any security impact on Julia code be ruled out?
  • If such a change is necessary, it’s probably too late for Julia 1.0. Could it be done in a non-breaking way during the 1.x development line?
  • If I understand correctly, the performance hit could be rather big for code that is heavy on virtual function calls. But “good” Julia code should then not be affected so much, right? Considering that having more information avalailble at compile time and aggressive inlining should reduce the need for virtual function calls.

Thanks for your insights!


#2

Not a julia core dev, but I spent the last day reading the papers and happily nerding out about this beautiful new vulnerability class.

So, first (to clarify for everyone, even though you already know this): Meltdown is irrelevant for mitigations in julia/llvm (but optimization passes might slightly change, in order to reflect the increased costs of syscalls on intel). We are only talking about spectre here.

Now, in order for spectre to be dangerous to julia programs, you need the following three conditions:

  1. You handle secret data.
  2. You share a core with an attacker.
  3. The attacker knows your memory layout.

I would argue that the combination of (1) and (2) is already rare enough that any costly mitigations should probably be command-line or even compile-time flag only.

Note that malicious javascript is an issue, but is expected to be mitigated by browsers. Also note that malicious software in other guests (if you compute in the cloud) should be mitigated by the cloud hoster, but you will need to wait on guidelines by the cloud hosters on this.

Also, I additionally recommend the post describing the retpoline https://support.google.com/faqs/answer/7625886

Any other opinions?

Does anyone here use julia in settings where (1) and (2) are fulfilled?


#3

From what I understand, C++ is very vulnerable, because of the indirect branches used for virtual tables.
Julia, on the other hand (in any performant code where you’ve been careful of type stability, at least) isn’t doing any indirect branches of that sort.
If I am correct, it will be interesting to benchmark Julia code against other OO languages that use techniques like C++, once those have been recompiled to mitigate these new vulnerabilities. Whereas before, you might have been happy to get parity with C++, you might find that your Julia code would now be relatively faster than C++ :smile:


#4

Ah, but julia does emit a lot of indirect jumps (callq) even if the target should be known. Simple example on 0.6:

f(V,i)=V[i]

X=rand(4)
@code_native f(X,2)

	.text
	leaq	-1(%rsi), %rax
	cmpq	24(%rdi), %rax
	jae	L20
	movq	(%rdi), %rax
	vmovsd	-8(%rax,%rsi,8), %xmm0  # xmm0 = mem[0],zero
	retq
L20:
	pushq	%rbp
	movq	%rsp, %rbp
	movq	%rsp, %rcx
	leaq	-16(%rcx), %rax
	movq	%rax, %rsp
	movq	%rsi, -16(%rcx)
	movabsq	$jl_bounds_error_ints, %rcx
	movl	$1, %edx
	movq	%rax, %rsi
	callq	*%rcx
	nopw	(%rax,%rax)

This code is vulnerable to spectre. I can mistrain the branch predictor to assume that the bound-check fails (jae) and mistrain the branch-target-predictor to jump (callq) to my favorite gadget.

Edit: Technically, this chains attacks 1 and 2 in the google terminology.

Edit 2: Maybe these indirect calls are worth fixing anyway, and should even improve performance (in a mostly negligible way, mostly reducing codesize and speeding up exceptions). No reason for codegen to not emit a direct call (we know which exception we want to raise).


#5

We’re aware of the details of the attack. I think it’s rare to be using julia in situations where this will cause problems (no operating systems or web browsers in julia at the moment). We also don’t tend to emit indirect branches in code emitted by the static compiler. Since the indirect branch mitigation is in LLVM, we can of course import it at any time. Importantly though, it’d have to be applied across both julia and its binary dependencies to be effective. I think the best thing to do is to let this shake out here. My hunch is that chicken bits intel is flipping with their latest microcode patches might be a better mitigation for people using julia in scenarios where such an attack is a concern.


#6

Could you explain why Julia is emitting lots of indirect branches? I wouldn’t have expected that, I don’t usually see that when I look at my own code with @code_native, at least, but maybe that’s because of the low-level nature of most of the code I write, where all types will be known at compile time.


#7

“Chicken bits” is the new henchmen unrolling.


#8

Could you explain why Julia is emitting lots of indirect branches

Case statements (chained if statements) could also trigger emission of this code pattern, and is a common compiler optimization (indirect branches, aka jump-tables or vtables, are sometimes both the fastest way to design the code and the most flexible). Modern CPUs have also gotten really good at executing them too, so there’s often little or no penalty for emitting them, but it can make code emission substantially easier (and/or may be required by the platform ABI).

Now, in order for spectre to be dangerous to julia programs, you need the following three conditions:

I believe 2 isn’t necessary. You don’t need to be on the same core, but you do need for your attacker to be running in the same process (so same address-space, not same-core). I’m not sure about 3. It seems like it may not be sufficient in some cases (the attacker also needs to be able to control values in memory), but it seems like it may also not be necessary (starting with zero-knowledge of layout, it may be possible to probe – albeit perhaps much slower to start).

“Chicken bits” is the new henchmen unrolling.

Chicken bit is an existing term (see wiktionary)


#9

I can’t tell you why julia was made this way, but this appears to be the mechanic for raising exceptions, e.g. in boundschecks. I’d call that “lots”. Funny that you are so diligent about @inbounds that you never encountered those.

But as Keno said, spectre does not look too problematic for most julia uses. And also, yes, both codegen, runtime and all other linked libs would need to be checked, and would still be an incomplete mitigation (even if you secure the Branch Target Buffer, possibly using shiny new microcode updates, a chain of mispredicted branches can still get you into leaking states, and finding all such possible leaking execution pathes looks quite NP-complete to me-- and AFAIK the currently known microcode updates do nothing about ordinary branch prediction, but correct me if I’m wrong).

Edit: Ok, that’s just the way ccall is done. So, not just exceptions.


#10

Spectre needs you to share the branch predictor state which is AFAIK shared per-core. Different thread in same process, but pinned to different core should be secure; separate process on the same core should be vulnerable, even if in a different VM (though the published POCs for spectre were in the same process, as javascript reading browser memory).


#11

The indirect branch pattern you mentioned is only there in jitted code, because that’s the platform ABI for the large code model (there’s no call instruction that takes a 64 bit immediate). It’s not present in code we statically compile, because we don’t use the large code model there. (For the jit, we don’t control the memory placement at the moment, so we need to be able to handle code at arbitrary positions in the address space).


#12

Thanks for the explanation, interesting! I always mildly wondered about the way these calls look like in code_native.


#13

What would be involved in controlling the memory placement?
One thing that has concerned me, is that (from what I was told, but this was ages ago, and Julia development moves quite fast) was that generated code itself is not garbage collected.

One thing that would be of interest to me is being able to have a shared memory segment, where code for methods could be placed, and then shared between all Julia processes, and done so that if no process is using a particular chunk anymore, it can be freed up.

(you may think that seems kind of strange - but it’s the architecture of the language I worked on before, with the compiled routines (a collection of functions) object code (for a virtual machine) is stored in a persistent database, and loaded as needed, with buffers from 2K - 64K in shared memory, which can get reused)


#14

Whatever triggers this, instead of disallowing the code generation in general, would it be sufficient to stop it in security sensitive places, i.e. where you deal with passwords. [What’s the slowdown for disallowing in general, assuming not needed…]

Could a macro be made to stop the code pattern then? [There’s already a switch / case macro in a package.]

[I also recall a discussion on special handling for strings that are meant for passwords.]


#15

Sufficient for what? There’s no reason to arbitrarily slow down security-sensitive code. (Spectre is an attack against non-security-sensitive code that forces it to reveal secrets that the code doesn’t have access to).


#16

I ran the julia benchmark suite on my computer with a kernel 4.4.0-103 without any microcode updates and with all the latest kernel and microcode updates on a i7-5600U, the saved results are 250Mb each, so a little bit large to upload here or to compare them by just by going through the tests one by one.
Are there any scripts available that give a short summary when comparing two of these files?


#17

If you’re on macOS, Linux, or another *nix, what happens if you use ‘diff’ to compare the files? I.e. are they the same?


#18

diff is not really useful, because 1) the files do not have line breaks and 2) timings are not going to be identical.
1 could be solved by some kind of json auto-formatter, for 2 you have to apply some actual statistics but I would have to dig into the actual structure of the json, which I was hoping to avoid.


#19

You may have seen this already, but some instructions on processing the results locally are here: https://github.com/JuliaCI/BaseBenchmarks.jl#recipe-for-testing-a-julia-pr-locally


#20

The benchmarks were generated with the following script on Ubuntu 16.04 and julia 0.7.0-DEV.3394. I did not touch the computer during benchmarks, but also did not reserve processors, BLAS libraries used two cores:

using BenchmarkTools, BaseBenchmarks
BaseBenchmarks.loadall!() # load all benchmarks
results = run(BaseBenchmarks.SUITE; verbose = true) # run all benchmarks
BenchmarkTools.save("filename.json", results) # save results to JSON file

Please correct, if the following is wrong:

#                                      kernel               microcode
before = BenchmarkTools.load("i7_5600U_4.4.0-103_2017-12-04_0x25_2017-01-27.json")
after  = BenchmarkTools.load("i7_5600U_4.4.0-109_2018-01-09_0x28_2017-11-17.json")

regs = regressions(judge(minimum(before[1]), minimum(after[1])))
ppss = leaves(regs)

[ppss[i][2].ratio.time for i in eachindex(ppss)] |> mean

1.3734714397941863                                      

gives an 37% average regression, which is way more than in any official benchmark.