EXCEPTION_ACCESS_VIOLATION on Windows but not MacOS

I am running the exact same code on Windows and Mac with Julia 11.4 installed on both. I have to assume this error message relates to an out of bounds request. I’ll paste the huge error message below. I can share the code, but it’s a lot of code. The error relates to the use of the @view macro in 2 places that you can see in the code. Once again, it’s surprising that this works on Mac and not windows. Is the underlying Julia code or the underlying libraries for linear algebra that different between the 2 implementations?

My purpose for getting a windows machine is to have a machine with a high end NVIDIA graphics card for working with cuda with both Julia and Python (PyTorch). Too bad Windows is so often flawed relative to MacOS, despite offering a wider variety of hardware.

After this initial inquiry, perhaps someone on the team can point me to some way to help debug and narrow down the problem in a way that is more helpful to diagnosing it.

In the code below (too bad we can’t paste line numbers) the error occurs on this line, which is at the end of the innermost loop body:

                    layer.grad_weight[fi, fj, ic, oc] += sum(local_patch .* err)

Here is the complete function:

function compute_grad_weight!(layer, n_samples)
    H_out, W_out, _, batch_size = size(layer.eps_l)
    f_h, f_w, _, _ = size(layer.grad_weight)
    # @assert f_h == 3 && f_w == 3  # given 3x3 filters (for clarity)

    # Initialize grad_weight to zero
    fill!(layer.grad_weight, 0.0) # no allocations; faster than assignment

    # Use @views to avoid copying subarrays
    @inbounds for oc in axes(layer.eps_l, 3)      # 1:out_channels
        # View of the error for this output channel (all spatial positions, all batches)
        err = @view layer.eps_l[:, :, oc, :]      # size H_out × W_out × batch_size
        for ic in axes(layer.a_below, 3)          # 1:in_channels
            # View of the input activation for this channel
            # (We'll slide this view for each filter offset)
            input_chan = @view layer.a_below[:, :, ic, :]   # size H_in × W_in × batch_size
            for fj in axes(layer.weight,2)
                for fi in axes(layer.weight,1)
                    # Extract the overlapping region of input corresponding to eps_l[:, :, oc, :]
                    local_patch = @view input_chan[fi:fi+H_out-1, fj:fj+W_out-1, :]
                    # Accumulate gradient for weight at (fi,fj, ic, oc)
                    layer.grad_weight[fi, fj, ic, oc] += sum(local_patch .* err)
                end
            end
        end
    end

    # Average over batch (divide by batch_size)
    layer.grad_weight .*= (1 / n_samples)
    return   # nothing
end

Here is the voluminous error message:

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.       
Exception: EXCEPTION_ACCESS_VIOLATION at 0x2172d2a3166 -- getindex at .\essentials.jl:917 [inlined]
getindex at .\array.jl:930 [inlined]
getindex at .\subarray.jl:320 [inlined]
_getindex at .\abstractarray.jl:1358 [inlined]
getindex at .\abstractarray.jl:1312 [inlined]
_broadcast_getindex at .\broadcast.jl:644 [inlined]
_getindex at .\broadcast.jl:674 [inlined]
_broadcast_getindex at .\broadcast.jl:650 [inlined]
getindex at .\broadcast.jl:610 [inlined]
macro expansion at .\broadcast.jl:973 [inlined]
macro expansion at .\simdloop.jl:77 [inlined]
copyto! at .\broadcast.jl:972 [inlined]
copyto! at .\broadcast.jl:925 [inlined]
copy at .\broadcast.jl:897 [inlined]
materialize at .\broadcast.jl:872 [inlined]
compute_grad_weight! at C:\Users\lewis\code\Convolution\chatgpt_conv_code\src\sample_code.jl:597
in expression starting at REPL[10]:1
getindex at .\essentials.jl:917 [inlined]
getindex at .\array.jl:930 [inlined]
getindex at .\subarray.jl:320 [inlined]
_getindex at .\abstractarray.jl:1358 [inlined]
getindex at .\abstractarray.jl:1312 [inlined]
_broadcast_getindex at .\broadcast.jl:644 [inlined]
_getindex at .\broadcast.jl:674 [inlined]
_broadcast_getindex at .\broadcast.jl:650 [inlined]
getindex at .\broadcast.jl:610 [inlined]
macro expansion at .\broadcast.jl:973 [inlined]
macro expansion at .\simdloop.jl:77 [inlined]
copyto! at .\broadcast.jl:972 [inlined]
copyto! at .\broadcast.jl:925 [inlined]
copy at .\broadcast.jl:897 [inlined]
materialize at .\broadcast.jl:872 [inlined]
compute_grad_weight! at C:\Users\lewis\code\Convolution\chatgpt_conv_code\src\sample_code.jl:597
layer_backward! at C:\Users\lewis\code\Convolution\chatgpt_conv_code\src\sample_code.jl:566
unknown function (ip: 000002172d2aa8b7)
backprop! at C:\Users\lewis\code\Convolution\chatgpt_conv_code\src\sample_code.jl:801
#train_loop!#17 at C:\Users\lewis\code\Convolution\chatgpt_conv_code\src\sample_code.jl:864
train_loop! at C:\Users\lewis\code\Convolution\chatgpt_conv_code\src\sample_code.jl:821
unknown function (ip: 0000021703b9be70)
jl_apply at C:/workdir/src\julia.h:2157 [inlined]
do_call at C:/workdir/src\interpreter.c:126
eval_value at C:/workdir/src\interpreter.c:223
eval_stmt_value at C:/workdir/src\interpreter.c:174 [inlined]
eval_body at C:/workdir/src\interpreter.c:684
jl_interpret_toplevel_thunk at C:/workdir/src\interpreter.c:824
jl_toplevel_eval_flex at C:/workdir/src\toplevel.c:943
jl_toplevel_eval_flex at C:/workdir/src\toplevel.c:886
jl_toplevel_eval_flex at C:/workdir/src\toplevel.c:886
ijl_toplevel_eval at C:/workdir/src\toplevel.c:952 [inlined]
ijl_toplevel_eval_in at C:/workdir/src\toplevel.c:994
eval at .\boot.jl:430 [inlined]
eval_user_input at C:\workdir\usr\share\julia\stdlib\v1.11\REPL\src\REPL.jl:245
repl_backend_loop at C:\workdir\usr\share\julia\stdlib\v1.11\REPL\src\REPL.jl:342
#start_repl_backend#59 at C:\workdir\usr\share\julia\stdlib\v1.11\REPL\src\REPL.jl:327
start_repl_backend at C:\workdir\usr\share\julia\stdlib\v1.11\REPL\src\REPL.jl:324
#run_repl#72 at C:\workdir\usr\share\julia\stdlib\v1.11\REPL\src\REPL.jl:483
run_repl at C:\workdir\usr\share\julia\stdlib\v1.11\REPL\src\REPL.jl:469
jfptr_run_repl_10360.1 at C:\Users\lewis\AppData\Local\Programs\Julia-1.11.4\share\julia\compiled\v1.11\REPL\u0gqU_hz07T.dll (unknown line)
#1150 at .\client.jl:446
jfptr_YY.1150_15097.1 at C:\Users\lewis\AppData\Local\Programs\Julia-1.11.4\share\julia\compiled\v1.11\REPL\u0gqU_hz07T.dll (unknown line)
jl_apply at C:/workdir/src\julia.h:2157 [inlined]
jl_f__call_latest at C:/workdir/src\builtins.c:875
#invokelatest#2 at .\essentials.jl:1055 [inlined]
invokelatest at .\essentials.jl:1052 [inlined]
run_main_repl at .\client.jl:430
repl_main at .\client.jl:567 [inlined]
_start at .\client.jl:541
jfptr__start_75324.1 at C:\Users\lewis\AppData\Local\Programs\Julia-1.11.4\lib\julia\sys.dll (unknown line)
jl_apply at C:/workdir/src\julia.h:2157 [inlined]
true_main at C:/workdir/src\jlapi.c:900
jl_repl_entrypoint at C:/workdir/src\jlapi.c:1059
mainCRTStartup at C:/workdir/cli\loader_exe.c:58
BaseThreadInitThunk at C:\Windows\System32\KERNEL32.DLL (unknown line)
jl_f__call_latest at C:/workdir/src\builtins.c:875
#invokelatest#2 at .\essentials.jl:1055 [inlined]
invokelatest at .\essentials.jl:1052 [inlined]
run_main_repl at .\client.jl:430
repl_main at .\client.jl:567 [inlined]
_start at .\client.jl:541
jfptr__start_75324.1 at C:\Users\lewis\AppData\Local\Programs\Julia-1.11.4\lib\julia\sys.dll (unknown line)
jl_apply at C:/workdir/src\julia.h:2157 [inlined]
true_main at C:/workdir/src\jlapi.c:900
jl_f__call_latest at C:/workdir/src\builtins.c:875
#invokelatest#2 at .\essentials.jl:1055 [inlined]
invokelatest at .\essentials.jl:1052 [inlined]
run_main_repl at .\client.jl:430
repl_main at .\client.jl:567 [inlined]
jl_f__call_latest at C:/workdir/src\builtins.c:875
#invokelatest#2 at .\essentials.jl:1055 [inlined]
invokelatest at .\essentials.jl:1052 [inlined]
jl_f__call_latest at C:/workdir/src\builtins.c:875
#invokelatest#2 at .\essentials.jl:1055 [inlined]
jl_f__call_latest at C:/workdir/src\builtins.c:875
jl_f__call_latest at C:/workdir/src\builtins.c:875
jl_f__call_latest at C:/workdir/src\builtins.c:875
#invokelatest#2 at .\essentials.jl:1055 [inlined]
invokelatest at .\essentials.jl:1052 [inlined]
run_main_repl at .\client.jl:430
repl_main at .\client.jl:567 [inlined]
_start at .\client.jl:541
jfptr__start_75324.1 at C:\Users\lewis\AppData\Local\Programs\Julia-1.11.4\lib\julia\sys.dll (unknown line)
jl_apply at C:/workdir/src\julia.h:2157 [inlined]
jl_f__call_latest at C:/workdir/src\builtins.c:875
#invokelatest#2 at .\essentials.jl:1055 [inlined]
invokelatest at .\essentials.jl:1052 [inlined]
run_main_repl at .\client.jl:430
repl_main at .\client.jl:567 [inlined]
_start at .\client.jl:541
jfptr__start_75324.1 at C:\Users\lewis\AppData\Local\Programs\Julia-1.11.4\lib\julia\sys.dll (unknown line)
jl_apply at C:/workdir/src\julia.h:2157 [inlined]
#invokelatest#2 at .\essentials.jl:1055 [inlined]
invokelatest at .\essentials.jl:1052 [inlined]
run_main_repl at .\client.jl:430
repl_main at .\client.jl:567 [inlined]
_start at .\client.jl:541
jfptr__start_75324.1 at C:\Users\lewis\AppData\Local\Programs\Julia-1.11.4\lib\julia\sys.dll (unknown line)
jl_apply at C:/workdir/src\julia.h:2157 [inlined]
true_main at C:/workdir/src\jlapi.c:900
jl_repl_entrypoint at C:/workdir/src\jlapi.c:1059
mainCRTStartup at C:/workdir/cli\loader_exe.c:58
BaseThreadInitThunk at C:\Windows\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
true_main at C:/workdir/src\jlapi.c:900
jl_repl_entrypoint at C:/workdir/src\jlapi.c:1059
mainCRTStartup at C:/workdir/cli\loader_exe.c:58
BaseThreadInitThunk at C:\Windows\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
BaseThreadInitThunk at C:\Windows\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
RtlUserThreadStart at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
Allocations: 39319545 (Pool: 39318320; Big: 1225); GC: 63

Well since your code claims to always be in bounds, if an out of bounds access happens, you’ll naturally get a segmentation fault. Remove the @inbounds and you should get a better error message, or run with --check-bounds=true.

I suspect it has to do with your indexing - you iterate over the third axis, but index with that into the fourth dimension of layer.grad_weight.

2 Likes

An unrelated comment. Since you’re concerned about unnecessary copying, this construction does unnecessary allocation of the result of local_patch .* err. You can avoid it by doing mapreduce(splat(*), +, zip(local_patch, err)) instead. Or possibly dot(local_patch, err).

Of course. I’ll see if I get a better message.

But, I’ll point out that this code was developed and tested using Julia on a Mac.

The problem is in the Julia runtime for Windows. I don’t expect to be able to fix this.

Thanks. Great suggestions. I’ve been trying to purge the allocations and still miss obvious alternatives.

If you run this code without that @inbounds on your Mac, I’d expect that you’d get a BoundsError just the same. Windows & Mac don’t necessarily react the same when it comes to accessing data out of bounds. They are different operating systems with possibly different page sizes & thus different access bounds before the OS itself errors, which is what EXCEPTION_ACCESS_VIOLATION is communicating. Just because you don’t get an exception on one OS doesn’t mean that the code itself is correct.

This difference in behavior is not generally due to an implementation detail in Julia itself.

That is quite a strong claim, and a minimal example of that would certainly be appreciated!

1 Like

Fair enough.

The code provided runs on mac without segfaulting and with correct answers. If macOS was allowing an out of bounds memory access (bad, bad) then it would be a remarkable coincidence that the answers were always correct. Removing @inbounds on the Mac code would not fix anything, but it would enable Julia to terminate the running code gracefully, after catching the out of bounds condition, and report the error without abend’ing (ancient terminology). Will try it and report back.

I was not trying to be provocative. The code provided is the example. Hard to break it down, I realize.

1 Like

You were right! Much more useful message because it shows the array indices being attempted, which points to the problem directly.

Also, the suggestion to use map(…) or dot did run without a problem but don’t produce convergence for training the model. So what the Mac code was doing was strange indeed: implicit zero padding—not literally but “helpfully” returning zeros? Who knows.

Back to the drawing board a little. A good discovery given how the error had been masked.

Thanks, all.

I am using email. I can go into Discourse and mark closed or go ahead and really fix the code and then report back.

2 Likes

The insidious thing about out of bounds accesses is that not all of them will always produce a segmentation fault, even on Mac or Linux - or Windows!

Under the hood, an OS chunks the available memory into what are called pages, which are blocks of memory usually a few Kilobyte in size. When you read memory, the OS checks if the page that memory is on is physically present in RAM and loads it if necessary (this is partly how processes are isolated from one another, as well as how you can allocate memory larger than your physical RAM). A segmentation fault (or EXCEPTION_ACCESS_VIOLATION on Windows) happens when you’re trying to access memory that isn’t allocated to your program - but the OS can only check that on the granularity of a single page! If that check were to happen on every memory access, everything would be terribly slow. Due to differences in sizes of those memory pages on different OS, this can cause faulty code to produce correct results on some OS, since there the memory may be (coincidentally!) zeroed, or the access does not need to touch a new page, masking the problem.

You would get faulty results even without a segmentation fault in case the memory previously contained other data; the memory may after all be reused from a previous allocation, without being zeroed inbetween. It really is just a coincidence that the code worked on Mac - it may be the case that it would not work on a different machine!

Apologies for the snarky reply, I was just really sure that the @inbounds was masking the actual problem :slight_smile: I hope you can find the real bug!

Fixed.

“Same” padding requires explicit padding for the backpropagation of convolution weights, though not for the activations (layer loss). Simple to add. Works on Mac and Windows.

Would never have discovered this problem if I hadn’t run the code on Windows. Even on windows the error segfault with @inbounds was interesting. About 20-30 iterations would run before the crash. Fascinating because I pre-allocate all the training arrays in advance. i’ve tried to get rid of allocations, but not all the way yet. So, probably on Windows some allocations created the need to move some stuff in memory and after moving to the new page(s), we crossed a page boundary.

Interesting that this hadnt (yet) occurred on Mac, not because I was so brilliant to avoid all allocations but maybe the page size is different and the pages were zero initialized (does the OS even do that?) so looked like the effect of (slightly misaligned) zero padding.

Gradient descent survives some surprising sloppiness as long as the sloppiness is consistent across feed forward and back prop. But, sometimes it’s not so resilient. Amazing that a new industry is built-on such ad hoc (less politely, hacky) technology. (shhh… …Don’t let the public or VCs know!)

That’s the thing about out of bounds access. It’s undefined behaviour. Anything can happen. The reason is that your out of bounds access can hit anything. It can overwrite data, so you get a wrong result. It can overwrite internal book-keeping data in julia, counters, pointer, whatever, which may cause some other access to be wrong, much later in the execution. If you’re lucky it hits non-mapped memory directly, and you get an exception.