Intermittent `IOError: SSL_ERROR_SYSCALL` on put to S3

I have a loop that writes bytes to S3. It works about 95% of the time. Then periodically, I get a cryptic error from the depths of HTTP.jl about SSL_ERROR_SYSCALL.

The function looks like this

using Retry: @repeat
using AWSS3
using AWS.AWSExceptions: AWSException

function write_and_log(path::S3Path, bytes::Vector{UInt8})
    @repeat 4 try
        @info "START: Running put to s3" path length(bytes)
        write(path, bytes)
        @info "END: successfully put file to s3" path
    catch e
        @retry if e isa HTTP.Exceptions.HTTPError || e isa AWSException
            bt = catch_backtrace()
            @error "END: Failed to put file to s3. Will try again." path exception=(e, bt)
        end
        @error "END: Failed to write file to s3" path=path exception=(e, catch_backtrace())
    end
end

The error thrown looks like this:

โ”Œ Error: END: Failed to put file to s3. Will try again.
โ”‚   job.path = p"s3://ppad-processing-processed-datanode-output-prod/rollout2024q2-150/consolidated_rollout2024q2-150_innetwork/data_version_p=1.1.0/versioneddatasourceid_p=150/sourcefileid_p=1446234/batch_2_chunk_0.parq"
โ”‚   exception =
โ”‚    HTTP.RequestError:
โ”‚    HTTP.Request:
โ”‚    HTTP.Messages.Request:
โ”‚    """
โ”‚    PUT /ppad-processing-processed-datanode-output-prod/rollout2024q2-150/consolidated_rollout2024q2-150_innetwork/data_version_p%3D1.1.0/versioneddatasourceid_p%3D150/sourcefileid_p%3D1446234/batch_2_chunk_0.parq HTTP/1.1
โ”‚    Content-Type: application/octet-stream
โ”‚    User-Agent: AWS.jl/1.0.0
โ”‚    Host: s3.us-east-1.amazonaws.com
โ”‚    x-amz-date: 20240619T154349Z
โ”‚    x-amz-content-sha256: a167eba412e99ca1b5a51fab79f014a783fab46274cde03b5fe2b94a65434381
โ”‚    Content-MD5: Hf6y6YV/Vt4O6NhptirfpQ==
โ”‚    x-amz-security-token: # [redacted] because I don't know if this should be shared :)
โ”‚    Authorization: AWS4-HMAC-SHA256 Credential=[redacted]/us-east-1/s3/aws4_request, SignedHeaders=content-md5;content-type;host;user-agent;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=b94a0794fdbe81efa864a85afb59b4c22697ae92f192294d5a1ef80750be9b78
โ”‚    Accept: */*
โ”‚    Content-Length: 302537168
โ”‚    Accept-Encoding: gzip
โ”‚ 
โ”‚ 
โ”‚    โ‹ฎ
โ”‚    302537168-byte body
โ”‚    """Underlying error:
โ”‚    IOError: SSL_ERROR_SYSCALL
โ”‚    Stacktrace:
โ”‚      [1] (::HTTP.ConnectionRequest.var"#connections#4"{HTTP.ConnectionRequest.var"#connections#1#5"{HTTP.TimeoutRequest.var"#timeouts#3"{HTTP.TimeoutRequest.var"#timeouts#1#4"{HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{typeof(HTTP.StreamRequest.streamlayer)}}}}}})(req::HTTP.Messages.Request; proxy::Nothing, socket_type::Type, socket_type_tls::Type, readtimeout::Int64, connect_timeout::Int64, logerrors::Bool, logtag::Nothing, kw::@Kwargs{iofunction::Nothing, decompress::Nothing, verbose::Int64})
โ”‚        @ HTTP.ConnectionRequest C:\Users\mrufsvold\.julia\packages\HTTP\Y2JKB\src\clientlayers\ConnectionRequest.jl:143
โ”‚      [2] (::HTTP.RetryRequest.var"#manageretries#3"{HTTP.RetryRequest.var"#manageretries#1#4"{HTTP.ConnectionRequest.var"#connections#4"{HTTP.ConnectionRequest.var"#connections#1#5"{HTTP.TimeoutRequest.var"#timeouts#3"{HTTP.TimeoutRequest.var"#timeouts#1#4"{HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{typeof(HTTP.StreamRequest.streamlayer)}}}}}}}})(req::HTTP.Messages.Request; retry::Bool, retries::Int64, retry_delays::ExponentialBackOff, retry_check::Function, retry_non_idempotent::Bool, kw::@Kwargs{iofunction::Nothing, decompress::Nothing, verbose::Int64})        
โ”‚        @ HTTP.RetryRequest C:\Users\mrufsvold\.julia\packages\HTTP\Y2JKB\src\clientlayers\RetryRequest.jl:35
โ”‚      [3] manageretries
โ”‚        @ C:\Users\mrufsvold\.julia\packages\HTTP\Y2JKB\src\clientlayers\RetryRequest.jl:30 [inlined]โ”‚      [4] (::HTTP.CookieRequest.var"#managecookies#4"{HTTP.CookieRequest.var"#managecookies#1#5"{HTTP.RetryRequest.var"#manageretries#3"{HTTP.RetryRequest.var"#manageretries#1#4"{HTTP.ConnectionRequest.var"#connections#4"{HTTP.ConnectionRequest.var"#connections#1#5"{HTTP.TimeoutRequest.var"#timeouts#3"{HTTP.TimeoutRequest.var"#timeouts#1#4"{HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{typeof(HTTP.StreamRequest.streamlayer)}}}}}}}}}})(req::HTTP.Messages.Request; cookies::Bool, cookiejar::HTTP.Cookies.CookieJar, kw::@Kwargs{iofunction::Nothing, decompress::Nothing, verbose::Int64, retry::Bool})
โ”‚        @ HTTP.CookieRequest C:\Users\mrufsvold\.julia\packages\HTTP\Y2JKB\src\clientlayers\CookieRequest.jl:42
โ”‚      [5] managecookies
โ”‚        @ C:\Users\mrufsvold\.julia\packages\HTTP\Y2JKB\src\clientlayers\CookieRequest.jl:19 [inlined]
โ”‚      [6] (::HTTP.HeadersRequest.var"#defaultheaders#2"{HTTP.HeadersRequest.var"#defaultheaders#1#3"{HTTP.CookieRequest.var"#managecookies#4"{HTTP.CookieRequest.var"#managecookies#1#5"{HTTP.RetryRequest.var"#manageretries#3"{HTTP.RetryRequest.var"#manageretries#1#4"{HTTP.ConnectionRequest.var"#connections#4"{HTTP.ConnectionRequest.var"#connections#1#5"{HTTP.TimeoutRequest.var"#timeouts#3"{HTTP.TimeoutRequest.var"#timeouts#1#4"{HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{typeof(HTTP.StreamRequest.streamlayer)}}}}}}}}}}}})(req::HTTP.Messages.Request; iofunction::Nothing, decompress::Nothing, basicauth::Bool, detect_content_type::Bool, canonicalize_headers::Bool, kw::@Kwargs{verbose::Int64, retry::Bool})
โ”‚        @ HTTP.HeadersRequest C:\Users\mrufsvold\.julia\packages\HTTP\Y2JKB\src\clientlayers\HeadersRequest.jl:71
โ”‚      [7] defaultheaders
โ”‚        @ C:\Users\mrufsvold\.julia\packages\HTTP\Y2JKB\src\clientlayers\HeadersRequest.jl:14 [inlined]
โ”‚      [8] (::HTTP.RedirectRequest.var"#redirects#3"{HTTP.RedirectRequest.var"#redirects#1#4"{HTTP.HeadersRequest.var"#defaultheaders#2"{HTTP.HeadersRequest.var"#defaultheaders#1#3"{HTTP.CookieRequest.var"#managecookies#4"{HTTP.CookieRequest.var"#managecookies#1#5"{HTTP.RetryRequest.var"#manageretries#3"{HTTP.RetryRequest.var"#manageretries#1#4"{HTTP.ConnectionRequest.var"#connections#4"{HTTP.ConnectionRequest.var"#connections#1#5"{HTTP.TimeoutRequest.var"#timeouts#3"{HTTP.TimeoutRequest.var"#timeouts#1#4"{HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{typeof(HTTP.StreamRequest.streamlayer)}}}}}}}}}}}}}})(req::HTTP.Messages.Request; redirect::Bool, redirect_limit::Int64, redirect_method::Nothing, forwardheaders::Bool, response_stream::Base.BufferStream, kw::@Kwargs{verbose::Int64, retry::Bool})
โ”‚        @ HTTP.RedirectRequest C:\Users\mrufsvold\.julia\packages\HTTP\Y2JKB\src\clientlayers\RedirectRequest.jl:17
โ”‚      [9] redirects
โ”‚        @ C:\Users\mrufsvold\.julia\packages\HTTP\Y2JKB\src\clientlayers\RedirectRequest.jl:14 [inlined]
โ”‚     [10] (::HTTP.MessageRequest.var"#makerequest#3"{HTTP.MessageRequest.var"#makerequest#1#4"{HTTP.RedirectRequest.var"#redirects#3"{HTTP.RedirectRequest.var"#redirects#1#4"{HTTP.HeadersRequest.var"#defaultheaders#2"{HTTP.HeadersRequest.var"#defaultheaders#1#3"{HTTP.CookieRequest.var"#managecookies#4"{HTTP.CookieRequest.var"#managecookies#1#5"{HTTP.RetryRequest.var"#manageretries#3"{HTTP.RetryRequest.var"#manageretries#1#4"{HTTP.ConnectionRequest.var"#connections#4"{HTTP.ConnectionRequest.var"#connections#1#5"{HTTP.TimeoutRequest.var"#timeouts#3"{HTTP.TimeoutRequest.var"#timeouts#1#4"{HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{typeof(HTTP.StreamRequest.streamlayer)}}}}}}}}}}}}}}}})(method::String, url::URIs.URI, headers::Vector{Pair{SubString{String}, SubString{String}}}, body::Vector{UInt8}; copyheaders::Bool, response_stream::Base.BufferStream, http_version::HTTP.Strings.HTTPVersion, verbose::Int64, kw::@Kwargs{redirect::Bool, retry::Bool})
โ”‚        @ HTTP.MessageRequest C:\Users\mrufsvold\.julia\packages\HTTP\Y2JKB\src\clientlayers\MessageRequest.jl:35
โ”‚     [11] makerequest
โ”‚        @ C:\Users\mrufsvold\.julia\packages\HTTP\Y2JKB\src\clientlayers\MessageRequest.jl:24 [inlined]
โ”‚     [12] request(stack::HTTP.MessageRequest.var"#makerequest#3"{HTTP.MessageRequest.var"#makerequest#1#4"{HTTP.RedirectRequest.var"#redirects#3"{HTTP.RedirectRequest.var"#redirects#1#4"{HTTP.HeadersRequest.var"#defaultheaders#2"{HTTP.HeadersRequest.var"#defaultheaders#1#3"{HTTP.CookieRequest.var"#managecookies#4"{HTTP.CookieRequest.var"#managecookies#1#5"{HTTP.RetryRequest.var"#manageretries#3"{HTTP.RetryRequest.var"#manageretries#1#4"{HTTP.ConnectionRequest.var"#connections#4"{HTTP.ConnectionRequest.var"#connections#1#5"{HTTP.TimeoutRequest.var"#timeouts#3"{HTTP.TimeoutRequest.var"#timeouts#1#4"{HTTP.ExceptionRequest.var"#exceptions#2"{HTTP.ExceptionRequest.var"#exceptions#1#3"{typeof(HTTP.StreamRequest.streamlayer)}}}}}}}}}}}}}}}}, method::String, url::URIs.URI, h::Vector{Pair{SubString{String}, SubString{String}}}, b::Vector{UInt8}, q::Nothing; headers::Vector{Pair{SubString{String}, SubString{String}}}, body::Vector{UInt8}, query::Nothing, kw::@Kwargs{redirect::Bool, retry::Bool, response_stream::Base.BufferStream})
โ”‚        @ HTTP C:\Users\mrufsvold\.julia\packages\HTTP\Y2JKB\src\HTTP.jl:457
โ”‚     [13] #request#20
โ”‚        @ HTTP C:\Users\mrufsvold\.julia\packages\HTTP\Y2JKB\src\HTTP.jl:315 [inlined]
โ”‚     [14] macro expansion
โ”‚        @ C:\Users\mrufsvold\.julia\packages\Mocking\Q17aB\src\mock.jl:29 [inlined]
โ”‚     [15] (::AWS.var"#48#50"{AWS.Request, OrderedCollections.LittleDict{Symbol, Any, Vector{Symbol}, 
Vector{Any}}})()
โ”‚        @ AWS C:\Users\mrufsvold\.julia\packages\AWS\SchLh\src\utilities\request.jl:225
โ”‚     [16] (::Base.var"#96#98"{Base.var"#96#97#99"{AWS.AWSExponentialBackoff, AWS.var"#49#51", AWS.var"#48#50"{AWS.Request, OrderedCollections.LittleDict{Symbol, Any, Vector{Symbol}, Vector{Any}}}}})(; kwargs::@Kwargs{})
โ”‚        @ Base .\error.jl:308
โ”‚     [17] (::Base.var"#96#98"{Base.var"#96#97#99"{AWS.AWSExponentialBackoff, AWS.var"#49#51", AWS.var"#48#50"{AWS.Request, OrderedCollections.LittleDict{Symbol, Any, Vector{Symbol}, Vector{Any}}}}})()   
โ”‚        @ Base .\error.jl:291
โ”‚     [18] _http_request(http_backend::AWS.HTTPBackend, request::AWS.Request, response_stream::IOBuffer)
โ”‚        @ AWS C:\Users\mrufsvold\.julia\packages\AWS\SchLh\src\utilities\request.jl:250
โ”‚     [19] macro expansion
โ”‚        @ C:\Users\mrufsvold\.julia\packages\Mocking\Q17aB\src\mock.jl:29 [inlined]
โ”‚     [20] (::AWS.var"#41#44"{AWS.AWSConfig, AWS.Request, IOBuffer, Vector{Int64}})()
โ”‚        @ AWS C:\Users\mrufsvold\.julia\packages\AWS\SchLh\src\utilities\request.jl:134
โ”‚     [21] (::AWS.var"#42#46"{AWS.var"#41#44"{AWS.AWSConfig, AWS.Request, IOBuffer, Vector{Int64}}, IOBuffer})()
โ”‚        @ AWS C:\Users\mrufsvold\.julia\packages\AWS\SchLh\src\utilities\request.jl:149
โ”‚     [22] (::Base.var"#96#98"{Base.var"#96#97#99"{AWS.AWSExponentialBackoff, AWS.var"#43#47"{AWS.AWSConfig, Vector{String}, Vector{String}, Int64}, AWS.var"#42#46"{AWS.var"#41#44"{AWS.AWSConfig, AWS.Request, IOBuffer, Vector{Int64}}, IOBuffer}}})(; kwargs::@Kwargs{})
โ”‚        @ Base .\error.jl:296
โ”‚     [23] (::Base.var"#96#98"{Base.var"#96#97#99"{AWS.AWSExponentialBackoff, AWS.var"#43#47"{AWS.AWSConfig, Vector{String}, Vector{String}, Int64}, AWS.var"#42#46"{AWS.var"#41#44"{AWS.AWSConfig, AWS.Request, IOBuffer, Vector{Int64}}, IOBuffer}}})()
โ”‚        @ Base .\error.jl:291
โ”‚     [24] submit_request(aws::AWS.AWSConfig, request::AWS.Request; return_headers::Nothing)
โ”‚        @ AWS C:\Users\mrufsvold\.julia\packages\AWS\SchLh\src\utilities\request.jl:200
โ”‚     [25] (::AWS.RestXMLService)(request_method::String, request_uri::String, args::Dict{String, Any}; aws_config::AWS.AWSConfig, feature_set::AWS.FeatureSet)
โ”‚        @ AWS C:\Users\mrufsvold\.julia\packages\AWS\SchLh\src\AWS.jl:287
โ”‚     [26] RestXMLService
โ”‚        @ C:\Users\mrufsvold\.julia\packages\AWS\SchLh\src\AWS.jl:251 [inlined]
โ”‚     [27] #put_object#172
โ”‚        @ C:\Users\mrufsvold\.julia\packages\AWS\SchLh\src\services\s3.jl:5754 [inlined]
โ”‚     [28] s3_put(aws::AWS.AWSConfig, bucket::SubString{String}, path::String, data::Vector{UInt8}, data_type::String, encoding::String; acl::String, metadata::Dict{String, String}, tags::Dict{String, String}, parse_response::Bool, kwargs::@Kwargs{})
โ”‚        @ AWSS3 C:\Users\mrufsvold\.julia\packages\AWSS3\8cxdr\src\AWSS3.jl:1037
โ”‚     [29] s3_put
โ”‚        @ AWSS3 C:\Users\mrufsvold\.julia\packages\AWSS3\8cxdr\src\AWSS3.jl:985 [inlined]
โ”‚     [30] write(fp::S3Path{Nothing}, content::Vector{UInt8}; part_size_mb::Int64, multipart::Bool, returns::Symbol, other_kwargs::@Kwargs{})
โ”‚        @ AWSS3 C:\Users\mrufsvold\.julia\packages\AWSS3\8cxdr\src\s3path.jl:696
โ”‚     [31] write
โ”‚        @ C:\Users\mrufsvold\.julia\packages\AWSS3\8cxdr\src\s3path.jl:674 [inlined]
โ”‚     [32] macro expansion
โ”‚        @ c:\Users\mrufsvold\Projects\DIL-price-transparency-psd\TableConsolidator.jl\src\jobs\GetSendJob.jl:17 [inlined]
โ”‚     [33] macro expansion
โ”‚        @ C:\Users\mrufsvold\.julia\packages\Retry\vS1bg\src\repeat_try.jl:192 [inlined]
โ”‚     [34] write_and_log(job::Main.TableConsolidator.SendJob)
โ”‚        @ Main.TableConsolidator c:\Users\mrufsvold\Projects\DIL-price-transparency-psd\TableConsolidator.jl\src\jobs\GetSendJob.jl:15
โ”‚     [35] (::Main.TableConsolidator.var"#60#61"{Channel{Main.TableConsolidator.SendJob}})()
โ”‚        @ Main.TableConsolidator c:\Users\mrufsvold\Projects\DIL-price-transparency-psd\TableConsolidator.jl\src\TableConsolidator.jl:71
โ”” @ Main.TableConsolidator c:\Users\mrufsvold\Projects\DIL-price-transparency-psd\TableConsolidator.jl\src\jobs\GetSendJob.jl:22

Some version info:

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC) 
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 ร— 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
  Threads: 1 on 8 virtual cores

(TableConsolidator) pkg> status
Project TableConsolidator v0.1.0
Status `C:\Users\mrufsvold\Projects\DIL-price-transparency-psd\TableConsolidator.jl\Project.toml`
โŒƒ [fbe9abb3] AWS v1.90.3
  [1c724243] AWSS3 v0.11.2
  [336ed68f] CSV v0.10.14
  [0f8b85d8] JSON3 v1.14.0
โŒƒ [e6f89c97] LoggingExtras v1.0.2
  [98105f81] LoggingFormats v1.5.0
โŒƒ [98572fba] Parquet2 v0.2.19
โŒƒ [2dfb63ee] PooledArrays v1.4.2
  [20febd7b] Retry v0.4.1
โŒƒ [bd369af6] Tables v1.10.1
โŒƒ [28d57a85] Transducers v0.4.78
โŒƒ [9d95f2ec] TypedTables v1.4.3
  [56ddb016] Logging
  [9a3f8284] Random
Info Packages marked with โŒƒ have new versions available and may be upgradable.

The files are not huge, around 300MB. And the retry loop usually succeeds the second time. I notice that, when it is going to fail, it hangs for a long time before finally throwing this error.

Edit: I canโ€™t seem to reproduce this error in isolation. This write function is getting called in a program that has a number of async read operations. So maybe there is an issue with SSL being used from multiple threads authenticating the reads at the same time as my write loop?

Edit2: openssl - SSL_read failing with SSL_ERROR_SYSCALL error - Stack Overflow indicates that S3 might be closing the connection before getting to the actual EOF. So maybe my write thread is getting interrupted, S3 drops the connection, and then when it tries to write again, we hit this error. If thatโ€™s the case, Iโ€™m not sure how to convince Julia that a task should not be interrupted.

Edit3: Iโ€™m more and more convinced that this is happening because of task switching. Consistently, I hit this error when the start of a write step unblocks a new batch of reads. I think what might be happening is that I take! a send job, it starts, but that unblocks upstream put!s for read tasks. So then the writer is interrupted to consume the batch of files. And then S3 closes the connection. The retry succeeds immediately because upstream tasks are blocked again.

I see that there is no API to tell a task not to yield. So is this a bug in AWSS3 that it doesnโ€™t more gracefully handle retries of multipart uploads?

I added a sync point before moving on to the next batch of files, and I am no longer getting this error. I had hoped to let the IO run async to the main thread, but it is what it is!

1 Like