I’m trying to write a parquet file to S3 and I’m just not sure exactly what the issue is. This code has worked in the past and there doesn’t seem to be a significant change to the underlying data. Potentially, all the values in a single column may be “missing” but I don’t believe that is causing the problem.
I am getting the following error below:
ERROR: LoadError: KeyError: key Union{} not found
Stacktrace:
[1] getindex at ./dict.jl:467 [inlined]
[2] write_col(::FilePathsBase.FileBuffer, ::Array{Missing,1}, ::String, ::Int32, ::Int32; nchunks::Int64) at /home/ec2-user/.julia/packages/Parquet/O0PXc/src/writer.jl:369
[3] _write_parquet(::FilePathsBase.FileBuffer, ::Tables.Columns{DataFrames.DataFrameColumns{DataFrame}}, ::Array{Symbol,1}, ::Int64; ncols::Int64, encoding::Dict{String,Int32}, codec::Dict{String,Int32}) at /home/ec2-user/.julia/packages/Parquet/O0PXc/src/writer.jl:563
[4] write_parquet(::FilePathsBase.FileBuffer, ::DataFrame; compression_codec::String) at /home/ec2-user/.julia/packages/Parquet/O0PXc/src/writer.jl:506
[5] #83 at /home/ec2-user/.julia/packages/Parquet/O0PXc/src/writer.jl:526 [inlined]
[6] open(::Parquet.var"#83#84"{String,DataFrame}, ::S3Path{Nothing}, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at ./io.jl:325
[7] open at ./io.jl:323 [inlined]
[8] #write_parquet#82 at /home/ec2-user/.julia/packages/Parquet/O0PXc/src/writer.jl:525 [inlined]
[9] write_parquet(::S3Path{Nothing}, ::DataFrame) at /home/ec2-user/.julia/packages/Parquet/O0PXc/src/writer.jl:525
[10] top-level scope at /home/ec2-user/environment/Analytics Testing/julia_orch.jl:383
[11] include(::Function, ::Module, ::String) at ./Base.jl:380
[12] include(::Module, ::String) at ./Base.jl:368
[13] exec_options(::Base.JLOptions) at ./client.jl:296
[14] _start() at ./client.jl:506
in expression starting at /home/ec2-user/environment/Analytics Testing/julia_orch.jl:383
The code that is throwing the error is shown in a snippet below:
# Write the link prediction results to an S3 bucket.
path = S3Path("s3://$s3_bucket_output/$s3_filepath_output/link_prediction_df_" * unique_identifier * ".parquet")
@info("Outputting link prediction data to $path")
write_parquet(path, link_prediction_df)
Lastly, the contents of the dataframe are shown below:
5×18 DataFrame
Row │ MONTHS_FROM_START TOPIC_I TOPIC_J PIJ PI_PJ PI_GIVEN_J RIJ RIJ_HAT DIJ EIJ DIJ_HAT EIJ_HAT RIJ_DELTA RIJ_TREND PI_PJ_NOVELTY PI_PJ_LOG YEAR MONTH
│ Int64 String String Float64 Float64 Float64 Float64 Float64 Float64? Float64? Float64? Float64? Missing Missing Missing Float64 Int64 Int64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 1 TOPIC_38 TOPIC_29 0.000288662 0.000337401 0.0157594 -0.225085 -0.0919115 -0.0100146 0.000345866 -0.00286437 0.000323948 missing missing missing -7.99424 2018 2
2 │ 1 TOPIC_38 TOPIC_18 0.00032709 0.000377291 0.0159694 -0.205989 -0.0822713 -0.00976723 0.000231847 -0.00268832 0.000228191 missing missing missing -7.88249 2018 2
3 │ 1 TOPIC_38 TOPIC_19 0.000414862 0.000508113 0.0150398 -0.292518 -0.0412777 -0.011874 0.000683656 -0.00253039 0.000313784 missing missing missing -7.58481 2018 2
4 │ 1 TOPIC_38 TOPIC_21 0.000422538 0.000489185 0.0159107 -0.2113 -0.0424972 -0.0103839 0.000352965 -0.00261318 0.000287849 missing missing missing -7.62277 2018 2
5 │ 1 TOPIC_26 TOPIC_38 0.000405344 0.000451404 0.0220053 -0.155272 0.0410278 -0.0110256 0.000526242 -0.00253196 0.000418073 missing missing missing -7.70315 2018 2