When I try to convert an Arrow file to Avro using this code:
function t_arrowtoavro(path, file)
a = Arrow.Table(joinpath(path, "$(file).arrow")) |> DataFrame
println(Tables.schema(a))
Avro.writetable(joinpath(path, "$(file).avro"), a; compress=:zstd)
end
I get the following error whenever the input file contains a string column, e.g.
Tables.Schema:
:IndividualId Int32
:ResultDate Union{Missing, Date}
:HIVResult Union{Missing, String}
ERROR: ArgumentError: internal writing error: buffer too small, len = 1563130
Whereas this version does not produce this error:
function t_arrowtoavro(path, file)
a = Arrow.Table(joinpath(path, "$(file).arrow")) |> DataFrame
println(Tables.schema(a))
b = select(a, :IndividualId, :ResultDate)
println(Tables.schema(b))
Avro.writetable(joinpath(path, "$(file).avro"), b; compress=:zstd)
end
Output:
Tables.Schema:
:IndividualId Int32
:ResultDate Union{Missing, Date}
:HIVResult Union{Missing, String}
Tables.Schema:
:IndividualId Int32
:ResultDate Union{Missing, Date}
"D:\\Data\\Demography\\AHRI\\Staging\\HIVResults.avro"
How do I resolve this problem if I need a string column in my output file?