Correct. My “solution” did not produce an error message but it did not do the job.
with
[collect(Missings.replace(a, 0)) for a in df.columns]
I now get
MethodError: Cannot `convert` an object of type String to an object of type Int64
This may have arisen from a call to the constructor Int64(...),
since type constructors fall back to convert methods.
Stacktrace:
[1] replace(::Array{Union{Int64, Missings.Missing},1}, ::String) at /home/js/.julia/v0.6/Missings/src/Missings.jl:276
[2] collect_to!(::Array{CategoricalArrays.CategoricalArray{String,1,UInt32,String,CategoricalArrays.CategoricalString{UInt32},Union{}},1}, ::Base.Generator{Array{Any,1},##21#22}, ::Int64, ::Int64) at ./array.jl:508
[3] collect(::Base.Generator{Array{Any,1},##21#22}) at ./array.jl:476
[4] include_string(::String, ::String) at ./loading.jl:522
Maybe I should first convert all the values in the dataframe to strings before trying this.
In the end it would be written as strings to an pg_dump file anyhow.
Not that I know how to do it at this stage. But I will find out.
d = Dict([Union{Int64, Missing}=>0, Union{String, Missing}=>""])
d2 = [collect(Missings.replace(a, d[eltype(a)])) for a in df.columns];
> key union{Int64, Missing} not found
Passing a full set of replacement missing values (one per column), as in for a,b in df.columns, dfmissingvec = [0, 0, "", "etc"] should be okay though
for (v, name) in eachcol(df)
df[name] = collect(Missings.replace(v, "\\N"))
end
collect(Missings.replace(v, "\\N")) can also be coalesce.(v, "\\N") or recode(v, missing => "\\N") (the latter is in CategoricalArrays). But if you don’t know the type of the columns in advance I don’t see how you could choose an appropriate replacement…
Thanks. However that will only work on columns with a String type.
With names(df) I get an array of column headers. How can I use that to create a second dataframe with those names but of the type String for each column? If I can do that, I can possibly copy the non strings values to the second dataframe with string(v)?
coalesce.(df, 0)
# or rather
x = coalesce.(df, "\\N")
I’d prefer to use replace and the pairs definition of the transform like the following, but the SO example only works for a single column, the coalesce brodcast makes the replacement for every column every row that has missing.