Hi all,
I deal with API, that returns sometimes incomplete JSON. Some properties are missing. That could be reproduced with this code:
using Query
using CSV
using JSON
using DataFrames
testjson = """
[{"A":5, "B":6}, {"A":7, "D":8}]
"""
js = JSON.parse(testjson)
df = DataFrame.(js)
vcat(df...)
Note that first object contains properties A
, B
, second A
, D
. As a result Iβd like to have a DataFrame
with 2 rows and columns A, B, D.
The vcat
returns error, that is not surprising.
julia> vcat(df...)
ERROR: ArgumentError: column(s) D are missing from argument(s) 1, and column(s) B are missing from argument(s) 2
Stacktrace:
[1] _vcat(::Array{DataFrame,1}; cols::Symbol) at C:\Users\u\.julia\packages\DataFrames\3ZmR2\src\abstractdataframe\abstractdataframe.jl:1421
My βsolutionβ looks quite ugly. Is there any other way how to do that?
using Query
using CSV
using JSON
using DataFrames
testjson = """
[{"A":5, "B":6}, {"A":7, "D":8}]
"""
js = JSON.parse(testjson)
df = DataFrame.(js)
dfs = DataFrame.(df);
# get all column names through all JSON objects (= A, B, D)
dfcols = collect.(keys.(js)) |>
Iterators.flatten |>
@groupby(_) |>
@map(Name=key(_)) |>
collect
# for each DataFrame find missing columns and place some default value
for df = dfs
missingcolumns = setdiff(dfcols, names(df))
for col = missingcolumns
df[!, col] .= ""
end
end
# finally possible to concat the dataframes
dffinal = vcat(dfs...)
The result looks like this:
julia> dffinal
2Γ3 DataFrame
β Row β A β B β D β
β β Int64 β Any β Any β
βββββββΌββββββββΌββββββΌββββββ€
β 1 β 5 β 6 β β
β 2 β 7 β β 8 β