Fixing GBQ.jl - Dict to DataFrames

Hi All,

I am trying to fix GBQ.jl (as it seems to be the only 100% easy way to access Google Big Query, correct me if I am wrong). When running it with DataFrames 0.27 it works fine, but using the latest version it no longer works (I read somewhere a lot of older libraries stopped working around 1.0?). The error I get is:

ERROR: MethodError: Cannot `convert` an object of type Dict{Any, Any} to an object of type DataFrame
Closest candidates are:
  convert(::Type{DataFrame}, ::SubDataFrame) at ~/.julia/packages/DataFrames/JZ7x5/src/subdataframe/subdataframe.jl:304
  convert(::Type{T}, ::T) where T at Base.jl:61
  DataFrame(::AbstractDict; copycols) at ~/.julia/packages/DataFrames/JZ7x5/src/dataframe/dataframe.jl:275
  ...
Stacktrace:
 [1] _gbq_parse(response::Vector{Any})
   @ GBQ ~/.julia/packages/GBQ/KCvh5/src/GBQ.jl:60
 [2] gbq_query(query::String; use_legacy_sql::Bool, quiet::Bool, max_rows::Int64)
   @ GBQ ~/.julia/packages/GBQ/KCvh5/src/GBQ.jl:69
 [3] gbq_query(query::String)
   @ GBQ ~/.julia/packages/GBQ/KCvh5/src/GBQ.jl:67
 [4] top-level scope
   @ /home/jupyter/xxxxx/Customer Segmentation.jl:31

So from what I can see the data being returned from BQ are dict types and it is trying to convert a Dict to DataFrame - but the conversion fails. When checking the library I assume the error occurs at:

# internal function to parse json response returned from big query
#
# returns a dataframe
function _gbq_parse(response)
    cols = collect(keys(response[1]))
    values = Dict()
    for key in cols
        values[key] = []
    end
    for dict in response
        for key in cols
            push!(values[key], dict[key])
        end
    end
    return convert(DataFrame, values)
end


# Execute a query
#
# Returns a dataframe
function gbq_query(query; use_legacy_sql=false, quiet=true, max_rows=100000000)
  response = JSON.parse(read(`bq --format=json  --quiet="$quiet" query --use_legacy_sql="$use_legacy_sql" --max_rows="$max_rows" "$query"`, String))
  return _gbq_parse(response)
end

However, I am not sure what is wrong in the _gbq_parse - as the library relies on this extensively…

Is convert(DataFrame, values) no longer valid to convert Dict values? I have also tried DataFrames(values) with the same results.

1 Like

1.0 was a breaking release of DataFrames.jl.

Here the problem is the incorrect use of convert that was in the old DataFrames.jl and was fixed in 1.0 releae.

Now you need to write DataFrame(values) instead of convert(DataFrame, values). Assuming that keys of your dict are either strings or Symbols all will work (your dict has key type Any so I am not sure if this is the case)

It still shoots an error when I switch to DataFrame(values) as I briefly mentioned in my first post, this was the first and simplest fix - but when I got the same error I figured I would ask the community:

ERROR: MethodError: Cannot `convert` an object of type Dict{Any, Any} to an object of type DataFrame
Closest candidates are:
  convert(::Type{DataFrame}, ::SubDataFrame) at ~/.julia/packages/DataFrames/JZ7x5/src/subdataframe/subdataframe.jl:304
  convert(::Type{T}, ::T) where T at Base.jl:61
  DataFrame(::AbstractDict; copycols) at ~/.julia/packages/DataFrames/JZ7x5/src/dataframe/dataframe.jl:275
  ...
Stacktrace:
 [1] _gbq_parse(response::Vector{Any})
   @ GBQ ~/.julia/packages/GBQ/KCvh5/src/GBQ.jl:60
 [2] gbq_query(query::String; use_legacy_sql::Bool, quiet::Bool, max_rows::Int64)
   @ GBQ ~/.julia/packages/GBQ/KCvh5/src/GBQ.jl:69
 [3] gbq_query(query::String)
   @ GBQ ~/.julia/packages/GBQ/KCvh5/src/GBQ.jl:67
 [4] top-level scope
   @ /home/jupyter/xxxx/Customer Segmentation.jl:31