Hello.
I have been looking at the Mongoc.jl documentation on and off for several months, but I haven’t been able to find a way to convert BSON data to a DataFrame.
The data I am using is from the yelp dataset, found here : https://raw.githubusercontent.com/melqkiades/yelp/master/notebooks/yelp_academic_dataset_business.json
This is the code that I have so far :
using DataFrames
using CSV
using Pipe
using Mongoc
client = Mongoc.Client()
database = client["local"]
collection = database["yelp"]
bson_options = Mongoc.BSON("""
{ "projection" : { "_id" : false, "city" : true,
"attributes.Noise Level" : true,
"attributes.Ambience.casual" : true}},
{"sort" : { "city" : 1, "attributes.Noise Level" : 1, "attributes.Ambience.casual" : 1} }
""")
mongo_data = Mongoc.find(collection,options=bson_options)
vector_bson_docs_mongo_data = collect(mongo_data)
I basically want to do the same as I would with PostgreSQL’s JSON manipulation capabilities :
using LibPQ
using DataFrames
conn = LibPQ.Connection("dbname=myusername")
df = DataFrame(execute(conn, """
SELECT
info ->> 'city' AS city,
info -> 'attributes' ->> 'Noise Level' AS noise_level,
info -> 'attributes' -> 'Ambience' ->> 'casual' AS casual
FROM yelpo
ORDER BY city ASC, noise_level ASC, casual ASC;
"""))
df
How would one go about converting all the mongodb data to a DataFrame ?