Connection to BigQuery with GoogleStorage

Hi,

I have a question regarding the usage of the package GoogleCloud.jl

Following the documentation, I manage to connect to a bucket in Google Cloud Storage but the documentation doesn’t provide a example to connect to Big Query.

The package GCP is not maintained anymore and I have tested the package TidierDB.jl which gives me the same error than the following code :

julia

credentials_path = "path_to_key_json_file"
creds = JSONCredentials(credentials_path)

# Using a public dataset to ensure there is no connection problem
url = "https://bigquery.googleapis.com/bigquery/v2/projects/bigquery-public-data/queries"

# Both bigquery and general cloud-platform
session = GoogleSession(creds, ["https://www.googleapis.com/auth/bigquery", "https://www.googleapis.com/auth/cloud-platform"])

auth = authorize(session)

query = """
SELECT
  name,
  SUM(number) AS total
FROM
  bigquery-public-data.usa_names.usa_1910_2013
GROUP BY
  name
ORDER BY
  total DESC
LIMIT
  10;
"""

body = JSON3.write(Dict("query" => query,
                         "useLegacySql" => false,
                         "location" => "EU"))

token = auth[:access_token]

headers = ["Authorization" => "Bearer $token", "Content-Type" => "application/json"]

response = HTTP.post(url, headers, body)

This code gives me the following error :
“error”: {
“code”: 403,
“message”: “Access Denied: Project bigquery-public-data: User does not have bigquery.jobs.create permission in project bigquery-public-data.”
…}

I don’t understand why I get this error and what I forgot in the code to make this small example working. The dataset is public so any user should be able to access it. The problem seems to be with the scopes or other parameters that are missing.

If someone has any ideas or already find a way to connect to Big Query with Julia…

Thank you.

Based on your error, it seems like you may not have permissions to access the bucket so if you are using a personal acct, you may have to make sure you have given yourself permissions.

I was just able to reconnect to my GBQ acct and query a table via TidierDB’s gbq connect function.

I had a lot of trouble working through the API with google cloud because of the sparse documentation around the package. You may be able to take a look at that backends api in the source code for tidierdb and see if that helps guide u.

Hi Daniel,

Thank you for your answer.

It seems that I don’t have any permission issues, as I can connect to my private (or public) database using Python. However, with the same JSON key, I’m unable to connect using Julia, whether it’s with the TidierDB.jl package or directly with the GoogleCloud.jl package. So to clarify my question, is there something additional that needs to be added in the code or elsewhere to allow the connection when attempting to access a private dataset ?

hmm that is strange, i just tried a different dataset example and was able to succeed in addition to the one you originally had.

The issue with your code is this line

url = "https://bigquery.googleapis.com/bigquery/v2/projects/my-project-id/queries"

you need to specify your project ID. then it should work (or at least it did for me after making this change)

project_id = JSONCredentials(credentials_path).project_id
url = "https://bigquery.googleapis.com/bigquery/v2/projects/$project_id/queries"
response = HTTP.post(url, headers, body) # succeeded
using TidierDB
using GoogleCloud
conn = connect(gbq(), "credentials_path", "project_id" )
taxis = db_table(conn, "bigquery-public-data.new_york_taxi_trips.taxi_zone_geom")

@chain t(taxis) begin 
    @group_by borough
    @summarise(n = n())
    @collect
end

6×2 DataFrame
 Row │ borough        n     
     │ String         Int64 
─────┼──────────────────────
   1 │ EWR                1
   2 │ Bronx             43
   3 │ Queens            69
   4 │ Brooklyn          61
   5 │ Manhattan         69
   6 │ Staten Island     20
@chain db_table(conn, "bigquery-public-data.usa_names.usa_1910_2013") begin 
    @group_by name
    @summarise(total = sum(number))
    @arrange desc(total)
    @collect
end

29828×2 DataFrame
   Row │ name     total   
       │ String   Int64   
───────┼──────────────────
     1 │ James    4942431
     2 │ John     4834422
     3 │ Robert   4718787
     4 │ Michael  4297230
     5 │ William  3822209
     6 │ Mary     3737679
   ⋮   │    ⋮        ⋮
 29824 │ Jeziel         5
 29825 │ Yahshua        5
 29826 │ Ej             5
 29827 │ Getzel         5
 29828 │ Stiles         5
        29817 rows omitted
2 Likes

Hi,

Thanks again for your answer. The code you provide is working for the database you mention but still not working for my private database.

However, I finally find the problem I have with other databases and in particular with my private GBQ base. In the TidierDB.jl package, the function get_table_metadata() specify the location in the query to “US” and I was doing the same when using directly GoogleCloud.jl in my HTTP request.

Obviously if the location of the database is different (EU for example), that returns the error shown above, which is not very explicit…

I will submit an issue in the TidierDB package. Add a location parameter to the get_table_metadata() will solve the problem…

Thank you.

2 Likes