GCP.jl - Google Cloud Platform APIs / BigQuery

GCP.jl provides a Google Cloud Platform BigQuery API client and the ability to auto-generate new client APIs from Google’s discovery service. This is a 100% Julia implementation not reliant on command line installation of Google Cloud SDK or Python 2 (which Google Cloud SDK requires).

BigQuery API is generated, functional and tested for getting data in and out of the BigQuery service. Compare to GBP, GoogleCloud and GoogleCloudObjectStores.

GCP.jl is available from the official Julia package registry.

Example usage

using GCP
using GCP.Bigquery

CredentialFilename = expanduser("~/secrets/your-project.json")

p = GCP.Project(CredentialFilename, ["cloud-platform"])
dataset = Bigquery.Dataset()
dataset.datasetReference = Bigquery.DatasetReference()
dataset.datasetReference.datasetId = "your-dataset"
res = p.bigquery.datasets.insert(dataset)

@info res

Why? This is an itch I needed to scratch. For anyone who needs BigQuery access, or needs to generate further Google SDKs, I hope this will help you. At the time of writing 251 more APIs might be generated.

Cheers

25 Likes

@Rana_Ian This is great, thank you for pulling this together. I might be missing it in my quick glance at the test cases, but I’m curious if you have an example you could point me to of pulling down a query result to a dataframe locally using GCP.jl?

I have a solution that I’m using (I’ve written an alternative to the way the GBQ module handles this that is a touch more vectorized and checks data types), but I would like to move away from the reliance on the command line installation.

I’m happy to open PRs to add any of the above items if they’re not in there and it is useful.

2 Likes

I am able to connect but how do we query tables?

CredentialFilename = expanduser("~/xxx/creds.json")
p = GCP.Project(CredentialFilename, ["cloud-platform"])
dataset = Bigquery.Dataset()
dataset.datasetReference = Bigquery.DatasetReference()

I checked the examples and I didn’t see any example of selecting specific tables - there is the “get” command, but this didn’t seem like it worked as a querying tool…

Is there anything that is similar to Python’s google.cloud where one can just use a SELECT statement on the tables??

I had a similar question last year; let me know if you come across anything that works for you @Billpete002 - I’d be open to collaborating on a project to have something closer to R’s bigquery, since R, like Julia, doesn’t have the dedicated API like Python for BQ.

1 Like

Sorry I missed this when I did as I’ve just been using GBQ but the [ANY] format for data is a bummer. Could you provide your solution?

Also, do you know of a way to write a dataframe back to BQ?

@Billpete002 it is funny, I was just starting to think about writing a process to upload data to BQ last week, so your timing is pretty good. I’ve mostly been doing local analysis in Julia and only recently had a reason to push back data to BQ (was still python jobs to do my pushes to BQ).

Here is my current solution that builds off the work the BGQ.jl folks did. I change the way the dataframe is constructed from BGQ.jl and I added a type check since I had issues with INT and FLOAT not getting parsed correctly. Feel free to make any comments.

For uploads, I’ve started to look at writing a wrapper around the CLI tool. These are the upload docs I found.

1 Like

This is great stuff - you should DM Martin on GBQ.jl to incorporate your fix. The wrapper around the CLI tool sounds promising

1 Like

That is good to hear, I’ll reach out about opening a PR.

Opened: Update GBQ.jl by ChaimKesler · Pull Request #10 · martineastwood/GBQ · GitHub