Skip to content

BigQuery

Use KX BigQuery API to interact with Google BigQuery

You need to have QPacker installed and have the BigQuery QPK bq.qpk.

Set up your workspace

To run inside a Docker container and access private BigQuery projects, register your authentication token for BigQuery manually.

A simple way to do this: pass the access token from your local computer into the container so that you can use it for registration.

Create a project with the structure below

$ mkdir app
$ cd app
$ tree .
.
├── bq.qpk
├── main.q
└── qp.json

0 directories, 3 files

In qp.json specify bq as a dependency.

{
    "app": {
        "entry": [ "main.q" ],
        "depends": [ "bq" ]
    }
}

Build and tag an image to run.

qp build
qp tag app 0.1.0

Run the container and pass in the access token as an environment variable

docker run --rm -it --env-file qpbuild/.env \
  -e GOOGLE_TOKEN=$(gcloud auth print-access-token) app:0.1.0

Inside the container register with Kurl manually.

.kurl.register (`oauth2; "*.googleapis.com"; ""; 
  enlist[`access_token]!enlist getenv `GOOGLE_TOKEN)

Explore the workspace

The workspace contains all of the BigQuery resources and their available methods.

List them:

q).bq.disc.listResources[]
`rowAccessPolicies`models`tabledata`datasets`tables`projects`routines`jobs

and get a description of the methods that a given resource supports.

q).bq.disc.describeMethodsForResource`datasets
method httpMethod parameters           request             response                description   ..
-------------------------------------------------------------------------------------------------..
update PUT        `projectId`datasetId (,`$ref)!,"Dataset" (,`$ref)!,"Dataset"     "Updates infor..
list   GET        ,`projectId          ::                  (,`$ref)!,"DatasetList" "Lists all dat..
patch  PATCH      `projectId`datasetId (,`$ref)!,"Dataset" (,`$ref)!,"Dataset"     "Updates infor..
delete DELETE     `projectId`datasetId ::                  ::                      "Deletes the d..
get    GET        `projectId`datasetId ::                  (,`$ref)!,"Dataset"     "Returns the d..
insert POST       ,`projectId          (,`$ref)!,"Dataset" (,`$ref)!,"Dataset"     "Creates a new..

As you see, each method takes some parameters. Get more detailed information on these parameters.

q).bq.disc.getParametersForMethod `datasets.list
parameter type     location description                     required format
---------------------------------------------------------------------------..
projectId "string" "path"   "Project ID of the new dataset" 1        ::    

Your first request

Use .bq.run to request a list of the datasets to which you have access.

q).bq.run[`datasets.list; enlist[`projectId]!enlist "cloudpak"; ()]
kind    | "bigquery#datasetList"
etag    | "PiOY7GfDPfxBWU56fwyYbg=="
datasets| +`kind`id`datasetReference`location!(("bigquery#dataset";"bigquery#dataset";"bigquery#d..

The first argument is the method name and the second argument is a dictionary of the method parameters. We’ll get to the third argument below.

The response column returned from .bq.disc.describeMethodsForResource tells you what type of object is returned. In this case it is a DatasetList.

You can find detailed information about these objects in the BigQuery API docs, for example, DatasetList.

Use the datasets key to pull out the exact list of datasets from the response.

q).bq.run[`datasets.list; enlist[`projectId]!enlist "cloudpak"; ()]`datasets
kind               id                             datasetReference                               ..
-------------------------------------------------------------------------------------------------..
"bigquery#dataset" "cloudpak:kx_bma_bigquery_ds"  `datasetId`projectId!("kx_bma_bigquery_ds";"clo..
"bigquery#dataset" "cloudpak:kx_test_bigquery_ds" `datasetId`projectId!("kx_test_bigquery_ds";"cl..
"bigquery#dataset" "cloudpak:test_dataset"        `datasetId`projectId!("test_dataset";"cloudpak"..
"bigquery#dataset" "cloudpak:test_dataset2"       `datasetId`projectId!("test_dataset2";"cloudpak..
"bigquery#dataset" "cloudpak:weather"             `datasetId`projectId!("weather";"cloudpak")    ..

Your second request

From the description of the methods, you can see methods with a HTTP method of POST or PUT take a request object.

These request objects are the same as the response objects returned from the API.

To use the insert method, pass a Dataset as the request field.

Dataset

From the docs, you see that datasetReference is the only required field for the Dataset object. The value of datasetReference is another object that contains the fields datasetId and projectId. Of those fields, only datasetId is required.

So a minimal Dataset object would look like

{
    "datasetReference" : { 
        "datasetId" : "MyDataset"
    }

}

Using this information, create a new dataset using .bq.run.

q)request:enlist[`datasetReference]!enlist enlist[`datasetId]!enlist "MyDataset"

q).j.j request
"{\"datasetReference\":{\"datasetId\":\"MyDataset\"}}"

q).bq.run[`datasets.insert; enlist[`projectId]!enlist "cloudpak"; .j.j request]
kind            | "bigquery#dataset"
etag            | "KQdjOs1o+xP1TfMFdstYog=="
id              | "cloudpak:MyDataset"
selfLink        | "https://bigquery.googleapis.com/bigquery/v2/projects/cloudpak/datasets/MyDatas..
datasetReference| `datasetId`projectId!("MyDataset";"cloudpak")
access          | (`role`specialGroup!("WRITER";"projectWriters");`role`specialGroup!("OWNER";"pr..
creationTime    | "1612263987441"
lastModifiedTime| "1612263987441"
location        | "US"

By now you have probably realized the third argument of .bq.run is a JSON string containing the request for the method that you are calling.

Next steps

You now have the tools to run any supported BigQuery method.

Explore some of the public datasets in BigQuery and build something cool.


BigQuery API documentation