Skip to content

Tutorial

Running using qce

Run qce and load the BigQuery library with

q)\l bq.q_

If you want to access private projects during development, you can register an access token

$ export GOOGLE_TOKEN=$(gcloud auth print-access-token)
$ qce
q)\l bq.q_
q).kurl.register (`oauth2; "*.googleapis.com"; ""; enlist[`access_token]!enlist getenv `GOOGLE_TOKEN)

Running as a qp application

When you're running inside a Docker container and want to access private BigQuery projects you will need to register your authentication token for BigQuery manually.

A simple way to do this is to pass the access token from your local computer into the container so that you can use it for registration.

Create a project with the structure below

$ mkdir app
$ cd app
$ tree .
.
├── bq.qpk
├── main.q
└── qp.json

0 directories, 3 files

Make sure that the project specifies bq as a depenendency

{
    "app": {
        "entry": [ "main.q" ],
        "depends": [ "bq" ]
    }
}

Build and tag an image to run

$ qp build
$ qp tag app 0.1.0

Run the container and pass in the access token as an environment variable

$ docker run --rm -it --env-file qpbuild/.env -e GOOGLE_TOKEN=$(gcloud auth print-access-token) app:0.1.0

Inside the container register with kurl manually

q).kurl.register (`oauth2; "*.googleapis.com"; ""; enlist[`access_token]!enlist getenv `GOOGLE_TOKEN)

Exploring the workspace

The workspace contains all of the BigQuery resources and their available methods.

You can list the available resources

q).com_kx_bq.disc.listResources[]
`rowAccessPolicies`models`tabledata`datasets`tables`projects`routines`jobs

and get a description of the methods that a given resource supports.

q).com_kx_bq.disc.describeMethodsForResource`datasets
method httpMethod parameters           request             response                description   ..
-------------------------------------------------------------------------------------------------..
update PUT        `projectId`datasetId (,`$ref)!,"Dataset" (,`$ref)!,"Dataset"     "Updates infor..
list   GET        ,`projectId          ::                  (,`$ref)!,"DatasetList" "Lists all dat..
patch  PATCH      `projectId`datasetId (,`$ref)!,"Dataset" (,`$ref)!,"Dataset"     "Updates infor..
delete DELETE     `projectId`datasetId ::                  ::                      "Deletes the d..
get    GET        `projectId`datasetId ::                  (,`$ref)!,"Dataset"     "Returns the d..
insert POST       ,`projectId          (,`$ref)!,"Dataset" (,`$ref)!,"Dataset"     "Creates a new..

As you can see, each method takes some parameters. You can get some more detailed information on these parameters.

q).com_kx_bq.disc.getParametersForMethod `datasets.list
parameter type     location description                     required format
---------------------------------------------------------------------------..
projectId "string" "path"   "Project ID of the new dataset" 1        ::    

Your first request

From this information, you can make a request to list the datasets that you have access to using .com_kx_bq.run

q).com_kx_bq.run[`datasets.list; enlist[`projectId]!enlist "cloudpak"; ()]
kind    | "bigquery#datasetList"
etag    | "PiOY7GfDPfxBWU56fwyYbg=="
datasets| +`kind`id`datasetReference`location!(("bigquery#dataset";"bigquery#dataset";"bigquery#d..

The first parameter is the method name and the second parameter is a dictionary of the method parameters. We'll get to the third parameter later.

The response column that is returned from .com_kx_bq.disc.describeMethodsForResource tells you what type of object is returned - in this case it is a DatasetList.

You can find detailed information about these objects in the BigQuery API docs, for example https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets#DatasetList

From the response, you can pull out the exact list of datasets using the datasets key

q).com_kx_bq.run[`datasets.list; enlist[`projectId]!enlist "cloudpak"; ()]`datasets
kind               id                             datasetReference                               ..
-------------------------------------------------------------------------------------------------..
"bigquery#dataset" "cloudpak:kx_bma_bigquery_ds"  `datasetId`projectId!("kx_bma_bigquery_ds";"clo..
"bigquery#dataset" "cloudpak:kx_test_bigquery_ds" `datasetId`projectId!("kx_test_bigquery_ds";"cl..
"bigquery#dataset" "cloudpak:test_dataset"        `datasetId`projectId!("test_dataset";"cloudpak"..
"bigquery#dataset" "cloudpak:test_dataset2"       `datasetId`projectId!("test_dataset2";"cloudpak..
"bigquery#dataset" "cloudpak:weather"             `datasetId`projectId!("weather";"cloudpak")    ..

Your second request

From the description of the methods, you can see that methods that have a HTTP method of POST or PUT take a request object.

These request objects are the same as the response objects that are returned from the API.

If you want to use the insert method, you need to pass a Dataset as the request field.

You can get information about this object in the BigQuery API docs https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets#Dataset

From the docs, you can see that datasetReference is the only required field for the Dataset object. The value of datasetReference is another object that contains the fields datasetId and projectId. Of those fields, only datasetId is required.

So a minimal Dataset object would look like

{
    "datasetReference" : { 
        "datasetId" : "MyDataset"
    }

}

Using this information, you can create a new dataset using .com_kx_bq.run

q)request:enlist[`datasetReference]!enlist enlist[`datasetId]!enlist "MyDataset"

q).j.j request
"{\"datasetReference\":{\"datasetId\":\"MyDataset\"}}"

q).com_kx_bq.run[`datasets.insert; enlist[`projectId]!enlist "cloudpak"; .j.j request]
kind            | "bigquery#dataset"
etag            | "KQdjOs1o+xP1TfMFdstYog=="
id              | "cloudpak:MyDataset"
selfLink        | "https://bigquery.googleapis.com/bigquery/v2/projects/cloudpak/datasets/MyDatas..
datasetReference| `datasetId`projectId!("MyDataset";"cloudpak")
access          | (`role`specialGroup!("WRITER";"projectWriters");`role`specialGroup!("OWNER";"pr..
creationTime    | "1612263987441"
lastModifiedTime| "1612263987441"
location        | "US"

By now you have probably realised that the third argument of .com_kx_bq.run is a JSON string containing the request for the method that you are calling.

Next steps

You now have the tools to run any supported BigQuery method.

Now go and explore some of the public datasets available in BigQuery and build something cool.

https://cloud.google.com/bigquery/public-data