Skip to content

Get Data - Object Storage

The page provides a walkthrough to demonstrate how to ingest data from object storage into a database.

We have provided a weather dataset, hosted on each of the major cloud providers, for use in this walkthrough.

No kdb+ knowledge required

No prior experience with q/kdb+ is required to build this pipeline.

Before you import data, ensure the insights-demo database is created, as described here.

The following sections describes how to:

  • Create a pipeline. Create the weather pipeline and add it to the insights-demo package created here. This pipeline is comprised of the following nodes:
    • Readers. To read data from its source. Either Google Cloud Storage, Amazon S3 or Microsoft Azure Storage.
    • Decoders. To decode the ingested csv data.
    • Schema. To convert the data to a type compatible with a kdb+ database.
    • Writers. To write the data to a kdb Insights Enterprise database.
  • Deploy the pipeline. To run the pipeline you have just created to ingest data into the insights-demo database.
  • Teardown the pipeline. The pipeline can be torn down after data has been ingested. This frees up resources and is good practice.

Create the pipeline

Use the Import Wizard to create the pipeline:

  1. On the Overview page, choose Import Data under Databases on the Quick Actions panel.

    Select a build a database.

  2. In the Import your data screen select a cloud provider; Google Cloud Storage, Microsoft Azure Storage, Amazon S3.

    Select from one of the Cloud providers.

  3. Complete the reader properties for the selected cloud provider.

    Properties

    Setting Value
    GS URI* gs://kxevg/weather/temp.csv
    Project ID kx-evangelism
    Tenant Not applicable
    File Mode* Binary
    Offset* 0
    Chunking* Auto
    Chunk Size* 1MB
    Use Watching No
    Use Authentication No

    Properties

    Setting Value
    MS URI* ms://kxevg/temp.csv
    Account* kxevg
    Tenant Not applicable
    File Mode* Binary
    Offset* 0
    Chunking* Auto
    Chunk Size* 1MB
    Use Watching Unchecked
    Use Authentication Unchecked

    Properties

    Setting Value
    S3 URI* s3://kxs-prd-cxt-twg-roinsightsdemo/weather.csv
    Region* eu-west-1
    File Mode* Binary
    Tenant kxinsights
    Offset* 0
    Chunking* Auto
    Chunk Size 1MB
    Use Watching No
    Use Authentication No
  4. Click Next to select a decoder.

  5. Select CSV, as shown below, as the weather data is a csv file.

    Select the csv decoder for the weather data set.

  6. In the Configure CSV screen keep the default CSV decoder settings.

    Keep the default CSV decoder settings.

  7. Click Next to open the Configure Schema screen.

    Configure schema screen

  8. In the Configure Schema screen:

    Leave the following unchanged:

    setting value
    Apply a Schema Enabled
    Data Format Any
    1. Click Load Schema set the following values:

    2. Select insights-demo as the Database.

    3. Select weather as the Table.

    Database and table

  9. Click Load.

  10. Click Next to open the Configure Writer screen.

  11. Configure the writer settings as follows:

    setting value
    Database insights-demo
    Table weather

    Leave the remaining settings unchanged.

  12. Click Open Pipeline to display the Create Pipeline dialog.

    • Enter weather-1 as the Pipeline Name
    • Click Select a Package and select insights-demo.
    • Click Create.

    Create a pipeline

    If insights-demo is not available for selection, open the Packages Index and select Teardown from the actions menu beside insights-demo. If insights-demo does not appear on packages list create it, as described here.

  13. You can review the Pipeline as shown below. Note that the first node in the pipeline differs depending on the selected reader type.

    Completed pipeline following the import steps

  14. Click Save.

At this stage you are ready to ingest the data.

Deploy the pipeline

Deploy the package containing the database and pipeline in order to ingest the data into the database.

  1. Go to the Package Index page and click on the three dots beside insights-demo package and click Deploy.

Note

It may take Kubernetes several minutes to spin up the necessary resources to deploy the pipeline.

If the package or its database are already deployed you must tear it down. Do this on the Package Index page by clicking on the three dots beside insights-demo package and click Teardown.

  1. You can check the progress of the pipeline under the Running Pipelines panel of the Overview tab. The data is ready to query when Status = Finished.

    A running weather pipeline

Pipeline warnings

Once the pipeline is running some warnings may be displayed in the Running Pipelines panel of the Overview tab, these are expected and can be ignored.

Pipeline teardown

Once the CSV file has been ingested, the weather pipeline can be torn down. Ingesting this data is a batch ingest operation, rather than an ongoing stream, so it is ok to teardown the pipeline once the data is ingested. Tearing down a pipeline returns resources, so is a good practice when it is no longer needed.

  1. Click X in Running Pipelines on the Overview tab to teardown a pipeline.

    Teardown a pipeline.

  2. Check Clean up resources after teardown as these are no longer required now that the CSV file has been ingested.

    Teardown a pipeline to free up resources.

  3. Click Teardown Pipeline.

Troubleshoot pipelines

If any errors are reported they can be checked against the logs of the deployment process. Click View diagnostics in the Running Pipelines section of the Overview tab to review the status of a deployment.

Click *View Diagnostics* in **Running Pipelines** of **Overview** to view the status of a pipeline deployment.

Next steps

Now that data has been ingested into the weather table you can:

Further reading

Use the following links to learn more about specific topics mentioned in this page: