Skip to content

Generate and extend a database

This page explains how to create and expand databases using PyKX.

Tip: For the best experience, we recommend reading Databases in PyKX first. If you already have access to a database and only need to load it, you can skip this page and jump right to load database.

Before leveraging the performance of PyKX when querying on-disk data, you need to create a persisted database. In the following sections we complete the following:

  1. Create a new database containing a single table trade and multiple days of data.
  2. Add a new day worth of data for today to the database for the trade table.
  3. On-board a new table (quote) which contains data from today.
  4. Ensure that the new table is queryable.

Bring your own data

The below example makes use of randomly-generated data using PyKX, where we use trade or quote tables generated in that manner. You can replace them with an equivalent Pandas/PyArrow table which will be converted to a PyKX table before being persisted.

1. Create database

For more information on database structures, see the linked section on what is a database. With PyKX, use the pykx.DB class for all database interactions in Python. This class lets you create, expand, and maintain on-disk partitioned databases. First, we need to create a database.

In the next cell, we create a trade table with data from multiple days in the chat.

>>> import pykx as kx
>>> N = 10000000
>>> trade = kx.Table(data={
...     'date': kx.random.random(N, kx.DateAtom('today') - [1, 2, 3, 4]),
...     'time': kx.q.asc(kx.random.random(N, kx.q('1D'))),
...     'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
...     'price': kx.random.random(N, 10.0)
...     })

Now that we have generated our trade table, we can persist it to disk at the location /tmp/db.

>>> db = kx.DB(path='/tmp/db')
>>> db.create(trade, 'trade', 'date')

That's it, you now have a persisted database. To verify the availability of the database and its tables, we can examine the database object:

>>> db.tables
['trade']
>>> type(db.trade)
<class 'pykx.wrappers.PartitionedTable'>

The above database persistence uses the default parameters within the create function. If you need to compress/encrypt the persisted database partitions or need to define a by or specify the symbol enumeration name, you can follow the API documentation here.

2. Add new database partition

Now that you have generated a database, you can add extra partitions using the same database class and the create function. In this example we will add new data for the current day created in the below cell:

>>> N = 2000000
>>> trade = kx.Table(data={
...     'time': kx.q.asc(kx.random.random(N, kx.q('1D'))),
...     'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
...     'price': kx.random.random(N, 10.0)
...     })

Note that in comparison to the original database creation logic, we do not have a date column. Instead, we add a date at partition creation. Below we provide a variety of examples of adding new partitions under various conditions:

>>> db.create(trade, 'trade', kx.DateAtom('today'))

In the below example, we compress data within the persisted partition using gzip. For further details on supported compression formats see here or look at the API reference here.

>>> gzip = kx.Compress(kx.CompressionAlgorithm.gzip, level=2)
>>> db.create(trade, 'trade', kx.DateAtom('today'), compress=gzip)

In the below example, we encrypt the data persisted for the added partition. For further details on how encryption works within PyKX see here or look at the API reference here.

>>> encrypt = kx.Encrypt('/path/to/mykey.key', 'mySuperSecretPassword')
>>> db.create(trade, 'trade', kx.DateAtom('today'), encrypt=encrypt)

3. Add new table to database

After onboarding your first table to a database, a common question is “How can I add a new table of related data?”. You can use the database class and the create function to do this. For instance, let’s add a quote table for the current day:

>>> N = 1000000
>>> quote = kx.Table(data={
...     'time': kx.q.asc(kx.random.random(N, kx.q('1D'))),
...     'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
...     'ask': kx.random.random(N, 100),
...     'bid': kx.random.random(N, 100)
... })

We can now add this as the data for the current day to the quote table and see that the table is defined:

>>> db.create(quote, 'quote', kx.DateAtom('today'))
>>> db.tables
['quote', 'trade']
>>> type(db.quote)
<class 'pykx.wrappers.PartitionedTable'>

4. Ensure new table is queryable

You have now persisted another table to your database, however, you will notice if you access the quote table that the return is surprising:

>>> db.quote
pykx.PartitionedTable(pykx.q('+`time`sym`ask`bid!`quote'))

The reason for this is that you currently do not have data in each partition of your database for the quote table. To rectify this, run the fill_database method off the database class which adds relevant empty quote data to tables to the partitions from which it's missing:

>>> db.fill_database()

Now you should be able to access the quote data for query:

>>> db.quote

Next Steps