Generate and extend a database
This page explains how to create and expand databases using PyKX.
Tip: For the best experience, we recommend reading Databases in PyKX first. If you already have access to a database and only need to load it, you can skip this page and jump right to load database.
Before leveraging the performance of PyKX when querying on-disk data, you need to create a persisted database. In the following sections we complete the following:
- Create a new database containing a single table
trade
and multiple days of data. - Add a new day worth of data for
today
to the database for thetrade
table. - On-board a new table (
quote
) which contains data fromtoday
. - Ensure that the new table is queryable.
Bring your own data
The below example makes use of randomly-generated data using PyKX, where we use trade
or quote
tables generated in that manner. You can replace them with an equivalent Pandas/PyArrow table which will be converted to a PyKX table before being persisted.
1. Create database
For more information on database structures, see the linked section on what is a database. With PyKX, use the pykx.DB
class for all database interactions in Python. This class lets you create, expand, and maintain on-disk partitioned databases. First, we need to create a database.
In the next cell, we create a trade
table with data from multiple days in the chat.
>>> import pykx as kx
>>> N = 10000000
>>> trade = kx.Table(data={
... 'date': kx.random.random(N, kx.DateAtom('today') - [1, 2, 3, 4]),
... 'time': kx.q.asc(kx.random.random(N, kx.q('1D'))),
... 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
... 'price': kx.random.random(N, 10.0)
... })
Now that we have generated our trade table, we can persist it to disk at the location /tmp/db
.
>>> db = kx.DB(path='/tmp/db')
>>> db.create(trade, 'trade', 'date')
That's it, you now have a persisted database. To verify the availability of the database and its tables, we can examine the database object:
>>> db.tables
['trade']
>>> type(db.trade)
<class 'pykx.wrappers.PartitionedTable'>
The above database persistence uses the default parameters within the create
function. If you need to compress/encrypt the persisted database partitions or need to define a by
or specify the symbol enumeration name, you can follow the API documentation here.
2. Add new database partition
Now that you have generated a database, you can add extra partitions using the same database class and the create
function. In this example we will add new data for the current day created in the below cell:
>>> N = 2000000
>>> trade = kx.Table(data={
... 'time': kx.q.asc(kx.random.random(N, kx.q('1D'))),
... 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
... 'price': kx.random.random(N, 10.0)
... })
Note that in comparison to the original database creation logic, we do not have a date
column. Instead, we add a date at partition creation. Below we provide a variety of examples of adding new partitions under various conditions:
>>> db.create(trade, 'trade', kx.DateAtom('today'))
In the below example, we compress data within the persisted partition using gzip
. For further details on supported compression formats see here or look at the API reference here.
>>> gzip = kx.Compress(kx.CompressionAlgorithm.gzip, level=2)
>>> db.create(trade, 'trade', kx.DateAtom('today'), compress=gzip)
In the below example, we encrypt the data persisted for the added partition. For further details on how encryption works within PyKX see here or look at the API reference here.
>>> encrypt = kx.Encrypt('/path/to/mykey.key', 'mySuperSecretPassword')
>>> db.create(trade, 'trade', kx.DateAtom('today'), encrypt=encrypt)
3. Add new table to database
After onboarding your first table to a database, a common question is “How can I add a new table of related data?”. You can use the database
class and the create
function to do this. For instance, let’s add a quote
table for the current day:
>>> N = 1000000
>>> quote = kx.Table(data={
... 'time': kx.q.asc(kx.random.random(N, kx.q('1D'))),
... 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
... 'ask': kx.random.random(N, 100),
... 'bid': kx.random.random(N, 100)
... })
We can now add this as the data for the current day to the quote
table and see that the table is defined:
>>> db.create(quote, 'quote', kx.DateAtom('today'))
>>> db.tables
['quote', 'trade']
>>> type(db.quote)
<class 'pykx.wrappers.PartitionedTable'>
4. Ensure new table is queryable
You have now persisted another table to your database, however, you will notice if you access the quote
table that the return is surprising:
>>> db.quote
pykx.PartitionedTable(pykx.q('+`time`sym`ask`bid!`quote'))
The reason for this is that you currently do not have data in each partition of your database for the quote
table. To rectify this, run the fill_database
method off the database
class which adds relevant empty quote data to tables to the partitions from which it's missing:
>>> db.fill_database()
Now you should be able to access the quote
data for query:
>>> db.quote
Next Steps
- Load an existing database.
- Modify the contents of your database
- Query your database with Python
- Compress/encrypt data for persisting database partitions.