Database Creation and Management¶
This notebook provides a walkthrough of some of the functionality available for users looking to create and maintain large databases using PyKX.
In particular, this notebook refers to creating and maintaining partitioned kdb+ databases. Go to Q for Mortals for more in-depth information about partitioned databases in kdb+.
You can download this walkthrough as a .ipynb notebook file using the following link.
This walkthrough provides examples of the following tasks:
- Creating a database from a historical dataset
- Adding a new partition to the database
- Managing the on-disk database by:
- Renaming a table and column
- Creating a copy of a column to the database
- Applying a Python function to a column of the database
- Updating the data type of a column
- Adding a new table to the most recent partition of the database
For full information on the functions available you can reference the API section.
Initial setup¶
Import all required libraries and create a temporary directory which will be used to store the database we create for this walkthrough
import os
os.environ['PYKX_BETA_FEATURES'] = 'true'
import pykx as kx
from datetime import date
import tempfile
tempdir = tempfile.TemporaryDirectory()
Database interactions are facilitated through use of the pykx.DB class. All methods/attributes used in this notebook are contained within this class.
Initialise the DB class to start. The expected input is the file path where you intend to save the partitioned database and its associated tables. In this case we're going to use the temporary directory we just created.
db = kx.DB(path = tempdir.name + '/db')
For details on any methods contained within this class, you can use the help method.
help(db.create)
Help on method create in module pykx.db:
create(table, table_name, partition, *, by_field=None, sym_enum=None, log=True, compress=None, encrypt=None) method of pykx.db.DB instance
Create an on-disk partitioned table within a kdb+ database from a supplied
`pykx.Table` object. Once generated this table will be accessible
as an attribute of the `DB` class or a sub attribute of `DB.table`.
Parameters:
table: The `pykx.Table` object which is to be persisted to disk
table_name: The name with which the table will be persisted and accessible
once loaded and available as a `pykx.PartitionedTable`
partition: The name of the column which is to be used to partition the data if
supplied as a `str` or if supplied as non string object this will be used as
the partition to which all data is persisted
by_field: A field of the table to be used as a by column, this column will be
the second column in the table (the first being the virtual column determined
by the partitioning column)
sym_enum: The name of the symbol enumeration table to be associated with the table
log: Print information about status of partitioned datab
compress: `pykx.Compress` initialized class denoting the
compression settings to be used when persisting a partition/partitions
encrypt: `pykx.Encrypt` initialized class denoting the encryption setting to be used
when persisting a partition/partitions
Returns:
A `None` object on successful invocation, the database class will be
updated to contain attributes associated with the available created table
Examples:
Generate a partitioned table from a table containing multiple partitions
```python
>>> import pykx as kx
>>> db = kx.DB(path = '/tmp/newDB')
>>> N = 1000
>>> qtab = kx.Table(data = {
... 'date': kx.q.asc(kx.random.random(N, kx.q('2020.01 2020.02 2020.03m'))),
... 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
... 'price': kx.random.random(N, 10.0),
... 'size': kx.random.random(N, 100)
... })
>>> db.create(qtab, 'stocks', 'date', by_field = 'sym', sym_enum = 'symbols')
>>> db.tables
['stocks']
>>> db.stocks
pykx.PartitionedTable(pykx.q('
month sym price size
---------------------------
2020.01 AAPL 7.979004 85
2020.01 AAPL 5.931866 55
2020.01 AAPL 5.255477 49
2020.01 AAPL 8.15255 74
2020.01 AAPL 4.771067 80
..
'))
```
Add a table as a partition to an on-disk database, in the example below we are adding
a partition to the table generated above
```python
>>> import pykx as kx
>>> db = kx.DB(path = '/tmp/newDB')
>>> N = 333
>>> qtab = kx.Table(data = {
... 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
... 'price': kx.random.random(N, 10.0),
... 'size': kx.random.random(N, 100)
... })
>>> db.create(qtab, 'stocks', kx.q('2020.04m'), by_field = 'sym', sym_enum = 'symbols')
>>> db.tables
['stocks']
>>> db.stocks
pykx.PartitionedTable(pykx.q('
month sym price size
---------------------------
2020.01 AAPL 7.979004 85
2020.01 AAPL 5.931866 55
2020.01 AAPL 5.255477 49
2020.01 AAPL 8.15255 74
2020.01 AAPL 4.771067 80
..
'))
```
Add a table as a partition to an on-disk database, in the example below we are
additionally applying gzip compression to the persisted table
```python
>>> import pykx as kx
>>> db = kx.DB(path = '/tmp/newDB')
>>> N = 333
>>> qtab = kx.Table(data = {
... 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
... 'price': kx.random.random(N, 10.0),
... 'size': kx.random.random(N, 100)
... })
>>> compress = kx.Compress(kx.CompressionAlgorithm.gzip, level=2)
>>> db.create(qtab, 'stocks', kx.q('2020.04m'), compress=compress)
>>> kx.q('{-21!hsym x}', '/tmp/newDB/2020.04/stocks/price')
pykx.Dictionary(pykx.q('
compressedLength | 2064
uncompressedLength| 2680
algorithm | 2i
logicalBlockSize | 17i
zipLevel | 2i
'))
```
Create the sample dataset¶
Create a dataset called trades containing time-series data spanning multiple dates, and columns of various types:
N = 1000000
trades = kx.Table(data={
'date': kx.random.random(N, [date(2020, 1, 1), date(2020, 1, 2)]),
'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
'price': kx.random.random(N, 10.0),
'size': kx.random.random(N, 1000)
})
Create the database¶
Create the database using the date column as the partition, and add trades as a table called trade_data within it.
db.create(trades, 'trade_data', 'date')
Writing Database Partition 2020.01.01 to table trade_data Writing Database Partition 2020.01.02 to table trade_data
This now exists as a table and is saved to disk.
db.tables
['trade_data']
When a table is saved, an attribute is added to the db class for it. For our newly generated table, this is db.trade_data
db.trade_data
| date | sym | price | size | |
|---|---|---|---|---|
| 0 | 2020.01.01 | MSFT | 7.079266 | 800 |
| 1 | 2020.01.01 | AAPL | 1.824321 | 65 |
| 2 | 2020.01.01 | MSFT | 2.408259 | 292 |
| 3 | 2020.01.01 | GOOG | 1.675438 | 7 |
| 4 | 2020.01.01 | AAPL | 8.311168 | 183 |
| 5 | 2020.01.01 | AAPL | 2.208693 | 989 |
| 6 | 2020.01.01 | MSFT | 6.068126 | 567 |
| 7 | 2020.01.01 | AAPL | 4.918926 | 794 |
| 8 | 2020.01.01 | AAPL | 9.331869 | 39 |
| 9 | 2020.01.01 | AAPL | 1.142611 | 507 |
| 10 | 2020.01.01 | AAPL | 2.685874 | 581 |
| 11 | 2020.01.01 | AAPL | 3.483591 | 163 |
| 12 | 2020.01.01 | AAPL | 0.4422525 | 466 |
| 13 | 2020.01.01 | MSFT | 7.406654 | 976 |
| 14 | 2020.01.01 | MSFT | 2.493871 | 171 |
| 15 | 2020.01.01 | AAPL | 9.242088 | 28 |
| 16 | 2020.01.01 | MSFT | 0.3954522 | 747 |
| 17 | 2020.01.01 | MSFT | 0.3441191 | 512 |
| 18 | 2020.01.01 | GOOG | 9.662762 | 998 |
| 19 | 2020.01.01 | AAPL | 9.601674 | 812 |
| 20 | 2020.01.01 | AAPL | 4.969858 | 910 |
| 21 | 2020.01.01 | GOOG | 1.048204 | 830 |
| 22 | 2020.01.01 | GOOG | 0.9817644 | 595 |
| ... | ... | ... | ... | ... |
| 999999 | 2020.01.02 | GOOG | 1.470716 | 636 |
1,000,000 rows × 4 columns
Add a new partition to the database¶
Once a table has been generated, you can add more partitions to the database through reuse of the create method. In this case we are adding the new partition 2020.01.03 to the database.
N = 10000
new_day = kx.Table(data={
'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
'price': kx.random.random(N, 10.0),
'size': kx.random.random(N, 100)
})
db.create(new_day, 'trade_data', date(2020, 1, 3))
Writing Database Partition 2020-01-03 to table trade_data
/usr/local/lib/python3.10/site-packages/pykx/db.py:381: UserWarning: A database table "trade_data" would overwrite one of the pykx.DB() methods, please access your table via the table attribute
warn(f'A database table "{i}" would overwrite one of the pykx.DB() methods, please access your table via the table attribute') # noqa: E501
Manage the database¶
This section covers updating the contents of a database. The examples below demonstrate a number of common tasks that would be completed regularly when updating a database.
The name of a table can be updated using the rename_table method. Below, we are updating the table trade_data to be called trade.
db.rename_table('trade_data', 'trades')
2024.10.22 13:25:16 renaming :/tmp/tmpiw21h3k2/db/2020.01.01/trade_data to :/tmp/tmpiw21h3k2/db/2020.01.01/trades
2024.10.22 13:25:16 renaming :/tmp/tmpiw21h3k2/db/2020.01.02/trade_data to :/tmp/tmpiw21h3k2/db/2020.01.02/trades 2024.10.22 13:25:16 renaming :/tmp/tmpiw21h3k2/db/2020.01.03/trade_data to :/tmp/tmpiw21h3k2/db/2020.01.03/trades
During the rename process, the attribute in the db class is also updated.
db.trades
| date | sym | price | size | |
|---|---|---|---|---|
| 0 | 2020.01.01 | MSFT | 7.079266 | 800 |
| 1 | 2020.01.01 | AAPL | 1.824321 | 65 |
| 2 | 2020.01.01 | MSFT | 2.408259 | 292 |
| 3 | 2020.01.01 | GOOG | 1.675438 | 7 |
| 4 | 2020.01.01 | AAPL | 8.311168 | 183 |
| 5 | 2020.01.01 | AAPL | 2.208693 | 989 |
| 6 | 2020.01.01 | MSFT | 6.068126 | 567 |
| 7 | 2020.01.01 | AAPL | 4.918926 | 794 |
| 8 | 2020.01.01 | AAPL | 9.331869 | 39 |
| 9 | 2020.01.01 | AAPL | 1.142611 | 507 |
| 10 | 2020.01.01 | AAPL | 2.685874 | 581 |
| 11 | 2020.01.01 | AAPL | 3.483591 | 163 |
| 12 | 2020.01.01 | AAPL | 0.4422525 | 466 |
| 13 | 2020.01.01 | MSFT | 7.406654 | 976 |
| 14 | 2020.01.01 | MSFT | 2.493871 | 171 |
| 15 | 2020.01.01 | AAPL | 9.242088 | 28 |
| 16 | 2020.01.01 | MSFT | 0.3954522 | 747 |
| 17 | 2020.01.01 | MSFT | 0.3441191 | 512 |
| 18 | 2020.01.01 | GOOG | 9.662762 | 998 |
| 19 | 2020.01.01 | AAPL | 9.601674 | 812 |
| 20 | 2020.01.01 | AAPL | 4.969858 | 910 |
| 21 | 2020.01.01 | GOOG | 1.048204 | 830 |
| 22 | 2020.01.01 | GOOG | 0.9817644 | 595 |
| ... | ... | ... | ... | ... |
| 1009999 | 2020.01.03 | AAPL | 9.750387 | 99 |
1,010,000 rows × 4 columns
Renaming a column in a table is achieved using the rename_column method. For example, let's update the sym column in the trade table to be called ticker.
db.rename_column('trades', 'sym', 'ticker')
2024.10.22 13:25:16 renaming sym to ticker in `:/tmp/tmpiw21h3k2/db/2020.01.01/trades 2024.10.22 13:25:16 renaming sym to ticker in `:/tmp/tmpiw21h3k2/db/2020.01.02/trades 2024.10.22 13:25:16 renaming sym to ticker in `:/tmp/tmpiw21h3k2/db/2020.01.03/trades
/usr/local/lib/python3.10/site-packages/pykx/db.py:381: UserWarning: A database table "trades" would overwrite one of the pykx.DB() methods, please access your table via the table attribute
warn(f'A database table "{i}" would overwrite one of the pykx.DB() methods, please access your table via the table attribute') # noqa: E501
db.trades
| date | ticker | price | size | |
|---|---|---|---|---|
| 0 | 2020.01.01 | MSFT | 7.079266 | 800 |
| 1 | 2020.01.01 | AAPL | 1.824321 | 65 |
| 2 | 2020.01.01 | MSFT | 2.408259 | 292 |
| 3 | 2020.01.01 | GOOG | 1.675438 | 7 |
| 4 | 2020.01.01 | AAPL | 8.311168 | 183 |
| 5 | 2020.01.01 | AAPL | 2.208693 | 989 |
| 6 | 2020.01.01 | MSFT | 6.068126 | 567 |
| 7 | 2020.01.01 | AAPL | 4.918926 | 794 |
| 8 | 2020.01.01 | AAPL | 9.331869 | 39 |
| 9 | 2020.01.01 | AAPL | 1.142611 | 507 |
| 10 | 2020.01.01 | AAPL | 2.685874 | 581 |
| 11 | 2020.01.01 | AAPL | 3.483591 | 163 |
| 12 | 2020.01.01 | AAPL | 0.4422525 | 466 |
| 13 | 2020.01.01 | MSFT | 7.406654 | 976 |
| 14 | 2020.01.01 | MSFT | 2.493871 | 171 |
| 15 | 2020.01.01 | AAPL | 9.242088 | 28 |
| 16 | 2020.01.01 | MSFT | 0.3954522 | 747 |
| 17 | 2020.01.01 | MSFT | 0.3441191 | 512 |
| 18 | 2020.01.01 | GOOG | 9.662762 | 998 |
| 19 | 2020.01.01 | AAPL | 9.601674 | 812 |
| 20 | 2020.01.01 | AAPL | 4.969858 | 910 |
| 21 | 2020.01.01 | GOOG | 1.048204 | 830 |
| 22 | 2020.01.01 | GOOG | 0.9817644 | 595 |
| ... | ... | ... | ... | ... |
| 1009999 | 2020.01.03 | AAPL | 9.750387 | 99 |
1,010,000 rows × 4 columns
To safely apply a function to modify the price column within the database, first create a copy of the column.
db.copy_column('trades', 'price', 'price_copy')
2024.10.22 13:25:16 copying price to price_copy in `:/tmp/tmpiw21h3k2/db/2020.01.01/trades 2024.10.22 13:25:16 copying price to price_copy in `:/tmp/tmpiw21h3k2/db/2020.01.02/trades 2024.10.22 13:25:16 copying price to price_copy in `:/tmp/tmpiw21h3k2/db/2020.01.03/trades
/usr/local/lib/python3.10/site-packages/pykx/db.py:381: UserWarning: A database table "trades" would overwrite one of the pykx.DB() methods, please access your table via the table attribute
warn(f'A database table "{i}" would overwrite one of the pykx.DB() methods, please access your table via the table attribute') # noqa: E501
db.trades
| date | ticker | price | size | price_copy | |
|---|---|---|---|---|---|
| 0 | 2020.01.01 | MSFT | 7.079266 | 800 | 7.079266 |
| 1 | 2020.01.01 | AAPL | 1.824321 | 65 | 1.824321 |
| 2 | 2020.01.01 | MSFT | 2.408259 | 292 | 2.408259 |
| 3 | 2020.01.01 | GOOG | 1.675438 | 7 | 1.675438 |
| 4 | 2020.01.01 | AAPL | 8.311168 | 183 | 8.311168 |
| 5 | 2020.01.01 | AAPL | 2.208693 | 989 | 2.208693 |
| 6 | 2020.01.01 | MSFT | 6.068126 | 567 | 6.068126 |
| 7 | 2020.01.01 | AAPL | 4.918926 | 794 | 4.918926 |
| 8 | 2020.01.01 | AAPL | 9.331869 | 39 | 9.331869 |
| 9 | 2020.01.01 | AAPL | 1.142611 | 507 | 1.142611 |
| 10 | 2020.01.01 | AAPL | 2.685874 | 581 | 2.685874 |
| 11 | 2020.01.01 | AAPL | 3.483591 | 163 | 3.483591 |
| 12 | 2020.01.01 | AAPL | 0.4422525 | 466 | 0.4422525 |
| 13 | 2020.01.01 | MSFT | 7.406654 | 976 | 7.406654 |
| 14 | 2020.01.01 | MSFT | 2.493871 | 171 | 2.493871 |
| 15 | 2020.01.01 | AAPL | 9.242088 | 28 | 9.242088 |
| 16 | 2020.01.01 | MSFT | 0.3954522 | 747 | 0.3954522 |
| 17 | 2020.01.01 | MSFT | 0.3441191 | 512 | 0.3441191 |
| 18 | 2020.01.01 | GOOG | 9.662762 | 998 | 9.662762 |
| 19 | 2020.01.01 | AAPL | 9.601674 | 812 | 9.601674 |
| 20 | 2020.01.01 | AAPL | 4.969858 | 910 | 4.969858 |
| 21 | 2020.01.01 | GOOG | 1.048204 | 830 | 1.048204 |
| 22 | 2020.01.01 | GOOG | 0.9817644 | 595 | 0.9817644 |
| ... | ... | ... | ... | ... | ... |
| 1009999 | 2020.01.03 | AAPL | 9.750387 | 99 | 9.750387 |
1,010,000 rows × 4 columns
You can now apply a function to the copied column without the risk of losing the original data. Below we are modifying the copied column by multiplying the contents by 2.
db.apply_function('trades', 'price_copy', kx.q('{2*x}'))
2024.10.22 13:25:16 resaving column price_copy (type 9) in `:/tmp/tmpiw21h3k2/db/2020.01.01/trades 2024.10.22 13:25:16 resaving column price_copy (type 9) in `:/tmp/tmpiw21h3k2/db/2020.01.02/trades 2024.10.22 13:25:16 resaving column price_copy (type 9) in `:/tmp/tmpiw21h3k2/db/2020.01.03/trades
/usr/local/lib/python3.10/site-packages/pykx/db.py:381: UserWarning: A database table "trades" would overwrite one of the pykx.DB() methods, please access your table via the table attribute
warn(f'A database table "{i}" would overwrite one of the pykx.DB() methods, please access your table via the table attribute') # noqa: E501
db.trades
| date | ticker | price | size | price_copy | |
|---|---|---|---|---|---|
| 0 | 2020.01.01 | MSFT | 7.079266 | 800 | 14.15853 |
| 1 | 2020.01.01 | AAPL | 1.824321 | 65 | 3.648642 |
| 2 | 2020.01.01 | MSFT | 2.408259 | 292 | 4.816519 |
| 3 | 2020.01.01 | GOOG | 1.675438 | 7 | 3.350875 |
| 4 | 2020.01.01 | AAPL | 8.311168 | 183 | 16.62234 |
| 5 | 2020.01.01 | AAPL | 2.208693 | 989 | 4.417385 |
| 6 | 2020.01.01 | MSFT | 6.068126 | 567 | 12.13625 |
| 7 | 2020.01.01 | AAPL | 4.918926 | 794 | 9.837851 |
| 8 | 2020.01.01 | AAPL | 9.331869 | 39 | 18.66374 |
| 9 | 2020.01.01 | AAPL | 1.142611 | 507 | 2.285222 |
| 10 | 2020.01.01 | AAPL | 2.685874 | 581 | 5.371748 |
| 11 | 2020.01.01 | AAPL | 3.483591 | 163 | 6.967183 |
| 12 | 2020.01.01 | AAPL | 0.4422525 | 466 | 0.8845049 |
| 13 | 2020.01.01 | MSFT | 7.406654 | 976 | 14.81331 |
| 14 | 2020.01.01 | MSFT | 2.493871 | 171 | 4.987742 |
| 15 | 2020.01.01 | AAPL | 9.242088 | 28 | 18.48418 |
| 16 | 2020.01.01 | MSFT | 0.3954522 | 747 | 0.7909045 |
| 17 | 2020.01.01 | MSFT | 0.3441191 | 512 | 0.6882382 |
| 18 | 2020.01.01 | GOOG | 9.662762 | 998 | 19.32552 |
| 19 | 2020.01.01 | AAPL | 9.601674 | 812 | 19.20335 |
| 20 | 2020.01.01 | AAPL | 4.969858 | 910 | 9.939716 |
| 21 | 2020.01.01 | GOOG | 1.048204 | 830 | 2.096408 |
| 22 | 2020.01.01 | GOOG | 0.9817644 | 595 | 1.963529 |
| ... | ... | ... | ... | ... | ... |
| 1009999 | 2020.01.03 | AAPL | 9.750387 | 99 | 19.50077 |
1,010,000 rows × 4 columns
Once you are happy with the new values within the price_copy column, you can safely delete the price column, then rename the price_copy column to be called price.
db.delete_column('trades', 'price')
db.rename_column('trades', 'price_copy', 'price')
2024.10.22 13:25:16 deleting column price from `:/tmp/tmpiw21h3k2/db/2020.01.01/trades 2024.10.22 13:25:16 deleting column price from `:/tmp/tmpiw21h3k2/db/2020.01.02/trades 2024.10.22 13:25:16 deleting column price from `:/tmp/tmpiw21h3k2/db/2020.01.03/trades 2024.10.22 13:25:16 renaming price_copy to price in `:/tmp/tmpiw21h3k2/db/2020.01.01/trades 2024.10.22 13:25:16 renaming price_copy to price in `:/tmp/tmpiw21h3k2/db/2020.01.02/trades 2024.10.22 13:25:16 renaming price_copy to price in `:/tmp/tmpiw21h3k2/db/2020.01.03/trades
/usr/local/lib/python3.10/site-packages/pykx/db.py:381: UserWarning: A database table "trades" would overwrite one of the pykx.DB() methods, please access your table via the table attribute
warn(f'A database table "{i}" would overwrite one of the pykx.DB() methods, please access your table via the table attribute') # noqa: E501
db.trades
| date | ticker | size | price | |
|---|---|---|---|---|
| 0 | 2020.01.01 | MSFT | 800 | 14.15853 |
| 1 | 2020.01.01 | AAPL | 65 | 3.648642 |
| 2 | 2020.01.01 | MSFT | 292 | 4.816519 |
| 3 | 2020.01.01 | GOOG | 7 | 3.350875 |
| 4 | 2020.01.01 | AAPL | 183 | 16.62234 |
| 5 | 2020.01.01 | AAPL | 989 | 4.417385 |
| 6 | 2020.01.01 | MSFT | 567 | 12.13625 |
| 7 | 2020.01.01 | AAPL | 794 | 9.837851 |
| 8 | 2020.01.01 | AAPL | 39 | 18.66374 |
| 9 | 2020.01.01 | AAPL | 507 | 2.285222 |
| 10 | 2020.01.01 | AAPL | 581 | 5.371748 |
| 11 | 2020.01.01 | AAPL | 163 | 6.967183 |
| 12 | 2020.01.01 | AAPL | 466 | 0.8845049 |
| 13 | 2020.01.01 | MSFT | 976 | 14.81331 |
| 14 | 2020.01.01 | MSFT | 171 | 4.987742 |
| 15 | 2020.01.01 | AAPL | 28 | 18.48418 |
| 16 | 2020.01.01 | MSFT | 747 | 0.7909045 |
| 17 | 2020.01.01 | MSFT | 512 | 0.6882382 |
| 18 | 2020.01.01 | GOOG | 998 | 19.32552 |
| 19 | 2020.01.01 | AAPL | 812 | 19.20335 |
| 20 | 2020.01.01 | AAPL | 910 | 9.939716 |
| 21 | 2020.01.01 | GOOG | 830 | 2.096408 |
| 22 | 2020.01.01 | GOOG | 595 | 1.963529 |
| ... | ... | ... | ... | ... |
| 1009999 | 2020.01.03 | AAPL | 99 | 19.50077 |
1,010,000 rows × 4 columns
To convert the data type of a column, you can use the set_column_type method. Before we do that, we can look at the metadata information for the table using the meta method.
kx.q.meta(db.trades)
| t | f | a | |
|---|---|---|---|
| c | |||
| date | "d" | ||
| ticker | "s" | ||
| size | "j" | ||
| price | "f" |
Currently the size column is the type LongAtom. We will update this to be a type ShortAtom.
db.set_column_type('trades', 'size', kx.ShortAtom)
2024.10.22 13:25:16 resaving column size (type 5) in `:/tmp/tmpiw21h3k2/db/2020.01.01/trades 2024.10.22 13:25:16 resaving column size (type 5) in `:/tmp/tmpiw21h3k2/db/2020.01.02/trades 2024.10.22 13:25:16 resaving column size (type 5) in `:/tmp/tmpiw21h3k2/db/2020.01.03/trades
/usr/local/lib/python3.10/site-packages/pykx/db.py:381: UserWarning: A database table "trades" would overwrite one of the pykx.DB() methods, please access your table via the table attribute
warn(f'A database table "{i}" would overwrite one of the pykx.DB() methods, please access your table via the table attribute') # noqa: E501
Now let's apply the grouped attribute to the size column. For more information on attributes in kdb+, please refer to the Q for Mortals Attributes section.
db.set_column_attribute('trades', 'ticker', 'grouped')
2024.10.22 13:25:16 resaving column ticker (type 20) in `:/tmp/tmpiw21h3k2/db/2020.01.01/trades 2024.10.22 13:25:16 resaving column ticker (type 20) in `:/tmp/tmpiw21h3k2/db/2020.01.02/trades 2024.10.22 13:25:16 resaving column ticker (type 20) in `:/tmp/tmpiw21h3k2/db/2020.01.03/trades
Let's revisit the metadata of the table to ensure they have been applied correctly.
kx.q.meta(db.trades)
| t | f | a | |
|---|---|---|---|
| c | |||
| date | "d" | ||
| ticker | "s" | g | |
| size | "h" | ||
| price | "f" |
Onboarding your next table¶
Now that you have successfully set up one table, you may want to add a second table. We follow the same method as before and create the quotes table using the create method. In this example, the quotes table only contains data for 2020.01.03.
quotes = kx.Table(data={
'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
'open': kx.random.random(N, 10.0),
'high': kx.random.random(N, 10.0),
'low': kx.random.random(N, 10.0),
'close': kx.random.random(N, 10.0)
})
db.create(quotes, 'quotes', date(2020, 1, 3), by_field = 'sym')
Writing Database Partition 2020-01-03 to table quotes
/usr/local/lib/python3.10/site-packages/pykx/db.py:381: UserWarning: A database table "trades" would overwrite one of the pykx.DB() methods, please access your table via the table attribute
warn(f'A database table "{i}" would overwrite one of the pykx.DB() methods, please access your table via the table attribute') # noqa: E501
All tables within a database must contain the same partition structure. To ensure the new table can be accessed, the quotes table needs to exist in every partition within the database, even if there is no data for that partition. This is called backfilling data. For the partitions where the quotes table is missing, we use the fill_database method.
db.fill_database()
Successfully filled missing tables to partition: :/tmp/tmpiw21h3k2/db/2020.01.02 Successfully filled missing tables to partition: :/tmp/tmpiw21h3k2/db/2020.01.01
/usr/local/lib/python3.10/site-packages/pykx/db.py:381: UserWarning: A database table "quotes" would overwrite one of the pykx.DB() methods, please access your table via the table attribute
warn(f'A database table "{i}" would overwrite one of the pykx.DB() methods, please access your table via the table attribute') # noqa: E501
Now that the database has resolved the missing tables within the partitions, we can view the new quotes table
db.quotes
| date | sym | open | high | low | close | |
|---|---|---|---|---|---|---|
| 0 | 2020.01.03 | AAPL | 8.204026 | 0.9115201 | 3.916864 | 9.813545 |
| 1 | 2020.01.03 | AAPL | 8.092754 | 6.019578 | 0.08513137 | 2.825277 |
| 2 | 2020.01.03 | AAPL | 1.425043 | 8.881719 | 4.285461 | 7.820761 |
| 3 | 2020.01.03 | AAPL | 7.172736 | 3.33985 | 5.999403 | 3.010211 |
| 4 | 2020.01.03 | AAPL | 2.974185 | 1.559372 | 2.76356 | 5.182052 |
| 5 | 2020.01.03 | AAPL | 3.200759 | 7.485088 | 7.928813 | 6.437041 |
| 6 | 2020.01.03 | AAPL | 7.749599 | 5.559444 | 0.3300404 | 9.424896 |
| 7 | 2020.01.03 | AAPL | 4.885961 | 4.677432 | 8.288318 | 4.366883 |
| 8 | 2020.01.03 | AAPL | 7.412891 | 5.082189 | 9.214036 | 7.900838 |
| 9 | 2020.01.03 | AAPL | 6.625847 | 9.792139 | 6.208818 | 9.195079 |
| 10 | 2020.01.03 | AAPL | 2.075797 | 5.340321 | 0.4038709 | 0.7533655 |
| 11 | 2020.01.03 | AAPL | 4.797642 | 8.373317 | 4.98156 | 6.299731 |
| 12 | 2020.01.03 | AAPL | 0.8688765 | 1.967616 | 3.349573 | 4.094004 |
| 13 | 2020.01.03 | AAPL | 2.684143 | 0.05767352 | 8.878174 | 2.166685 |
| 14 | 2020.01.03 | AAPL | 3.181093 | 4.686113 | 0.8967613 | 7.39341 |
| 15 | 2020.01.03 | AAPL | 3.630268 | 0.4563809 | 2.89025 | 6.428857 |
| 16 | 2020.01.03 | AAPL | 7.342469 | 9.298404 | 7.098509 | 1.698009 |
| 17 | 2020.01.03 | AAPL | 1.293144 | 8.125834 | 7.214184 | 5.946857 |
| 18 | 2020.01.03 | AAPL | 8.051322 | 1.446192 | 9.436185 | 4.824975 |
| 19 | 2020.01.03 | AAPL | 1.018781 | 1.299401 | 1.18181 | 0.6091787 |
| 20 | 2020.01.03 | AAPL | 4.002909 | 4.115772 | 5.036211 | 1.680549 |
| 21 | 2020.01.03 | AAPL | 0.9864104 | 4.75085 | 0.5140735 | 2.468647 |
| 22 | 2020.01.03 | AAPL | 8.388561 | 6.170405 | 1.067153 | 2.034476 |
| ... | ... | ... | ... | ... | ... | ... |
| 9999 | 2020.01.03 | MSFT | 2.832818 | 1.466171 | 3.457545 | 5.985203 |
10,000 rows × 6 columns
Finally, to view the amount of saved data you can count the number of rows per partition using partition_count
db.partition_count()
| quotes | trades | |
|---|---|---|
| 2020.01.01 | 0 | 500425 |
| 2020.01.02 | 0 | 499575 |
| 2020.01.03 | 10000 | 10000 |
Cleanup temporary database created¶
tempdir.cleanup()