Manage a database
This page explains how to modify databases generated in PyKX.
Tip: For the best experience, we recommend reading Databases in PyKX, Generate a database and Load a database first.
With PyKX, you can use various methods to modify your on-disk database. These changes can take many forms:
- Add new columns to the database
- Apply functions to existing columns
- Rename columns
- Delete columns
A cautionary note
Operations on persisted databases can lead to changes that are hard to undo. For instance, applying functions that modify row values in a column can result in updated values that make it impossible to retrieve the original data. Before using this functionality for complex tasks, ensure you understand the impact of your changes and have a backup of your data to mitigate any issues.
The next section demonstrates how to edit the trade
table generated here to extract information from the table columns, sanitize the data, and update the database schema.
Update your database
Over time, the data you work with will change. This includes the names and types of columns, and even which columns are in the table. These changes can occur as new sensors are introduced in a manufacturing setting or when your data provider updates the information they supply in the financial sector.
To that end, we can take the trade
table and make the following changes:
- Rename the column
sym
tosymbol
. - Change the type of the
price
column from apykx.FloatAtom
topykx.RealAtom
to reduce storage requirements. - Add a new column
exchange
which initially has an emptypykx.SymbolAtom
entry under the expectation that newly added partitions will have this column available.
>>> import pykx as kx
>>> db = kx.DB(path='/tmp/db')
>>> db.rename_column('trade', 'sym', 'symbol')
>>> db.set_column_type('trade', 'price', kx.RealAtom)
>>> db.add_column('trade', 'exchange', kx.SymbolAtom.null)
Now that we’ve made some basic changes, we can proceed with more detailed modifications to the database. These changes can significantly impact the data since they involve free-form edits to individual columns and partitions. If you’re unsure about the changes or your ability to undo them, it’s a good idea to make a copy of the column first.
In the below cell, we complete the following:
- Cache the order of columns prior to changes.
- Make a copy of the column
price
namedprice_copy
. - Adjust the value of the stock price on the copied column to account for a two-for-one stock split by multiplying the price by half.
- Delete the original
price
column. - Rename the copied column
symbol_copy
to besymbol
. - Reorder the columns.
>>> col_order = db.trade.columns.py()
>>> db.copy_column('trade', 'price', 'price_copy')
>>> db.apply_function('trade', 'price_copy', lambda x: x * 0.5)
>>> db.delete_column('trade', 'price')
>>> db.rename_column('trade', 'price_copy', 'price')
>>> db.reorder_columns(col_order)
Next Steps
- Query your database with Python
- Compress/encrypt data for persisting database partitions.