Database interaction and management
This page documents the API for managing kdb+ databases using PyKX.
DB
DB(*, path=None, change_dir=True, load_scripts=True)
Bases: _TABLES
Initialize a database class used within your process. This is a singleton class from which all interactions with your database will be made. On load if supplied with a 'path' this functionality will attempt to load the database at this location. If no database exists at this location the path supplied will be used when a new database is created.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Optional[Union[str, Path]]
|
The location at which your database is/will be located. |
None
|
change_dir |
Optional[bool]
|
Should the working directory be changed to the location of the
loaded database, for q 4.0 this is the only supported behavior, please
set |
True
|
load_scripts |
Optional[bool]
|
Should any q scripts find in the database directory be loaded,
for q 4.0 this is the only supported behavior, please set
|
True
|
Returns:
Type | Description |
---|---|
None
|
A database class which can be used to interact with a partitioned database. |
Examples:
Load a partitioned database at initialization
>>> import pykx as kx
>>> db = kx.DB(path = '/tmp/db')
>>> db.tables
['quote', 'trade']
Define the path to be used for a database which does not initially exist
>>> import pykx as kx
>>> db = kx.DB(path = 'db')
>>> db.tables
>>> db.path
PosixPath('/usr/projects/pykx/db')
create
create(
table,
table_name,
partition,
*,
by_field=None,
sym_enum=None,
log=True,
compress=None,
encrypt=None,
change_dir=True,
load_scripts=True
)
Create an on-disk partitioned table within a kdb+ database from a supplied
pykx.Table
object. Once generated this table will be accessible
as an attribute of the DB
class or a sub attribute of DB.table
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
k.Table
|
The |
required |
table_name |
str
|
The name with which the table will be persisted and accessible
once loaded and available as a |
required |
partition |
Union[int, str, k.DateAtom]
|
The name of the column which is to be used to partition the data if
supplied as a |
required |
by_field |
Optional[str]
|
A field of the table to be used as a by column, this column will be the second column in the table (the first being the virtual column determined by the partitioning column) |
None
|
sym_enum |
Optional[str]
|
The name of the symbol enumeration table to be associated with the table |
None
|
log |
Optional[bool]
|
Print information about status while persisting the partitioned database |
True
|
compress |
Optional[Compress]
|
|
None
|
encrypt |
Optional[Encrypt]
|
|
None
|
Returns:
Type | Description |
---|---|
None
|
A |
Examples:
Generate a partitioned database from a table containing multiple partitions.
>>> import pykx as kx
>>> db = kx.DB(path = '/tmp/newDB')
>>> N = 1000
>>> qtab = kx.Table(data = {
... 'date': kx.q.asc(kx.random.random(N, kx.q('2020.01 2020.02 2020.03m'))),
... 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
... 'price': kx.random.random(N, 10.0),
... 'size': kx.random.random(N, 100)
... })
>>> db.create(qtab, 'stocks', 'date', by_field = 'sym', sym_enum = 'symbols')
>>> db.tables
['stocks']
>>> db.stocks
pykx.PartitionedTable(pykx.q('
month sym price size
---------------------------
2020.01 AAPL 7.979004 85
2020.01 AAPL 5.931866 55
2020.01 AAPL 5.255477 49
2020.01 AAPL 8.15255 74
2020.01 AAPL 4.771067 80
..
'))
Add a table as a partition to an on-disk database.
>>> import pykx as kx
>>> db = kx.DB(path = '/tmp/newDB')
>>> N = 333
>>> qtab = kx.Table(data = {
... 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
... 'price': kx.random.random(N, 10.0),
... 'size': kx.random.random(N, 100)
... })
>>> db.create(qtab, 'stocks', kx.q('2020.04m'), by_field = 'sym', sym_enum = 'symbols')
>>> db.tables
['stocks']
>>> db.stocks
pykx.PartitionedTable(pykx.q('
month sym price size
---------------------------
2020.01 AAPL 7.979004 85
2020.01 AAPL 5.931866 55
2020.01 AAPL 5.255477 49
2020.01 AAPL 8.15255 74
2020.01 AAPL 4.771067 80
..
'))
Add a table as a partition to an on-disk database and apply gzip compression to the persisted table
>>> import pykx as kx
>>> db = kx.DB(path = '/tmp/newDB')
>>> N = 333
>>> qtab = kx.Table(data = {
... 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
... 'price': kx.random.random(N, 10.0),
... 'size': kx.random.random(N, 100)
... })
>>> compress = kx.Compress(kx.CompressionAlgorithm.gzip, level=2)
>>> db.create(qtab, 'stocks', kx.q('2020.04m'), compress=compress)
>>> kx.q('{-21!hsym x}', '/tmp/newDB/2020.04/stocks/price')
pykx.Dictionary(pykx.q('
compressedLength | 2064
uncompressedLength| 2680
algorithm | 2i
logicalBlockSize | 17i
zipLevel | 2i
'))
load
load(
path, *, change_dir=True, load_scripts=True, overwrite=False, encrypt=None
)
Load the tables associated with a kdb+ database. Once loaded, a table
is accessible as an attribute of the DB
class or a sub-attribute
of DB.table
. Note this can alternatively be called when providing a path
on initialisation of the DB class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Union[Path, str]
|
The file system path at which your database is located |
required |
change_dir |
Optional[bool]
|
Should the working directory be changed to the location of the
loaded database, for q 4.0 this is the only supported behavior, please
set |
True
|
load_scripts |
Optional[bool]
|
Should any q scripts find in the database directory be loaded,
for q 4.0 this is the only supported behavior, please set
|
True
|
overwrite |
Optional[bool]
|
Should loading of the database overwrite any currently loaded databases |
False
|
encrypt |
Optional[Encrypt]
|
The encryption key object to be loaded prior to database load |
None
|
Returns:
Type | Description |
---|---|
None
|
A |
Examples:
Load an on-disk database
>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testData')
>>> db.tables
['testData']
>>> db.testData
pykx.PartitionedTable(pykx.q('
month sym time price size
---------------------------------------
2020.01 FDP 00:00:00.004 90.94738 12
2020.01 FDP 00:00:00.005 33.81127 15
2020.01 FDP 00:00:00.027 88.89853 16
2020.01 FDP 00:00:00.035 78.33244 9
2020.01 JPM 00:00:00.055 68.65177 1
..
'))
>>> db.table.testData
pykx.PartitionedTable(pykx.q('
month sym time price size
---------------------------------------
2020.01 FDP 00:00:00.004 90.94738 12
2020.01 FDP 00:00:00.005 33.81127 15
2020.01 FDP 00:00:00.027 88.89853 16
2020.01 FDP 00:00:00.035 78.33244 9
2020.01 JPM 00:00:00.055 68.65177 1
..
'))
Load an on-disk database when initialising the class
>>> import pykx as kx
>>> db = kx.DB(path = 'testData')
>>> db.tables
['testData']
>>> db.testData
pykx.PartitionedTable(pykx.q('
month sym time price size
---------------------------------------
2020.01 FDP 00:00:00.004 90.94738 12
2020.01 FDP 00:00:00.005 33.81127 15
2020.01 FDP 00:00:00.027 88.89853 16
2020.01 FDP 00:00:00.035 78.33244 9
2020.01 JPM 00:00:00.055 68.65177 1
..
'))
>>> db.table.testData
pykx.PartitionedTable(pykx.q('
month sym time price size
---------------------------------------
2020.01 FDP 00:00:00.004 90.94738 12
2020.01 FDP 00:00:00.005 33.81127 15
2020.01 FDP 00:00:00.027 88.89853 16
2020.01 FDP 00:00:00.035 78.33244 9
2020.01 JPM 00:00:00.055 68.65177 1
..
'))
rename_column
rename_column(table, original_name, new_name)
Rename a column within a loaded kdb+ Database
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str
|
The name of the table containing the column to be renamed |
required |
original_name |
str
|
Name of the column which is to be renamed |
required |
new_name |
str
|
Updated column name |
required |
Returns:
Type | Description |
---|---|
None
|
A |
Examples:
Rename the column of a table
>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> db.list_columns('testTable')
['month', 'sym', 'time', 'price', 'size']
>>> db.rename_column('testTable', 'sym', 'symbol')
>>> db.list_columns('testTable')
['month', 'symbol', 'time', 'price', 'size']
delete_column
delete_column(table, column)
Delete a column from a loaded kdb+ Database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str
|
The name of the table containing the column to be deleted |
required |
column |
str
|
Name of the column which is to be deleted from the table |
required |
Returns:
Type | Description |
---|---|
None
|
A |
Examples:
Delete the column of a table
>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> db.list_columns('testTable')
['month', 'sym', 'time', 'price', 'size']
>>> db.delete_column('testTable', 'size')
>>> db.list_columns('testTable')
['month', 'sym', 'time', 'price']
rename_table
rename_table(original_name, new_name)
Rename a table within a loaded kdb+ Database
Parameters:
Name | Type | Description | Default |
---|---|---|---|
original_name |
str
|
The name of the table which is to be renamed |
required |
new_name |
str
|
Updated table name |
required |
Returns:
Type | Description |
---|---|
None
|
A |
Examples:
Rename a database table
>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> db.rename_table('testTable', 'updated_table')
>>> db.tables
['updated_table']
list_columns
list_columns(table)
List the columns of a table within a loaded kdb+ Database
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str
|
The name of the table whose columns are to be listed |
required |
Returns:
Type | Description |
---|---|
None
|
A list of strings defining the columns of the table |
Examples:
List the columns of a table in a database
>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> db.list_columns('testTable')
['month', 'sym', 'time', 'price', 'size']
add_column
add_column(table, column_name, default_value)
Add a column to a table within a loaded kdb+ Database
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str
|
The name of the table to which a column is to be added |
required |
column_name |
str
|
Name of the column to be added |
required |
default_value |
Any
|
The default value to be used for all existing partitions |
required |
Returns:
Type | Description |
---|---|
None
|
A |
Examples:
Add a column to a table within a partitioned database where all items are an integer null
>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> db.list_columns('testTable')
['month', 'sym', 'time', 'price', 'size']
>>> db.add_column('testTable', 'test', kx.IntAtom.null)
>>> db.list_columns('testTable')
['month', 'sym', 'time', 'price', 'size']
find_column
find_column(table, column_name)
Functionality for finding a column across partitions within a loaded kdb+ Database
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str
|
The name of the table within which columns are to be found |
required |
column_name |
str
|
The name of the column to be found within a table |
required |
Returns:
Type | Description |
---|---|
None
|
A |
None
|
If a column does not exist in a specified partition, an error is raised |
None
|
and the logs indicate which columns did not contain the specified column. |
Examples:
Find a column that exists
>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> db.list_columns('testTable')
['month', 'sym', 'time', 'price', 'size']
>>> db.find_column('price')
2023.11.10 16:48:57 column price (type 0) in `:/usr/pykx/db/2015.01.01/testTable
2023.11.10 16:48:57 column price (type 0) in `:/usr/pykx/db/2015.01.02/testTable
Attempt to find a column that does not exist
>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> db.list_columns('testTable')
['month', 'sym', 'time', 'price', 'size']
>>> db.find_column('side')
2023.11.10 16:49:02 column side *NOT*FOUND* in `:/usr/pykx/db/2015.01.01/testTable
2023.11.10 16:49:02 column side *NOT*FOUND* in `:/usr/pykx/db/2015.01.02/testTable
Traceback (most recent call last):
...
pykx.exceptions.QError: Requested column not found in all partitions, see log output above
reorder_columns
reorder_columns(table, new_order)
Reorder the columns of a persisted kdb+ database
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str
|
The name of the table within which columns will be rearranged |
required |
new_order |
list
|
The ordering of the columns following update |
required |
Returns:
Type | Description |
---|---|
None
|
A |
Examples:
Update the order of columns for a persisted kdb+ database
>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> col_list = db.list_columns('testTable')
>>> col_list
['month', 'sym', 'time', 'price', 'size']
>>> col_list.reverse()
>>> col_list
['size', 'price', 'time', 'sym', 'month']
>>> db.reorder_columns('testTable', col_list)
2023.11.13 17:56:17 reordering columns in `:/usr/pykx/2015.01.01/testTable
2023.11.13 17:56:17 reordering columns in `:/usr/pykx/2015.01.02/testTable
['month', 'sym', 'time', 'price', 'size']
set_column_attribute
set_column_attribute(table, column_name, new_attribute)
Set an attribute associated with a column for an on-disk database
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str
|
The name of the table within which an attribute will be set |
required |
column_name |
str
|
Name of the column to which the attribute will be applied |
required |
new_attribute |
str
|
The attribute which is to be applied, this can be one of
|
required |
Returns:
Type | Description |
---|---|
None
|
A |
Examples:
Add an attribute to a column of a persisted database
>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> kx.q.meta(db.testTable)
pykx.KeyedTable(pykx.q('
c | t f a
----| -----
date| d
test| j
p | f
sym | s
'))
>>> db.set_column_attribute('testTable', 'sym', 'grouped')
>>> kx.q.meta(db.testTable)
pykx.KeyedTable(pykx.q('
c | t f a
----| -----
date| d
test| j
p | f
sym | s g
'))
set_column_type
set_column_type(table, column_name, new_type)
Convert/set the type of a column to a specified type
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str
|
The name of the table within which a column is to be converted |
required |
column_name |
str
|
Name of the column which is to be converted |
required |
new_type |
k.K
|
PyKX type to which a column is to be converted |
required |
Returns:
Type | Description |
---|---|
None
|
A |
Examples:
Convert the type of a column within a database table
>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> kx.q.meta(db.testTable)
pykx.KeyedTable(pykx.q('
c | t f a
----| -----
date| d
test| j
p | f
sym | s
'))
>>> db.set_column_type('testTable', 'test', kx.FloatAtom)
>>> kx.q.meta(db.testTable)
pykx.KeyedTable(pykx.q('
c | t f a
----| -----
date| d
test| f
p | f
sym | s
'))
clear_column_attribute
clear_column_attribute(table, column_name)
Clear an attribute associated with a column of an on-disk database
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str
|
The name of the table within which the attribute of a column will be removed |
required |
column_name |
str
|
Name of the column from which an attribute will be removed |
required |
Returns:
Type | Description |
---|---|
None
|
A |
Examples:
Remove an attribute of a column of a persisted database
>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.tables
['testTable']
>>> kx.q.meta(db.testTable)
pykx.KeyedTable(pykx.q('
c | t f a
----| -----
date| d
test| j
p | f
sym | s g
'))
>>> db.clear_column_attribute('testTable', 'sym')
>>> kx.q.meta(db.testTable)
pykx.KeyedTable(pykx.q('
c | t f a
----| -----
date| d
test| j
p | f
sym | s
'))
copy_column
copy_column(table, original_column, new_column)
Create a copy of a column within a table
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str
|
Name of the table |
required |
original_column |
str
|
Name of the column to be copied |
required |
new_column |
str
|
Name of the copied column |
required |
Returns:
Type | Description |
---|---|
None
|
A |
None
|
database following column copy |
Examples:
Copy a column within a kdb+ database
>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.list_columns('testTable')
['month', 'sym', 'time', 'price', 'size']
>>> db.copy_column('testTable', 'size', 'dup_size')
['month', 'sym', 'time', 'price', 'size', 'dup_size']
apply_function
apply_function(table, column_name, function)
Apply a function per partition on a column of a persisted kdb+ database
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str
|
Name of the table |
required |
column_name |
str
|
Name of the column on which the function is to be applied |
required |
function |
callable
|
Callable function to be applied on a column vector per partition |
required |
Returns:
Type | Description |
---|---|
None
|
A |
None
|
and reloading of the database |
Examples:
Apply a q function to a specified column per partition
>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.testTable
pykx.PartitionedTable(pykx.q('
month symbol time price
-------------------------------------
2020.01 FDP 00:00:00.004 90.94738
2020.01 FDP 00:00:00.005 33.81127
2020.01 FDP 00:00:00.027 88.89853
2020.01 FDP 00:00:00.035 78.33244
2020.01 JPM 00:00:00.055 68.65177
..
'))
>>> db.apply_function('testTable', 'price', kx.q('2*'))
>>> db.testTable
pykx.PartitionedTable(pykx.q('
month symbol time price
-------------------------------------
2020.01 FDP 00:00:00.004 181.8948
2020.01 FDP 00:00:00.005 67.62254
2020.01 FDP 00:00:00.027 177.7971
2020.01 FDP 00:00:00.035 156.6649
2020.01 JPM 00:00:00.055 137.3035
..
'))
Apply a Python function to the content of a specified column per partition
>>> import pykx as kx
>>> db = kx.DB()
>>> db.load('testDB')
>>> db.testTable
pykx.PartitionedTable(pykx.q('
month symbol time price
-------------------------------------
2020.01 FDP 00:00:00.004 90.94738
2020.01 FDP 00:00:00.005 33.81127
2020.01 FDP 00:00:00.027 88.89853
2020.01 FDP 00:00:00.035 78.33244
2020.01 JPM 00:00:00.055 68.65177
..
'))
>>> db.apply_function('testTable', 'price', lambda x:2*x.np())
>>> db.testTable
pykx.PartitionedTable(pykx.q('
month symbol time price
-------------------------------------
2020.01 FDP 00:00:00.004 181.8948
2020.01 FDP 00:00:00.005 67.62254
2020.01 FDP 00:00:00.027 177.7971
2020.01 FDP 00:00:00.035 156.6649
2020.01 JPM 00:00:00.055 137.3035
..
'))
fill_database
fill_database()
Fill missing tables from partitions within a database using the most recent partition as a template, this will report the partitions but not the tables which are being filled.
Returns:
Type | Description |
---|---|
None
|
A |
Examples:
Fill missing tables from a database
>>> import pykx as kx
>>> db = kx.DB(path = 'newDB')
>>> db.fill_database()
Successfully filled missing tables to partition: :/usr/newDB/2020.04
Successfully filled missing tables to partition: :/usr/newDB/2020.03
Successfully filled missing tables to partition: :/usr/newDB/2020.02
Successfully filled missing tables to partition: :/usr/newDB/2020.01
partition_count
partition_count(*, subview=None)
Count the number of rows per partition for the presently loaded database.
Use of the parameter subview
can allow users to count only the rows
in specified partitions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subview |
Optional[list]
|
An optional list of partitions from which to retrieve the per partition count |
None
|
Returns:
Type | Description |
---|---|
k.Dictionary
|
A |
Warning
Using this function results in any specified subview
of the data
being reset, if you require the use of a subview for queries please set using
the database subview
command.
Examples:
Copy a column within a kdb+ database
>>> import pykx as kx
>>> db = kx.DB(path = 'newDB')
>>> db.partition_count()
pykx.Dictionary(pykx.q('
| trades quotes
-------| -------------
2020.01| 334 0
2020.02| 324 0
2020.03| 342 1000
'))
Copy a column within a kdb+ database
>>> import pykx as kx
>>> db = kx.DB(path = 'newDB')
>>> db.partition_count(sub_view = kx.q('2020.01 2020.02m'))
pykx.Dictionary(pykx.q('
| trades quotes
-------| -------------
2020.01| 334 0
2020.02| 324 0
'))
subview
subview(view=None)
Specify the subview to be used when querying a partitioned table
Parameters:
Name | Type | Description | Default |
---|---|---|---|
view |
list
|
A list of partition values which will serve as a filter
for all queries against any partitioned table within the
database. If view is supplied as |
None
|
Returns:
Type | Description |
---|---|
None
|
A |
Examples:
Set the subview range to include only 2020.02
and 2020.03
>>> import pykx as kx
>>> db = kx.DB(path = 'newDB')
>>> db.subview(kx.q('2020.02 2020.03m')
>>> kx.q.qsql.select(db.trades, 'month')
pykx.Table(pykx.q('
month
-------
2020.02
2020.03
'))
Reset the database subview to include a fully specified range
>>> import pykx as kx
>>> db = kx.DB(path = 'newDB')
>>> db.subview()
>>> kx.q.qsql.select(db.trades, 'month')
pykx.Table(pykx.q('
month
-------
2020.01
2020.02
2020.03
2020.04
2020.05
'))
enumerate
enumerate(table, *, sym_file=None)
Perform an enumeration on a user specified table against the current sym files associated with the database
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str
|
The |
required |
sym_file |
Optional[str]
|
The name of the sym file contained in the folder specified by
the |
None
|
Returns:
Type | Description |
---|---|
k.Table
|
The supplied table with enumeration applied |
Examples:
Enumerate the symbol columns of a table without specifying the sym_file
>>> import pykx as kx
>>> db = kx.DB(path = 'newDB')
>>> N = 1000
>>> tab = kx.Table(data = {
... 'x': kx.random.random(N, ['a', 'b', 'c']),
... 'x1': kx.random.random(N, 1.0),
... 'x2': kx.random.random(N, 10)
... }
>>> tab = db.enumerate(tab)
>>> tab['x']
pykx.EnumVector(pykx.q('`sym$`a`b`a`c`b..'))
Enumerate the symbol columns of a table specifying the sym_file
>>> import pykx as kx
>>> db = kx.DB(path = 'newDB')
>>> N = 1000
>>> tab = kx.Table(data = {
... 'x': kx.random.random(N, ['a', 'b', 'c']),
... 'x1': kx.random.random(N, 1.0),
... 'x2': kx.random.random(N, 10)
... }
>>> tab = db.enumerate(tab, sym_file = 'mysym')
>>> tab['x']
pykx.EnumVector(pykx.q('`mysym$`a`b`a`c`b..'))