Querying
pykx.query
Query interfaces for PyKX.
QSQL
QSQL(q)
Generates and submits functional q SQL queries.
Instances of this class can be accessed as the qsql
attribute of any pykx.Q
. For
instance, pykx.q.qsql
, or pykx.QConnection(...).qsql
.
The QSQL
class provides Python users with a method of querying q simple, keyed, splayed and
partitioned tables using a single set of functionality.
This is achieved by wrapping the logic contained within the q functional select, exec, update, and delete functionality. For more information on this functionality please refer to Chapter 9 Section 12 of Q for Mortals.
While it is also conceivable that the interface could compile a qSQL statement to achieve the same end goal there are a number of advantages to using the more complex functional form.
- Users that are unfamiliar with q who use the interface are introduced to the more powerful version of querying with q, while still operating within a familiar setting.
-
Using the functional form provides the ability when running functional updates to update the q tables with data derived from Python:
qtable = pykx.q('([]1 2 3;4 5 6)') pykx.q.qsql.update(qtable, {'x': [10, 20, 30]})
-
It makes development and maintenance of the interface easier when dealing across the forms of supported table within q within which the functional forms of interacting with tables are more natural.
select
select(table, columns=None, where=None, by=None, inplace=False)
Apply a q style select statement on tables defined within the process.
This implementation follows the q functional select syntax with limitations on structures supported for the various clauses a result of this.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
Union[k.Table, str]
|
The q table or name of the table (provided the table is named within the q memory space) on which the select statement is to be applied. |
required |
columns |
Optional[Union[Dict[str, str], k.Dictionary]]
|
A dictionary mapping the name to be given to a column and the logic to be applied in aggregation to that column both as strings. |
None
|
where |
Optional[Union[List[str], str, k.SymbolAtom, k.SymbolVector]]
|
Conditional filtering used to select subsets of the data on which by-clauses and appropriate aggregations are to be applied. |
None
|
by |
Optional[Union[Dict[str, str], k.Dictionary]]
|
A dictionary mapping the names to be assigned to the produced columns and the columns whose results are used to construct the groups of the by clause. |
None
|
inplace |
bool
|
Whether the result of an update is to be persisted. This operates for tables referenced by name in q memory or general table objects https://code.kx.com/q/basics/qsql/#result-and-side-effects. |
False
|
Examples:
Define a q table in python, and give it a name in q memory
qtab = pykx.q('([]col1:100?`a`b`c;col2:100?1f;col3:100?0b;col4:100?10f)')
pykx.q['qtab'] = qtab
Select all items in the table
pykx.q.qsql.select(qtab)
pykx.q.qsql.select('qtab')
Filter table based on various where conditions
pykx.q.qsql.select(qtab, where='col2<0.5')
pykx.q.qsql.select(qtab, where=['col1=`a', 'col2<0.3'])
Retrieve statistics by grouping data on symbol columns
pykx.q.qsql.select(qtab, columns={'maxCol2': 'max col2'}, by={'col1': 'col1'})
pykx.q.qsql.select(qtab, columns={'avgCol2': 'avg col2', 'minCol4': 'min col4'}, by={'col1': 'col1'})
Retrieve grouped statistics with restrictive where condition
pykx.q.qsql.select(qtab, columns={'avgCol2': 'avg col2', 'minCol4': 'min col4'}, by={'col1': 'col1'}, where='col3=0b')
exec
exec(table, columns=None, where=None, by=None)
Apply a q style exec statement on tables defined within the process.
This implementation follows the q functional exec syntax with limitations on structures supported for the various clauses a result of this.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
Union[k.Table, str]
|
The q table or name of the table (provided the table is named within the q memory space) on which the exec statement is to be applied. |
required |
columns |
Optional[Union[Dict[str, str], k.Dictionary]]
|
A dictionary mapping the name to be given to a column and the logic to be applied in aggregation to that column both as strings. A string defining a single column to be retrieved from the table as a list. |
None
|
where |
Optional[Union[List[str], str, k.SymbolAtom, k.SymbolVector]]
|
Conditional filtering used to select subsets of the data on which by clauses and appropriate aggregations are to be applied. |
None
|
by |
Optional[Union[Dict[str, str], k.Dictionary]]
|
A dictionary mapping the names to be assigned to the produced columns and the the columns whose results are used to construct the groups of the by clause. |
None
|
Examples:
Define a q table in python and named in q memory
pykx.q['qtab'] = pd.DataFrame.from_dict({
'col1': [['a', 'b', 'c'][randint(0, 2)] for _ in range(100)],
'col2': [random() for _ in range(100)],
'col3': [randint(0, 1) == 1 for _ in range(100)],
'col4': [random() * 10 for _ in range(100)]
})
Select last item of the table
pykx.q.qsql.exec(qtab)
pykx.q.qsql.exec('qtab')
Retrieve a column from the table as a list
pykx.q.qsql.exec(qtab, 'col3')
Retrieve a set of columns from a table as a dictionary
pykx.q.qsql.exec(qtab, {'symcol': 'col1'})
pykx.q.qsql.exec(qtab, {'symcol': 'col1', 'boolcol': 'col3'})
Filter columns from a table based on various where conditions
pykx.q.qsql.exec(qtab, 'col3', where='col1=`a')
pykx.q.qsql.exec(qtab, {'symcol': 'col1', 'maxcol4': 'max col4'}, where=['col1=`a', 'col2<0.3'])
Retrieve data grouping by data on symbol columns
pykx.q.qsql.exec(qtab, 'col2', by={'col1': 'col1'})
pykx.q.qsql.exec(qtab, columns={'maxCol2': 'max col2'}, by={'col1': 'col1'})
pykx.q.qsql.exec(qtab, columns={'avgCol2': 'avg col2', 'minCol4': 'min col4'}, by={'col1': 'col1'})
Retrieve grouped statistics with restrictive where condition
pykx.q.qsql.exec(qtab, columns={'avgCol2': 'avg col2', 'minCol4': 'min col4'}, by={'col1': 'col1'}, where='col3=0b')
update
update(table, columns=None, where=None, by=None, modify=False, inplace=False)
Apply a q style update statement on tables defined within the process.
This implementation follows the q functional update syntax with limitations on structures supported for the various clauses a result of this.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
Union[k.Table, str]
|
The q table or name of the table (provided the table is named within the q memory space) on which the update statement is to be applied. |
required |
columns |
Optional[Union[Dict[str, str], k.Dictionary]]
|
A dictionary mapping the name of a column present in the table or one to be added to the contents which are to be added to the column, this content can be a string denoting q data or the equivalent Python data. |
None
|
where |
Optional[Union[List[str], str, k.SymbolAtom, k.SymbolVector]]
|
Conditional filtering used to select subsets of the data on which by-clauses and appropriate aggregations are to be applied. |
None
|
by |
Optional[Union[Dict[str, str], k.Dictionary]]
|
A dictionary mapping the names to be assigned to the produced columns and the columns whose results are used to construct the groups of the by clause. |
None
|
modify |
bool
|
|
False
|
inplace |
bool
|
Whether the result of an update is to be persisted. This operates for tables referenced by name in q memory or general table objects https://code.kx.com/q/basics/qsql/#result-and-side-effects. |
False
|
Examples:
Define a q table in python and named in q memory
qtab = pykx.Table(
[
['tom', 28, 'fair', 'green'],
['dick', 29, 'dark', 'brown'],
['harry', 35, 'fair', 'gray']
],
columns=['name', 'age', 'hair', 'eye']
)
pykx.q['qtab'] = qtab
Update all the contents of a column
pykx.q.qsql.update(qtab, {'eye': '`blue`brown`green'})
pykx.q.qsql.update(qtab, {'age': [25, 30, 31]})
Update the content of a column restricting scope using a where clause
pykx.q.qsql.update(qtab, {'eye': ['blue']}, where='hair=`fair')
Define a q table suitable for by clause example
byqtab = pykx.q('([]p:100?`a`b`c;name:100?`nut`bolt`screw;color:100?`red`green`blue;weight:0.5*100?20;city:100?`london`paris`rome)')
pykx.q['byqtab'] = byqtab
Apply an update grouping based on a by phrase
pykx.q.qsql.update(byqtab, {'weight': 'avg weight'}, by={'city': 'city'})
Apply an update grouping based on a by phrase and persist the result using the modify keyword
python
pykx.q.qsql.update('byqtab', columns={'weight': 'avg weight'}, by={'city': 'city'}, inplace=True)
pykx.q['byqtab']
delete
delete(table, columns=None, where=None, modify=False, inplace=False)
Apply a q style delete statement on tables defined within the process.
This implementation follows the q functional delete syntax with limitations on structures supported for the various clauses a result of this.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
Union[k.Table, str]
|
The q table or name of the table (provided the table is named within the q memory space) on which the delete statement is to be applied. |
required |
columns |
Optional[Union[List[str], k.SymbolVector]]
|
Denotes the columns to be deleted from a table. |
None
|
where |
Optional[Union[List[str], str, k.SymbolAtom, k.SymbolVector]]
|
Conditional filtering used to select subsets of the data which are to be deleted from the table. |
None
|
modify |
bool
|
|
False
|
inplace |
bool
|
Whether the result of an update is to be persisted. This operates for tables referenced by name in q memory or general table objects https://code.kx.com/q/basics/qsql/#result-and-side-effects. |
False
|
Examples:
Define a q table in python and named in q memory
qtab = pykx.q('([]name:`tom`dick`harry;age:28 29 35;hair:`fair`dark`fair;eye:`green`brown`gray)')
pykx.q['qtab'] = qtab
Delete all the contents of the table
pykx.q.qsql.delete(qtab)
pykx.q.qsql.delete('qtab')
Delete single and multiple columns from the table
pykx.q.qsql.delete(qtab, 'age')
pykx.q.qsql.delete('qtab', ['age', 'eye'])
Delete rows of the dataset based on where condition
pykx.q.qsql.delete(qtab, where='hair=`fair')
pykx.q.qsql.delete('qtab', where=['hair=`fair', 'age=28'])
Delete a column from the dataset named in q memory and persist the result using the modify keyword
pykx.q.qsql.delete('qtab', 'age', modify=True)
pykx.q['qtab']
SQL
SQL(q)
Wrapper around the KX Insights Core ANSI SQL interface.
Lots of examples within this interface use a table named trades, an example of this table is
>>> kx.q['trades'] = kx.toq(
pd.DataFrame.from_dict({
'sym': [['AAPL', 'GOOG', 'MSFT'][randint(0, 2)] for _ in range(100)],
'date': [[date(2022, 1, 1), date(2022, 1, 2), date(2022, 1, 3)][randint(0, 2)] for _ in range(100)],
'price': [random() * 1000 for _ in range(100)]
})
)
__call__
__call__(query, *args)
Compile and run a SQL statement.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query |
str
|
The SQL query, using KX Insights Core SQL, documented at https://code.kx.com/insights/core/sql.html |
required |
*args |
Any
|
The arguments for the query, which will be interpolated into the query. Each
argument will be converted into a |
()
|
Returns:
Type | Description |
---|---|
k.Table
|
The result of the evaluation of |
Avoid interpolating the table into the query when running over IPC.
It's common to interpolate a pykx.Table
object into the query as '$1'
. This works
well when running embedded within the process, but when the Q
instance is an
IPC connection this will result in the entire table being sent over
the connection, which will negatively impact performance. Instead, when running over
IPC, write the name of the table (as defined in the connected q server) directly into
the query.
Examples:
Query a table by name:
>>> q.sql('select * from trades')
pykx.Table(pykx.q('
sym date price
------------------------
AAPL 2022.01.02 484.4727
AAPL 2022.01.02 682.7999
MSFT 2022.01.01 153.227
MSFT 2022.01.03 535.0923
..
'))
Query a pykx.Table
instance by interpolating it in as the first argument:
>>> q.sql('select * from $1', trades) # where `trades` is a `pykx.Table` object
pykx.Table(pykx.q('
sym date price
------------------------
AAPL 2022.01.02 484.4727
AAPL 2022.01.02 682.7999
MSFT 2022.01.01 153.227
MSFT 2022.01.03 535.0923
..
'))
Query a table using interpolated conditions:
>>> q.sql('select * from trades where date = $1 and price < $2', date(2022, 1, 2), 500.0)
pykx.Table(pykx.q('
sym date price
------------------------
AAPL 2022.01.02 484.4727
MSFT 2022.01.02 457.328
GOOG 2022.01.02 8.062521
MSFT 2022.01.02 338.0097
..
'))
prepare
prepare(query, *args)
Prepare a parametrized query to be executed later, the parameter types are deduced from the types of the arguments used to prepare the statement.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query |
str
|
The SQL query, using KX Insights Core SQL, documented at https://code.kx.com/insights/core/sql.html |
required |
*args |
Any
|
The arguments for the query, these arguments are not used in the query. They are used to determine the types of the parameters that will later be used as parameters when executing the query. |
()
|
Returns:
Type | Description |
---|---|
k.List
|
The parametrized query, which can later be used with |
Examples:
When preparing a query with K types you don't have to fully construct one.
For example you can pass kx.LongAtom(1)
as a value to the prepare function as well as
just pykx.LongAtom
. This only works for Atom and Vector types. There is also a
helper function for tables that you can use called pykx.Table.prototype
.
Prepare a query for later execution that will expect a table with 3 columns a, b, and c with
ktypes pykx.SymbolVector
, pykx.FloatVector
, and pykx.LongVector
respectively.
>>> p = q.sql.prepare('select * from $1', kx.q('([] a:``; b: 0n 0n; c: 0N 0N)'))
You can also use the pykx.Table.prototype
helper function to build a table to pass into a
prepared SQL query.
>>> p = q.sql.prepare('select * from $1', kx.Table.prototype({
'a': kx.SymbolVector,
'b': kx.FloatVector,
'c': kx.LongVector})
)
Prepare a query for later execution that will take a date and a float as input at execution time to query the trades table.
>>> p = q.sql.prepare('select * from trades where date = $1 and price < $2',
date(1, 1, 2),
500.0
)
You can also directly pass in the pykx.K
types you wish to use instead.
>>> p = q.sql.prepare('select * from trades where date = $1 and price < $2',
kx.DateAtom,
kx.FloatAtom
)
execute
execute(query, *args)
Execute a prepared query the parameter types must match the types of the arguments used in the prepare statement.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query |
k.List
|
A prepared SQL statement returned by a call to |
required |
*args |
Any
|
The arguments for the query, which will be interpolated into the query. Each
argument will be converted into a |
()
|
Returns:
Type | Description |
---|---|
k.K
|
The result of the evaluation of |
Avoid interpolating the table into the query when running over IPC.
It's common to interpolate a pykx.Table
object into the query as '$1'
. This
works well when running embedded within the process, but when the Q
instance is an
IPC connection this will result in the entire table being sent over
the connection, which will negatively impact performance. Instead, when running over
IPC, write the name of the table (as defined in the connected q server) directly into
the query.
Examples:
Execute a prepared query passing in a pykx.Table
with 3 columns a, b, and c with
ktypes pykx.SymbolVector
, pykx.FloatVector
, and pykx.LongVector
respectively.
>>> p = q.sql.prepare('select * from $1', kx.q('([] a:``; b: 0n 0n; c: 0N 0N)'))
>>> q.sql.execute(p, kx.q('([] a:`a`b`c`d; b: 1.0 2.0 3.0 4.0; c: 1 2 3 4)'))
pykx.Table(pykx.q('
a b c
-----
a 1 1
b 2 2
c 3 3
d 4 4
'))
Execute a prepared query that takes a date and a float as input to query the trades table.
>>> p = q.sql.prepare('select * from trades where date = $1 and price < $2',
date(1, 1, 2),
500.0
)
>>> q.sql.execute(p, date(2022, 1, 2), 500.0)
pykx.Table(pykx.q('
sym date price
------------------------
AAPL 2022.01.02 484.4727
MSFT 2022.01.02 457.328
GOOG 2022.01.02 8.062521
MSFT 2022.01.02 338.0097
..
'))
get_input_types
get_input_types(prepared_query)
Get the pykx.K
types that are expected to be used with a prepared query.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prepared_query |
k.List
|
A prepared SQL statement returned by a call to |
required |
Returns:
Type | Description |
---|---|
List[str]
|
A Python list object containing the string representations of the expected K types for use with the prepared statement. |
Examples:
>>> p = q.sql.prepare('select * from trades where date = $1 and price < $2',
date(1, 1, 1),
0.0
)
>>> q.sql.get_input_types(p)
['DateAtom/DateVector', 'FloatAtom/FloatVector']
TableAppend
TableAppend(_q)
Helper class for the q insert function
Insert
Insert(_q)
Bases: TableAppend
Helper class for the q insert and upsert functions
__call__
__call__(table, row, match_schema=False, test_insert=False)
Helper function around q
's insert
function which inserts a row or multiple rows into
a q table object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
Union[str, k.SymbolAtom]
|
The name of the table to be inserted onto. |
required |
row |
Union[List, k.List]
|
A list of objects to be inserted as a row, or a list of lists containing objects to insert multiple rows at once. |
required |
match_schema |
bool
|
Whether the row/rows to be inserted must match the tables current schema. |
False
|
test_insert |
bool
|
Causes the function to modify a small local copy of the table and return the modified example, this can only be used with embedded q and will not modify the source tables contents. |
False
|
Returns:
Type | Description |
---|---|
Union[None, k.Table]
|
A |
Union[None, k.Table]
|
|
Union[None, k.Table]
|
last 5 rows of the table with the new rows inserted onto the end, this does not modify |
Union[None, k.Table]
|
the actual table object. |
Raises:
Type | Description |
---|---|
PyKXException
|
If the |
Examples:
Insert a single row onto a table named tab
ensuring that the row matches the tables
schema. Will raise an error if the row does not match
>>> q.insert('tab', [1, 2.0, datetime.datetime(2020, 2, 24)], match_schema=True)
Insert multiple rows onto a table named tab
ensuring that each of the rows being added
match the tables schema.
>>> q.insert(
'tab',
[[1, 2], [2.0, 4.0], [datetime.datetime(2020, 2, 24), datetime.datetime(2020,3 , 19)]],
match_schema=True
)
Run a test insert to modify a local copy of the table to test what the table would look like after inserting the new rows.
>>> kx.q['tab'] = kx.Table([[1, 1.0, 'a'], [2, 2.0, 'b'], [3, 3.0, 'c']], columns=['a', 'b', 'c'])
>>> kx.q.insert('tab', [4, 4.0, 'd'], test_insert=True) # example of table after insert
pykx.Table(pykx.q('
a b c
-----
1 1 a
2 2 b
3 3 c
4 4 d
'))
>>> kx.q('tab') # table object was not modified
pykx.Table(pykx.q('
a b c
-----
1 1 a
2 2 b
3 3 c
'))
Upsert
Upsert(_q)
Bases: TableAppend
Helper class for the q upsert function
__call__
__call__(table, row, match_schema=False, test_insert=False)
Helper function around q
's upsert
function which inserts a row or multiple rows into
a q table object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
Union[str, k.SymbolAtom, k.Table]
|
A |
required |
row |
Union[List, k.List]
|
A list of objects to be inserted as a row, if the table is within embedded q you may also pass in a table object to be upserted. |
required |
match_schema |
bool
|
Whether the row/rows to be inserted must match the tables current schema. |
False
|
test_insert |
bool
|
Causes the function to modify a small local copy of the table and return the modified example, this can only be used with embedded q and will not modify the source tables contents. |
False
|
Returns:
Type | Description |
---|---|
Union[None, k.Table]
|
The modified table if a |
Union[None, k.Table]
|
If the |
Union[None, k.Table]
|
with the new rows inserted onto the end, this does not modify the actual table object. |
Raises:
Type | Description |
---|---|
PyKXException
|
If the |
Examples:
Upsert a single row onto a table named tab
ensuring that the row matches the tables
schema. Will raise an error if the row does not match
>>> q.upsert('tab', [1, 2.0, datetime.datetime(2020, 2, 24)], match_schema=True)
>>> table = q.upsert(table, [1, 2.0, datetime.datetime(2020, 2, 24)], match_schema=True)
Upsert multiple rows onto a table named tab
ensuring that each of the rows being added
match the tables schema. Upserting multiple rows only works within embedded q.
>>> q.upsert(
'tab',
q('([] a: 1 2; b: 1.0 2.0; c: `b`c)'),
match_schema=True
)
>>> table = q.upsert(
table,
[[1, 2], [2.0, 4.0], [datetime.datetime(2020, 2, 24), datetime.datetime(2020,3 , 19)]],
match_schema=True
)
Run a test insert to modify a local copy of the table to test what the table would look like after inserting the new rows.
>>> kx.q['tab'] = kx.Table([[1, 1.0, 'a'], [2, 2.0, 'b'], [3, 3.0, 'c']], columns=['a', 'b', 'c'])
>>> kx.q.upsert('tab', [4, 4.0, 'd'], test_insert=True) # example of table after insert
pykx.Table(pykx.q('
a b c
-----
1 1 a
2 2 b
3 3 c
4 4 d
'))
>>> kx.q('tab') # table object was not modified
pykx.Table(pykx.q('
a b c
-----
1 1 a
2 2 b
3 3 c
'))