How to Perform a Non-Transformed TSS Search in KDB.AI
This page details how to execute a Non-Transformed Temporal Similarity Search (Non-Transformed TSS) search in KDB.AI.
Tip: For the best experience, start by reading about KDB.AI Non-Transformed TSS.
To use the Non-Transformed TSS search, you don't need to extract vectors from the time series. The algorithm performs the following actions:
- Takes simple time series (numerical sequence stored in a kdb+ column) as input.
- Scans the time series with a sliding window (of the same size as the query vector; size can be changed between queries).
- Computes the list of distances between the query vector and each occurrence of the sliding window.
- Returns the k-nearest neighbors.
Setup
Before you start, make sure you have:
- An active KDB.AI Cloud or Server license
- The latest version of KDB.AI Cloud or Server installed
- A valid API key if you're using KDB.AI Cloud
- Python Client
To store and search temporal data using the Non-Transformed TSS method, follow these steps:
1. Import dependencies
Start by importing the following dependencies:
import sys
import kdbai_client as kdbai
from pprint import pprint # for pretty printing
import pandas as pd
import numpy as np
2. Create schema
Open a KDB.AI session to create a schema:
session = kdbai.Session()
session.database('default').tables # check what tables are already created
schema = [
{"name": "realTime", "type": "datetime64[ns]"},
{"name": "sym", "type": "str"},
{"name": "price", "type": "float64"},
{"name": "priceReals", "type": "float32"},
{"name": "volumeLongs", "type": "int64"},
{"name": "volumeInts", "type": "int32"},
{"name": "volumeShorts", "type": "int16"},
{"name": "size", "type": "int32"},
]
table = session.database('default').create_table('trade', schema)
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables \
--header 'Content-Type: application/json' \
--data '{
"table":"trade",
"schema": [
{
"name": "realTime",
"type": "timestamp"
},
{
"name": "sym",
"type": "symbol"
},
{
"name": "price",
"type": "float"
},
{
"name": "priceReals",
"type": "real"
},
{
"name": "volumeLongs",
"type": "long"
},
{
"name": "volumeInts",
"type": "int"
},
{
"name": "volumeShorts",
"type": "short"
},
{
"name": "size",
"type": "int"
}
]
}' | jq .
`gw set hopen 8082;
dims:10;
mySchema:flip `name`type!(`realTime`sym`price`priceReals`volumeLongs`volumeInts`volumeShorts`size;`p`s`f`e`j`i`h`j);
// create
p:`database`table`schema!(`default;`trade;mySchema);
gw(`createTable;p);
or alternatively, if you have an existing kdb+ table on disk and would like to create from it, run the command below:
table = session.database('default').create_table(table="trade",external_data_references=[{"path":b'/db', "provider" :"kx"}])
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables \
--header 'Content-Type: application/json' \
--data '{
"table":"trade",
"externalDataReferences": [{
"path": "/db",
"provider": "kx"
}]
}' | jq .
`gw set hopen 8082;
ref:enlist `path`provider!("/db";`kx);
p:`database`table`externalDataReferences!(`default;`trade;ref);
gw(`createTable;p);
3. Insert data
Create the data df that contains your time series data and the numerical column(s) you wish to search:
numRows = 100
df = pd.DataFrame()
df['realTime'] = sorted(np.random.randint(sys.maxsize, size=numRows).astype('datetime64[ns]'))
df['sym'] = np.random.choice(['aaa', 'bbb', 'ccc'], size=numRows).astype('str')
df['price'] = [x.astype('float64') for x in np.random.rand(numRows)]
df['priceReals'] = [x.astype('float32') for x in np.random.rand(numRows)]
df['volumeLongs'] = [x.astype('int64') for x in np.random.randint(0, 100, numRows)]
df['volumeInts'] = [x.astype('int32') for x in np.random.randint(0, 100, numRows)]
df['volumeShorts'] = [x.astype('int16') for x in np.random.randint(0, 100, numRows)]
df['size'] = np.random.randint(100, size=numRows).astype('int32')
splits:34 33 33;
N:sum splits;
t:([] realTime:asc N?0p;sym:raze {x#y}'[splits;`aaa`bbb`ccc]; price:N?1f; priceReals:"e"$N?1f; volumeLongs:"j"$N?1f; volumeInts:"i"$N?1f; volumeShorts:"h"$N?1f; size:til N);
Insert df into the table:
table.insert(df)
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/insert \
--header 'Content-Type: application/json' \
--data '{
"payload": [
{
"realTime": "2001.01.01D00:48:57.051633652",
"sym": "aaa",
"price": 1.1,
"priceReals": 14.7,
"volumeLongs": 27,
"volumeInts": 37,
"volumeShorts": 47,
"size": 42
},
{
"realTime": "2002.01.01D00:48:57.051633652",
"sym": "aaa",
"price": 2.2,
"priceReals": 14.7,
"volumeLongs": 27,
"volumeInts": 37,
"volumeShorts": 47,
"size": 36
},
{
"realTime": "2003.01.01D00:48:57.051633652",
"sym": "aaa",
"price": 3.4,
"priceReals": 14.7,
"volumeLongs": 27,
"volumeInts": 37,
"volumeShorts": 47,
"size": 24
},
{
"realTime": "2004.01.01D00:48:57.051633652",
"sym": "aaa",
"price": 4.7,
"priceReals": 14.7,
"volumeLongs": 27,
"volumeInts": 37,
"volumeShorts": 47,
"size": 11
},
{
"realTime": "2004.01.01D00:48:57.051633652",
"sym": "bbb",
"price": 4.7,
"priceReals": 14.7,
"volumeLongs": 27,
"volumeInts": 37,
"volumeShorts": 47,
"size": 11
}
]
}' | jq .
r:gw(`insertData;`database`table`payload!(`default;`trade;t));
Run a query to check the contents of the table:
table.query()
curl -s -X POST localhost:8081/api/v2/databases/default/tables/trade/query | jq .
(gw(`query;`database`table!(`default;`trade)))[`result];
4. Perform searches
Now you can conduct a temporal similarity search, (searching for a pattern in any one of the numerical columns), as below:
# single query search
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss")[0]
# multiple queries search
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]]}, n=3, type="tss")
# single query search
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"price" : [[1.2,2.2,3.2]]},
"n": 1,
"type": "tss"
}' | jq .
# multiple queries search
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"price" : [[1.3,2.3,3.3], [1.4,2.4,3.4]]},
"n": 1,
"type": "tss"
}' | jq .
tqry1:enlist[`price]!enlist enlist 1.1 1.2 1.3; // single query search
tqry2:enlist[`price]!enlist (1.1 1.2 1.3;2.1 2.2 2.3); // multiple queries search
first (gw(`search;`database`table`vectors`n`type!(`default;`trade;tqry1;10;`tss)))[`result];
(gw(`search;`database`table`vectors`n`type!(`default;`trade;tqry2;10;`tss)))[`result];
If the column contains fewer values than the pattern being searched for, add force to options. This is particularly useful when searching a partitioned table where some partitions may be empty or contain very few rows.
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=5, type="tss", options={'force':True})[0]
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"price" : [[1.2,2.2,3.2]]},
"n": 1,
"type": "tss",
"options":{"force" : true}
}' | jq .
first (gw(`search;`database`table`vectors`n`type`options!(`default;`trade;tqry1;3;`tss;(enlist `force)!(enlist 1b))))[`result];
Outlier searches
You can also perform an outlier search of the numerical column by specifying a negative value for n:
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss")[0] # similarity search
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=-3, type="tss")[0] # outlier search
# similarity search
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"price" : [[1.2,2.2,3.2]]},
"n": 1,
"type": "tss"
}' | jq .
# outlier search
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"price" : [[1.2,2.2,3.2]]},
"n": -1,
"type": "tss"
}' | jq .
tqry1:enlist[`price]!enlist enlist 1.1 1.2 1.3;
first (gw(`search;`database`table`vectors`n`type`searchBy`options!(`default;`trade;tqry1;10;`tss;`sym;enlist[`force]!enlist 1b)))[`result];
first (gw(`search;`database`table`vectors`n`type`searchBy`options!(`default;`trade;tqry1;10;`tss;`sym;`force`returnMatches!11b)))[`result];
TSS search by group
Sometimes you may want to search by different categories instead of searching the input column as a whole. For example, in a trading use case the prices in the data would refer to different stocks, so we may want to perform our searches by symbol; or in IoT, we may want to search patterns by sensor.
To do this, add the searchBy argument to the search function.
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss", search_by="sym", options={"force":True})[0]
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss", search_by="sym", options={"force":True, "returnMatches":True})[0]
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"price" : [[1.2,2.2,3.2]]},
"n": 1,
"type": "tss",
"searchBy": ["sym"],
"options":{"force" : true}
}' | jq .
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"price" : [[1.2,2.2,3.2]]},
"n": 1,
"type": "tss",
"searchBy": ["sym"],
"options":{"force" : true, "returnMatches" : true}
}' | jq .
first (gw(`search;`database`table`vectors`n`type`searchBy!(`default;`trade;tqry1;10;`tss)))[`result];
first (gw(`search;`database`table`vectors`n`type`searchBy`options!(`default;`trade;tqry1;10;`tss;`sym;enlist[`returnMatches]!enlist 1b)))[`result];
Tip: When using the searchBy option, the TSS searches for each group are parallelized using multiple threads.
The searchBy may return a lot more results than a simple TSS search. In the case of a splayed table, it will return n matches for each group. In the case of a partitioned table, it will return n matches for each group and each partition.
Return the matched patterns
If you would like to obtain the original values of the captured data, add returnMatches to options. This inserts an extra column nnMatch in the result, which contains the original values:
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss", options={"returnMatches":True})[0]
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"price" : [[1.2,2.2,3.2]]},
"n": 1,
"type": "tss",
"options":{"returnMatches" : true}
}' | jq .
first (gw(`search;`database`table`vectors`n`type`options!(`default;`trade;tqry1;10;`tss;enlist[`returnMatches]!enlist 1b)))[`result];
Searching different data types
Apart from the standard floats, you can also search other columns of reals, longs, ints and shorts.
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss", options={"returnMatches":True})[0]
table.search(vectors={'priceReals': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss", options={"returnMatches":True})[0]
table.search(vectors={'volumeLongs': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss", options={"returnMatches":True})[0]
table.search(vectors={'volumeInts': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss", options={"returnMatches":True})[0]
table.search(vectors={'volumeShorts': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss", options={"returnMatches":True})[0]
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"price" : [[1.2,2.2,3.2]]},
"n": 1,
"type": "tss",
"options":{"returnMatches" : true}
}' | jq .
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"priceReals" : [[1.2,2.2,3.2]]},
"n": 1,
"type": "tss",
"options":{"returnMatches" : true}
}' | jq .
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"volumeLongs" : [[1.2,2.2,3.2]]},
"n": 1,
"type": "tss",
"options":{"returnMatches" : true}
}' | jq .
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"volumeInts" : [[1.2,2.2,3.2]]},
"n": 1,
"type": "tss",
"options":{"returnMatches" : true}
}' | jq .
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"volumeShorts" : [[1.2,2.2,3.2]]},
"n": 1,
"type": "tss",
"options":{"returnMatches" : true}
}' | jq .
tqryF:enlist[`price]!enlist enlist "f"$1.1 1.2 1.3; // single query search
tqryE:enlist[`priceReals]!enlist enlist "e"$1.1 1.2 1.3; // single query search
tqryJ:enlist[`volumeLongs]!enlist enlist "j"$1.1 1.2 1.3; // single query search
tqryI:enlist[`volumeInts]!enlist enlist "i"$1.1 1.2 1.3; // single query search
tqryH:enlist[`volumeShorts]!enlist enlist "h"$1.1 1.2 1.3; // single query search
first (gw(`search;`database`table`vectors`n`type`options!(`default;`trade;tqryF;10;`tss;enlist[`returnMatches]!enlist 1b)))[`result];
first (gw(`search;`database`table`vectors`n`type`options!(`default;`trade;tqryE;10;`tss;enlist[`returnMatches]!enlist 1b)))[`result];
first (gw(`search;`database`table`vectors`n`type`options!(`default;`trade;tqryJ;10;`tss;enlist[`returnMatches]!enlist 1b)))[`result];
first (gw(`search;`database`table`vectors`n`type`options!(`default;`trade;tqryI;10;`tss;enlist[`returnMatches]!enlist 1b)))[`result];
first (gw(`search;`database`table`vectors`n`type`options!(`default;`trade;tqryH;10;`tss;enlist[`returnMatches]!enlist 1b)))[`result];
End-to-end example
By putting the snippets of create/insert/search together, you obtain a complete example of the Non-transformed TSS method.
Example: Non-Transformed TSS search
import sys
import kdbai_client as kdbai
from pprint import pprint # for pretty printing
import pandas as pd
import numpy as np
session = kdbai.Session()
session.database('default').tables # check what tables are already created
schema = [
{"name": "realTime", "type": "datetime64[ns]"},
{"name": "sym", "type": "str"},
{"name": "price", "type": "float64"},
{"name": "size", "type": "int32"},
]
table = session.database('default').create_table('trade', schema)
numRows = 40
df = pd.DataFrame()
df['realTime'] = sorted(np.random.randint(sys.maxsize, size=numRows).astype('datetime64[ns]'))
df['sym'] = np.random.choice(['aaa', 'bbb'], size=numRows).astype('str')
df['price'] = [x.astype('float64') for x in np.random.rand(numRows)]
df['size'] = np.random.randint(100, size=numRows).astype('int32')
table.insert(df)
table.query()
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=5, type="tss")[0]
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]]}, n=5, type="tss")
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss")[0] # similarity search
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=-3, type="tss")[0] # outlier search
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss", options={"returnMatches":True})[0] # return original values
table.search(vectors={'size': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss", options={"returnMatches":True})[0] # search the integer column
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss", search_by = "sym",options={"force":True})[0] # search by sym
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables \
--header 'Content-Type: application/json' \
--data '{
"table":"trade",
"schema": [
{
"name": "realTime",
"type": "timestamp"
},
{
"name": "sym",
"type": "symbol"
},
{
"name": "price",
"type": "float"
},
{
"name": "size",
"type": "int"
}
]
}' | jq .
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/insert \
--header 'Content-Type: application/json' \
--data '{
"payload": [
{
"realTime": "2001.01.01D00:48:57.051633652",
"sym": "aaa",
"price": 1.1,
"size": 42
},
{
"realTime": "2002.01.01D00:48:57.051633652",
"sym": "aaa",
"price": 2.2,
"size": 36
},
{
"realTime": "2003.01.01D00:48:57.051633652",
"sym": "aaa",
"price": 3.4,
"size": 24
},
{
"realTime": "2004.01.01D00:48:57.051633652",
"sym": "aaa",
"price": 4.7,
"size": 11
},
{
"realTime": "2004.01.01D00:48:57.051633652",
"sym": "bbb",
"price": 4.7,
"size": 11
}
]
}' | jq .
curl -s -X POST localhost:8081/api/v2/databases/default/tables/trade/query | jq .
# single query search
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"price" : [[1.2,2.2,3.2]]},
"n": 1,
"type": "tss"
}' | jq .
# multiple queries search
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"price" : [[1.3,2.3,3.3], [1.4,2.4,3.4]]},
"n": 1,
"type": "tss"
}' | jq .
# outlier search
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"price" : [[1.2,2.2,3.2]]},
"n": -1,
"type": "tss"
}' | jq .
# return matches
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"price" : [[1.2,2.2,3.2]]},
"n": 1,
"type": "tss",
"options":{"returnMatches" : true}
}' | jq .
# other data type
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"size" : [[1,3,7]]},
"n": 1,
"type": "tss",
"options":{"returnMatches" : true}
}' | jq .
# search by sym
curl -s -X POST http://localhost:8081/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{
"vectors":{"price" : [[1.2,2.2,3.2]]},
"n": 1,
"type": "tss",
"searchBy":["sym"],
"options":{"force" : true}
}' | jq .
`gw set hopen 8082;
dims:10;
mySchema:flip `name`type!(`realTime`sym`price`size;`p`s`f`j);
// create
p:`database`table`schema!(`default;`trade;mySchema);
gw(`createTable;p);
// insert
splits:34 33 33;
N:sum splits;
t:([] realTime:asc N?0p;sym:raze {x#y}'[splits;`aaa`bbb`ccc]; price:N?1f; size:til N);
r:gw(`insertData;`database`table`payload!(`default;`trade;t));
(gw(`query;`database`table!(`default;`trade)))[`result];
// search
tqry1:enlist[`price]!enlist enlist 1.1 1.2 1.3; // single query search
tqry2:enlist[`price]!enlist (1.1 1.2 1.3;2.1 2.2 2.3); // multiple queries search
tqryInt:enlist[`size]!enlist enlist 1 3 6; // single query search
first (gw(`search;`database`table`vectors`n`type!(`default;`trade;tqry1;10;`tss)))[`result];
(gw(`search;`database`table`vectors`n`type!(`default;`trade;tqry2;10;`tss)))[`result];
first (gw(`search;`database`table`vectors`n`type!(`default;`trade;tqry1;-10;`tss)))[`result]; // outlier search
first (gw(`search;`database`table`vectors`n`type`options!(`default;`trade;tqry1;10;`tss;enlist[`returnMatches]!enlist 1b)))[`result]; // return original pattern
first (gw(`search;`database`table`vectors`n`type`options!(`default;`trade;tqryInt;10;`tss;enlist[`returnMatches]!enlist 1b)))[`result]; // search integer column
first (gw(`search;`database`table`vectors`n`type`searchBy`options!(`default;`trade;tqry1;10;`tss;`sym;enlist[`force]!enlist 1b)))[`result]; // return original pattern
As you can see in the above comparison, the main grammatical differences between running the Non-Transformed TSS search vs. other cases are:
| Non-Transformed TSS | Transformed TSS or Non-TSS | |
|---|---|---|
type |
tss |
flat, hnsw etc. |
dims |
Not required | Required |
| Entries in the search column | Scalars | Vectors |
| Outlier search | Available | N/A |
We support all numerical types (for example, float64, float32, int64, int32, int16) as data and Non-Transformed TSS query. However, for numerical precision, the Non-Transformed TSS calculation uses float64 in all cases.
Next steps
Now that you're familiar with a Non-Transformed TSS search, we suggest the following:
- Try a Non-Transformed TSS in our sample project.
- Review our article on discovering time series insights with Temporal Similarity Search.
- Watch our video to learn more about Non-Transformed TSS or our video tutorial to get started with your own real data.
- Discover some of our integrations like LlamaIndex for RAG or kdb+ for seamless time-series insights