Skip to content

How to perform a Non-Transformed TSS search

This page details how to execute a Non-Transformed Temporal Similarity Search (Non-Transformed TSS) search in KDB.AI.

Tip: For the best experience, we recommend reading about KDB.AI Non-Transformed TSS first.

Before we dive in, go to the Understanding Non-Transformed TSS search page to learn about this method.

To use the Non-Transformed TSS search, you don't need to extract vectors from the time series. The algorithm performs the following actions:

  1. Takes simple time series (numerical sequence stored in a kdb+ column) as input.
  2. Scans the time series with a sliding window (of same size as the query vector; size can change between two queries).
  3. Computes the list of distances between the query vector and each occurrence of the sliding window.
  4. Returns the k-nearest neighbors.

Setup

Before you start, make sure you have:

To store and search temporal data using the Non-Transformed TSS method, follow these steps:

  1. Import dependencies
  2. Create schema
  3. Insert data
  4. Perform searches

1. Import dependencies

Start by importing the following dependencies:

import sys
import kdbai_client as kdbai
from pprint import pprint # for pretty printing
import pandas as pd
import numpy as np

2. Create schema

Open a KDB.AI session to create a schema:

session = kdbai.Session()
session.database('default').tables # check what tables are already created

schema = [
            {"name": "realTime", "type": "datetime64[ns]"},
            {"name": "sym", "type": "str"},
            {"name": "price", "type": "float64"},
            {"name": "size", "type": "int32"},
            ]

table = session.database('default').create_table('trade', schema)
curl -s -X POST http://localhost:8082/api/v2/databases/default/tables \
--header 'Content-Type: application/json' \
--data '{  
"table":"trade",
"schema": [
    {
    "name": "realTime",
    "type": "timestamp"
    },
    {
    "name": "sym",
    "type": "symbol"
    },
    {
    "name": "price",
    "type": "float"
    },
    {
    "name": "size",
    "type": "int"
    }
]
    }' | jq .
`gw set hopen 8082;

dims:10;
mySchema:flip `name`type!(`realTime`sym`price`size;`p`s`f`j);

// create
p:`database`table`schema!(`default;`trade;mySchema);
gw(`createTable;p);

or alternatively, if you have an existing kdb+ table on disk and would like to create from it, run below:-

trade = db.create_table(table="trade",external_data_references=[{"path":b'/db', "provider" :"kx"}])
curl -s -X POST http://localhost:8082/api/v2/databases/default/tables \
--header 'Content-Type: application/json' \
--data '{  
"table":"trade",
"externalDataReferences": [{
    "path": "/db",
    "provider": "kx"
    }]
    }' | jq .
`gw set hopen 8082;

ref:enlist `path`provider!("/db";`kx);
p:`database`table`externalDataReferences!(`default;`trade;ref);
gw(`createTable;p);

3. Insert data

Create the data df that contains the time series column price:

numRows = 40

df = pd.DataFrame()
df['realTime'] = sorted(np.random.randint(sys.maxsize, size=numRows).astype('datetime64[ns]'))
df['sym'] = np.random.choice(['aaa', 'bbb'], size=numRows).astype('str')
df['price'] = [x.astype('float64') for x in np.random.rand(numRows)]
df['size'] = np.random.randint(100, size=numRows).astype('int32')
N:100; 
t:([] realTime:asc N?0p;sym:N?`3; price:N?1f; size:til N);

Insert df into the table:

table.insert(df)
curl -s -X POST http://localhost:8082/api/v2/databases/default/tables/trade/insert \
--header 'Content-Type: application/json' \
--data '{  
    "payload": [
            {
            "realTime": "2001.01.01D00:48:57.051633652",
            "sym": "aaa",
            "price": 1.1,
            "size": 42
            },
            {
            "realTime": "2002.01.01D00:48:57.051633652",
            "sym": "bbb",
            "price": 2.2,
            "size": 36
            },
            {
            "realTime": "2003.01.01D00:48:57.051633652",
            "sym": "ccc",
            "price": 3.4,
            "size": 24
            },
            {
            "realTime": "2004.01.01D00:48:57.051633652",
            "sym": "ddd",
            "price": 4.7,
            "size": 11
            }
        ]
    }' | jq .
r:gw(`insertData;`database`table`payload!(`default;`trade;t));

Run a query to check the contents of the table:

table.query()
curl -s -X POST localhost:8082/api/v2/databases/default/tables/trade/query | jq .
(gw(`query;`database`table!(`default;`trade)))[`result];

4. Perform searches

Now you can conduct a similarity search (searching along either the dense column) as below:

# single query search
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss")[0]

# multiple queries search
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]]}, n=3, type="tss")
# single query search
curl -s -X POST http://localhost:8082/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{  
    "vectors":{"price" : [[1.2,2.2,3.2]]},
    "n": 1,
    "type": "tss"
    }' | jq .

# multiple queries search
curl -s -X POST http://localhost:8082/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{  
    "vectors":{"price" : [[1.3,2.3,3.3], [1.4,2.4,3.4]]},
    "n": 1,
    "type": "tss"
    }' | jq .
tqry1:enlist[`price]!enlist enlist 1.1 1.2 1.3; // single query search
tqry2:enlist[`price]!enlist (1.1 1.2 1.3;2.1 2.2 2.3); // multiple queries search
gw(`search;`database`table`vectors`n`type!(`default;`trade;tqry1;10;`tss));
gw(`search;`database`table`vectors`n`type!(`default;`trade;tqry2;10;`tss));

If it is a partitioned table and you would like to force search even thought the query is longer than the number of rows in some partitions, add force to options:-

trade.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=5, type="tss", options={'force':True})[0]
curl -s -X POST http://localhost:8082/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{  
    "vectors":{"price" : [[1.2,2.2,3.2]]},
    "n": 1,
    "type": "tss",
    "options":{"force" : true}
    }' | jq .
first (gw(`search;`database`table`vectors`n`type`options!(`default;`trade;tqry1;3;`tss;(enlist `force)!(enlist 1b))))[`result];

You can also perform an outlier search along the dense column using a negative n:

table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss")[0] # similarity search
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=-3, type="tss")[0] # outlier search
# similarity search
curl -s -X POST http://localhost:8082/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{  
    "vectors":{"price" : [[1.2,2.2,3.2]]},
    "n": 1,
    "type": "tss"
    }' | jq .

# outlier search
curl -s -X POST http://localhost:8082/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{  
    "vectors":{"price" : [[1.2,2.2,3.2]]},
    "n": -1,
    "type": "tss"
    }' | jq .
tqry1:enlist[`price]!enlist enlist 1.1 1.2 1.3;
gw(`search;`database`table`vectors`n`type!(`default;`trade;tqry1;3;`tss));  // similarity search
gw(`search;`database`table`vectors`n`type!(`default;`trade;tqry2;-3;`tss)); // outlier search

Summary

By putting the above snippets of create/insert/search together, we obtain the below example snippet for the Non-transformed TSS method. If you're already familiar with the basic usage of KDB.AI, we attached a snippet with a Non-transformed TSS case so you can compare the two. Feel free to switch between the two tabs to spot the differences.

Example: Non-Transformed TSS search
import sys
import kdbai_client as kdbai
from pprint import pprint # for pretty printing
import pandas as pd
import numpy as np

session = kdbai.Session()
session.database('default').tables # check what tables are already created

schema = [
            {"name": "realTime", "type": "datetime64[ns]"},
            {"name": "sym", "type": "str"},
            {"name": "price", "type": "float64"},
            {"name": "size", "type": "int32"},
            ]

table = session.database('default').create_table('trade', schema)

numRows = 40

df = pd.DataFrame()
df['realTime'] = sorted(np.random.randint(sys.maxsize, size=numRows).astype('datetime64[ns]'))
df['sym'] = np.random.choice(['aaa', 'bbb'], size=numRows).astype('str')
df['price'] = [x.astype('float64') for x in np.random.rand(numRows)]
df['size'] = np.random.randint(100, size=numRows).astype('int32')

table.insert(df)
table.query()

table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=5, type="tss")[0]
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]]}, n=5, type="tss")
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss")[0] # similarity search
table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=-3, type="tss")[0] # outlier search

table.search(vectors={'price': [[0,1,2,3,4,0,1,2,3,4]]}, n=3, type="tss", options={"returnMatches":True})[0]  # return original values
curl -s -X POST http://localhost:8082/api/v2/databases/default/tables \
--header 'Content-Type: application/json' \
--data '{  
"table":"trade",
"schema": [
    {
    "name": "realTime",
    "type": "timestamp"
    },
    {
    "name": "sym",
    "type": "symbol"
    },
    {
    "name": "price",
    "type": "float"
    },
    {
    "name": "size",
    "type": "int"
    }
]
    }' | jq .

curl -s -X POST http://localhost:8082/api/v2/databases/default/tables/trade/insert \
--header 'Content-Type: application/json' \
--data '{  
    "payload": [
            {
            "realTime": "2001.01.01D00:48:57.051633652",
            "sym": "aaa",
            "price": 1.1,
            "size": 42
            },
            {
            "realTime": "2002.01.01D00:48:57.051633652",
            "sym": "bbb",
            "price": 2.2,
            "size": 36
            },
            {
            "realTime": "2003.01.01D00:48:57.051633652",
            "sym": "ccc",
            "price": 3.4,
            "size": 24
            },
            {
            "realTime": "2004.01.01D00:48:57.051633652",
            "sym": "ddd",
            "price": 4.7,
            "size": 11
            }
        ]
    }' | jq .

curl -s -X POST localhost:8082/api/v2/databases/default/tables/trade/query | jq .

# single query search
curl -s -X POST http://localhost:8082/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{  
    "vectors":{"price" : [[1.2,2.2,3.2]]},
    "n": 1,
    "type": "tss"
    }' | jq .

# multiple queries search
curl -s -X POST http://localhost:8082/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{  
    "vectors":{"price" : [[1.3,2.3,3.3], [1.4,2.4,3.4]]},
    "n": 1,
    "type": "tss"
    }' | jq .

# outlier search
curl -s -X POST http://localhost:8082/api/v2/databases/default/tables/trade/search \
--header 'Content-Type: application/json' \
--data '{  
    "vectors":{"price" : [[1.2,2.2,3.2]]},
    "n": -1,
    "type": "tss"
    }' | jq .
`gw set hopen 8082;

dims:10;
mySchema:flip `name`type!(`realTime`sym`price`size;`p`s`f`j);

// create
p:`database`table`schema!(`default;`trade;mySchema);
gw(`createTable;p);

// insert
N:100; 
t:([] realTime:asc N?0p;sym:N?`3; price:N?1f; size:til N);
r:gw(`insertData;`database`table`payload!(`default;`trade;t));

(gw(`query;`database`table!(`default;`trade)))[`result];

// search
tqry1:enlist[`price]!enlist enlist 1.1 1.2 1.3; // single query search
tqry2:enlist[`price]!enlist (1.1 1.2 1.3;2.1 2.2 2.3); // multiple queries search
first (gw(`search;`database`table`vectors`n`type!(`default;`trade;tqry1;10;`tss)))[`result];
(gw(`search;`database`table`vectors`n`type!(`default;`trade;tqry2;10;`tss)))[`result];
first (gw(`search;`database`table`vectors`n`type!(`default;`trade;tqry1;-10;`tss)))[`result]; // outlier search

first (gw(`search;`database`table`vectors`n`type`options!(`default;`trade;tqry1;10;`tss;enlist[`returnMatches]!enlist 1b)))[`result]; // return original pattern

As you can see in the above comparison, the main grammatical differences between running the Non-Transformed TSS search vs. other cases are:

Non-Transformed TSS Transformed TSS or Non-TSS
type tss flat, hnsw etc.
dims Not required Required
Entries in the search column Scalars Vectors
pytype of the search column float64 float32
Outlier search Available N/A

Next steps

Now that you're familiar with a Non-Transformed TSS search, try the following: