Skip to content

How to perform a Non-Transformed TSS search

This section details how to execute a Non-Transformed Temporal Similarity Search (Non-Transformed TSS) search in KDB.AI.

Added in v1.1.0.

Before we dive in, go to the Understanding Non-Transformed TSS search page to learn about this method.

To use the Non-Transformed TSS search, you don't need to extract vectors from the time series. The algorithm performs the following actions:

  1. Takes simple time series (numerical sequence stored in a kdb+ column) as input.
  2. Scans the time series with a sliding window (of same size as the query vector; size can change between two queries).
  3. Computes the list of distances between the query vector and each occurrence of the sliding window.
  4. Returns the k-nearest neighbors.

Setup

Before you start, make sure you have:

To store and search temporal data using the Non-Transformed TSS method, follow these steps:

  1. Import dependencies
  2. Create schema
  3. Insert data
  4. Perform searches

1. Import dependencies

Start by importing the following dependencies:

import sys
import kdbai_client as kdbai
from pprint import pprint # for pretty printing
import pandas as pd
import numpy as np

2. Create schema

Open a KDB.AI session to create a schema:

session = kdbai.Session()
session.list() # for example, see that tables trade, quote is in the session

schema = dict(
    columns=[
        dict(
            name='realTime', 
            pytype='datetime64[ns]'
            ),
        dict(
            name='sym', 
            pytype='str'
            ),
        dict(
            name='price', 
            pytype='float64',
            vectorIndex=
                dict(
                    type='tss', # Note this line!!
                    metric='L2'
                    )
            ),
        dict(
            name='size', 
            pytype='int32'
            ),
        ]
    )

if 'trade' in session.list():
    table = session.table('trade')
    table.drop()

table = session.create_table('trade', schema)

3. Insert data

Create the data df that contains the time series column price:

numRows = 40

df = pd.DataFrame()
df['realTime'] = sorted(np.random.randint(sys.maxsize, size=numRows).astype('datetime64[ns]'))
df['sym'] = np.random.choice(['aaa', 'bbb'], size=numRows).astype('str')
df['price'] = [x.astype('float64') for x in np.random.rand(numRows)]
df['size'] = np.random.randint(100, size=numRows).astype('int32')

Insert df into the table:

table.insert(df)

Run a query to check the contents of the table:

table.query()

4. Perform searches

Now you can conduct a similarity search (searching along either the dense column) as below:

# single query search
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=5) # search along the dense column

# multiple queries search
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]],n=5)  # search along the dense column

You can also perform an outlier search along the dense column using a negative n:

table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=3) # similarity search
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=-3) # outlier search

Summary

By putting the above snippets of create/insert/search together, we obtain the below example snippet for the Transformed TSS method. If you're already familiar with the basic usage of KDB.AI, we attached a snippet with a Non-transformed TSS case so you can compare the two. Feel free to switch between the two tabs to spot the differences.

Example: Non-Transformed TSS search
import sys
import pykx as kx
import kdbai_client as kdbai
from pprint import pprint # for pretty printing
import pandas as pd
import numpy as np

session = kdbai.Session()
session.list() # for example, see that tables trade, quote is in the session

schema = dict(
    columns=[
        dict(
            name='realTime', 
            pytype='datetime64[ns]'
            ),
        dict(
            name='sym', 
            pytype='str'
            ),
        dict(
            name='price', 
            pytype='float64',
            vectorIndex=
                dict(
                    type='tss', # Note this line!!
                    metric='L2'

                    )
            ),
        dict(
            name='size', 
            pytype='int32'
            ),
        ]
    )

if 'trade' in session.list():
    table = session.table('trade')
    table.drop()

table = session.create_table('trade', schema)

numRows = 40

df = pd.DataFrame()
df['realTime'] = sorted(np.random.randint(sys.maxsize, size=numRows).astype('datetime64[ns]'))
df['sym'] = np.random.choice(['aaa', 'bbb'], size=numRows).astype('str')
df['price'] = [x.astype('float64') for x in np.random.rand(numRows)]
df['size'] = np.random.randint(100, size=numRows).astype('int32')
table.insert(df)

table.query()

table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=5)
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]],n=5)
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=3) # similarity search
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=-3) # outlier search

As you can see in the above comparison, the main grammatical differences between running the Non-Transformed TSS search vs. other cases are:

Non-Transformed TSS Transformed TSS or Non-TSS
type tss flat, hnsw etc.
dims Not required Required
Entries in the search column Scalars Vectors
pytype of the search column float64 float32
Outlier search Available N/A

Next steps

Now that you're familiar with a Non-Transformed TSS search, try the following: