Skip to content

How to perform a Non-Transformed TSS search

This page details how to execute a Non-Transformed Temporal Similarity Search (Non-Transformed TSS) search in KDB.AI.

This feature is currently only available for KDB.AI Server 1.1.0.

Before we dive in, go to the Understanding Non-Transformed TSS search page to learn about this method.

To use the Non-Transformed TSS search, you don't need to extract vectors from the time series. The algorithm performs the following actions:

  1. Takes simple time series (numerical sequence stored in a kdb+ column) as input.
  2. Scans the time series with a sliding window (of same size as the query vector; size can change between two queries).
  3. Computes the list of distances between the query vector and each occurrence of the sliding window.
  4. Returns the k-nearest neighbors.

Setup

Before you start, make sure you have:

To store and search temporal data using the Non-Transformed TSS method, follow these steps:

  1. Import dependencies.
  2. Create schema.
  3. Insert data.
  4. Perform searches.

1. Import dependencies

Start by importing the following dependencies:

import pykx as kx
import kdbai_client as kdbai
from pprint import pprint # for pretty printing
import pandas as pd
import numpy as np

2. Create schema

Open a KDB.AI session to create a schema:

session = kdbai.Session()
session.list() # e.g. see that tables trade, quote is in the session

schema = dict(
    columns=[
        dict(
            name='realTime', 
            pytype='datetime64[ns]'
            ),
        dict(
            name='sym', 
            pytype='str'
            ),
        dict(
            name='price', 
            pytype='float32',
            vectorIndex=
                dict(
                    type='tss', # Note this line!!
                    metric='L2'

                    )
            ),
        dict(
            name='size', 
            pytype='int64'
            ),
        dict(
            name="sparseVectors",
            # pytype="int32", # type1 trial; this line is not contained in test_hybrid.py
            pytype="dict", # type2
            sparseIndex={
                "k": 1.25,
                "b": 0.75
            },
        ),
        ]
    )

if 'trade' in session.list():
    table = session.table('trade')
    table.drop()

table = session.create_table('trade', schema)

Read more about how to change the values of k and b on the Understanding hybrid search page.

3. Insert data

Create the data df that contains both the time series column price and the sparse column sparseVectors:

numRows1 = 40
numRows2 = 70
svPartial1 = [{int(y+1):1 for y in np.random.choice(range(12000),x+1,replace=False)}for x in np.random.choice(range(100),numRows1)]
svPartial2 = [{int(y+1):1 for y in np.random.choice(range(12000),x+1,replace=False)}for x in np.random.choice(range(120),numRows2)]
sparseVectors = svPartial1 + svPartial2

df = kx.q('{([] realTime:asc x?0p; sym:x?`aaa`bbb; price: x?1e; size:x?100j)}', numRows1 + numRows2).pd()
df['sparseVectors'] = sparseVectors

Insert df into the table:

table.insert(df)

Run a query to check the contents of the table:

table.query()

4. Perform searches

Now you can conduct a similarity search (searching along either the dense column, or the sparse column) or hybrid search (searching using both the dense column and the sparse column) as below:

# single query search/hybridSearch
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=5) # search along the dense column
table.search(sparseVectors[:1],5) # search along the sparse column
table.hybrid_search(dense_vectors=[[0,1,2,3,4,0,1,2,3,4]], sparse_vectors=sparseVectors[:1],n=5) # hybrid search using both the dense column and the sparse column

# multiple queries search/hybridSearch
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]],n=5)  # search along the dense column
table.search(sparseVectors[:2],5) # search along the sparse column
table.hybrid_search(dense_vectors=[[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]], sparse_vectors=sparseVectors[:2],n=5)  # hybrid search using both the dense column and the sparse column

You can also perform an outlier search along the dense column using a negative n:

table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=3) # similarity search
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=-3) # outlier search

Summary

By putting the above snippets of create/insert/search together, we obtain the below example snippet for the Transformed TSS method. If you're already familiar with the basic usage of KDB.AI, we attached a snippet with a Non-transformed TSS case so you can compare the two. Feel free to switch between the two tabs to spot the differences.

Example: Non-Transformed TSS hybrid search vs. hybrid search
import pykx as kx
import kdbai_client as kdbai
from pprint import pprint # for pretty printing
import pandas as pd
import numpy as np

session = kdbai.Session()
session.list() # e.g. see that tables trade, quote is in the session

schema = dict(
    columns=[
        dict(
            name='realTime', 
            pytype='datetime64[ns]'
            ),
        dict(
            name='sym', 
            pytype='str'
            ),
        dict(
            name='price', 
            pytype='float32', # default option; can be skipped
            # pytype='float64',  # float64 is also available for TSS
            vectorIndex=
                dict(
                    type='tss', # Note this line!!
                    metric='L2'

                    )
            ),
        dict(
            name='size', 
            pytype='int64'
            ),
        dict(
            name="sparseVectors",
            pytype="dict",
            sparseIndex={
                "k": 1.25,
                "b": 0.75
            },
        ),
        ]
    )

if 'trade' in session.list():
    table = session.table('trade')
    table.drop()

table = session.create_table('trade', schema)

numRows1 = 40
numRows2 = 70
svPartial1 = [{int(y+1):1 for y in np.random.choice(range(12000),x+1,replace=False)}for x in np.random.choice(range(100),numRows1)]
svPartial2 = [{int(y+1):1 for y in np.random.choice(range(12000),x+1,replace=False)}for x in np.random.choice(range(120),numRows2)]
sparseVectors = svPartial1 + svPartial2

df = kx.q('{([] realTime:asc x?0p; sym:x?`aaa`bbb; price: x?1e; size:x?100j)}', numRows1 + numRows2).pd() # if price column uses float32 in schema
# df = kx.q('{([] realTime:asc x?0p; sym:x?`aaa`bbb; price: x?1f; size:x?100j)}', numRows1 + numRows2).pd()  # if price column uses float64 in schema
df['sparseVectors'] = sparseVectors

table.insert(df)

table.query()

table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=5)
table.search(sparseVectors[:1],5)
table.hybrid_search(dense_vectors=[[0,1,2,3,4,0,1,2,3,4]], sparse_vectors=sparseVectors[:1],n=5)

table.search(vectors=[[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]],n=5)
table.search(sparseVectors[:2],5)
table.hybrid_search(dense_vectors=[[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]], sparse_vectors=sparseVectors[:2],n=5)

table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=3) # similarity search
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=-3) # outlier search
import pykx as kx
import kdbai_client as kdbai
from pprint import pprint # for pretty printing
import pandas as pd
import numpy as np

session = kdbai.Session()
session.list() # e.g. see that tables trade, quote is in the session

schema = dict(
    columns=[
        dict(
            name='realTime', 
            pytype='datetime64[ns]'
            ),
        dict(
            name='sym', 
            pytype='str'
            ),
        dict(
            name='price', 
            pytype='float32',

            vectorIndex=
                dict(
                    type='flat',
                    dims=10, # Note this line!! (TSS has no dims)
                    metric='L2'
                    )
            ),
        dict(
            name='size', 
            pytype='int64'
            ),
        dict(
            name="sparseVectors",
            pytype="dict",
            sparseIndex={
                "k": 1.25,
                "b": 0.75
            },
        ),
        ]
    )

if 'trade' in session.list():
    table = session.table('trade')
    table.drop()

table = session.create_table('trade', schema)

numRows1 = 40
numRows2 = 70
svPartial1 = [{int(y+1):1 for y in np.random.choice(range(12000),x+1,replace=False)}for x in np.random.choice(range(100),numRows1)]
svPartial2 = [{int(y+1):1 for y in np.random.choice(range(12000),x+1,replace=False)}for x in np.random.choice(range(120),numRows2)]
sparseVectors = svPartial1 + svPartial2

df = kx.q('{([] realTime:asc x?0p; sym:x?`aaa`bbb; price: (x;10)#(x*10)?1e; size:x?100j)}', numRows1 + numRows2).pd()

df['sparseVectors'] = sparseVectors

table.insert(df)

table.query()

table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=5)
table.search(sparseVectors[:1],5)
table.hybrid_search(dense_vectors=[[0,1,2,3,4,0,1,2,3,4]], sparse_vectors=sparseVectors[:1],n=5)

table.search(vectors=[[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]],n=5)
table.search(sparseVectors[:2],5)
table.hybrid_search(dense_vectors=[[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]], sparse_vectors=sparseVectors[:2],n=5)

As you can see in the above comparison, the main grammatical differences between running the Non-Transformed TSS search vs. the Transformed TSS search are:

TSS Non-TSS
type tss flat, hnsw etc.
dims Not required Required
Entries in the search column Scalars Vectors
pytype of the search column Can be either float32 (i.e. default) or float64 Should be float32
Outlier search Available N/A

Note

The type of the inserted data should be inline with the schema. For example, if float32 is used at schema, the inserted data should be e; In contrast, if float64 is used at schema, the inserted data should be f.

df = kx.q('{([] realTime:asc x?0p; sym:x?`aaa`bbb; price: x?1e; size:x?100j)}', numRows1 + numRows2).pd() # data to insert for float32 at schema
df = kx.q('{([] realTime:asc x?0p; sym:x?`aaa`bbb; price: x?1f; size:x?100j)}', numRows1 + numRows2).pd() # data to insert for float64 at schema

Tip

Use float64 to get more accurate distances in the search results.

Next steps

Now that you're familiar with a Non-Transformed TSS search, try the following: