How to perform a Non-Transformed TSS search

This page details how to execute a Non-Transformed Temporal Similarity Search (Non-Transformed TSS) search in KDB.AI.

This feature is currently only available for KDB.AI Server 1.1.0.

Before we dive in, go to the Understanding Non-Transformed TSS search page to learn about this method.

To use the Non-Transformed TSS search, you don't need to extract vectors from the time series. The algorithm performs the following actions:

Takes simple time series (numerical sequence stored in a kdb+ column) as input.
Scans the time series with a sliding window (of same size as the query vector; size can change between two queries).
Computes the list of distances between the query vector and each occurrence of the sliding window.
Returns the k-nearest neighbors.

Setup

Before you start, make sure you have:

a KDB.AI Server license,
set up your KDB.AI Server, and
the latest version of the server and python client.

To store and search temporal data using the Non-Transformed TSS method, follow these steps:

Import dependencies.
Create schema.
Insert data.
Perform searches.

1. Import dependencies

Start by importing the following dependencies:

import pykx as kx
import kdbai_client as kdbai
from pprint import pprint # for pretty printing
import pandas as pd
import numpy as np

2. Create schema

Open a KDB.AI session to create a schema:

session = kdbai.Session()
session.list() # e.g. see that tables trade, quote is in the session

schema = dict(
    columns=[
        dict(
            name='realTime', 
            pytype='datetime64[ns]'
            ),
        dict(
            name='sym', 
            pytype='str'
            ),
        dict(
            name='price', 
            pytype='float32',
            vectorIndex=
                dict(
                    type='tss', # Note this line!!
                    metric='L2'

                    )
            ),
        dict(
            name='size', 
            pytype='int64'
            ),
        dict(
            name="sparseVectors",
            # pytype="int32", # type1 trial; this line is not contained in test_hybrid.py
            pytype="dict", # type2
            sparseIndex={
                "k": 1.25,
                "b": 0.75
            },
        ),
        ]
    )

if 'trade' in session.list():
    table = session.table('trade')
    table.drop()

table = session.create_table('trade', schema)

Read more about how to change the values of k and b on the Understanding hybrid search page.

3. Insert data

Create the data df that contains both the time series column price and the sparse column sparseVectors:

numRows1 = 40
numRows2 = 70
svPartial1 = [{int(y+1):1 for y in np.random.choice(range(12000),x+1,replace=False)}for x in np.random.choice(range(100),numRows1)]
svPartial2 = [{int(y+1):1 for y in np.random.choice(range(12000),x+1,replace=False)}for x in np.random.choice(range(120),numRows2)]
sparseVectors = svPartial1 + svPartial2

df = kx.q('{([] realTime:asc x?0p; sym:x?`aaa`bbb; price: x?1e; size:x?100j)}', numRows1 + numRows2).pd()
df['sparseVectors'] = sparseVectors

Insert df into the table:

table.insert(df)

Run a query to check the contents of the table:

table.query()

4. Perform searches

Now you can conduct a similarity search (searching along either the dense column, or the sparse column) or hybrid search (searching using both the dense column and the sparse column) as below:

# single query search/hybridSearch
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=5) # search along the dense column
table.search(sparseVectors[:1],5) # search along the sparse column
table.hybrid_search(dense_vectors=[[0,1,2,3,4,0,1,2,3,4]], sparse_vectors=sparseVectors[:1],n=5) # hybrid search using both the dense column and the sparse column

# multiple queries search/hybridSearch
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]],n=5)  # search along the dense column
table.search(sparseVectors[:2],5) # search along the sparse column
table.hybrid_search(dense_vectors=[[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]], sparse_vectors=sparseVectors[:2],n=5)  # hybrid search using both the dense column and the sparse column

You can also perform an outlier search along the dense column using a negative n:

table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=3) # similarity search
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=-3) # outlier search

Summary

By putting the above snippets of create/insert/search together, we obtain the below example snippet for the Transformed TSS method. If you're already familiar with the basic usage of KDB.AI, we attached a snippet with a Non-transformed TSS case so you can compare the two. Feel free to switch between the two tabs to spot the differences.

Example: Non-Transformed TSS hybrid search vs. hybrid search

Non-Transformed TSS hybrid searchHybrid search

import pykx as kx
import kdbai_client as kdbai
from pprint import pprint # for pretty printing
import pandas as pd
import numpy as np

session = kdbai.Session()
session.list() # e.g. see that tables trade, quote is in the session

schema = dict(
    columns=[
        dict(
            name='realTime', 
            pytype='datetime64[ns]'
            ),
        dict(
            name='sym', 
            pytype='str'
            ),
        dict(
            name='price', 
            pytype='float32', # default option; can be skipped
            # pytype='float64',  # float64 is also available for TSS
            vectorIndex=
                dict(
                    type='tss', # Note this line!!
                    metric='L2'

                    )
            ),
        dict(
            name='size', 
            pytype='int64'
            ),
        dict(
            name="sparseVectors",
            pytype="dict",
            sparseIndex={
                "k": 1.25,
                "b": 0.75
            },
        ),
        ]
    )

if 'trade' in session.list():
    table = session.table('trade')
    table.drop()

table = session.create_table('trade', schema)

numRows1 = 40
numRows2 = 70
svPartial1 = [{int(y+1):1 for y in np.random.choice(range(12000),x+1,replace=False)}for x in np.random.choice(range(100),numRows1)]
svPartial2 = [{int(y+1):1 for y in np.random.choice(range(12000),x+1,replace=False)}for x in np.random.choice(range(120),numRows2)]
sparseVectors = svPartial1 + svPartial2

df = kx.q('{([] realTime:asc x?0p; sym:x?`aaa`bbb; price: x?1e; size:x?100j)}', numRows1 + numRows2).pd() # if price column uses float32 in schema
# df = kx.q('{([] realTime:asc x?0p; sym:x?`aaa`bbb; price: x?1f; size:x?100j)}', numRows1 + numRows2).pd()  # if price column uses float64 in schema
df['sparseVectors'] = sparseVectors

table.insert(df)

table.query()

table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=5)
table.search(sparseVectors[:1],5)
table.hybrid_search(dense_vectors=[[0,1,2,3,4,0,1,2,3,4]], sparse_vectors=sparseVectors[:1],n=5)

table.search(vectors=[[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]],n=5)
table.search(sparseVectors[:2],5)
table.hybrid_search(dense_vectors=[[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]], sparse_vectors=sparseVectors[:2],n=5)

table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=3) # similarity search
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=-3) # outlier search

import pykx as kx
import kdbai_client as kdbai
from pprint import pprint # for pretty printing
import pandas as pd
import numpy as np

session = kdbai.Session()
session.list() # e.g. see that tables trade, quote is in the session

schema = dict(
    columns=[
        dict(
            name='realTime', 
            pytype='datetime64[ns]'
            ),
        dict(
            name='sym', 
            pytype='str'
            ),
        dict(
            name='price', 
            pytype='float32',

            vectorIndex=
                dict(
                    type='flat',
                    dims=10, # Note this line!! (TSS has no dims)
                    metric='L2'
                    )
            ),
        dict(
            name='size', 
            pytype='int64'
            ),
        dict(
            name="sparseVectors",
            pytype="dict",
            sparseIndex={
                "k": 1.25,
                "b": 0.75
            },
        ),
        ]
    )

if 'trade' in session.list():
    table = session.table('trade')
    table.drop()

table = session.create_table('trade', schema)

numRows1 = 40
numRows2 = 70
svPartial1 = [{int(y+1):1 for y in np.random.choice(range(12000),x+1,replace=False)}for x in np.random.choice(range(100),numRows1)]
svPartial2 = [{int(y+1):1 for y in np.random.choice(range(12000),x+1,replace=False)}for x in np.random.choice(range(120),numRows2)]
sparseVectors = svPartial1 + svPartial2

df = kx.q('{([] realTime:asc x?0p; sym:x?`aaa`bbb; price: (x;10)#(x*10)?1e; size:x?100j)}', numRows1 + numRows2).pd()

df['sparseVectors'] = sparseVectors

table.insert(df)

table.query()

table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=5)
table.search(sparseVectors[:1],5)
table.hybrid_search(dense_vectors=[[0,1,2,3,4,0,1,2,3,4]], sparse_vectors=sparseVectors[:1],n=5)

table.search(vectors=[[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]],n=5)
table.search(sparseVectors[:2],5)
table.hybrid_search(dense_vectors=[[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]], sparse_vectors=sparseVectors[:2],n=5)

As you can see in the above comparison, the main grammatical differences between running the Non-Transformed TSS search vs. the Transformed TSS search are:

	TSS	Non-TSS
`type`	`tss`	`flat`, `hnsw` etc.
`dims`	Not required	Required
Entries in the search column	Scalars	Vectors
`pytype` of the search column	Can be either `float32` (i.e. default) or `float64`	Should be `float32`
Outlier search	Available	N/A

Note

The type of the inserted data should be inline with the schema. For example, if float32 is used at schema, the inserted data should be e; In contrast, if float64 is used at schema, the inserted data should be f.

df = kx.q('{([] realTime:asc x?0p; sym:x?`aaa`bbb; price: x?1e; size:x?100j)}', numRows1 + numRows2).pd() # data to insert for float32 at schema
df = kx.q('{([] realTime:asc x?0p; sym:x?`aaa`bbb; price: x?1f; size:x?100j)}', numRows1 + numRows2).pd() # data to insert for float64 at schema

Tip

Use float64 to get more accurate distances in the search results.

Next steps

Now that you're familiar with a Non-Transformed TSS search, try the following:

Explore best practices and use cases on the Learning Hub.
Check out this example notebook.