How to perform a Non-Transformed TSS search
This section details how to execute a Non-Transformed Temporal Similarity Search (Non-Transformed TSS) search in KDB.AI.
Added in v1.1.0.
Before we dive in, go to the Understanding Non-Transformed TSS search page to learn about this method.
To use the Non-Transformed TSS search, you don't need to extract vectors from the time series. The algorithm performs the following actions:
- Takes simple time series (numerical sequence stored in a kdb+ column) as input.
- Scans the time series with a sliding window (of same size as the query vector; size can change between two queries).
- Computes the list of distances between the query vector and each occurrence of the sliding window.
- Returns the k-nearest neighbors.
Setup
Before you start, make sure you have:
- An active KDB.AI Cloud or Server license
- Installed the latest version of KDB.AI Cloud or Server
- A valid API key if you're using KDB.AI Cloud
- Python Client
To store and search temporal data using the Non-Transformed TSS method, follow these steps:
1. Import dependencies
Start by importing the following dependencies:
import sys
import kdbai_client as kdbai
from pprint import pprint # for pretty printing
import pandas as pd
import numpy as np
2. Create schema
Open a KDB.AI session to create a schema:
session = kdbai.Session()
session.list() # for example, see that tables trade, quote is in the session
schema = dict(
columns=[
dict(
name='realTime',
pytype='datetime64[ns]'
),
dict(
name='sym',
pytype='str'
),
dict(
name='price',
pytype='float64',
vectorIndex=
dict(
type='tss', # Note this line!!
metric='L2'
)
),
dict(
name='size',
pytype='int32'
),
]
)
if 'trade' in session.list():
table = session.table('trade')
table.drop()
table = session.create_table('trade', schema)
3. Insert data
Create the data df
that contains the time series column price
:
numRows = 40
df = pd.DataFrame()
df['realTime'] = sorted(np.random.randint(sys.maxsize, size=numRows).astype('datetime64[ns]'))
df['sym'] = np.random.choice(['aaa', 'bbb'], size=numRows).astype('str')
df['price'] = [x.astype('float64') for x in np.random.rand(numRows)]
df['size'] = np.random.randint(100, size=numRows).astype('int32')
Insert df
into the table:
table.insert(df)
Run a query to check the contents of the table:
table.query()
4. Perform searches
Now you can conduct a similarity search (searching along either the dense column) as below:
# single query search
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=5) # search along the dense column
# multiple queries search
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]],n=5) # search along the dense column
You can also perform an outlier search along the dense column using a negative n
:
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=3) # similarity search
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=-3) # outlier search
Summary
By putting the above snippets of create/insert/search together, we obtain the below example snippet for the Transformed TSS method. If you're already familiar with the basic usage of KDB.AI, we attached a snippet with a Non-transformed TSS case so you can compare the two. Feel free to switch between the two tabs to spot the differences.
Example: Non-Transformed TSS search
import sys
import pykx as kx
import kdbai_client as kdbai
from pprint import pprint # for pretty printing
import pandas as pd
import numpy as np
session = kdbai.Session()
session.list() # for example, see that tables trade, quote is in the session
schema = dict(
columns=[
dict(
name='realTime',
pytype='datetime64[ns]'
),
dict(
name='sym',
pytype='str'
),
dict(
name='price',
pytype='float64',
vectorIndex=
dict(
type='tss', # Note this line!!
metric='L2'
)
),
dict(
name='size',
pytype='int32'
),
]
)
if 'trade' in session.list():
table = session.table('trade')
table.drop()
table = session.create_table('trade', schema)
numRows = 40
df = pd.DataFrame()
df['realTime'] = sorted(np.random.randint(sys.maxsize, size=numRows).astype('datetime64[ns]'))
df['sym'] = np.random.choice(['aaa', 'bbb'], size=numRows).astype('str')
df['price'] = [x.astype('float64') for x in np.random.rand(numRows)]
df['size'] = np.random.randint(100, size=numRows).astype('int32')
table.insert(df)
table.query()
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=5)
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4],[7,1,2,3,4,7,1,2,3,4]],n=5)
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=3) # similarity search
table.search(vectors=[[0,1,2,3,4,0,1,2,3,4]],n=-3) # outlier search
As you can see in the above comparison, the main grammatical differences between running the Non-Transformed TSS search vs. other cases are:
Non-Transformed TSS | Transformed TSS or Non-TSS | |
---|---|---|
type |
tss |
flat , hnsw etc. |
dims |
Not required | Required |
Entries in the search column | Scalars | Vectors |
pytype of the search column |
float64 |
float32 |
Outlier search | Available | N/A |
Next steps
Now that you're familiar with a Non-Transformed TSS search, try the following:
- Explore best practices and use cases on the KDB.AI Learning hub.
- Discover our GitHub repo, open the sample or run the notebook directly in Google Colab.
- Run the pattern matching notebook in Google Colab.
- Download the pattern matching Jupyter notebook and accompanying files from GitHub.
- Watch this YouTube video about Temporal Similarity Search for vector databases.