Brute Force (Flat)

This page describes the brute force (flat) index parameters as part of AI libs.

A flat index (or brute-force index) is the most direct form of vector indexing, storing every vector embedding without transformation, compression, or clustering. It guarantees 100% recall and precision by exhaustively comparing the query vector against all stored vectors to return the k-nearest neighbors. While highly accurate, this approach is slower and less efficient than optimized or approximate indexing methods.

.ai.flat.normalize

The .ai.flat.normalize function normalizes vectors so they can be compared using an inner product metric instead of cosine similarity.

Cosine similarity (CS) is mathematically equivalent to the inner product (IP) metric performed on normalized vectors. By normalizing your vectors before inserting them into hnsw or flat, and also normalizing incoming search vectors, you can use the inner product metric instead of cosine similarity, thus yielding identical results. This significantly reduces search and insert times, as it removes repeated normalization.

You can use this function, however, note that it is optimized for speed, not memory utilization. If you're converting large vector stores, it's best to do them in smaller chunks using the formula {8h$x%sqrt sum x*x}.

Parameters

Name	Type(s)	Description
`embs`	real[][]	The original un-normalized vector embeddings

Returns

Type	Description
real[][]	Returns normalized vector embeddings

Example

q).ai:use`kx.ai
q)vecs:{(x;y)#(x*y)?1e}[100000;10];
q)\ts:1000 res1:.ai.flat.search[vecs;first vecs;5;`CS]
676 2099232
q)nvecs:.ai.flat.normalize vecs;
q)\ts:1000 res2:.ai.flat.search[nvecs;first nvecs;5;`IP]
544 2099232
q)res1[1]~res2[1]
1b

This example shows that cosine similarity (CS) is equivalent to inner product (IP) when vectors are normalized. The example first performs a cosine similarity (CS) search on unnormalized vectors, then normalizes them using .ai.flat.normalize and repeats the search with an inner product (IP) metric. The results are identical, but the normalized version runs faster (544 ms vs. 676 ms). This demonstrates how pre-normalization removes the need for repeated normalization at query time, improving performance without changing accuracy.

.ai.flat.search

The .ai.flat.search function conducts a parallelized flat search, returning k nearest neighbors under the provided metric.

By leveraging multiple threads, this function accelerates similarity search across large vector datasets. It is designed for high-performance retrieval where speed and scalability are critical, making it the preferred method over .scan when working with larger workloads or latency-sensitive applications.

A flat search performs an exhaustive scan through a set of embeddings for a given query vector or set of query vectors. Since the distance between the query vector and each vector in the search space is calculated the recall is 100%, unlike the other available indexes that explore subspaces of the embeddings and don't guarantee 100% recall. And it's an exhaustive search, the query time is slower than other indexes so it's primarily recommended with small vector spaces or real-time ingestion where the time taken to insert into an index would otherwise create a bottleneck.

Parameters

Name	Type(s)	Description
`embs`	real[][]	The set of vectors to conduct kNN search against
`q`	real[] \| float[]	The query vector
`k`	short \| int \| long	The number of nearest neighbors to return
`metric`	symbol	A metric for search, one of (`L2`; `CS`; `IP`)

Returns

Type	Description
(real[];long[])	The nearest points and the corresponding distance under the given metric

Example

q).ai:use`kx.ai
q)vecs:{(x;y)#(x*y)?1e}[1000;10];
q).ai.flat.search[vecs;first vecs;5;`L2]
0 0.2658583 0.3118592 0.3484065 0.3911282
0 404       631       93        241

This example highlights a Euclidean distance (L2) search, showing both the distances to and the indices of the 5 nearest neighbors of the query vector.