Manage Tables

This page provides details on managing tables within the KDB.AI vector database.

In KDB.AI, the table is the fundamental structure for storing and organizing your data. Each table not only holds the actual vector data, but also includes crucial metadata that defines how the data is indexed and organized.

To create a table object, use the create_table function, which allows you to specify various attributes and settings for your table. Alternatively, you can retrieve an existing table using table function, which provides access to previously defined tables within your session.

Information on available tables

After you have connected with your KDB.AI session and fetched a database, you can retrieve information about the existing tables.

In python client tables property is cached on the database instance. If the tables are used by multiple users/sessions data.refresh() method updates the database's state

PythonRESTq

database = session.database('default')
database.tables

curl -s localhost:8082/api/v2/databases/default/tables

// gw is a handler to the gateway
gw(`listTables;enlist[`database]!enlist `default)

Schema

Before defining a new table, you must first design the schema. This is defined as dictionary that contains a list of columns. For each column you need to define the name and that it's a type- refer to the list of supported data types (type).

The column name must be unique within a table. In addition, avoid using the reserved column names date, int or the prefix label_.

The vector embeddings column should be type of float32s.

Parameter	Description
`metric`	The choice metric depends on the specific context and nature of your data. See available metrics in KDB.AI here.
`type`	Like metrics, the one you choose depends on your data and your overall performance requirements. See available indexes in KDB.AI here.

Depending on the choice of an index, there are additional parameters specific to that index that require configuration. For more information about these parameters and their default values, please refer to the dedicated index section.

PythonJSONq

schema = [
     {'name': 'id', 'type': 'int16'},
     {'name': 'tag', 'type': 'bool'},
     {'name': 'author', 'type': 'str'},
     {'name': 'length', 'type': 'int32'},
     {'name': 'content', 'type': 'bytes'},
     {'name': 'createdDate', 'type': 'datetime64[ns]'},
     {'name': 'embeddings', 'type': 'float32s'}
]

{
    "table": "documents",
    "schema": [
        {"name": "id", "type": "short"},
        {"name": "tag", "type": "boolean"},
        {"name": "author", "type": "char"},
        {"name": "length", "type": "int"},
        {"name": "content", "type": "char"},
        {"name": "createdDate", "type": "timestamp"},
        {"name": "embeddings", "type": "reals"}
    ]
}

schema: flip `name`type!(`id`tag`author`length`content`createdDate`embeddings;`h`b`s`i`C`p`E)

This helps you create a Sparse Index or conduct a Hybrid Search, a Transformed Temporal Similarity Search, and a Non-Transformed Temporal Similarity Search.

Indexes

To be able to run similarity search you need to create at least one index. A table can have multiple indexes even on the same column but index names must be unique per table. More details about supported indexes: supported indexes

PythonJSONq

indexes = [
    {'name': 'flat_index', 'column': 'embeddings', 'type': 'flat', 'params': {'dims': 25}}
    {'name': 'hnsw_fast', 'column': 'embeddings', 'type': 'hnsw', 'params': {'dims': 25, 'M': 8, 'efConstruction': 8}}
    {'name': 'sparse_index', 'column': 'sparse_column', 'type': '', 'params': {'k': 1.25, 'b': 0.75}}
]

{
    "indexes": [
        {"name": "flat_index", "column": "embeddings", "type": "flat", "params": {"dims": 25}}
        {"name": "hnsw_fast", "column": "embeddings", "type": "hnsw", "params": {"dims": 25, "M": 8, "efConstruction": 8}}
        {"name": "sparse_index", "column": "sparse_column", "type": "bm25", "params": {"k": 1.25, "b": 0.75}}
    ]
}

flatIndex: `name`column`type`params!(`flat_index;`embeddings;`flat;enlist[`dims]!enlist 25)
hnswFast: `name`column`type`params!(`hnsw_fast;`embeddings;`hnsw;`dims`M`efConstruction!(25;8;8))
sparseIndex: `name`column`type`params!(`sparse_index;`sparse_column;`bm25;`k`b!(1.25;0.75))
indexes: (flat_index;hnsw_fast;sparse_index)

Partition column

A table can be partitioned by providing a valid column name in partition_column parameter.

Embedding configurations

To reduce embeddings' size TSC configurations can be provided with embedding_configurations dictionary where keys are the column names.

PythonJSONq

embedding_configurations = {'embeddings': {'dims': 4, 'type': 'tsc', 'on_insert_error': 'reject_all'}}

{
   "embeddingConfigurations": {
        "embeddings": {
            "dims": 4,
            "type": "tsc",
            "on_insert_error": "reject_all"
        }
   } 
}

embeddingConfigurations: enlist[`embeddings]!enlist `dims`type`on_insert_error!(4;`tsc;`reject_all)

Reference tables

Since KDB.AI 1.4.0 you can use existing kdb+ tables by providing path on disk.

PythonJSONq

external_data_references = [{'path': b'/tmp/kx/remote', 'provider': 'kx'}]

{
   "externalDataReferences": [
        {
            "path": "/tmp/kx/remote",
            "provider": "kx"
        }
   ]
}

externalDataReferences: enlist `path`provider!("/tmp/kx/remote";`kx)

Create table

You can create multiple tables to suit your data organization needs.

When creating a new table, the associated index is also automatically generated based on the configuration provided on the indexs parameter. To set up a table after defining its schema, provide the desired table name along with the schema details specified above. For example, to create a table titled documents, run this command:

PythonRESTq

documents = session.create_table("documents", schema=schema, indexes=indexes)

curl -X POST -H "Content-Type: application/json" -d @schemaAbove.json localhost:8082/api/v2/databases/default/

// gw is a handler to the gateway
gw(`createTable;`database`table`schema`indexes!(`default;`documents`;schema;indexes))

Table configuration

The schema and index parameters that have been explicitly chosen during the table creation appear in the table configuration.

PythonRESTq

documents.schema
documents.indexes

curl -s localhost:8082/api/v2/databases/default/tables/documents

// gw is a handler to the gateway
gw(`getTable;`database`table!(`default;`documents`))

Delete table

Deleting a table deletes all the data together with the associated index.

You can delete a table in the KDB.AI Cloud UI from the Tables section, using the trash icon. You can only delete a complete table, not row level data.

Alternatively, from the command line, use:

PythonRESTq

documents.drop()

curl -s -X DELETE localhost:8082/api/v2/databases/default/tables/documents

// gw is a handler to the gateway
gw(`deleteTable;`database`table!(`default;`documents`))

Do not perform this action on a production database or in any environment where data deletion is not intended, because this action cannot be reverted.

Next steps

Once you have some tables created and your schema ready, you can do the following:

Ingest some data.
Perform a Hybrid search or a Transformed Temporal Similarity Search.