Manage Sym Files

This guide explains how to manage symbol files in KDB-X. You will learn how to persist enumerations, verify data integrity, safely migrate tables using IPC, and reclaim resources with compaction.

Overview

Understand symbol enumeration: How KDB-X stores symbols
Set up the environment: Create a workspace
Populate the sym file: Create dummy data and save it
Verify the enumeration: Check data integrity
Migrate data between databases: Move tables safely
Clean up the sym file: Remove unused symbols
Manage custom domains: Handle specialized tasks
Optimize sym file performance: Manage bloat and contention
Explore advanced sym file maintenance: Read the white paper

Understand symbol enumeration

KDB-X stores symbols as integers to save space, a process called enumeration. The system uses a sym file to map text strings to these integers, which significantly accelerates key comparisons. For more details, read Q for Mortals.

flowchart LR
    %% Styles
    classDef dataNode fill:#1565C0,stroke:#0D47A1,color:#fff,stroke-width:2px;
    classDef fileNode fill:#E65100,stroke:#BF360C,color:#fff,stroke-width:2px;
    classDef colNode fill:#2E7D32,stroke:#1B5E20,color:#fff,stroke-width:2px;

    subgraph Input [Input Symbols]
        direction TB
        I1("#96;AAPL"):::dataNode
        I2("#96;GOOG"):::dataNode
        I3("#96;AAPL"):::dataNode
    end

    subgraph SymMap [Sym File]
        direction TB
        Sym("<div style='text-align: left'>0: #96;AAPL<br/>1: #96;GOOG</div>"):::fileNode
    end

    subgraph Result [Enumerated Column]
        direction TB
        Out("<div style='text-align: center'>0<br/>1<br/>0</div>"):::colNode
    end

    %% Connections
    I1 -- "Map" ---> Sym
    I2 -- "Map" ---> Sym
    I3 -- "Reuse" ---> Sym
    Sym -- " Indices " ----> Out

    linkStyle default stroke:#666,stroke-width:2px,fill:none;

Figure 1: Symbol enumeration process

Set up the environment

To follow this guide, create a dedicated workspace and start a KDB-X session on port 5001. Using a fresh environment ensures the guide's files are contained within a single directory.

Linux / macOSWindows

mkdir -p guide_workspace && cd guide_workspace
q -p 5001

New-Item -ItemType Directory -Force -Path "guide_workspace"; Set-Location "guide_workspace"
q -p 5001

Populate the sym file

To understand how sym files work, you will create two common database structures: a splayed table and a partitioned database.

Begin by generating dummy data for the tests:

q)// Define paths for the two test databases
q)srcSplayed:`:hdb_splayed;
q)srcPartitioned:`:hdb_partitioned;
q)// Generate 10,000 rows of dummy data
q)n:10000;
q)daily:([]time:n?09:30:00.0;sym:n?`GOOG`MSFT`AAPL`AMZN;price:n?100.0;size:n?100);
q)trade:([]time:n?09:30:00.0;sym:n?`TSLA`NFLX`META`AAPL;price:n?100.0;size:n?100);

Save the daily table as a splayed table. Use the .Q.en function to enumerate all symbol columns against a file named sym in the target directory. This function uses enum extend (?) as the underlying operator.

q)// Save 'daily' table splayed
q)// .Q.en enumerates the 'sym' column against `:hdb_splayed/sym
q)`:hdb_splayed/daily/ set .Q.en[srcSplayed] daily;

Next, save the trade table as a partitioned table. The .Q.dpft function handles enumeration automatically and creates a new, independent sym file for this database.

q)// Save 'trade' table partitioned by date (2026.01.01)
q)// .Q.dpft handles enumeration automatically against `:hdb_partitioned/sym
q).Q.dpft[srcPartitioned;2026.01.01;`sym;`trade];

Separate sym files

You now have two separate sym files: one in hdb_splayed and one in hdb_partitioned. They are not compatible with each other because they likely map the same integers to different symbols (or vice versa).

Verify the enumeration

After saving the data, verify that KDB-X correctly converted the symbols to integers.

The most practical way to verify this is to cast the column to an integer. This works even if the sym file is loaded in memory.

q)// Reload database to ensure sym is loaded
q)\l hdb_splayed
q)// Cast to integer to see the underlying indices
q)show raw:`int$get `:daily/sym;
`sym!0 1 2 1 2 1 3 0 2 2 3 3 0 0 2 1 3 2 1 0 2 2 2 3 1 0 ..

Exploratory: Simulate a missing sym file

To deeply understand how KDB-X depends on the sym file, you can simulate a missing or corrupted file by deleting the sym variable from memory.

Note: This is strictly for demonstration. In a production environment, you would never delete the sym variable.

q)// 1. Delete sym to simulate missing/corrupted file
q)delete sym from `.

q)// 2. Read the 'sym' column directly
q)// Returns integers (indices) only because the map is gone
q)show raw:get `:daily/sym;
`sym!0 1 2 1 2 1 3 0 2 2 3 3 0 0 2 1 3 2 1 0 2 2 2 3 1 0 ..

Migrate data between databases

Symbol mappings often differ between databases. For example, AAPL might be 0 in your source database but 1 in your destination. If you copy a file directly, the destination database will misread the integer 0, corrupting your data.

To migrate safely, you must convert indices back to symbols (de-enumerate) and then map them to the new database's indices (re-enumerate).

Use IPC for this. IPC automatically converts indices to symbols when you query a remote table.

In this example, you will migrate the daily table from the splayed database to the partitioned database.

Process breakdown

Start a Destination Session: Open a new terminal window
Connect to Source: Connect to the source process on port 5001
Query & Re-enumerate: Pull the data. IPC automatically converts indices to symbols. Then, use .Q.en to map those symbols to the destination's sym file
Save: Write the re-enumerated table to disk

Open a new terminal window and navigate to your workspace. Start a fresh q session (do not set a port):

cd guide_workspace
q

Now, within this new session, migrate the data:

q)// 1. Connect to the Source process
q)src:hopen 5001;
q)dst:`:hdb_partitioned;
q)// 2. Query remote table 'daily'
q)//    IPC automates de-enumeration only sending headers and symbol strings
q)t:src "select from daily";
q)// 3. Re-enumerate against Destination Database
q)t: .Q.en[dst] t;
q)// 4. Save as 'trade' in a new partition (for example, 2026.01.02)
q)`:hdb_partitioned/2026.01.02/trade/ set t;
`:hdb_partitioned/2026.01.02/trade/

Clean up the sym file

Over time, sym files grow and collect unused symbols, such as delisted stocks. "Compacting" the file removes these unused entries.

Compaction rewrites the sym file and updates every column that references it. This takes time and resources, so you should only run it during scheduled maintenance.

How the compaction algorithm works

High-level process:

Backup: Rename the current sym file (for example, to zym)
Reset: Create a new, empty sym file
Identify: Find every symbol column in the database
Re-enumerate: For each column:
- Load data using the old sym file
- Save data using the new sym file

Compaction Script

Create a file named compact.q in your workspace with the following content. This script defines a compactSym function that safely switches to the target database directory, performs the compaction, and then restores your original path.

Use at your own risk

This is an all-or-nothing approach. Run the code below at your own risk.

Ensure you understand what it does, and test it against a dev HDB you are happy to destroy in the event of an error.

compactSym:{[hdb]
  cwd:system"cd";             / save current directory
  system"cd ",1_string hdb;   / switch to hdb directory
  system"mv sym zym";         / backup sym file
  `:sym set `symbol$();       / create new empty sym file

  files:key `:.;
  dates:files where files like "????.??.??";

  {[d]
    root:":",string d;
    tableNames:string key `$root;
    tableRoot:root,/:"/",/:tableNames;
    files:raze {`$x,/:"/",/:string key `$x}each tableRoot;
    files:files where not files like "*#";
    types:type each get each files;
    enumeratedFiles:files where types within 20 76h;
    / if we have more than one enum better get help
    if[any types within 21 76h;'"too difficult"];  
    {
        `sym set get `:zym;
        s:get x;
        a:attr s;
        s:value s;
        `sym set get `:sym;
        s:a#.Q.en[`:.;([]s:s)]`s;
        x set s;
        -1 "re-enumerated ", string x;
    }each enumeratedFiles;
   }each dates;

   system"cd ",cwd;           / restore directory
   -1 "Compaction Complete.";
 };

To execute the compaction, load the script and run the function against your partitioned database:

q)\l compact.q
q)compactSym[`:hdb_partitioned];
re-enumerated :hdb_partitioned/2026.01.01/trade/sym
Compaction Complete.

Manage custom domains

Most databases use a single, automatic sym file. However, you might sometimes need to enumerate symbols against a specific, manual list. This is useful for legacy systems or when you need strict control over integer mappings.

q)// Create dummy data (if needed)
q)trade:([]sym:10?`AAPL`GOOG`MSFT;price:10?100.0);
q)// 1. Create a custom domain (list of symbols)
q)manualSym:distinct trade`sym;
q)// 2. Enumerate against this domain using '?' (Find)
q)trade:update `manualSym$sym from trade;
q)// 3. Save the domain and the table
q)`:hdb_manual/manualSym set manualSym;
q)`:hdb_manual/trade/ set trade;

Protect your sym file

The sym file is the only link between your data and its meaning. If this file is deleted or corrupted, the integers stored in your database become unreadable, effectively destroying your dataset. Always maintain backups of your sym file.

Reduce risk with multiple sym files

Relying on a single global sym file creates a single point of failure and complicates dependency management. To mitigate this risk, consider giving each table (or small group of tables) its own sym file. This significantly reduces the impact of corruption—limiting it to just that table and makes moving or managing individual tables much safer.

Optimize sym file performance

Managing sym files effectively requires choosing between a monolithic or distributed approach based on your system's needs.

Avoid sym bloat

"Sym bloat" happens when a sym file gets too big, often containing millions of unused or rarely accessed symbols.

Use multiple sym files: Assign a separate sym file to each table. This keeps files small and isolates errors
High Cardinality Data: Do not use symbols for columns with millions of unique values (like Order IDs). Use strings or GUIDs. This keeps your sym files fast

Understand performance bottlenecks

Large sym files cause two primary problems:

Slow Enumeration: Checking new symbols against a large file takes longer, slowing down data loading
Sym Contention: Multiple processes accessing the same sym file must wait for file locks. Using multiple sym files reduces this waiting time

Compression

You can Compress your data columns to save space, but never compress the sym file itself as it requires constant random access.

Explore advanced sym file maintenance

This guide covers standard maintenance. Complex architectures may require advanced techniques. Read the working with sym files white paper for details on:

Advanced Enumeration: 64-bit enumerations (20h) versus legacy 32-bit
Incremental Maintenance: Strategies for rsync and dbmaint.q
Datatype Choices: optimization guidelines for symbols versus strings versus GUIDs
Multithreaded Operations: Parallel compaction with peach
Multi-Sym Architectures: Using .Q.ens for segmented databases

Summary

In this guide, you learned how to:

Enumerate symbols: Map text strings to integers for efficient storage
Populate data: Create splayed and partitioned databases
Verify enumeration: Inspect raw integers vs. mapped symbols
Migrate data: Use IPC to safely move tables between databases
Compact files: Remove unused symbols to reclaim space
Manage custom domains: Handle manual enumeration lists
Optimize performance: Manage bloat, contention, and lookups