Manage Sym Files
This guide explains how to manage symbol files in KDB-X. You will learn how to persist enumerations, verify data integrity, safely migrate tables using IPC, and reclaim resources with compaction.
Overview
- Understand symbol enumeration: How KDB-X stores symbols
- Set up the environment: Create a workspace
- Populate the sym file: Create dummy data and save it
- Verify the enumeration: Check data integrity
- Migrate data between databases: Move tables safely
- Clean up the sym file: Remove unused symbols
- Manage custom domains: Handle specialized tasks
- Optimize sym file performance: Manage bloat and contention
- Explore advanced sym file maintenance: Read the white paper
Understand symbol enumeration
KDB-X stores symbols as integers to save space, a process called enumeration. The system uses a sym file to map text strings to these integers, which significantly accelerates key comparisons. For more details, read Q for Mortals.
flowchart LR
%% Styles
classDef dataNode fill:#1565C0,stroke:#0D47A1,color:#fff,stroke-width:2px;
classDef fileNode fill:#E65100,stroke:#BF360C,color:#fff,stroke-width:2px;
classDef colNode fill:#2E7D32,stroke:#1B5E20,color:#fff,stroke-width:2px;
subgraph Input [Input Symbols]
direction TB
I1("#96;AAPL"):::dataNode
I2("#96;GOOG"):::dataNode
I3("#96;AAPL"):::dataNode
end
subgraph SymMap [Sym File]
direction TB
Sym("<div style='text-align: left'>0: #96;AAPL<br/>1: #96;GOOG</div>"):::fileNode
end
subgraph Result [Enumerated Column]
direction TB
Out("<div style='text-align: center'>0<br/>1<br/>0</div>"):::colNode
end
%% Connections
I1 -- "Map" ---> Sym
I2 -- "Map" ---> Sym
I3 -- "Reuse" ---> Sym
Sym -- " Indices " ----> Out
linkStyle default stroke:#666,stroke-width:2px,fill:none;
Set up the environment
To follow this guide, create a dedicated workspace and start a KDB-X session on port 5001. Using a fresh environment ensures the guide's files are contained within a single directory.
mkdir -p guide_workspace && cd guide_workspace
q -p 5001
New-Item -ItemType Directory -Force -Path "guide_workspace"; Set-Location "guide_workspace"
q -p 5001
Populate the sym file
To understand how sym files work, you will create two common database structures: a splayed table and a partitioned database.
Begin by generating dummy data for the tests:
q)// Define paths for the two test databases
q)srcSplayed:`:hdb_splayed;
q)srcPartitioned:`:hdb_partitioned;
q)// Generate 10,000 rows of dummy data
q)n:10000;
q)daily:([]time:n?09:30:00.0;sym:n?`GOOG`MSFT`AAPL`AMZN;price:n?100.0;size:n?100);
q)trade:([]time:n?09:30:00.0;sym:n?`TSLA`NFLX`META`AAPL;price:n?100.0;size:n?100);
Save the daily table as a splayed table. Use the .Q.en function to enumerate all symbol columns against a file named sym in the target directory. This function uses enum extend (?) as the underlying operator.
q)// Save 'daily' table splayed
q)// .Q.en enumerates the 'sym' column against `:hdb_splayed/sym
q)`:hdb_splayed/daily/ set .Q.en[srcSplayed] daily;
Next, save the trade table as a partitioned table. The .Q.dpft function handles enumeration automatically and creates a new, independent sym file for this database.
q)// Save 'trade' table partitioned by date (2026.01.01)
q)// .Q.dpft handles enumeration automatically against `:hdb_partitioned/sym
q).Q.dpft[srcPartitioned;2026.01.01;`sym;`trade];
Separate sym files
You now have two separate sym files: one in hdb_splayed and one in hdb_partitioned. They are not compatible with each other because they likely map the same integers to different symbols (or vice versa).
Verify the enumeration
After saving the data, verify that KDB-X correctly converted the symbols to integers.
The most practical way to verify this is to cast the column to an integer. This works even if the sym file is loaded in memory.
q)// Reload database to ensure sym is loaded
q)\l hdb_splayed
q)// Cast to integer to see the underlying indices
q)show raw:`int$get `:daily/sym;
`sym!0 1 2 1 2 1 3 0 2 2 3 3 0 0 2 1 3 2 1 0 2 2 2 3 1 0 ..
Exploratory: Simulate a missing sym file
To deeply understand how KDB-X depends on the sym file, you can simulate a missing or corrupted file by deleting the sym variable from memory.
Note: This is strictly for demonstration. In a production environment, you would never delete the sym variable.
q)// 1. Delete sym to simulate missing/corrupted file
q)delete sym from `.
q)// 2. Read the 'sym' column directly
q)// Returns integers (indices) only because the map is gone
q)show raw:get `:daily/sym;
`sym!0 1 2 1 2 1 3 0 2 2 3 3 0 0 2 1 3 2 1 0 2 2 2 3 1 0 ..
Migrate data between databases
Symbol mappings often differ between databases. For example, AAPL might be 0 in your source database but 1 in your destination. If you copy a file directly, the destination database will misread the integer 0, corrupting your data.
To migrate safely, you must convert indices back to symbols (de-enumerate) and then map them to the new database's indices (re-enumerate).
Use IPC for this. IPC automatically converts indices to symbols when you query a remote table.
In this example, you will migrate the daily table from the splayed database to the partitioned database.
Process breakdown
- Start a Destination Session: Open a new terminal window
- Connect to Source: Connect to the source process on port 5001
- Query & Re-enumerate: Pull the data. IPC automatically converts indices to symbols. Then, use
.Q.ento map those symbols to the destination's sym file - Save: Write the re-enumerated table to disk
Open a new terminal window and navigate to your workspace. Start a fresh q session (do not set a port):
cd guide_workspace
q
Now, within this new session, migrate the data:
q)// 1. Connect to the Source process
q)src:hopen 5001;
q)dst:`:hdb_partitioned;
q)// 2. Query remote table 'daily'
q)// IPC automates de-enumeration only sending headers and symbol strings
q)t:src "select from daily";
q)// 3. Re-enumerate against Destination Database
q)t: .Q.en[dst] t;
q)// 4. Save as 'trade' in a new partition (for example, 2026.01.02)
q)`:hdb_partitioned/2026.01.02/trade/ set t;
`:hdb_partitioned/2026.01.02/trade/
Clean up the sym file
Over time, sym files grow and collect unused symbols, such as delisted stocks. "Compacting" the file removes these unused entries.
Compaction rewrites the sym file and updates every column that references it. This takes time and resources, so you should only run it during scheduled maintenance.
How the compaction algorithm works
High-level process:
- Backup: Rename the current sym file (for example, to
zym) - Reset: Create a new, empty sym file
- Identify: Find every symbol column in the database
- Re-enumerate: For each column:
- Load data using the old sym file
- Save data using the new sym file
Compaction Script
Create a file named compact.q in your workspace with the following content. This script defines a compactSym function that safely switches to the target database directory, performs the compaction, and then restores your original path.
Use at your own risk
This is an all-or-nothing approach. Run the code below at your own risk.
Ensure you understand what it does, and test it against a dev HDB you are happy to destroy in the event of an error.
compactSym:{[hdb]
cwd:system"cd"; / save current directory
system"cd ",1_string hdb; / switch to hdb directory
system"mv sym zym"; / backup sym file
`:sym set `symbol$(); / create new empty sym file
files:key `:.;
dates:files where files like "????.??.??";
{[d]
root:":",string d;
tableNames:string key `$root;
tableRoot:root,/:"/",/:tableNames;
files:raze {`$x,/:"/",/:string key `$x}each tableRoot;
files:files where not files like "*#";
types:type each get each files;
enumeratedFiles:files where types within 20 76h;
/ if we have more than one enum better get help
if[any types within 21 76h;'"too difficult"];
{
`sym set get `:zym;
s:get x;
a:attr s;
s:value s;
`sym set get `:sym;
s:a#.Q.en[`:.;([]s:s)]`s;
x set s;
-1 "re-enumerated ", string x;
}each enumeratedFiles;
}each dates;
system"cd ",cwd; / restore directory
-1 "Compaction Complete.";
};
To execute the compaction, load the script and run the function against your partitioned database:
q)\l compact.q
q)compactSym[`:hdb_partitioned];
re-enumerated :hdb_partitioned/2026.01.01/trade/sym
Compaction Complete.
Manage custom domains
Most databases use a single, automatic sym file. However, you might sometimes need to enumerate symbols against a specific, manual list. This is useful for legacy systems or when you need strict control over integer mappings.
q)// Create dummy data (if needed)
q)trade:([]sym:10?`AAPL`GOOG`MSFT;price:10?100.0);
q)// 1. Create a custom domain (list of symbols)
q)manualSym:distinct trade`sym;
q)// 2. Enumerate against this domain using '?' (Find)
q)trade:update `manualSym$sym from trade;
q)// 3. Save the domain and the table
q)`:hdb_manual/manualSym set manualSym;
q)`:hdb_manual/trade/ set trade;
Protect your sym file
The sym file is the only link between your data and its meaning. If this file is deleted or corrupted, the integers stored in your database become unreadable, effectively destroying your dataset. Always maintain backups of your sym file.
Reduce risk with multiple sym files
Relying on a single global sym file creates a single point of failure and complicates dependency management. To mitigate this risk, consider giving each table (or small group of tables) its own sym file. This significantly reduces the impact of corruption—limiting it to just that table and makes moving or managing individual tables much safer.
Optimize sym file performance
Managing sym files effectively requires choosing between a monolithic or distributed approach based on your system's needs.
Avoid sym bloat
"Sym bloat" happens when a sym file gets too big, often containing millions of unused or rarely accessed symbols.
- Use multiple sym files: Assign a separate sym file to each table. This keeps files small and isolates errors
- High Cardinality Data: Do not use symbols for columns with millions of unique values (like Order IDs). Use strings or GUIDs. This keeps your sym files fast
Understand performance bottlenecks
Large sym files cause two primary problems:
- Slow Enumeration: Checking new symbols against a large file takes longer, slowing down data loading
- Sym Contention: Multiple processes accessing the same sym file must wait for file locks. Using multiple sym files reduces this waiting time
Compression
You can Compress your data columns to save space, but never compress the sym file itself as it requires constant random access.
Explore advanced sym file maintenance
This guide covers standard maintenance. Complex architectures may require advanced techniques. Read the working with sym files white paper for details on:
- Advanced Enumeration: 64-bit enumerations (
20h) versus legacy 32-bit - Incremental Maintenance: Strategies for
rsyncanddbmaint.q - Datatype Choices: optimization guidelines for symbols versus strings versus GUIDs
- Multithreaded Operations: Parallel compaction with
peach - Multi-Sym Architectures: Using
.Q.ensfor segmented databases
Summary
In this guide, you learned how to:
- Enumerate symbols: Map text strings to integers for efficient storage
- Populate data: Create splayed and partitioned databases
- Verify enumeration: Inspect raw integers vs. mapped symbols
- Migrate data: Use IPC to safely move tables between databases
- Compact files: Remove unused symbols to reclaim space
- Manage custom domains: Handle manual enumeration lists
- Optimize performance: Manage bloat, contention, and lookups