Caching in Kxreaper
This page explains how caching works in Kxreaper.
Because cloud storage has higher latency, the objstor module in KDB-X can cache the requests results on a local high-performance disk. There is a limit on how much data can be added to the cache. This is controlled by the KX_OBJSTR_CACHE_SIZE environment variable.
When files are first retrieved from cloud storage, they are automatically stored in the local cache. Subsequent requests for the same data is then retrieved from the local cache instead of cloud storage, reducing both query time and transfer costs.
KDB-X only uses the local cache if the environment variable KX_OBJSTR_CACHE_PATH is defined.
This variable sets the path to the directory where cached files are stored. For example:
export KX_OBJSTR_CACHE_PATH=/myfastssd/kxs3cache
Temporary files are written under this directory, and KDB-X requires write permission. Cached files are stored under a subdirectory as $KX_OBJSTR_CACHE_PATH/objects.
Shared cache
If multiple KDB-X instances are using the same HDB (historical database), they should all share a single cache directory.
The local cache is not cleared upon KDB-X start-up or shutdown.
It can only be deleted manually once all processes using the cache have stopped. To manage disk usage, use Kxreaper to manage eviction from the cache folder. Without Kxreaper, the local cache continues to grow with each new dataset retrieved from object storage.
Kxreaper
Kxreaper is a command-line application that manages the amount of object storage data cached on local disk.
Kxreaper can run as a daemon and takes two arguments:
- the cache root path
- an integer representing the size to limit the cache to in MB
Kxreaper continuously monitors file access within the specified directory and enforces a size limit by deleting files based on a Least Recently Used (LRU) algorithm. Any file moved into this directory becomes a candidate for deletion, as this area is exclusively used by KDB-X.
Files written by KDB-X in this area are first created as filename$. Once writing completes, they are renamed to the final filename. At this point, the OS notifies Kxreaper of this addition. If the cache exceeds its configured limit, Kxreaper deletes the least recently used files to free space.
On startup, Kxreaper scans the directory and deletes the oldest files if the total size exceeds the configured limit.
If Kxreaper gets out of sync with the filesystem - for example, after manual cache file deletions - you can manually trigger a rescan by sending a SIGHUP signal to the Kxreaper process. If Kxreaper becomes too slow to process disk notifications, it rescans automatically.
Using Network Access Storage (NAS)
When using NAS, the Kxreaper process should run on the same machine as the HDB reader process, which is why NAS is not a recommended setup. For optimal performance, the cache should be on locally attached storage.
Example
The following example runs Kxreaper as a daemon with both STDOUT and STDERR redirected to the system log:
kxreaper $KX_OBJSTR_CACHE_PATH 5000 2>&1 | logger &
View the log with:
sudo tail /var/log/syslog
Alternatively, redirect the output to a log file:
kxreaper $KX_OBJSTR_CACHE_PATH 5000 2>&1 > reaper.log &
Troubleshooting
If you observe the following error:
inotify_init: Too many open files
check the values in:
/proc/sys/fs/inotify/max_queued_events
/proc/sys/fs/inotify/max_user_instances
/proc/sys/fs/inotify/max_user_watches
Increase them if needed to match the number of files that may be cached. For example:
echo 8192 >> /proc/sys/fs/inotify/max_user_instances
echo 8192 >> /proc/sys/fs/inotify/max_user_watches
If the environment variable KX_CACHE_REAPER_TRACE is set, Kxreaper prints tracing information while processing events.