Caching
This page explains how kdb Insights Core caching works.
Due to the high latency of cloud storage, kdb Insights Core caches the requests results on a local high-performance disk.
When files are retrieved from cloud storage, they are automatically stored in the local cache. Subsequent requests for the same data is then retrieved from the local cache instead of cloud storage, saving time and transfer costs.
kdb+ utilizes the local cache only when the environment variable KX_OBJSTR_CACHE_PATH
is defined.
This is used to set the path to the directory where cached files are sorted. For example:
export KX_OBJSTR_CACHE_PATH=/myfastssd/kxs3cache
Temporary files are written under this directory, and kdb+ requires write permission. Cached files are stored under a subdirectory as $KX_OBJSTR_CACHE_PATH/objects
.
Shared cache
Multiple kdb+ instances using the same HDB (historical database) should share a cache area.
The local cache is not cleared upon kdb+ start-up or shutdown. The cache can only be manually deleted when all processes using the cache are no longer running. To manage disk usage, use Kxreaper to manage eviction from the cache folder. Without Kxreaper, the local cache continues to grow with each new dataset retrieved from object storage.
Kxreaper
Kxreaper is a command-line application that can be used to limit the amount of object storage data cached on local disk. Kxreaper may be started as a daemon. It takes two arguments:
- the cache root path
- an integer representing the size to limit the cache to in MB
Kxreaper continuously monitors file access within the specified directory and maintains a space limit by deleting files according to a Least Recently Used (LRU) algorithm. Any file moved into this directory becomes a candidate for deletion, as this area is exclusively used by kdb+.
Files written by kdb+ in this area initially have the form filename$
and are automatically moved by kdb+ to their final filename on completion of writing. Kxreaper is notified by the OS of this addition and, if the space used then exceeds the configured limit, it deletes the least recently used files to bring the space used within the limit again.
On startup, Kxreaper scans the directory and deletes the oldest files if the total size exceeds the configured limit.
If Kxreaper gets out of sync with the filesystem - for example, due to manual deletions of cache files - a rescan can be triggered manually by sending a SIGHUP
signal to the Kxreaper process. If Kxreaper becomes too slow to process disk notifications, it rescans automatically.
Using Network Access Storage (NAS)
When using NAS, the Kxreaper process should be on the same machine as the HDB reader process, which is why this is not a recommended setup. For optimal performance, the cache should be located on local attached storage.
Example
The following example runs Kxreaper as a daemon with both STDOUT
and STDERR
redirected to the system log:
kxreaper $KX_OBJSTR_CACHE_PATH 5000 2>&1 | logger &
The log may be viewed through
sudo tail /var/log/syslog
The output could equally be redirected to a log file
kxreaper $KX_OBJSTR_CACHE_PATH 5000 2>&1 > reaper.log &
Troubleshooting
If you observe the error
inotify_init: Too many open files
check the values in
/proc/sys/fs/inotify/max_queued_events
/proc/sys/fs/inotify/max_user_instances
/proc/sys/fs/inotify/max_user_watches
and if necessary update them to larger values to accommodate the number of files that might need to be cached. e.g.
echo 8192 >> /proc/sys/fs/inotify/max_user_instances
echo 8192 >> /proc/sys/fs/inotify/max_user_watches
If the environment variable KX_CACHE_REAPER_TRACE
is set, Kxreaper prints tracing info when processing events.