Limit the amount of object storage data cached on local SSD.
Cloud object storage such as AWS S3 is slow relative to local storage such as SSD. The performance of kdb+ when working with S3 can be improved by caching S3 data. Each query to S3 has a financial cost; caching the resulting data can help to reduce it.
Multiple kdb+ instances using the same HDB (historical database) should share a cache area, the base of which is to be specified in environment variable
Kdb+ writes temporary files under this directory and will require write permission to do so
Kxreaper continously monitors file access within this directory, and maintains a limit on the space used, by deleting files according to a LRU (least recently used) algorithm. Any file moved into this directory becomes a candidate for deletion as this is a scratch area for exclusive use by kdb+.
Files written by kdb+ in this area initially have the form
filename$ and are automatically moved by kdb+ to their final filename on completion of writing. Kxreaper is notified by the OS of this addition and, if the space used then exceeds the configured limit, deletes least-recently-used files to bring the space used within the limit again.
Kxreaper may be started as a daemon. It takes two arguments:
- the cache root path
- an integer representing the size to limit the cache to in MB
Here it is started as a daemon with both STDOUT and STDERR redirected to the system log
kxreaper $KX_S3_CACHE_PATH 5000 2>&1 | logger &
and the log may be viewed via
sudo tail /var/log/syslog
On startup, Kxreaper scan the directory and if the sum of the file sizes exceeds the configured limit, deletes the oldest files until within the limit again.
In case Kxreaper gets out of sync with the filesystem, for example due to manual deletions of cache files, a rescan can be triggered manually by sending the Kxreaper process a SIGHUP. Should Kxreaper become too slow to process disk notifications, it will rescan automatically.
If you observe the error
inotify_init: Too many open files
check the values in
/proc/sys/fs/inotify/max_queued_events /proc/sys/fs/inotify/max_user_instances /proc/sys/fs/inotify/max_user_watches
and if necessary update them to larger values to accommodate the number of files that might need to be cached. e.g.
echo 8192 >> /proc/sys/fs/inotify/max_user_instances echo 8192 >> /proc/sys/fs/inotify/max_user_watches
If the environment variable
KX_CACHE_REAPER_TRACE is set, Kxreaper prints tracing info when processing events.
To simply test the reaping of files
- start the Kxreaper process
- use the
test/populateCache.shscript to generate a sequence of files
For a more complex test, rather than use
populateCache.sh, run a set of kdb+ processes that query S3-based data.