Skip to content

Caching

Limit the amount of object storage data cached on local SSD.

Cloud object storage such as AWS S3 is slow relative to local storage such as SSD. The performance of kdb+ when working with S3 can be improved by caching S3 data. Each query to S3 has a financial cost; caching the resulting data can help to reduce it.

Multiple kdb+ instances using the same HDB (historical database) should share a cache area, the base of which is to be specified in environment variable KX_S3_CACHE_PATH, e.g.

export KX_S3_CACHE_PATH=/fastssd/s3cache

Kdb+ writes temporary files under this directory and will require write permission to do so

Kxreaper continously monitors file access within this directory, and maintains a limit on the space used, by deleting files according to a LRU (least recently used) algorithm. Any file moved into this directory becomes a candidate for deletion as this is a scratch area for exclusive use by kdb+.

Files written by kdb+ in this area initially have the form filename$ and are automatically moved by kdb+ to their final filename on completion of writing. Kxreaper is notified by the OS of this addition and, if the space used then exceeds the configured limit, deletes least-recently-used files to bring the space used within the limit again.

Kxreaper may be started as a daemon. It takes two arguments:

  • the cache root path
  • an integer representing the size to limit the cache to in MB

Here it is started as a daemon with both STDOUT and STDERR redirected to the system log

kxreaper $KX_S3_CACHE_PATH 5000 2>&1 | logger &

and the log may be viewed via

sudo tail /var/log/syslog

On startup, Kxreaper scan the directory and if the sum of the file sizes exceeds the configured limit, deletes the oldest files until within the limit again.

In case Kxreaper gets out of sync with the filesystem, for example due to manual deletions of cache files, a rescan can be triggered manually by sending the Kxreaper process a SIGHUP. Should Kxreaper become too slow to process disk notifications, it will rescan automatically.

Troubleshooting

If you observe the error inotify_init: Too many open files check the values in

/proc/sys/fs/inotify/max_queued_events
/proc/sys/fs/inotify/max_user_instances
/proc/sys/fs/inotify/max_user_watches

and if necessary update them to larger values to accommodate the number of files that might need to be cached. e.g.

echo 8192 >> /proc/sys/fs/inotify/max_user_instances
echo 8192 >> /proc/sys/fs/inotify/max_user_watches

If the environment variable KX_CACHE_REAPER_TRACE is set, Kxreaper prints tracing info when processing events.

Testing

To simply test the reaping of files

  1. set KX_S3_CACHE_PATH
  2. start the Kxreaper process
  3. use the test/populateCache.sh script to generate a sequence of files

For a more complex test, rather than use populateCache.sh, run a set of kdb+ processes that query S3-based data.