Contrib/CompactingHdbSym

From Kx Wiki
Jump to: navigation, search

301 Permanent move

This article is now at code.kx.com/q/cookbook/compacting-hdb-sym.

Please update your bookmark. The old wiki will remain here for a while. If you prefer it to the new format, please tell the Librarian why.


Under some scenarios, the sym enum file in an hdb can become bloated - this is the sym file sitting in the root of the hdb dir. This is due to symbols no longer being used as earlier parts of an hdb may have been archived.

Some users have expressed interest in being able to compact this sym enum file. This essentially requires re-enumeration of all enumerated columns against a new empty sym file. It can take some time to execute, and nothing else should try to read or write to the hdb area whilst this is running.

The code below is for a vanilla hdb, with date partitions, a single enumeration (sym), and only splayed tables present.

This is an all or nothing approach. If you choose to run the code below, it is at your own risk, and you should make sure you understand everything it is doing, and test it against a dev hdb that you are happy to destroy in the event that it goes wrong.

This should really ever only be a one off process. If you find that your sym file is growing beyond reasonable sizes, you very likely have non-repeating strings which would be better stored as char vectors than symbols. This process is not a fix for a poor choice of schema!

/cd hdb
/q
system "mv sym zym";
`:sym set `symbol$(); / create a new empty sym file
files:key `:.;
dates:files where files like "????.??.??";
{[d]
  root:":",string d;
  tableNames:string key `$root;
  tableRoot:root,/:"/",/:tableNames;
  files:raze {`$x,/:"/",/:string key `$x}each tableRoot;
  files:files where not files like "*#";
  types:type each get each files;
  enumeratedFiles:files where types within 20 76h;
  if[any types within 21 76h;'"too difficult"];  / if we have more than one enum better get help
  {
      `sym set get `:zym;
      s:get x;
      a:attr s;
      s:value s;
      `sym set get `:sym;
      s:a#.Q.en[`:.;([]s:s)]`s;
      x set s;
      -1 "re-enumerated ", string x;
  }each enumeratedFiles;
 }each dates

Don't forget to rm the zym file at end of processing.


Multi-threaded sym rewrite code

Here's some multi-threaded (can run single threaded) and more memory intensive but hugely faster sym file rewrite code that handles partitioned and splayed tables and par.txt. Note you lose the `g# which isn't supported in threads so you'll have to apply it later.

system"l ." /load the HDB - can change this if you don't start Q from your hdb root
allpaths:{[dbdir;table] / from dbmaint.q + an extra check for paths that exist (to support .Q.bv)
 files:key dbdir;
 if[any files like"par.txt";:raze allpaths[;table]each hsym each`$read0(`)sv dbdir,`par.txt];
 files@:where files like"[0-9]*";
 files:(`)sv'dbdir,'files,'table;
 files where 0<>(count key@)each files}
sym:oldSym:get`:sym /to unenumerate
symFiles:raze` sv/:/:raze{allpaths[`:.;x],/:\:exec c from meta[x] where t in "s"}peach tables[] where {1b~.Q.qp value x}each tables[] /sym files from parted tables
symFiles,:raze{` sv/: hsym[x],/:exec c from meta x where t in "s"}each tables[] where {0b~.Q.qp value x}each tables[] /sym files from splayed tables
allsyms:distinct raze{[file] :distinct @[value get@;file;`symbol$()] } peach symFiles; /symbol files we're dealing with - memory intensive
.Q.gc[] /memory intensive so gc
/*** the part above this line doesn't make changes to the HDB. You can estimate the savings with
/count[allsyms]%count sym
/*** the rest of the script makes changes so there is no going back once you start. There shouldn't be anything writing to the HDB while the script is in progress
system"mv sym zym" /make backup of sym file
`:sym set `symbol$() /reset sym file - scary part
`sym set get`:sym 
.Q.en[`:.;([]allsyms)] /enumerate all syms at once
{[file]
  s:get file; /file contents
  a:first `p`s inter attr s; /attributes - due to no`g# error in threads - this can be just a:attr s if your version of kdb+ does support setting `g# in threads
  s:oldSym`int$s; /unenumerate against old sym file
  file set a#`sym$s; /enumerate against new sym file & add attrib & write to disk
  0N!"re-enumerated ", string file;
  } peach symFiles

PS: Take backups!! PPS: The script deliberately has no ; at the end of lines - it's important you understand what's going on instead of just running the whole thing blindly

Personal tools
Namespaces
Variants
Actions
Navigation
Print/export
Toolbox