Cookbook/regex

From Kx Wiki
Jump to: navigation, search

Regular Expressions (Regex)

kdb+ has some builtin regex features for use with like and ssr.

For those who need something more flexible, it's possible to leverage regex libs such as re2, described below.

The home for re2 can be found at [1] The code below was compiled for kdb+v3.1 with this release [2] The k.h file can be downloaded from [3] For 64bit linux, this can be compiled as

g++ -m64 -O2 re2.cc -o re2.so -I . re2/obj/libre2.a -DKXVER=3 -shared -static

and the resulting re2.so should be copied into $QHOME/l64 subdirectory.

It can then be loaded and called in kdb+ via

q)f:`re2 2:(`FullMatch;2) / bind FullMatch to f
q)f["hello world";"hello ..rld"]
#include <re2/re2.h>
#include <re2/filtered_re2.h>
#include <stdlib.h>  //malloc
#include <stdio.h>
#include"k.h"

using namespace re2;

extern "C" {
Z S makeErrStr(S s1,S s2){Z __thread char b[256];snprintf(b,256,"%s - %s",s1,s2);R b;}
Z __inline S c2s(S s,J n){S r=(S)malloc(n+1);R r?memcpy(r,s,n),r[n]=0,r:(S)krr((S)"wsfull (re2)");}
K FullMatch(K x,K y){
  S s,sy;K r;
  P(x->t&&x->t!=KC&&x->t!=KS&&x->t!=-KS||y->t!=KC,krr((S)"type"))
  U(sy=c2s((S)kC(y),y->n))
  RE2 pattern(sy,RE2::Quiet);
  free(sy);
  P(!pattern.ok(),krr(makeErrStr((S)"bad regex",(S)pattern.error().c_str())))
  if(!x->t||x->t==KS){
    J i=0;
    K r=ktn(KB,x->n);
    for(;i<x->n;i++){
      K z=0;
      P(!x->t&&(z=kK(x)[i])->t!=KC,(r0(r),krr((S)"type")))
      s=z?c2s((S)kC(z),z->n):kS(x)[i];P(!s,(r0(r),(K)0))
      kG(r)[i]=RE2::FullMatch(s,pattern);
      if(z)free(s);
    }
    R r;
  }
  s=x->t==-KS?x->s:c2s((S)kC(x),x->n);
  r=kb(RE2::FullMatch(s,pattern));
  if(s!=x->s)free(s);
  R r;
}
}


It's also possible to create a regex matcher in q, using a state machine, e.g.

/ want to match "x*fz*0*0"
q)m:({0};{2*x="x"};{2+x="f"};{4*x="z"};{4+x="0"};{5+x="0"};{7-x="0"};{7-x="0"})
q)f:{6=1 m/x}
q)f"xyzfz000"
1b

although this does not return until all input chars have been processed, even if a match can be eliminated on the first char. This could be accomodated here

q)f:{6~last{$[count x 1;((m x 0)[first x 1];1 _ x 1);(0;first x)]}/[{0<x 0};(1;x)]}

Another library which can be loaded into kdb+ is [ http://q.o.potam.us/?p=pcre] although you should check whether it has been updated for kdb+v3.0.

See also

Personal tools
Namespaces
Variants
Actions
Navigation
Print/export
Toolbox