Cookbook/Unicode

From Kx Wiki
Jump to: navigation, search

The wiki is moving to a new format and this page is no longer maintained. You can find the new page at code.kx.com/q/cookbook/unicode.

The wiki will remain in place until the migration is complete. If you prefer the wiki to the new format, please tell the Librarian why.

Unicode

Unicode text can be stored in symbol, byte and character datatypes.

Since the data is simply a sequence of bytes, any unicode format can be stored. However, it is best to use an encoding such as utf8 or GBK that extends 7-bit ascii, i.e. a single byte in the range 00–7f means the same thing in ascii. Q will load a script with such encoding, but it will not load other formats. Note that if using these encodings, avoid having a byte order mark prefix on the data.

The Q language itself uses only 7-bit ascii. For example, the statement "2+3" should be given as the 3 decimal bytes 50 43 51, as in:

q)`char$50 43 51
"2+3"
q)value `char$50 43 51
5

Fixed width unicode formats cannot be used, since for example, in utf16, "2+3" would be the 6 decimal bytes 50 0 43 0 51 0, and Q does not recognize this:

q)value `char$50 0 43 0 51 0
'char

The display console should have the matching code page set or you will not be able to view the data correctly. e.g. if you store in utf8 format, ensure that your code page for the display is also utf8.

Table and column names should be plain ascii.

For example, the following has chinese characters in symbol and character columns:

sym:`apples`bananas`oranges
name:(`$"蘋果";`$"香蕉";`$"橙")
text:("每日一蘋果, 醫生遠離我";"香蕉船是一道可口的甜品";"從佛羅里達州來的鮮橙很甜美")
t:([]sym;name;text)

You can work with this table as usual, but note that the Q console displays the text entries as their octal character numbers:

q)select sym,name from t
sym     name
--------------
apples  蘋果
bananas 香蕉
oranges 橙

q)select from t where name=`$"香蕉"
sym     name   text                                      ..
---------------------------------------------------------..
bananas 香蕉 "\351\246\231\350\225\211\350\210\271\346\..

Display with -1 to show formatted text:

q)-1 text 0;
每日一蘋果, 醫生遠離我

Example assignments using the C interface:

int main(){
  int c=khp("localhost",5001);
  k(c,"set",ks("a"),kp("香蕉"),(K)0);
  k(c,"set",ks("b"),kp("\351\246\231\350\225\211"),(K)0);
  close(c);
}
Personal tools
Namespaces
Variants
Actions
Navigation
Print/export
Toolbox