{ "cells": [ { "cell_type": "markdown", "id": "015ba887", "metadata": {}, "source": [ "# Database Creation and Management\n", "\n", "This notebook provides a walkthrough of some of the functionality available for users looking to create and maintain large databases using PyKX.\n", "\n", "This notebook refers to creating and maintaining large [partitioned kdb+ databases](https://code.kx.com/q/kb/partition/) using PyKX. Go to [Q for Mortals](https://code.kx.com/q4m3/14_Introduction_to_Kdb+/#143-partitioned-tables) for more in-depth information about partitioned databases in kdb+.\n", "\n", "You can download this walkthrough as a `.ipynb` notebook file.\n", "\n", "This walkthrough provides examples of the following tasks:\n", "\n", "1. Creating a database from a historical dataset\n", "1. Adding a new partition to the database\n", "1. Managing the on-disk database by:\n", " - Renaming a table and column\n", " - Creating a copy of a column to the database\n", " - Applying a Python function to a column of the database\n", " - Updating the data type of a column\n", "1. Adding a new table to the most recent partition of the database\n", "\n", "For full information on the functions available, go to the [API section](https://code.kx.com/pykx/api/db.html).\n", "\n", "---\n", "\n", "## Initial setup\n", "\n", "Import all required libraries and create a temporary directory which will be used to store the database we create for this walkthrough." ] }, { "cell_type": "code", "execution_count": null, "id": "04341da6", "metadata": { "tags": [ "hide_code" ] }, "outputs": [], "source": [ "import os\n", "os.environ['PYKX_IGNORE_QHOME'] = '1' # Ignore symlinking PyKX q libraries to QHOME\n", "os.environ['PYKX_Q_LOADED_MARKER'] = '' # Only used here for running Notebook under mkdocs-jupyter during document generation." ] }, { "cell_type": "code", "execution_count": 1, "id": "0afee62a", "metadata": {}, "outputs": [], "source": [ "import os\n", "os.environ['PYKX_BETA_FEATURES'] = 'true'\n", "\n", "import pykx as kx\n", "from datetime import date\n", "import tempfile" ] }, { "cell_type": "code", "execution_count": 2, "id": "64c18054", "metadata": {}, "outputs": [], "source": [ "tempdir = tempfile.TemporaryDirectory()" ] }, { "cell_type": "markdown", "id": "2e91160e", "metadata": {}, "source": [ "Database interactions are facilitated through use of the `pykx.DB` class. All methods/attributes used in this notebook are contained within this class. \n", "\n", "Initialise the `DB` class to start. The expected input is the file path where you intend to save the partitioned database and its associated tables. In this case we're going to use the temporary directory we just created. " ] }, { "cell_type": "code", "execution_count": 3, "id": "90d9eac3", "metadata": {}, "outputs": [], "source": [ "db = kx.DB(path = tempdir.name + '/db')" ] }, { "cell_type": "markdown", "id": "143e0886", "metadata": {}, "source": [ "For details on any methods contained within this class, use the `help` method. " ] }, { "cell_type": "code", "execution_count": 4, "id": "0e817132", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on method create in module pykx.db:\n", "\n", "create(table, table_name, partition, *, by_field=None, sym_enum=None, log=True) method of pykx.db.DB instance\n", " Create an on-disk partitioned table within a kdb+ database from a supplied\n", " `pykx.Table` object. Once generated this table will be accessible\n", " as an attribute of the `DB` class or a sub attribute of `DB.table`.\n", " \n", " Parameters:\n", " table: The `pykx.Table` object which is to be persisted to disk\n", " table_name: The name with which the table will be persisted and accessible\n", " once loaded and available as a `pykx.PartitionedTable`\n", " partition: The name of the column which is to be used to partition the data if\n", " supplied as a `str` or if supplied as non string object this will be used as\n", " the partition to which all data is persisted\n", " by_field: A field of the table to be used as a by column, this column will be\n", " the second column in the table (the first being the virtual column determined\n", " by the partitioning column)\n", " sym_enum: The name of the symbol enumeration table to be associated with the table\n", " log: Print information about status of partitioned datab\n", " \n", " Returns:\n", " A `None` object on successful invocation, the database class will be\n", " updated to contain attributes associated with the available created table\n", " \n", " Examples:\n", " \n", " Generate a partitioned table from a table containing multiple partitions\n", " \n", " ```python\n", " >>> import pykx as kx\n", " >>> db = kx.DB(path = 'newDB')\n", " >>> N = 1000\n", " >>> qtab = kx.Table(data = {\n", " ... 'date': kx.q.asc(kx.random.random(N, kx.q('2020.01 2020.02 2020.03'))),\n", " ... 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),\n", " ... 'price': kx.random.random(N, 10.0),\n", " ... 'size': kx.random.random(N, 100)\n", " ... })\n", " >>> db.create(qtab, 'stocks', 'date', by_field = 'sym', sym_enum = 'symbols')\n", " >>> db.tables\n", " ['stocks']\n", " >>> db.stocks\n", " pykx.PartitionedTable(pykx.q('\n", " month sym price size\n", " ---------------------------\n", " 2020.01 AAPL 7.979004 85\n", " 2020.01 AAPL 5.931866 55\n", " 2020.01 AAPL 5.255477 49\n", " 2020.01 AAPL 8.15255 74\n", " 2020.01 AAPL 4.771067 80\n", " ..\n", " '))\n", " ```\n", " \n", " Add a table as a partition to an on-disk database, in the example below we are adding\n", " a partition to the table generated above\n", " \n", " ```python\n", " >>> import pykx as kx\n", " >>> db = kx.DB(path = 'newDB')\n", " >>> N = 333\n", " >>> qtab = kx.Table(data = {\n", " ... 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),\n", " ... 'price': kx.random.random(N, 10.0),\n", " ... 'size': kx.random.random(N, 100)\n", " ... })\n", " >>> db.create(qtab, 'stocks', kx.q('2020.04'), by_field = 'sym', sym_enum = 'symbols')\n", " >>> db.tables\n", " ['stocks']\n", " >>> db.stocks\n", " pykx.PartitionedTable(pykx.q('\n", " month sym price size\n", " ---------------------------\n", " 2020.01 AAPL 7.979004 85\n", " 2020.01 AAPL 5.931866 55\n", " 2020.01 AAPL 5.255477 49\n", " 2020.01 AAPL 8.15255 74\n", " 2020.01 AAPL 4.771067 80\n", " ..\n", " '))\n", " ```\n", "\n" ] } ], "source": [ "help(db.create)" ] }, { "cell_type": "markdown", "id": "607599f8", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "id": "3516ab83", "metadata": {}, "source": [ "## Create the sample dataset\n", "\n", "Create a dataset called `trades` containing time-series data spanning multiple dates, and columns of various types:" ] }, { "cell_type": "code", "execution_count": 5, "id": "686441cc", "metadata": {}, "outputs": [], "source": [ "N = 1000000\n", "trades = kx.Table(data={\n", " 'date': kx.random.random(N, [date(2020, 1, 1), date(2020, 1, 2)]),\n", " 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),\n", " 'price': kx.random.random(N, 10.0),\n", " 'size': kx.random.random(N, 1000)\n", "})" ] }, { "cell_type": "markdown", "id": "d0529e7c", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "id": "0738729d", "metadata": {}, "source": [ "## Create the database" ] }, { "cell_type": "markdown", "id": "0fb4659b", "metadata": {}, "source": [ "Create the database using the `date` column as the partition, and add `trades` as a table called `trade_data` within it." ] }, { "cell_type": "code", "execution_count": 6, "id": "db8b9a04", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing Database Partition 2020.01.01 to table trade_data\n", "Writing Database Partition 2020.01.02 to table trade_data\n" ] } ], "source": [ "db.create(trades, 'trade_data', 'date')" ] }, { "cell_type": "markdown", "id": "ad2fa6f9", "metadata": {}, "source": [ "This now exists as a table and is saved to disk." ] }, { "cell_type": "code", "execution_count": 7, "id": "82796fbc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['trade_data']" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.tables" ] }, { "cell_type": "markdown", "id": "c0ecec19", "metadata": {}, "source": [ "When a table is saved, an attribute is added to the `db` class for it. For our newly generated table, this is `db.trade_data`." ] }, { "cell_type": "code", "execution_count": 8, "id": "29606b7a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datesympricesize
02020.01.01MSFT7.079266800
12020.01.01AAPL1.82432165
22020.01.01MSFT2.408259292
32020.01.01GOOG1.6754387
42020.01.01AAPL8.311168183
52020.01.01AAPL2.208693989
62020.01.01MSFT6.068126567
72020.01.01AAPL4.918926794
82020.01.01AAPL9.33186939
92020.01.01AAPL1.142611507
102020.01.01AAPL2.685874581
112020.01.01AAPL3.483591163
122020.01.01AAPL0.4422525466
132020.01.01MSFT7.406654976
142020.01.01MSFT2.493871171
152020.01.01AAPL9.24208828
162020.01.01MSFT0.3954522747
172020.01.01MSFT0.3441191512
182020.01.01GOOG9.662762998
192020.01.01AAPL9.601674812
202020.01.01AAPL4.969858910
212020.01.01GOOG1.048204830
222020.01.01GOOG0.9817644595
...............
9999992020.01.02GOOG1.470716636
\n", "

1,000,000 rows × 4 columns

" ], "text/plain": [ "pykx.PartitionedTable(pykx.q('\n", "date sym price size\n", "------------------------------\n", "2020.01.01 MSFT 7.079266 800 \n", "2020.01.01 AAPL 1.824321 65 \n", "2020.01.01 MSFT 2.408259 292 \n", "2020.01.01 GOOG 1.675438 7 \n", "2020.01.01 AAPL 8.311168 183 \n", "2020.01.01 AAPL 2.208693 989 \n", "2020.01.01 MSFT 6.068126 567 \n", "2020.01.01 AAPL 4.918926 794 \n", "2020.01.01 AAPL 9.331869 39 \n", "2020.01.01 AAPL 1.142611 507 \n", "2020.01.01 AAPL 2.685874 581 \n", "2020.01.01 AAPL 3.483591 163 \n", "2020.01.01 AAPL 0.4422525 466 \n", "2020.01.01 MSFT 7.406654 976 \n", "2020.01.01 MSFT 2.493871 171 \n", "2020.01.01 AAPL 9.242088 28 \n", "2020.01.01 MSFT 0.3954522 747 \n", "2020.01.01 MSFT 0.3441191 512 \n", "2020.01.01 GOOG 9.662762 998 \n", "2020.01.01 AAPL 9.601674 812 \n", "..\n", "'))" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.trade_data" ] }, { "cell_type": "markdown", "id": "5ed4224e", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "id": "889dfb46", "metadata": {}, "source": [ "## Add a new partition to the database\n", "\n", "Once a table has been generated, you can add more partitions to the database through reuse of the `create` method. In this case we are adding the new partition `2020.01.03` to the database." ] }, { "cell_type": "code", "execution_count": 9, "id": "7cce4947", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing Database Partition 2020-01-03 to table trade_data\n" ] } ], "source": [ "N = 10000\n", "new_day = kx.Table(data={\n", " 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),\n", " 'price': kx.random.random(N, 10.0),\n", " 'size': kx.random.random(N, 100)\n", "})\n", "db.create(new_day, 'trade_data', date(2020, 1, 3))" ] }, { "cell_type": "markdown", "id": "e24ecc1d", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "id": "09f0bd28", "metadata": {}, "source": [ "## Manage the database\n", "\n", "This section covers updating the contents of a database. The examples below demonstrate a number of common tasks that would be completed regularly when updating a database.\n", "\n", "The name of a table can be updated using the `rename_table` method. Below, we are updating the table `trade_data` to be called `trade`." ] }, { "cell_type": "code", "execution_count": 10, "id": "ae9d244b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2023.12.15 16:14:22 renaming :/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.01/trade_data to :/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.01/trades\n", "2023.12.15 16:14:22 renaming :/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.02/trade_data to :/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.02/trades\n", "2023.12.15 16:14:22 renaming :/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.03/trade_data to :/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.03/trades\n" ] } ], "source": [ "db.rename_table('trade_data', 'trades')" ] }, { "cell_type": "markdown", "id": "5edc2eba", "metadata": {}, "source": [ "During the rename process, the attribute in the `db` class is also updated. " ] }, { "cell_type": "code", "execution_count": 11, "id": "00eaf253", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datesympricesize
02020.01.01MSFT7.079266800
12020.01.01AAPL1.82432165
22020.01.01MSFT2.408259292
32020.01.01GOOG1.6754387
42020.01.01AAPL8.311168183
52020.01.01AAPL2.208693989
62020.01.01MSFT6.068126567
72020.01.01AAPL4.918926794
82020.01.01AAPL9.33186939
92020.01.01AAPL1.142611507
102020.01.01AAPL2.685874581
112020.01.01AAPL3.483591163
122020.01.01AAPL0.4422525466
132020.01.01MSFT7.406654976
142020.01.01MSFT2.493871171
152020.01.01AAPL9.24208828
162020.01.01MSFT0.3954522747
172020.01.01MSFT0.3441191512
182020.01.01GOOG9.662762998
192020.01.01AAPL9.601674812
202020.01.01AAPL4.969858910
212020.01.01GOOG1.048204830
222020.01.01GOOG0.9817644595
...............
10099992020.01.03AAPL9.75038799
\n", "

1,010,000 rows × 4 columns

" ], "text/plain": [ "pykx.PartitionedTable(pykx.q('\n", "date sym price size\n", "------------------------------\n", "2020.01.01 MSFT 7.079266 800 \n", "2020.01.01 AAPL 1.824321 65 \n", "2020.01.01 MSFT 2.408259 292 \n", "2020.01.01 GOOG 1.675438 7 \n", "2020.01.01 AAPL 8.311168 183 \n", "2020.01.01 AAPL 2.208693 989 \n", "2020.01.01 MSFT 6.068126 567 \n", "2020.01.01 AAPL 4.918926 794 \n", "2020.01.01 AAPL 9.331869 39 \n", "2020.01.01 AAPL 1.142611 507 \n", "2020.01.01 AAPL 2.685874 581 \n", "2020.01.01 AAPL 3.483591 163 \n", "2020.01.01 AAPL 0.4422525 466 \n", "2020.01.01 MSFT 7.406654 976 \n", "2020.01.01 MSFT 2.493871 171 \n", "2020.01.01 AAPL 9.242088 28 \n", "2020.01.01 MSFT 0.3954522 747 \n", "2020.01.01 MSFT 0.3441191 512 \n", "2020.01.01 GOOG 9.662762 998 \n", "2020.01.01 AAPL 9.601674 812 \n", "..\n", "'))" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.trades" ] }, { "cell_type": "markdown", "id": "4c44fab2", "metadata": {}, "source": [ "To rename a column in a table, use the `rename_column` method. For example, let's rename the `sym` column (in the `trade` table) to `ticker`." ] }, { "cell_type": "code", "execution_count": 12, "id": "1c52d0b0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2023.12.15 16:14:25 renaming sym to ticker in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.01/trades\n", "2023.12.15 16:14:25 renaming sym to ticker in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.02/trades\n", "2023.12.15 16:14:25 renaming sym to ticker in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.03/trades\n" ] } ], "source": [ "db.rename_column('trades', 'sym', 'ticker')" ] }, { "cell_type": "code", "execution_count": 13, "id": "b03c5c17", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datetickerpricesize
02020.01.01MSFT7.079266800
12020.01.01AAPL1.82432165
22020.01.01MSFT2.408259292
32020.01.01GOOG1.6754387
42020.01.01AAPL8.311168183
52020.01.01AAPL2.208693989
62020.01.01MSFT6.068126567
72020.01.01AAPL4.918926794
82020.01.01AAPL9.33186939
92020.01.01AAPL1.142611507
102020.01.01AAPL2.685874581
112020.01.01AAPL3.483591163
122020.01.01AAPL0.4422525466
132020.01.01MSFT7.406654976
142020.01.01MSFT2.493871171
152020.01.01AAPL9.24208828
162020.01.01MSFT0.3954522747
172020.01.01MSFT0.3441191512
182020.01.01GOOG9.662762998
192020.01.01AAPL9.601674812
202020.01.01AAPL4.969858910
212020.01.01GOOG1.048204830
222020.01.01GOOG0.9817644595
...............
10099992020.01.03AAPL9.75038799
\n", "

1,010,000 rows × 4 columns

" ], "text/plain": [ "pykx.PartitionedTable(pykx.q('\n", "date ticker price size\n", "--------------------------------\n", "2020.01.01 MSFT 7.079266 800 \n", "2020.01.01 AAPL 1.824321 65 \n", "2020.01.01 MSFT 2.408259 292 \n", "2020.01.01 GOOG 1.675438 7 \n", "2020.01.01 AAPL 8.311168 183 \n", "2020.01.01 AAPL 2.208693 989 \n", "2020.01.01 MSFT 6.068126 567 \n", "2020.01.01 AAPL 4.918926 794 \n", "2020.01.01 AAPL 9.331869 39 \n", "2020.01.01 AAPL 1.142611 507 \n", "2020.01.01 AAPL 2.685874 581 \n", "2020.01.01 AAPL 3.483591 163 \n", "2020.01.01 AAPL 0.4422525 466 \n", "2020.01.01 MSFT 7.406654 976 \n", "2020.01.01 MSFT 2.493871 171 \n", "2020.01.01 AAPL 9.242088 28 \n", "2020.01.01 MSFT 0.3954522 747 \n", "2020.01.01 MSFT 0.3441191 512 \n", "2020.01.01 GOOG 9.662762 998 \n", "2020.01.01 AAPL 9.601674 812 \n", "..\n", "'))" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.trades" ] }, { "cell_type": "markdown", "id": "148207eb", "metadata": {}, "source": [ "To safely apply a function to modify the `price` column within the database, first create a copy of the column." ] }, { "cell_type": "code", "execution_count": 14, "id": "f7d2f116", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2023.12.15 16:14:29 copying price to price_copy in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.01/trades\n", "2023.12.15 16:14:29 copying price to price_copy in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.02/trades\n", "2023.12.15 16:14:29 copying price to price_copy in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.03/trades\n" ] } ], "source": [ "db.copy_column('trades', 'price', 'price_copy')" ] }, { "cell_type": "code", "execution_count": 15, "id": "9bad2096", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datetickerpricesizeprice_copy
02020.01.01MSFT7.0792668007.079266
12020.01.01AAPL1.824321651.824321
22020.01.01MSFT2.4082592922.408259
32020.01.01GOOG1.67543871.675438
42020.01.01AAPL8.3111681838.311168
52020.01.01AAPL2.2086939892.208693
62020.01.01MSFT6.0681265676.068126
72020.01.01AAPL4.9189267944.918926
82020.01.01AAPL9.331869399.331869
92020.01.01AAPL1.1426115071.142611
102020.01.01AAPL2.6858745812.685874
112020.01.01AAPL3.4835911633.483591
122020.01.01AAPL0.44225254660.4422525
132020.01.01MSFT7.4066549767.406654
142020.01.01MSFT2.4938711712.493871
152020.01.01AAPL9.242088289.242088
162020.01.01MSFT0.39545227470.3954522
172020.01.01MSFT0.34411915120.3441191
182020.01.01GOOG9.6627629989.662762
192020.01.01AAPL9.6016748129.601674
202020.01.01AAPL4.9698589104.969858
212020.01.01GOOG1.0482048301.048204
222020.01.01GOOG0.98176445950.9817644
..................
10099992020.01.03AAPL9.750387999.750387
\n", "

1,010,000 rows × 4 columns

" ], "text/plain": [ "pykx.PartitionedTable(pykx.q('\n", "date ticker price size price_copy\n", "-------------------------------------------\n", "2020.01.01 MSFT 7.079266 800 7.079266 \n", "2020.01.01 AAPL 1.824321 65 1.824321 \n", "2020.01.01 MSFT 2.408259 292 2.408259 \n", "2020.01.01 GOOG 1.675438 7 1.675438 \n", "2020.01.01 AAPL 8.311168 183 8.311168 \n", "2020.01.01 AAPL 2.208693 989 2.208693 \n", "2020.01.01 MSFT 6.068126 567 6.068126 \n", "2020.01.01 AAPL 4.918926 794 4.918926 \n", "2020.01.01 AAPL 9.331869 39 9.331869 \n", "2020.01.01 AAPL 1.142611 507 1.142611 \n", "2020.01.01 AAPL 2.685874 581 2.685874 \n", "2020.01.01 AAPL 3.483591 163 3.483591 \n", "2020.01.01 AAPL 0.4422525 466 0.4422525 \n", "2020.01.01 MSFT 7.406654 976 7.406654 \n", "2020.01.01 MSFT 2.493871 171 2.493871 \n", "2020.01.01 AAPL 9.242088 28 9.242088 \n", "2020.01.01 MSFT 0.3954522 747 0.3954522 \n", "2020.01.01 MSFT 0.3441191 512 0.3441191 \n", "2020.01.01 GOOG 9.662762 998 9.662762 \n", "2020.01.01 AAPL 9.601674 812 9.601674 \n", "..\n", "'))" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.trades" ] }, { "cell_type": "markdown", "id": "3c63e2bb", "metadata": {}, "source": [ "You can now apply a function to the copied column without the risk of losing the original data. Below, let's modify the copied column by multiplying the contents by 2." ] }, { "cell_type": "code", "execution_count": 16, "id": "483a3b48", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2023.12.15 16:14:31 resaving column price_copy (type 9) in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.01/trades\n", "2023.12.15 16:14:31 resaving column price_copy (type 9) in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.02/trades\n", "2023.12.15 16:14:31 resaving column price_copy (type 9) in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.03/trades\n" ] } ], "source": [ "db.apply_function('trades', 'price_copy', kx.q('{2*x}'))" ] }, { "cell_type": "code", "execution_count": 17, "id": "e5285600", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datetickerpricesizeprice_copy
02020.01.01MSFT7.07926680014.15853
12020.01.01AAPL1.824321653.648642
22020.01.01MSFT2.4082592924.816519
32020.01.01GOOG1.67543873.350875
42020.01.01AAPL8.31116818316.62234
52020.01.01AAPL2.2086939894.417385
62020.01.01MSFT6.06812656712.13625
72020.01.01AAPL4.9189267949.837851
82020.01.01AAPL9.3318693918.66374
92020.01.01AAPL1.1426115072.285222
102020.01.01AAPL2.6858745815.371748
112020.01.01AAPL3.4835911636.967183
122020.01.01AAPL0.44225254660.8845049
132020.01.01MSFT7.40665497614.81331
142020.01.01MSFT2.4938711714.987742
152020.01.01AAPL9.2420882818.48418
162020.01.01MSFT0.39545227470.7909045
172020.01.01MSFT0.34411915120.6882382
182020.01.01GOOG9.66276299819.32552
192020.01.01AAPL9.60167481219.20335
202020.01.01AAPL4.9698589109.939716
212020.01.01GOOG1.0482048302.096408
222020.01.01GOOG0.98176445951.963529
..................
10099992020.01.03AAPL9.7503879919.50077
\n", "

1,010,000 rows × 4 columns

" ], "text/plain": [ "pykx.PartitionedTable(pykx.q('\n", "date ticker price size price_copy\n", "-------------------------------------------\n", "2020.01.01 MSFT 7.079266 800 14.15853 \n", "2020.01.01 AAPL 1.824321 65 3.648642 \n", "2020.01.01 MSFT 2.408259 292 4.816519 \n", "2020.01.01 GOOG 1.675438 7 3.350875 \n", "2020.01.01 AAPL 8.311168 183 16.62234 \n", "2020.01.01 AAPL 2.208693 989 4.417385 \n", "2020.01.01 MSFT 6.068126 567 12.13625 \n", "2020.01.01 AAPL 4.918926 794 9.837851 \n", "2020.01.01 AAPL 9.331869 39 18.66374 \n", "2020.01.01 AAPL 1.142611 507 2.285222 \n", "2020.01.01 AAPL 2.685874 581 5.371748 \n", "2020.01.01 AAPL 3.483591 163 6.967183 \n", "2020.01.01 AAPL 0.4422525 466 0.8845049 \n", "2020.01.01 MSFT 7.406654 976 14.81331 \n", "2020.01.01 MSFT 2.493871 171 4.987742 \n", "2020.01.01 AAPL 9.242088 28 18.48418 \n", "2020.01.01 MSFT 0.3954522 747 0.7909045 \n", "2020.01.01 MSFT 0.3441191 512 0.6882382 \n", "2020.01.01 GOOG 9.662762 998 19.32552 \n", "2020.01.01 AAPL 9.601674 812 19.20335 \n", "..\n", "'))" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.trades" ] }, { "cell_type": "markdown", "id": "a7db5560", "metadata": {}, "source": [ "Once you are happy with the new values within the `price_copy` column, you can safely delete the `price` column, then rename the `price_copy` column to be called `price`." ] }, { "cell_type": "code", "execution_count": 18, "id": "fbb0fe94", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2023.12.15 16:14:33 deleting column price from `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.01/trades\n", "2023.12.15 16:14:33 deleting column price from `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.02/trades\n", "2023.12.15 16:14:33 deleting column price from `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.03/trades\n", "2023.12.15 16:14:33 renaming price_copy to price in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.01/trades\n", "2023.12.15 16:14:33 renaming price_copy to price in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.02/trades\n", "2023.12.15 16:14:33 renaming price_copy to price in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.03/trades\n" ] } ], "source": [ "db.delete_column('trades', 'price')\n", "db.rename_column('trades', 'price_copy', 'price')" ] }, { "cell_type": "code", "execution_count": 19, "id": "2810b08f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datetickersizeprice
02020.01.01MSFT80014.15853
12020.01.01AAPL653.648642
22020.01.01MSFT2924.816519
32020.01.01GOOG73.350875
42020.01.01AAPL18316.62234
52020.01.01AAPL9894.417385
62020.01.01MSFT56712.13625
72020.01.01AAPL7949.837851
82020.01.01AAPL3918.66374
92020.01.01AAPL5072.285222
102020.01.01AAPL5815.371748
112020.01.01AAPL1636.967183
122020.01.01AAPL4660.8845049
132020.01.01MSFT97614.81331
142020.01.01MSFT1714.987742
152020.01.01AAPL2818.48418
162020.01.01MSFT7470.7909045
172020.01.01MSFT5120.6882382
182020.01.01GOOG99819.32552
192020.01.01AAPL81219.20335
202020.01.01AAPL9109.939716
212020.01.01GOOG8302.096408
222020.01.01GOOG5951.963529
...............
10099992020.01.03AAPL9919.50077
\n", "

1,010,000 rows × 4 columns

" ], "text/plain": [ "pykx.PartitionedTable(pykx.q('\n", "date ticker size price \n", "--------------------------------\n", "2020.01.01 MSFT 800 14.15853 \n", "2020.01.01 AAPL 65 3.648642 \n", "2020.01.01 MSFT 292 4.816519 \n", "2020.01.01 GOOG 7 3.350875 \n", "2020.01.01 AAPL 183 16.62234 \n", "2020.01.01 AAPL 989 4.417385 \n", "2020.01.01 MSFT 567 12.13625 \n", "2020.01.01 AAPL 794 9.837851 \n", "2020.01.01 AAPL 39 18.66374 \n", "2020.01.01 AAPL 507 2.285222 \n", "2020.01.01 AAPL 581 5.371748 \n", "2020.01.01 AAPL 163 6.967183 \n", "2020.01.01 AAPL 466 0.8845049\n", "2020.01.01 MSFT 976 14.81331 \n", "2020.01.01 MSFT 171 4.987742 \n", "2020.01.01 AAPL 28 18.48418 \n", "2020.01.01 MSFT 747 0.7909045\n", "2020.01.01 MSFT 512 0.6882382\n", "2020.01.01 GOOG 998 19.32552 \n", "2020.01.01 AAPL 812 19.20335 \n", "..\n", "'))" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.trades" ] }, { "cell_type": "markdown", "id": "119a373b", "metadata": {}, "source": [ "To convert the data type of a column, use the `set_column_type` method. Before we do that, let's look at the metadata information for the table using the `meta` method:\n" ] }, { "cell_type": "code", "execution_count": 20, "id": "45f01b75", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tfa
c
date\"d\"
ticker\"s\"
size\"j\"
price\"f\"
" ], "text/plain": [ "pykx.KeyedTable(pykx.q('\n", "c | t f a\n", "------| -----\n", "date | d \n", "ticker| s \n", "size | j \n", "price | f \n", "'))" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kx.q.meta(db.trades)" ] }, { "cell_type": "markdown", "id": "ffad39b1", "metadata": {}, "source": [ "Currently the `size` column is the type `LongAtom`. Let's update this to be a type `ShortAtom`:" ] }, { "cell_type": "code", "execution_count": 21, "id": "3706ad43", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2023.12.15 16:20:03 resaving column size (type 5) in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.01/trades\n", "2023.12.15 16:20:03 resaving column size (type 5) in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.02/trades\n", "2023.12.15 16:20:03 resaving column size (type 5) in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.03/trades\n" ] } ], "source": [ "db.set_column_type('trades', 'size', kx.ShortAtom)" ] }, { "cell_type": "markdown", "id": "319317bf", "metadata": {}, "source": [ "Now let's apply the `grouped` attribute to the size column. For more information on attributes in kdb+, refer to the Q for Mortals [Attributes section](https://code.kx.com/q4m3/8_Tables/#88-attributes)." ] }, { "cell_type": "code", "execution_count": 22, "id": "fd550ac7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2023.12.15 16:20:04 resaving column ticker (type 20) in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.01/trades\n", "2023.12.15 16:20:04 resaving column ticker (type 20) in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.02/trades\n", "2023.12.15 16:20:04 resaving column ticker (type 20) in `:/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.03/trades\n" ] } ], "source": [ "db.set_column_attribute('trades', 'ticker', 'grouped')" ] }, { "cell_type": "markdown", "id": "95e9a5a9", "metadata": {}, "source": [ "Let's revisit the metadata of the table to ensure they have been applied correctly." ] }, { "cell_type": "code", "execution_count": 23, "id": "debf733d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tfa
c
date\"d\"
ticker\"s\"g
size\"h\"
price\"f\"
" ], "text/plain": [ "pykx.KeyedTable(pykx.q('\n", "c | t f a\n", "------| -----\n", "date | d \n", "ticker| s g\n", "size | h \n", "price | f \n", "'))" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kx.q.meta(db.trades)" ] }, { "cell_type": "markdown", "id": "e75b07ae", "metadata": {}, "source": [ "## Onboard your next table\n", "\n", "Now that you have successfully set up one table, you may want to add a second table. We follow the same method as before and create the `quotes` table using the `create` method. In this example, the `quotes` table only contains data for `2020.01.03`:" ] }, { "cell_type": "code", "execution_count": 24, "id": "b04c2f77", "metadata": {}, "outputs": [], "source": [ "quotes = kx.Table(data={\n", " 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),\n", " 'open': kx.random.random(N, 10.0),\n", " 'high': kx.random.random(N, 10.0),\n", " 'low': kx.random.random(N, 10.0),\n", " 'close': kx.random.random(N, 10.0)\n", "})" ] }, { "cell_type": "code", "execution_count": 25, "id": "6914a50e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing Database Partition 2020-01-03 to table quotes\n" ] } ], "source": [ "db.create(quotes, 'quotes', date(2020, 1, 3), by_field = 'sym')" ] }, { "cell_type": "markdown", "id": "87670793", "metadata": {}, "source": [ "All tables within a database must contain the same partition structure. To ensure you can access the new table, the `quotes` table needs to exist in every partition within the database, even if there is no data for that partition. This is called backfilling data. For the partitions where the `quotes` table is missing, we use the `fill_database` method:" ] }, { "cell_type": "code", "execution_count": 26, "id": "e6f873e0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Successfully filled missing tables to partition: :/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.02\n", "Successfully filled missing tables to partition: :/var/folders/l8/t7s11kcs02x3dchm9_m48mq80000gn/T/tmp2ts68edc/db/2020.01.01\n" ] } ], "source": [ "db.fill_database()" ] }, { "cell_type": "markdown", "id": "e41e8589", "metadata": {}, "source": [ "Now that the database has resolved the missing tables within the partitions, we can view the new `quotes` table:" ] }, { "cell_type": "code", "execution_count": 27, "id": "b3be6075", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datesymopenhighlowclose
02020.01.03AAPL8.2040260.91152013.9168649.813545
12020.01.03AAPL8.0927546.0195780.085131372.825277
22020.01.03AAPL1.4250438.8817194.2854617.820761
32020.01.03AAPL7.1727363.339855.9994033.010211
42020.01.03AAPL2.9741851.5593722.763565.182052
52020.01.03AAPL3.2007597.4850887.9288136.437041
62020.01.03AAPL7.7495995.5594440.33004049.424896
72020.01.03AAPL4.8859614.6774328.2883184.366883
82020.01.03AAPL7.4128915.0821899.2140367.900838
92020.01.03AAPL6.6258479.7921396.2088189.195079
102020.01.03AAPL2.0757975.3403210.40387090.7533655
112020.01.03AAPL4.7976428.3733174.981566.299731
122020.01.03AAPL0.86887651.9676163.3495734.094004
132020.01.03AAPL2.6841430.057673528.8781742.166685
142020.01.03AAPL3.1810934.6861130.89676137.39341
152020.01.03AAPL3.6302680.45638092.890256.428857
162020.01.03AAPL7.3424699.2984047.0985091.698009
172020.01.03AAPL1.2931448.1258347.2141845.946857
182020.01.03AAPL8.0513221.4461929.4361854.824975
192020.01.03AAPL1.0187811.2994011.181810.6091787
202020.01.03AAPL4.0029094.1157725.0362111.680549
212020.01.03AAPL0.98641044.750850.51407352.468647
222020.01.03AAPL8.3885616.1704051.0671532.034476
.....................
99992020.01.03MSFT2.8328181.4661713.4575455.985203
\n", "

10,000 rows × 6 columns

" ], "text/plain": [ "pykx.PartitionedTable(pykx.q('\n", "date sym open high low close \n", "---------------------------------------------------------\n", "2020.01.03 AAPL 8.204026 0.9115201 3.916864 9.813545 \n", "2020.01.03 AAPL 8.092754 6.019578 0.08513137 2.825277 \n", "2020.01.03 AAPL 1.425043 8.881719 4.285461 7.820761 \n", "2020.01.03 AAPL 7.172736 3.33985 5.999403 3.010211 \n", "2020.01.03 AAPL 2.974185 1.559372 2.76356 5.182052 \n", "2020.01.03 AAPL 3.200759 7.485088 7.928813 6.437041 \n", "2020.01.03 AAPL 7.749599 5.559444 0.3300404 9.424896 \n", "2020.01.03 AAPL 4.885961 4.677432 8.288318 4.366883 \n", "2020.01.03 AAPL 7.412891 5.082189 9.214036 7.900838 \n", "2020.01.03 AAPL 6.625847 9.792139 6.208818 9.195079 \n", "2020.01.03 AAPL 2.075797 5.340321 0.4038709 0.7533655\n", "2020.01.03 AAPL 4.797642 8.373317 4.98156 6.299731 \n", "2020.01.03 AAPL 0.8688765 1.967616 3.349573 4.094004 \n", "2020.01.03 AAPL 2.684143 0.05767352 8.878174 2.166685 \n", "2020.01.03 AAPL 3.181093 4.686113 0.8967613 7.39341 \n", "2020.01.03 AAPL 3.630268 0.4563809 2.89025 6.428857 \n", "2020.01.03 AAPL 7.342469 9.298404 7.098509 1.698009 \n", "2020.01.03 AAPL 1.293144 8.125834 7.214184 5.946857 \n", "2020.01.03 AAPL 8.051322 1.446192 9.436185 4.824975 \n", "2020.01.03 AAPL 1.018781 1.299401 1.18181 0.6091787\n", "..\n", "'))" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.quotes" ] }, { "cell_type": "markdown", "id": "43366fab", "metadata": {}, "source": [ "Finally, to view the amount of saved data, count the number of rows per partition using `partition_count`:" ] }, { "cell_type": "code", "execution_count": 28, "id": "78b45d91", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
quotestrades
2020.01.010500425
2020.01.020499575
2020.01.031000010000
" ], "text/plain": [ "pykx.Dictionary(pykx.q('\n", " | quotes trades\n", "----------| -------------\n", "2020.01.01| 0 500425\n", "2020.01.02| 0 499575\n", "2020.01.03| 10000 10000 \n", "'))" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.partition_count()" ] }, { "cell_type": "markdown", "id": "b03cfb4b", "metadata": {}, "source": [ "## Clean up temporary database created" ] }, { "cell_type": "code", "execution_count": 29, "id": "f3883344", "metadata": {}, "outputs": [], "source": [ "tempdir.cleanup()" ] }, { "cell_type": "markdown", "id": "90049e04", "metadata": {}, "source": [ "---" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 5 }