Sizing Guidance - User Node Pool

The 'User Node Pool' on your Azure AKS cluster is the powerhouse for your data capture, processing and querying. The Reference Lookup aims to provide a quick guideline on the initial size for systems until their exact usage profile is established.

Use-cases

The following are some specific use cases. For variations see Reference Lookup.

persona	description	suggested 'user node pool'
Data Scientist	Expects to work with datasets of up to 10 million records per day (4 GiB / day) using queries of Moderate complexity	3 x Standard_D8s_v5
Data Engineer	Expects to connect real-time financial datasets of up to 4 billion records per day (600 GiB / day). Streaming logic of Medium Memory Usage will compliment Complex queries.	4 x Standard_D64ds_v5

Reference Lookup

With reference to the definitions for Query Complexity and Streaming Logic below, the following table provides guidance on the User Node Pool sizes for data volumes up to the X GiB / day listed in the column header

query complexity	streaming logic	10 GiB / day	30 GiB / day	750 GiB / day	2000 GiB / day	3000 GiB / day	4000 GiB / day
Simple	Low Memory Usage	4 x 16	3 x 32	3 x 128	3 x 256	3 x 384	3 x 512
Simple	Medium Memory Usage	4 x 16	3 x 32	3 x 128	3 x 256	3 x 384	3 x 512
Simple	High Memory Usage	5 x 16	4 x 32	4 x 128	4 x 256	4 x 384	4 x 512
Moderate	Low Memory Usage	4 x 16	3 x 32	3 x 128	3 x 256	3 x 384	3 x 512
Moderate	Medium Memory Usage	4 x 16	4 x 32	4 x 128	4 x 256	4 x 384	4 x 512
Moderate	High Memory Usage	5 x 16	4 x 32	4 x 128	4 x 256	4 x 384	4 x 512
Complex	Low Memory Usage	4 x 32	3 x 64	4 x 256	4 x 384	4 x 512	4 x 672
Complex	Medium Memory Usage	4 x 32	4 x 64	4 x 256	4 x 384	4 x 512	4 x 672
Complex	High Memory Usage	4 x 32	4 x 64	4 x 256	4 x 384	4 x 512	4 x 672

Note: A number of Data Access points are deployed by default. To service additional concurrent queries these may need to be scaled further

Query Complexity

query complexity	description
Simple	Short time windows (e.g. small result sets) Non-complex query logic Quick execution < 10ms
Moderate	Large time windows with aggregations (e.g. small result sets) Execution time < 1sec (although <500ms should cover most)
Complex	Large time windows and/or large datasets Complex query logic Execution time > 1sec

Streaming Logic

streaming logic	description
Low Memory Usage	In-flight calculations Storage only Decoding of file format for ingestion and storage
Medium Memory Usage	Transformations: simple aggregations and time bucketing
High Memory Usage	Complex data joins over significant time periods In-flight actions (ML, AI) OR Multiple medium memory pipelines

FAQ

How much data do I have ?

For the majority of use-cases the amount of data being captured is the biggest factor driving the infrastructure sizing.

This table provides guidance on data volumes assuming a 50 column table.

range	rows / day (realtime)	node size for data capture(GiB)	SKU (excluding local storage)	SKU (including local SSD storage for rook-ceph)
< 30 GiB / day	90,000,000	32	Standard_D8s_v5	rook-ceph not recommended given the additional resource requirement
< 75 GiB / day	200,000,000	64	Standard_D16s_v5	Standard_D16ds_v5
75 => 1000 Gi day	3,000,000,000	128	Standard_D32s_v5	Standard_D32ds_v5
1000 => 2500 GiB day	7,000,000,000	256	Standard_E32s_v5 / Standard_D64s_v5	Standard_E32ds_v5 / Standard_D64ds_v5
2500 => 3500 GiB day	10,000,000,000	384	Standard_E48s_v5 / Standard_D96s_v5	Standard_E48ds_v5 / Standard_D96ds_v5
3500 => 5000 GiB day	14,000,000,000	512	Standard_E64s_v5	Standard_E64ds_v5

Notes:

For sizing purposes the concept of fields is used. Field entries are based on the multiplication of rows by columns e.g 15 fields could be 5 rows x 3 columns or vice versa. For estimation a field size of 8 bytes is used (for variations see https://code.kx.com/q/basics/datatypes/).
SKUs are for guidance only. For performance, cost, quota or configuration preferences these may not be suitable for all use-cases.

What if my requirements change?

Sizing requirements can be adjusted via configuration changes, often with little interruption to your system. Right-sizing and cost optimisation are easiest with a predictable usage profile.

What else impacts infrastructure sizing?

Late Data

If your use case involves a considerable amount of late data this will impact your sizing needs.

vCPU

The memory required to capture data often provides ample vCPU for the associated processing and query workloads e.g. a 128 GiB server will often include 32 vCPU.

Exceptions to this rule would be:

complex data pipelines - for example pipelines leveraging multiple workers may need additional vCPU to maximise throughput
additional shards - where data is split to reduce the max memory requirement, this does also distribute, and slightly increase, the vCPU burden.

Why do I need 3 nodes?

The resilience model utilised requires at least 3 nodes in this pool (see docs on RT).