Overview

The Query Router (QR) framework in Kx Control is used to manage client requests and database availability. The goal is to dispatch and load-balance queries efficiently to make the most of available resources.

In large, heavy-usage environments, there is often contention of resources with multiple clients trying to access the same processes. This can lead to long wait times against some processes, while others are idle. The framework aims to minimize such issues by acting as a layer between the clients and databases, managing the load and making more efficient routing decisions.

Screenshot

The image above shows the lifecycle of a request.

The framework consists of two main components: the Query Router (QR) and the Query Processor (QP). The QR is the entry point for client requests and performs the routing decisions. It keeps track of database availability for every process registered with it, only ever routing to available ones.

Once a query is ready to be dispatched, it gets assigned to a QP process. The selected QP will send the request to the database and collect the results. Once complete it will send the result back to the client directly. This asymmetric return path means the QR is isolated from the overhead of handling results. All communications in the framework are asynchronous to avoid blocking if a single process gets locked. This includes all communications with end-clients.

By default, the framework is deployed as a single cluster, where this is one set of QR and QP processes. The QRs run in a hot-warm cluster where only the master handles requests and the rest are available to take over should the master fail. The QPs are relatively state-less so operate hot-hot. They are assigned requests in a load-balanced manner. The framework can be split into shards to handle increasing loads.

Sharding

The QRs and QPs are deployed as standard process templates and instances. In their definitions, they provide analytic hooks which allow developers to customize selected behaviors. For the QR, this includes;

  • How it dispatches requests: could prioritize certain users or types of requests, but by default, dispatches first in, first out
  • Choose what QP to assign to: by default assigns to the least busy
  • Decide what routings to apply: used for for routed requests
  • How to parse the incoming request: for deciding if routed or not

Request targets

For standard requests, a client is required to know the name of the target they want to execute against. There are a couple of types of targets;

  • instances
  • connections
  • connection groups
  • service classes

Instances and connections each correspond to a single process running on a single server. While this is useful for some cases, it lacks in load-balancing and failover, as it requires the client to know of each underlying database. If one fails, the client would need to switch to another.

Connection groups and service classes are groups of multiple processes. The client specifies the name of one of these and the QR assigns the request to an available process in that group according to the configured mode for it.

There are three supported database types and the logic for each is described in the table below.

type logic
Default The database group will define the component connections with primary or backup status. The QR will route to the first available primary process or if none are connected, it will select the first available backup.
Round Robin Uses the same logic as above but rotates through the list of primaries or backups.
Mastered Processes register with their master/slave state. There should only ever be one master. QR will select this process when dispatching queries providing it is available. If the master goes down, another process should become the master and re-register this information with the QR.

Routed requests

By default, a single request operates on a single underlying database. Routed requests provide the ability to hit multiple databases from a single client request. The QR splits this type of request into one or more sub-requests and processes each of them separately. When all sub-requests complete, the results are aggregated and sent to the client.

Routed requests

Synchronous support

As mentioned before, all communication in the framework is asynchronous in order to prevent blocking and increase concurrency. However not all clients can support asynchronous requests or use a supported client interface so the ability to execute sync requests is provided.

Sync requests

Polling requests

Most requests are single-shot, meaning they're executed once only. For clients that require polling requests, support is provided.

Polling requests

Clients

To execute requests through the framework, a client must use one of the supported interfaces (kdb+, Java and C#). The kdb+ client is available in all kdb+ processes launched from Kx Control and full details of the APIs are in the Template API Guide. The Java and C# documentation is bundled with their respective packages.

Upon registering with the framework, the client will automatically register with all QR and QP processes. This process is asynchronous so a callback is provided to execute when complete. The interface supports one-shot or polling queries.

The registration API also allows the client to specify a heartbeat frequency. This will send heartbeats to the master QR process on a timer to check the process is still responsive, guarding against server or network failures. If a heartbeat timeout occurs, it will disconnect and retry registration with the cluster again.

Customizations

The QRs and QPs are deployed as standard process templates and instances. In their definitions, they provide analytic hooks which allow developers to customize selected behaviors. For the QR, this includes;

  • How it dispatches requests: could prioritize certain users or types of requests, but by default, dispatches first in, first out
  • Choose what QP to assign to: by default assigns to the least busy
  • Decide what routings to apply: used for for routed requests
  • How to parse the incoming request: for deciding if routed or not

Failover and timeouts

The QR processes run in a hot-warm cluster where multiple run at the same time but only one is designated as master to handle requests. The rest are available to take over should the master fail. The QPs are relatively state-less, so operate hot-hot, where every running QP is assigned queries in a load-balanced manner.

Requests will be timed out if they don’t complete within a configured interval. This protects the framework from continuing to target unavailable resources, which can be triggered by events such as server, network or application failures. If the query breaches the timeout threshold, the request will be marked as expired and the client notified.

There are two main reasons a timeout can occur;

  • Database too busy to service the request or it takes too long to execute. More often than not this is the cause
  • Server or network failure between the components

After a timeout occurs, databases that haven't re-registered as available will be disconnected. This protects further requests from targeting them and forces them to re-register when they're healthy again.

QP heartbeats

QPs can also be the cause of a timeout due to either of the reasons above. To monitor their health, these processes heartbeat to the QR. If the QR doesn't receive heartbeats within a configured period, it will disconnect the QP and remove it from the rotation. This protects further requests from being dispatched to an unavailable QP and causing further timeouts.

The QR heartbeats are enabled by default with a frequency of 30s and a timeout of 45. These values can be changed using the .qr.qpHeartbeatFreq and .qr.qpHeartbeatTimeout instance parameters.

Instance parameters

Deferred requests

When running a request on a database, the result is expected to be returned to the QP immediately. In some cases, the user might want to defer the result to complete some other tasks, i.e. if not all data is available for the result. In this case, the user would delay returning until the it received the missing data.

A sample workflow might look like this:

  • Client sends request to QR
  • Request is passed through and executed on the database
  • Request code notifies that the result will be deferred until later and awaits for a condition to be met
  • When the result is ready, it is passed back to the framework with a correlator ID
  • The result is then sent back to the QP and to the original client

The API calls are detailed in the QueryRouter.Deferred module in the Template API guide.

Triggered failover

By default the QR cluster will decide the master during handshaking of the processes. The master will only change if the previous master fails or becomes unavailable. However, solutions can manually trigger a failover of the QR cluster and direct which should become master by specifying the preferred instances. The cluster will then trigger a failover and choose the first available preferred instance as master. Some examples where this is useful:

  • Primary database cluster changes and the QR master needs to be close to the data
  • Where the current QR master is unreliable due to environmental issues

The QR cluster will subscribe to the QR topic so the solution code simply needs to broadcast a failover message for that topic with the preferred list of instances. The topic subscription and callback are already handled in the QR processes.

Failover APIs

Housekeeping

By default the QR will run housekeeping every 60 minutes. This housekeeping removes completed requests by trimming the request and query tables. By default garbage collection isn't performed but this can enabled using the .qr.runGC instance parameter.