Skip to content

christiancadieux/kubequery-postgres

Repository files navigation

kubequery Aggregator using Postgres

centralized kubequery information in Postgres database.

Architecture

kubequery

Create a centralized 'loader' that polls a collection of kubernetes clusters using kubequery and store all results in a single Postgres database using the same schema as kubequery.

Features

Kubequery is usually installed on specific clusters and used to make 'live' queries against the local kubernetes resources of that cluster using SQL. This is a different approach where this software and kubequery is installed on only one cluster(the aggregator) and polls other clusters(the targets) using kubequery remote-access. The results are saved in a centralized Postgres database.

Benefits:

  • Allows to store kube resources from hundreds of clusters in a single Postgres database.

  • Allows to make almost real-time queries against hundreds of clusters since all the data is in the same Postgres database.

  • Index can be added to the PG database for improved access time. This is not available on the original sqlite virtual tables.

  • Polling the clusters and saving the results in the database is fast and can be done every 10 minutes if needed. Clusters can be polled in parallel for improved speed.

  • On large clusters, SQL joins can be really slow(hours) against the sqlite database used by kubequery. The Postgres tables are not virtual and can be optimized to solve this problem.

  • Some of the kubequery schema columns like labels and annotations are TEXT but contain json information. These columns can be converted to Postgres JSONB for easy access.

  • Only the basic "select * from kubernetes_.." queries are executed by kubequery on the target clusters during aggregation. All complex queries (JOINS, multi-cluster) are done on the Postgres database.

  • Note: For speed, the Postgres database should be located on the aggregator cluster.

Kubequery changes

For this application, kubequery had to be modified to support remote cluster access. Currently, kubequery assumes that it is running inside the k8s cluster being probed. This change allows kubequery to connect to a remote cluster and extract resource data. See ./kubequery/ for details.

Kubequery script using parameters generated by the loader:

#!/bin/sh
# runquery <table> <token> <cluster_addr> <cluster> <uuid> <ix>

TABLE=$1
export KUBEQUERY_TOKEN=$2
export KUBEQUERY_ADDR=$3
export CLUSTER_NAME=$4
export CLUSTER_UID=$5
export CLUSTER_IX=$6

echo "select * from $TABLE;" | /opt/uptycs/bin/basequery  --flagfile=/opt/uptycs/etc/kubequery.flags  --config_path=/opt/uptycs/etc/kubequery.conf --extensions_socket=/opt/uptycs/var/kubequeryi.em$TABLE.$CLUSTER_IX  --extensions_autoload=/opt/uptycs/etc/autoload.exts  --extensions_require=kubequery  --extension_event_tables=kubernetes_events  --disable_database  --json  --disable_events=false  -S > /tmp/${CLUSTER_NAME}-$TABLE.json

Example queries

Note: The strings have been obfuscated. Query time is ~300ms.

# PODS BY REGIONS (using an 'additional field')
=> select region, count(distinct cluster_name) cluster_cnt, count(distinct namespace) ns_count , count(*) pod_cnt from kubernetes_pods group by region;
        region         | cluster_cnt | ns_count | pod_cnt 
-----------------------+-------------+----------+---------
 national1             |           4 |      343 |    9636
 national2             |          25 |       43 |   30324
 national3             |           4 |       29 |    9988
 regional1             |           8 |       49 |    8342
 regional2             |           8 |       50 |    6845
 regional4             |           4 |       16 |    2943
 regional5             |           6 |       33 |    3062
 regional6             |           6 |       38 |    4489
 regional7             |           8 |       33 |    7748
 regional8             |           3 |       42 |    1650
 regional9             |           3 |       36 |   20784

(11 rows)

Time: 314.390 ms


# MOST POPULAR CONTAINER IMAGES
=> select c.image, count(*) pod_cnt from kubernetes_pod_containers c group by c.image order by count(*) desc limit 10;
                            image                             | pod_cnt
--------------------------------------------------------------+--------
 hub.obfusca.net/xxxxx-packager/system/system:3.5.1-1         | 18080
 hub.obfusca.net/xxxxx-arch/varnish-image:7.0.2-dev1          | 11376
 hub.obfusca.net/xxxxx-packager/system/system:3.3.2-1         | 10820
 hub.obfusca.net/library/telegraf:1.13.4                      |  5434
 registry.xxxxxxx.net/anchorfree/twemproxy:latest             |  5384
 hub.obfusca.net/rio/services/rccs-decoder:1.1.13_959         |  4217
 hub.obfusca.net/k8s-eng/xxxxx/rdei/k8s-dns-node-cache:1.21.1 |  3727
 hub.obfusca.net/xxxxx/rdei/node-exporter:v1.3.1              |  3382
 hub.obfusca.net/k8s-eng/xxxxx/rdei/sumatra:0.42.07           |  3315
 hub.obfusca.net/rio/services/bmw:1.9.0_1473                  |  3299
(10 rows)

Time: 230.233 ms
 

Sample Loader Run

# concurrency=5
$ /loader -c 5 

4 Cluster START testcluster-01 - https://testcluster-01:6443 
1 Cluster START testcluster2-01 - https://testcluster2-01:6443 
2 Cluster START testcluster3-01 - https://testcluster3-01:6443 
3 Cluster START testcluster5-01 - https://testcluster5-01:6443 
0 Cluster START testcluster6-01 - https://testcluster-02:6443 
------------------------------------------------------------------------------
2 Cluster REPORT testcluster3-01 - https://testcluster3-01:6443 
------------------------------------------------------------------------------
  testcluster3-01 kubernetes_namespaces TABLE
  testcluster3-01 kubernetes_namespaces Rows=19, Duration=1.05929269s
  testcluster3-01 kubernetes_nodes TABLE
  testcluster3-01 kubernetes_nodes Rows=6, Duration=976.341347ms
  testcluster3-01 kubernetes_pods TABLE
  testcluster3-01 kubernetes_pods Rows=204, Duration=1.688286795s
  testcluster3-01 kubernetes_pod_containers TABLE
  testcluster3-01 kubernetes_pod_containers Rows=274, Duration=1.883914113s

5 Cluster START testcluster5 - https://testcluster5:6443 
------------------------------------------------------------------------------
4 Cluster REPORT testcluster-01 - https://testcluster-01:6443 
------------------------------------------------------------------------------
  testcluster-01 kubernetes_namespaces TABLE
  testcluster-01 kubernetes_namespaces Rows=33, Duration=894.902131ms
  testcluster-01 kubernetes_nodes TABLE
  testcluster-01 kubernetes_nodes Rows=60, Duration=970.209593ms
  testcluster-01 kubernetes_pods TABLE
  testcluster-01 kubernetes_pods Rows=647, Duration=2.524690614s
  testcluster-01 kubernetes_pod_containers TABLE
  testcluster-01 kubernetes_pod_containers Rows=953, Duration=3.389727493s

...

About

centralized kubequery postgres database

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published