grafana · dimitarvdimitrov · Dec 10, 2024 · Nov 24, 2024 · Dec 10, 2024
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -169,6 +169,7 @@
 
 ### Documentation
 
+* [CHANGE] Add production tips related to cache size, heavy multi-tenancy and latency spikes. #9978
 * [BUGFIX] Send native histograms: update the migration guide with the corrected dashboard query for switching between classic and native histograms queries. #10052
 
 ### Tools

@@ -147,6 +147,12 @@ The chunks caches store portions of time series samples fetched from object stor
 Entries in this cache tend to be large (several kilobytes) and are fetched in batches by the store-gateway components.
 This results in higher bandwidth usage compared to other caches.
 
+### Cache size
+
+Memcached [extstore](https://docs.memcached.org/features/flashstorage/) feature allows to extend Memcached’s memory space onto flash (or similar) storage.
+
+Refer to [how we scaled Grafana Cloud Logs' Memcached cluster to 50TB and improved reliability](https://grafana.com/blog/2023/08/23/how-we-scaled-grafana-cloud-logs-memcached-cluster-to-50tb-and-improved-reliability/).
+
 ## Security
 
 We recommend securing the Grafana Mimir cluster.
@@ -176,3 +182,20 @@ To configure gRPC compression, use the following CLI flags or their YAML equival
 | `-ruler.query-frontend.grpc-client-config.grpc-compression` | `ingester_client.grpc_client_config.grpc_compression`      |
 | `-alertmanager.alertmanager-client.grpc-compression`        | `query_scheduler.grpc_client_config.grpc_compression`      |
 | `-ingester.client.grpc-compression`                         | `ruler.query_frontend.grpc_client_config.grpc_compression` |
+
+## Heavy multi-tenancy
+
+For each tenant, Mimir opens and maintains a TSDB in memory. If you have a significant number of tenants, the memory overhead might become prohibitive.
+To reduce the associated overhead, consider the following:
+
+- Reduce `-blocks-storage.tsdb.head-chunks-write-buffer-size-bytes`, default `4MB`. For example, try `1MB` or `128KB`.
+- Reduce `-blocks-storage.tsdb.stripe-size`, default `16384`. For example, try `256`, or even `64`.
+- Configure [shuffle sharding](https://grafana.com/docs/mimir/latest/configure/configure-shuffle-sharding/)
+
+## Periodic latency spikes when cutting blocks
+
+Depending on the workload, you might witness latency spikes when Mimir cuts blocks.
+To reduce the impact of this behavior, consider the following:
+
+- Upgrade to `2.15+`. Refer to <https://github.com/grafana/mimir/commit/03f2f06e1247e997a0246d72f5c2c1fd9bd386df>.
+- Reduce `-blocks-storage.tsdb.block-ranges-period`, default `2h`. For example. try `1h`.