-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Replace SSD sizing tool with cluster tiers for distributed/microservices mode #15820
Open
poyzannur
wants to merge
9
commits into
main
Choose a base branch
from
poyzan/update-sizing-calculator
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+56
−83
Open
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
ff37a37
add new tables
poyzannur c4558f8
use tabs instead
poyzannur c24071e
Add summary and general notes
poyzannur 9546b8b
remove whitespace
poyzannur 3812456
add total numbers and adjust order
poyzannur c3d679e
improve text
poyzannur 995f7a4
fix grammer
poyzannur 04192aa
add link
poyzannur df9cf45
remove no list
poyzannur File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,4 @@ | ||
--- | ||
_build: | ||
list: false | ||
noindex: true | ||
title: Size the cluster | ||
menuTitle: Size the cluster | ||
description: Provides a tool that generates a Helm Chart values.yaml file based on expected ingestion, retention rate, and node type, to help size your Grafana deployment. | ||
|
@@ -17,72 +14,70 @@ weight: 100 | |
<!-- vale Grafana.Quotes = NO --> | ||
<!-- vale Grafana.Quotes = YES --> | ||
|
||
This tool helps to generate a Helm Charts `values.yaml` file based on specified | ||
expected ingestion, retention rate and node type. It will always configure a | ||
[scalable]({{< relref "../../get-started/deployment-modes#simple-scalable" >}}) deployment. The storage needs to be configured after generation. | ||
This section is a guide to size base resource needs of a Loki cluster. | ||
|
||
Based on the expected ingestion volume, Loki clusters can be categorised into three tiers. Recommendations below are based on p90 resource utilisations of the relevant components. Each tab represents a different tier. | ||
Please use this document as a rough guide to specify CPU and Memory requests in your deployment. This is only documented for [microservices/distributed](https://grafana.com/docs/loki/latest/get-started/deployment-modes/#microservices-mode) mode at this time. | ||
|
||
Query resource needs can greatly vary with usage patterns and correct configurations. General notes on Query Performance: | ||
- The rule of thumb is to run as small and as many queriers as possible. Unoptimised queries can easily require 10x of the suggested querier resources below in all tiers. Running horizontal autoscaling will be most cost effective solution to meet the demand. | ||
- Use this [blog post](https://grafana.com/blog/2023/12/28/the-concise-guide-to-loki-how-to-get-the-most-out-of-your-query-performance/) to adopt best practices for optimised query performance. | ||
- Parallel-querier and related components can be sized the same along with queriers for starters, depending on how much Loki rules are used. | ||
- Large Loki clusters benefits from disk based caching solution, memcached-extstore. Please see a detailed [blog post](https://grafana.com/blog/2023/08/23/how-we-scaled-grafana-cloud-logs-memcached-cluster-to-50tb-and-improved-reliability/) and read more on [memcached/nvm-caching here](https://memcached.org/blog/nvm-caching/). | ||
- If you’re running a cluster that handles less than 30TB/day (~1PB/month) ingestion, we do not recommend configuring memcached-extstore. The additional operational complexity does not justify the savings. | ||
|
||
These are the node types we suggest from various cloud providers. Please see the relevant specs on the provider documents. | ||
<div id="app"> | ||
<label>Node Type<i class="fa fa-question" v-on:mouseover="help='node'" v-on:mouseleave="help=null"></i></label> | ||
<select name="node-type" v-model="node"> | ||
<option v-for="node of nodes">{{ node }}</option> | ||
</select> | ||
<label>Ingest<i class="fa fa-question" v-on:mouseover="help='ingest'" v-on:mouseleave="help=null"></i></label> | ||
<div style="display: flex;"> | ||
<input style="padding-right:4.5em;" v-model="ingestInGB" name="ingest" placeholder="Desired ingest in GB/day" type="number" max="1048576" min="0"/> | ||
<span style="margin: auto auto auto -4em;">GB/day</span> | ||
</div> | ||
<label>Log retention period<i class="fa fa-question" v-on:mouseover="help='retention'" v-on:mouseleave="help=null"></i></label> | ||
<div style="display: flex;"> | ||
<input style="padding-right:4.5em;" v-model="retention" name="retention" placeholder="Desired retention period in days" type="number" min="0"/> | ||
<span style="margin: auto auto auto -4em;">days</span> | ||
</div> | ||
<label>Query performance<i class="fa fa-question" v-on:mouseover="help='queryperf'" v-on:mouseleave="help=null"></i></label> | ||
<div id="queryperf" style="display: inline-flex;"> | ||
<label for="basic"> | ||
<input type="radio" id="basic" value="Basic" v-model="queryperf"/>Basic | ||
</label> | ||
<label for="super"> | ||
<input type="radio" id="super" value="Super" v-model="queryperf"/>Super | ||
</label> | ||
</div> | ||
|
||
<div v-if="clusterSize"> | ||
<table> | ||
<tr> | ||
<th>Read Replicas</th> | ||
<th>Write Replicas</th> | ||
<th>Nodes</th> | ||
<th>Cores</th> | ||
<th>Memory</th> | ||
</tr> | ||
<tr> | ||
<td>{{ clusterSize.TotalReadReplicas }}</td> | ||
<td>{{ clusterSize.TotalWriteReplicas }}</td> | ||
<td>{{ clusterSize.TotalNodes}}</td> | ||
<td>{{ clusterSize.TotalCoresRequest}}</td> | ||
<td>{{ clusterSize.TotalMemoryRequest}} GB</td> | ||
</tr> | ||
</table> | ||
</div> | ||
|
||
<a v-bind:href="helmURL" class="primary-button">Generate and download values file</a> | ||
<label>Node Type<i class="fa fa-question" v-on:mouseover="help='node'" v-on:mouseleave="help=null"></i></label> | ||
<select name="node-type" v-model="node"> | ||
<option v-for="node of nodes">{{ node }}</option> | ||
</select><br> | ||
</div> | ||
|
||
{{< tabs >}} | ||
{{< tab-content name="Less than 100TB/month (3TB/day)" >}} | ||
| Component | CPU Request | Memory Request (Gi)| Base Replicas | Total CPU Req |Total Mem Req (Gi)| | ||
|------------------|-------------|-------------------|----------------|----------------|-----------------| | ||
| Ingester | 2 | 4 | 6 | 12 | 36 | | ||
| Distributor | 2 | 0.5 | 4 | 8 | 2 | | ||
| Index gateway | 0.5 | 2 | 4 | 2 | 8 | | ||
| Querier | 1 | 1 | 10 | 10 | 10 | | ||
| Query-frontend | 1 | 2 | 2 | 2 | 4 | | ||
| Query-scheduler | 1 | 0.5 | 2 | 2 | 1 | | ||
| Compactor | 2 | 10 | 1 (Singleton) | 2 | 10 | | ||
{{< /tab-content >}} | ||
{{< tab-content name="100TB to 1PB /month (3-30TB/day)" >}} | ||
| Component | CPU Request | Memory Request (Gi)| Base Replicas | Total CPU Req |Total Mem Req (Gi)| | ||
|------------------|-------------|-------------------|----------------|----------------|-----------------| | ||
| Ingester | 2 | 6 | 90 | 180 | 540 | | ||
| Distributor | 2 | 1 | 40 | 80 | 40 | | ||
| Index gateway | 0.5 | 4 | 10 | 5 | 40 | | ||
| Querier | 1.5 | 2 | 100 | 150 | 200 | | ||
| Query-frontend | 1 | 2 | 8 | 8 | 16 | | ||
| Query-scheduler | 1 | 0.5 | 2 | 2 | 1 | | ||
| Compactor | 6 | 20 | 1 (Singleton) | 6 | 20 | | ||
{{< /tab-content >}} | ||
{{< tab-content name="~1PB/month (30TB/day)" >}} | ||
| Component | CPU Request | Memory Request (Gi)| Base Replicas | Total CPU Req |Total Mem Req (Gi)| | ||
|------------------|-------------|-------------------|----------------|----------------|-----------------| | ||
| Ingester | 4 | 8 | 150 | 600 | 1200 | | ||
| Distributor | 2 | 1 | 100 | 200 | 100 | | ||
| Index gateway | 1 | 4 | 20 | 20 | 80 | | ||
| Querier | 1.5 | 3 | 250 | 375 | 750 | | ||
| Query-frontend | 1 | 4 | 16 | 16 | 64 | | ||
| Query-scheduler | 2 | 0.5 | 2 | 4 | 1 | | ||
| Compactor | 6 | 40 | 1 (Singleton) | 6 | 40 | | ||
{{< /tab-content >}} | ||
{{< /tabs >}} | ||
|
||
|
||
<blockquote v-if="help"> | ||
<span v-if="help === 'ingest'"> | ||
Defines the log volume in gigabytes, ie 1e+9 bytes, expected to be ingested each day. | ||
</span> | ||
<span v-else-if="help === 'node'"> | ||
Defines the node type of the Kubernetes cluster. Is a vendor or type | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just looking at how this renders: I wonder if we move this span up and change the wording
|
||
missing? If so, add it to <code>pkg/sizing/node.go</code>. | ||
</span> | ||
<span v-else-if="help === 'retention'"> | ||
Defines how long the ingested logs should be kept. | ||
</span> | ||
<span v-else-if="help === 'queryperf'"> | ||
Defines the expected query performance. Basic is sized for a max query throughput of around 3GB/s. Super aims for 25% more throughput. | ||
</span> | ||
</blockquote> | ||
</div> | ||
|
||
|
||
<script src="https://unpkg.com/vue@3/dist/vue.global.prod.js"></script> | ||
<style> | ||
|
@@ -94,18 +89,10 @@ This tool helps to generate a Helm Charts `values.yaml` file based on specified | |
padding-left: 8px; | ||
} | ||
|
||
#app #queryperf label { | ||
padding: 1em; | ||
text-align: center; | ||
} | ||
|
||
#app #queryperf label input { | ||
display: block; | ||
} | ||
|
||
#app a { | ||
padding: .5em; | ||
|
||
} | ||
} | ||
</style> | ||
|
||
|
@@ -118,11 +105,7 @@ createApp({ | |
return { | ||
nodes: ["Loading..."], | ||
node: "Loading...", | ||
bytesDayIngest: null, | ||
retention: null, | ||
queryperf: 'Basic', | ||
help: null, | ||
clusterSize: null | ||
} | ||
}, | ||
|
||
|
@@ -159,20 +142,10 @@ createApp({ | |
const url = `${API_URL}/nodes` | ||
this.nodes = await (await fetch(url,{mode: 'cors'})).json() | ||
}, | ||
async calculateClusterSize() { | ||
if (this.node == 'Loading...' || this.bytesDayIngest== null || this.retention == null) { | ||
return | ||
} | ||
const url = `${API_URL}/cluster?${this.queryString}` | ||
this.clusterSize = await (await fetch(url,{mode: 'cors'})).json() | ||
} | ||
}, | ||
|
||
watch: { | ||
node: 'calculateClusterSize', | ||
bytesDayIngest: 'calculateClusterSize', | ||
retention: 'calculateClusterSize', | ||
queryperf: 'calculateClusterSize' | ||
} | ||
}).mount('#app') | ||
</script> |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't select the right line as a comment but lines 2 - 4 will need to be removed so it appears in the docs