Skip to content

Commit

Permalink
Merge branch 'main' into HANAonKVM-CSL
Browse files Browse the repository at this point in the history
  • Loading branch information
awolfatsuse authored Nov 14, 2023
2 parents 11a48e6 + a3874bf commit 87e5b6d
Show file tree
Hide file tree
Showing 45 changed files with 1,623 additions and 544 deletions.
2 changes: 1 addition & 1 deletion DC-SBP-SLES4SAP-sap-infra-monitoring
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ ADOC_TYPE="article"

ADOC_POST="yes"

ADOC_ATTRIBUTES="--attribute docdate=2022-02-15"
ADOC_ATTRIBUTES="--attribute docdate=2023-09-29"

# stylesheets
STYLEROOT=/usr/share/xml/docbook/stylesheet/sbp
Expand Down
16 changes: 10 additions & 6 deletions adoc/SAP-S4HA10-setup-simplemount-sle15.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -625,7 +625,7 @@ For the ERS and ASCS instances, edit the instance profiles
profile directory _/usr/sap/{mySid}/SYS/profile/_.

Tell the `{sapStartSrv}` service to load the HA script connector library and to
use the connector `{s4sClConnector3}`. On the other hand, please make sure the
use the connector `{s4sClConnector3}`. On the other hand, make sure the
feature _Autostart_ is *not* used.

[subs="attributes"]
Expand Down Expand Up @@ -993,8 +993,10 @@ primitive rsc_sap_{mySID}_{myInstAscs} SAPInstance \
================================================
The shown SAPInstance monitor timeout is a trade-off between fast recovery of
the ASCS vs. resilience against sporadic temporary NFS issues. You may slightly
increase it to fit your infrastructure.
See also manual pages ocf_heartbeat_SAPInstance(7), ocf_heartbeat_IPAddr2(7) and ocf_suse_SAPStartSrv(7).
increase it to fit your infrastructure. Consult your storage or NFS server
documentation for appropriate timeout values.
See also manual pages ocf_heartbeat_SAPInstance(7), ocf_heartbeat_IPAddr2(7) ocf_suse_SAPStartSrv(7)
and nfs(5).

.ASCS group
================================================
Expand Down Expand Up @@ -1046,8 +1048,10 @@ primitive rsc_sap_{mySID}_{myInstErs} SAPInstance \
================================================
The shown SAPInstance monitor timeout is a trade-off between fast recovery of
the ERS vs. resilience against sporadic temporary NFS issues. You may slightly
increase it to fit your infrastructure.
See also manual pages ocf_heartbeat_SAPInstance(7), ocf_heartbeat_IPAddr2(7) and ocf_suse_SAPStartSrv(7).
increase it to fit your infrastructure. Consult your storage or NFS server
documentation for appropriate timeout values.
See also manual pages ocf_heartbeat_SAPInstance(7), ocf_heartbeat_IPAddr2(7) ocf_suse_SAPStartSrv(7)
and nfs(5).

.ERS group
================================================
Expand Down Expand Up @@ -2237,7 +2241,7 @@ Find below the Corosync configuration for one corosync ring. Ideally two rings w
[subs="specialchars,attributes"]
----
{my2nd1}:~ # cat /etc/corosync/corosync.conf
# Read the corosync.conf.5 manual page
# Please read the corosync.conf.5 manual page
totem {
version: 2
secauth: on
Expand Down
16 changes: 16 additions & 0 deletions adoc/SAP-S4HA10-setupguide-sle15.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -993,6 +993,14 @@ Verify the SBD cluster configuration and if needed, modify them as described.

First, configure the resources for the file system, IP address and the {sap}
instance. You need to adapt the parameters for your specific environment.
The shown file system and SAPInstance monitor timeouts are a trade-off between
fast recovery vs. resilience against sporadic temporary NFS issues. You may
slightly increase it to fit your infrastructure.
The SAPInstance timeout needs to be higher than the file system timeout.
Consult your storage or NFS server documentation for appropriate
timeout values.
See also manual pages ocf_heartbeat_Filesystem(7), ocf_heartbeat_SAPInstance(7)
and nfs(5).

.ASCS primitive
================================================
Expand Down Expand Up @@ -1042,6 +1050,14 @@ As user _root_, type the following command:

Second, configure the resources for the file system, IP address and the {sap}
instance. You need to adapt the parameters for your specific environment.
The shown file system and SAPInstance monitor timeouts are a trade-off between
fast recovery versus resilience against sporadic temporary NFS issues. You may
slightly increase it to fit your infrastructure.
The SAPInstance timeout needs to be higher than the file system timeout.
Consult your storage or NFS server documentation for appropriate
timeout values.
See also manual pages ocf_heartbeat_Filesystem(7), ocf_heartbeat_SAPInstance(7)
and nfs(5).

The specific parameter _IS_ERS=true_ must only be set for the ERS instance.

Expand Down
9 changes: 5 additions & 4 deletions adoc/SLES4SAP-hana-sr-guide-perfopt-15-aws.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -1864,12 +1864,13 @@ _crm-saphana.txt_, and load it with the command:
.Typical Resource Agent parameter settings for different scenarios
[width="99%",cols="52%,16%,16%,16%",options="header",]
|============================================================
|Parameter |Performance Optimized |Cost Optimized |Multi-Tier
|PREFER_SITE_TAKEOVER |true |false |false / true
|AUTOMATED_REGISTER |false / true |false / true |false
|DUPLICATE_PRIMARY_TIMEOUT |7200 |7200 |7200
|Parameter |Performance Optimized |Cost Optimized |Multi-Tier |Multi-Target
|PREFER_SITE_TAKEOVER |true |false |false / true |false / true
|AUTOMATED_REGISTER |false / true |false / true |false |true / false
|DUPLICATE_PRIMARY_TIMEOUT |7200 |7200 |7200 |7200
|============================================================


// TODO PRIO1: Check if all parameters in special DUPLICATE_PRIMARY_TIMEOUT
// are explained well

Expand Down
79 changes: 79 additions & 0 deletions adoc/SLES4SAP-sap-infra-monitoring-alertmanager.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
// Alertmanager adoc file
// Please use the following line to implement each tagged content to the main document:
// include::SLES4SAP-sap-infra-monitoring-alertmanager.adoc[tag=alert-XXXXX]

// Alertmanager general
# tag::alert-general[]
===== Alertmanager

The https://prometheus.io/docs/alerting/latest/alertmanager/[Alertmanager] handles alerts sent by client applications such as the Prometheus or Loki server.
It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email or PagerDuty. It also takes care of
silencing and inhibition of alerts.
# end::alert-general[]


// Alertmanager Implementing
# tag::alert-impl[]
=== Alertmanager
The Alertmanager package can be found in the PackageHub repository.
The repository needs to be activated via the SUSEConnect command first, unless you have activated it in the previous steps already.


[source]
----
SUSEConnect --product PackageHub/15.3/x86_64
----

Alertmanager can then be installed via the `zypper` command:
[subs="attributes,specialchars,verbatim,quotes"]
----
zypper in golang-github-prometheus-alertmanager
----


Notification can be done to different receivers. A receivers can be simply be an email, chat systems, webhooks and more.
(for a complete list please take a look at the https://prometheus.io/docs/alerting/latest/configuration/#receiver[Alertmanager documentation]) +


The example configuration below is using email for notification (receiver). +


Edit the Alertmanager configuration file `/etc/alertmanager/config.yml` like below: +

[subs="attributes,specialchars,verbatim,quotes"]
----
global:
resolve_timeout: 5m
smtp_smarthost: '<mailserver>'
smtp_from: '<mail-address>'
smtp_auth_username: '<username>'
smtp_auth_password: '<passwd>'
smtp_require_tls: true
route:
group_by: ['...']
group_wait: 10s
group_interval: 5m
repeat_interval: 4h
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- send_resolved: true
to: '<target mail-address>'
from: 'mail-address>'
headers:
From: <mail-address>
Subject: '{{ template "email.default.subject" . }}'
html: '{{ template "email.default.html" . }}'
----


[subs="attributes,specialchars,verbatim,quotes"]
Start and enable the alertmanager service:
----
systemctl enable --now prometheus-alertmanager.service
----

# end::alert-impl[]
106 changes: 106 additions & 0 deletions adoc/SLES4SAP-sap-infra-monitoring-collectd.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
// Collectd adoc file
// Please use the following line to implement each tagged content to the main document:
// include::SLES4SAP-sap-infra-monitoring-collectd.adoc[tag=collectd-XXXXX]

// Collectd general
# tag::collectd-general[]

===== `collectd` - System information collection daemon
https://collectd.org/[`collectd`] is a small daemon which collects system information periodically and provides mechanisms to store and monitor the values in a variety of ways.

# end::collectd-general[]


// Collectd implementing
# tag::collectd-impl[]

=== `collectd`

The `collectd` packages can be installed from the SUSE repositories as well. For the example at hand, we have used a newer version from the openSUSE repository.

Create a file `/etc/zypp/repos.d/server_monitoring.repo` and add the following content to it:
[subs="attributes,specialchars,verbatim,quotes"]
.Content for /etc/zypp/repos.d/server_monitoring.repo
----
[server_monitoring]
name=Server Monitoring Software (SLE_15_SP3)
type=rpm-md
baseurl=https://download.opensuse.org/repositories/server:/monitoring/SLE_15_SP3/
gpgcheck=1
gpgkey=https://download.opensuse.org/repositories/server:/monitoring/SLE_15_SP3/repodata/repomd.xml.key
enabled=1
----

Afterward refresh the repository metadata and install `collectd` and its plugins.

[subs="attributes,specialchars,verbatim,quotes"]
----
# zypper ref
# zypper in collectd collectd-plugins-all
----

Now the `collectd` must be adapted to collect the information you want to get and export it in the format you need.
For example, when looking for network latency, use the ping plugin and expose the data in a Prometheus format.

[subs="attributes,specialchars,verbatim,quotes"]
.Configuration of collectd in /etc/collectd.conf (excerpts)
----
...
LoadPlugin ping
...
<Plugin ping>
Host "10.162.63.254"
Interval 1.0
Timeout 0.9
TTL 255
# SourceAddress "1.2.3.4"
# AddressFamily "any"
Device "eth0"
MaxMissed -1
</Plugin>
...
LoadPlugin write_prometheus
...
<Plugin write_prometheus>
Port "9103"
</Plugin>
...
----

Uncomment the `LoadPlugin` line and check the `<Plugin ping>` section in the file.

Modify the `systemd` unit that `collectd` works as expected. First, create a copy from the system-provided service file.
[subs="attributes,specialchars,verbatim,quotes"]
----
# cp /usr/lib/systemd/system/collectd.service /etc/systemd/system/collectd.service
----

Second, adapt this local copy.
Add the required `CapabilityBoundingSet` parameters in our local copy `/etc/systemd/system/collectd.service`.
[subs="attributes,specialchars,verbatim,quotes"]
----
...
# Here's a (incomplete) list of the plugins known capability requirements:
# ping CAP_NET_RAW
CapabilityBoundingSet=CAP_NET_RAW
...
----

Activate the changes and start the `collectd` function.
[subs="attributes,specialchars,verbatim,quotes"]
----
# systemctl daemon-reload
# systemctl enable --now collectd
----

All `collectd` metrics are accessible at port 9103.

With a quick test, you can see if the metrics can be scraped.
[subs="attributes,specialchars,verbatim,quotes"]
----
# curl localhost:9103/metrics
----
// The offical project on github: https://github.com/collectd/collectd/


# end::collectd-impl[]
2 changes: 1 addition & 1 deletion adoc/SLES4SAP-sap-infra-monitoring-docinfo.xml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@
<abstract>
<para>
This guide provides detailed information about how to install and customize
SUSE Linux Enterprise Server for SAP Applications to monitor hardware related metrics to provide insights that can help increase uptime of critical SAP applications.
SUSE Linux Enterprise Server for SAP Applications to monitor hardware-related metrics to provide insights that can help increase uptime of critical SAP applications.
It is based on SUSE Linux Enterprise Server for SAP Applications 15 SP3.
The concept however can also be used starting with SUSE Linux Enterprise Server for SAP Applications 15 SP1.
</para>
Expand Down
84 changes: 84 additions & 0 deletions adoc/SLES4SAP-sap-infra-monitoring-grafana.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
// Grafana adoc file
// Please use the following line to implement each tagged content to the main document:
// include::SLES4SAP-sap-infra-monitoring-grafana.adoc[tag=grafana-XXXXX]

// Grafana general
# tag::grafana-general[]

===== Grafana

https://grafana.com/oss/grafana/[Grafana] is an open source visualization and analytics platform.
Grafana's plug-in architecture allows interaction with a variety of data sources without creating data copies.
Its graphical browser-based user interface visualizes the data through highly customizable views, providing an interactive diagnostic workspace.

Grafana can display metrics data from Prometheus and log data from Loki side-by-side, correlating events from log files with metrics.
This can provide helpful insights when trying to identify the cause for an issue.
Also, Grafana can trigger alerts based on metrics or log entries, and thus help identify potential issues early.

# end::grafana-general[]


// Grafana implementing
# tag::grafana-impl[]

=== Grafana

The Grafana RPM packages can be found in the PackageHub repository.
The repository has to be activated via the `SUSEConnect` command first, unless you have activated it in the previous steps already.
----
# SUSEConnect --product PackageHub/15.3/x86_64
----

Grafana can then be installed via `zypper` command:
----
# zypper in grafana
----


Start and enable the Grafana server service:
----
# systemctl enable --now grafana-server.service
----


Now connect from a browser to your Grafana instance and log in:

image::sap-infra-monitoring-grafana-login.png[Grafana Login page,scaledwidth=80%,title="Grafana welcome page"]

==== Grafana data sources
After the login, the data source must be added. On the right hand there is a wheel where a new data source can be added.

image::sap-infra-monitoring-grafana-datasource-add.png[Grafana add a new data source,scaledwidth=80%,title="Adding a new Grafana data source"]

Add a data source for the Prometheus service.

.Prometheus example
image::sap-infra-monitoring-grafana-data-prometheus.png[Prometheus data source,scaledwidth=80%,title="Grafana data source for Prometheus DB"]

Also add a data source for Loki.

.Loki example
image::sap-infra-monitoring-grafana-data-loki.png[Loki data source,scaledwidth=80%,title="Grafana data source for LOKI DB"]

Now Grafana can access both the metrics stored in Prometheus and the log data collected by Loki, to visualize them.

==== Grafana dashboards

Dashboards are how Grafana presents information to the user.
Prepared dashboards can be downloaded from https://grafana.com/dashboards, or imported using the Grafana ID.

.Grafana dashboard import
image::sap-infra-monitoring-grafana-dashboards.png[Dashboard overview,scaledwidth=80%,title="Grafana dashboard import option"]

The dashboards can also be created from scratch. Information from all data sources can be merged into one dashboard.

image::sap-infra-monitoring-grafana-dashboard-new.png[Dashboard create a new dashboard,scaledwidth=80%,title="Build your own dashboard"]

==== Putting it all together
The picture below shows a dashboard displaying detailed information about the SAP HANA cluster, orchestrated by *pacemaker*.

.Dashboard example for SAP HANA
image::sap-infra-monitoring-grafana-hana-cluster.png[SUSE HANA cluster dashboard example,scaledwidth=80%,title="SUSE cluster exporter dashboard"]


# end::grafana-impl[]
Loading

0 comments on commit 87e5b6d

Please sign in to comment.