Merge branch 'main' into HANAonKVM-CSL

SUSE · Nov 14, 2023 · 87e5b6d · 87e5b6d
2 parents 11a48e6 + a3874bf
commit 87e5b6d
Show file tree

Hide file tree

Showing 45 changed files with 1,623 additions and 544 deletions.
diff --git a/DC-SBP-SLES4SAP-sap-infra-monitoring b/DC-SBP-SLES4SAP-sap-infra-monitoring
@@ -4,7 +4,7 @@ ADOC_TYPE="article"
 
 ADOC_POST="yes"
 
-ADOC_ATTRIBUTES="--attribute docdate=2022-02-15"
+ADOC_ATTRIBUTES="--attribute docdate=2023-09-29"
 
 # stylesheets
 STYLEROOT=/usr/share/xml/docbook/stylesheet/sbp

diff --git a/adoc/SAP-S4HA10-setup-simplemount-sle15.adoc b/adoc/SAP-S4HA10-setup-simplemount-sle15.adoc
@@ -625,7 +625,7 @@ For the ERS and ASCS instances, edit the instance profiles
 profile directory _/usr/sap/{mySid}/SYS/profile/_.
 
 Tell the `{sapStartSrv}` service to load the HA script connector library and to
-use the connector `{s4sClConnector3}`. On the other hand, please make sure the
+use the connector `{s4sClConnector3}`. On the other hand, make sure the
 feature _Autostart_ is *not* used.
 
 [subs="attributes"]
@@ -993,8 +993,10 @@ primitive rsc_sap_{mySID}_{myInstAscs} SAPInstance \
 ================================================
 The shown SAPInstance monitor timeout is a trade-off between fast recovery of
 the ASCS vs. resilience against sporadic temporary NFS issues. You may slightly
-increase it to fit your infrastructure.
-See also manual pages ocf_heartbeat_SAPInstance(7), ocf_heartbeat_IPAddr2(7) and ocf_suse_SAPStartSrv(7).
+increase it to fit your infrastructure. Consult your storage or NFS server
+documentation for appropriate timeout values.
+See also manual pages ocf_heartbeat_SAPInstance(7), ocf_heartbeat_IPAddr2(7) ocf_suse_SAPStartSrv(7)
+and nfs(5).
 
 .ASCS group
 ================================================
@@ -1046,8 +1048,10 @@ primitive rsc_sap_{mySID}_{myInstErs} SAPInstance \
 ================================================
 The shown SAPInstance monitor timeout is a trade-off between fast recovery of
 the ERS vs. resilience against sporadic temporary NFS issues. You may slightly
-increase it to fit your infrastructure.
-See also manual pages ocf_heartbeat_SAPInstance(7), ocf_heartbeat_IPAddr2(7) and ocf_suse_SAPStartSrv(7).
+increase it to fit your infrastructure. Consult your storage or NFS server
+documentation for appropriate timeout values.
+See also manual pages ocf_heartbeat_SAPInstance(7), ocf_heartbeat_IPAddr2(7) ocf_suse_SAPStartSrv(7)
+and nfs(5).
 
 .ERS group
 ================================================
@@ -2237,7 +2241,7 @@ Find below the Corosync configuration for one corosync ring. Ideally two rings w
 [subs="specialchars,attributes"]
 ----
 {my2nd1}:~ # cat /etc/corosync/corosync.conf
-# Read the corosync.conf.5 manual page
+# Please read the corosync.conf.5 manual page
 totem {
     version: 2
     secauth: on

diff --git a/adoc/SAP-S4HA10-setupguide-sle15.adoc b/adoc/SAP-S4HA10-setupguide-sle15.adoc
@@ -993,6 +993,14 @@ Verify the SBD cluster configuration and if needed, modify them as described.
 
 First, configure the resources for the file system, IP address and the {sap}
 instance. You need to adapt the parameters for your specific environment.
+The shown file system and SAPInstance monitor timeouts are a trade-off between
+fast recovery vs. resilience against sporadic temporary NFS issues. You may
+slightly increase it to fit your infrastructure.
+The SAPInstance timeout needs to be higher than the file system timeout.
+Consult your storage or NFS server documentation for appropriate
+timeout values.
+See also manual pages ocf_heartbeat_Filesystem(7), ocf_heartbeat_SAPInstance(7)
+and nfs(5).
 
 .ASCS primitive
 ================================================
@@ -1042,6 +1050,14 @@ As user _root_, type the following command:
 
 Second, configure the resources for the file system, IP address and the {sap}
 instance. You need to adapt the parameters for your specific environment.
+The shown file system and SAPInstance monitor timeouts are a trade-off between
+fast recovery versus resilience against sporadic temporary NFS issues. You may
+slightly increase it to fit your infrastructure.
+The SAPInstance timeout needs to be higher than the file system timeout.
+Consult your storage or NFS server documentation for appropriate
+timeout values.
+See also manual pages ocf_heartbeat_Filesystem(7), ocf_heartbeat_SAPInstance(7)
+and nfs(5).
 
 The specific parameter _IS_ERS=true_ must only be set for the ERS instance.
 

diff --git a/adoc/SLES4SAP-hana-sr-guide-perfopt-15-aws.adoc b/adoc/SLES4SAP-hana-sr-guide-perfopt-15-aws.adoc
@@ -1864,12 +1864,13 @@ _crm-saphana.txt_, and load it with the command:
 .Typical Resource Agent parameter settings for different scenarios
 [width="99%",cols="52%,16%,16%,16%",options="header",]
 |============================================================
-|Parameter |Performance Optimized |Cost Optimized |Multi-Tier
-|PREFER_SITE_TAKEOVER |true |false |false / true
-|AUTOMATED_REGISTER |false / true |false / true |false
-|DUPLICATE_PRIMARY_TIMEOUT |7200 |7200 |7200
+|Parameter |Performance Optimized |Cost Optimized |Multi-Tier |Multi-Target
+|PREFER_SITE_TAKEOVER |true |false |false / true |false / true
+|AUTOMATED_REGISTER |false / true |false / true |false |true / false
+|DUPLICATE_PRIMARY_TIMEOUT |7200 |7200 |7200 |7200
 |============================================================
 
+
 // TODO PRIO1: Check if all parameters in special DUPLICATE_PRIMARY_TIMEOUT
 // are explained well
 

diff --git a/adoc/SLES4SAP-sap-infra-monitoring-alertmanager.adoc b/adoc/SLES4SAP-sap-infra-monitoring-alertmanager.adoc
@@ -0,0 +1,79 @@
+// Alertmanager adoc file
+// Please use the following line to implement each tagged content to the main document:
+// include::SLES4SAP-sap-infra-monitoring-alertmanager.adoc[tag=alert-XXXXX]
+
+// Alertmanager general
+# tag::alert-general[]
+===== Alertmanager
+
+The  https://prometheus.io/docs/alerting/latest/alertmanager/[Alertmanager] handles alerts sent by client applications such as the Prometheus or Loki server.
+It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email or PagerDuty. It also takes care of
+silencing and inhibition of alerts.
+# end::alert-general[]
+
+
+// Alertmanager Implementing
+# tag::alert-impl[]
+=== Alertmanager
+The Alertmanager package can be found in the PackageHub repository.
+The repository needs to be activated via the SUSEConnect command first, unless you have activated it in the previous steps already.
+
+
+[source]
+----
+SUSEConnect --product PackageHub/15.3/x86_64
+----
+
+Alertmanager can then be installed via the `zypper` command:
+[subs="attributes,specialchars,verbatim,quotes"]
+----
+zypper in  golang-github-prometheus-alertmanager
+----
+
+
+Notification can be done to different receivers. A receivers can be simply be an email, chat systems, webhooks and more. 
+(for a complete list please take a look at the https://prometheus.io/docs/alerting/latest/configuration/#receiver[Alertmanager documentation]) +
+
+
+The example configuration below is using email for notification (receiver). +
+
+
+Edit the Alertmanager configuration file `/etc/alertmanager/config.yml` like below: +
+
+[subs="attributes,specialchars,verbatim,quotes"]
+----
+global:
+  resolve_timeout: 5m
+  smtp_smarthost: '<mailserver>'
+  smtp_from: '<mail-address>'
+  smtp_auth_username: '<username>'
+  smtp_auth_password: '<passwd>'
+  smtp_require_tls: true
+
+route:
+  group_by: ['...']
+  group_wait: 10s
+  group_interval: 5m
+  repeat_interval: 4h
+  receiver: 'email'
+
+receivers:
+  - name: 'email'
+    email_configs:
+      - send_resolved: true
+        to: '<target mail-address>'
+        from: 'mail-address>'
+        headers:
+          From: <mail-address>
+          Subject: '{{ template "email.default.subject" . }}'
+          html: '{{ template "email.default.html" . }}'
+----
+
+
+[subs="attributes,specialchars,verbatim,quotes"]
+Start and enable the alertmanager service:
+----
+systemctl enable --now prometheus-alertmanager.service
+----
+
+# end::alert-impl[]
diff --git a/adoc/SLES4SAP-sap-infra-monitoring-collectd.adoc b/adoc/SLES4SAP-sap-infra-monitoring-collectd.adoc
@@ -0,0 +1,106 @@
+// Collectd adoc file
+// Please use the following line to implement each tagged content to the main document:
+// include::SLES4SAP-sap-infra-monitoring-collectd.adoc[tag=collectd-XXXXX]
+
+// Collectd general
+# tag::collectd-general[]
+
+===== `collectd` - System information collection daemon
+https://collectd.org/[`collectd`] is a small daemon which collects system information periodically and provides mechanisms to store and monitor the values in a variety of ways.
+
+# end::collectd-general[]
+
+
+// Collectd implementing
+# tag::collectd-impl[]
+
+=== `collectd`
+
+The `collectd` packages can be installed from the SUSE repositories as well. For the example at hand, we have used a newer version from the openSUSE repository.
+
+Create a file `/etc/zypp/repos.d/server_monitoring.repo` and add the following content to it:
+[subs="attributes,specialchars,verbatim,quotes"]
+.Content for /etc/zypp/repos.d/server_monitoring.repo
+----
+[server_monitoring]
+name=Server Monitoring Software (SLE_15_SP3)
+type=rpm-md
+baseurl=https://download.opensuse.org/repositories/server:/monitoring/SLE_15_SP3/
+gpgcheck=1
+gpgkey=https://download.opensuse.org/repositories/server:/monitoring/SLE_15_SP3/repodata/repomd.xml.key
+enabled=1
+----
+
+Afterward refresh the repository metadata and install `collectd` and its plugins.
+
+[subs="attributes,specialchars,verbatim,quotes"]
+----
+# zypper ref
+# zypper in collectd collectd-plugins-all
+----
+
+Now the `collectd` must be adapted to collect the information you want to get and export it in the format you need.
+For example, when looking for network latency, use the ping plugin and expose the data in a Prometheus format.
+
+[subs="attributes,specialchars,verbatim,quotes"]
+.Configuration of collectd in /etc/collectd.conf (excerpts)
+----
+...
+LoadPlugin ping
+...
+<Plugin ping>
+        Host "10.162.63.254"
+        Interval 1.0
+        Timeout 0.9
+        TTL 255
+#       SourceAddress "1.2.3.4"
+#       AddressFamily "any"
+        Device "eth0"
+        MaxMissed -1
+</Plugin>
+...
+LoadPlugin write_prometheus
+...
+<Plugin write_prometheus>
+        Port "9103"
+</Plugin>
+...
+----
+
+Uncomment the `LoadPlugin` line and check the `<Plugin ping>` section in the file.
+
+Modify the `systemd` unit that `collectd` works as expected. First, create a copy from the system-provided service file.
+[subs="attributes,specialchars,verbatim,quotes"]
+----
+# cp /usr/lib/systemd/system/collectd.service /etc/systemd/system/collectd.service
+----
+
+Second, adapt this local copy.
+Add the required `CapabilityBoundingSet` parameters in our local copy `/etc/systemd/system/collectd.service`.
+[subs="attributes,specialchars,verbatim,quotes"]
+----
+...
+# Here's a (incomplete) list of the plugins known capability requirements:
+#   ping            CAP_NET_RAW
+CapabilityBoundingSet=CAP_NET_RAW
+...
+----
+
+Activate the changes and start the `collectd` function.
+[subs="attributes,specialchars,verbatim,quotes"]
+----
+# systemctl daemon-reload
+# systemctl enable --now collectd
+----
+
+All `collectd` metrics are accessible at port 9103.
+
+With a quick test, you can see if the metrics can be scraped.
+[subs="attributes,specialchars,verbatim,quotes"]
+----
+# curl localhost:9103/metrics
+----
+// The offical project on github: https://github.com/collectd/collectd/
+
+
+# end::collectd-impl[]
diff --git a/adoc/SLES4SAP-sap-infra-monitoring-docinfo.xml b/adoc/SLES4SAP-sap-infra-monitoring-docinfo.xml
@@ -72,7 +72,7 @@
 <abstract>
         <para>
         This guide provides detailed information about how to install and customize
-        SUSE Linux Enterprise Server for SAP Applications to monitor hardware related metrics to provide insights that can help increase uptime of critical SAP applications.
+        SUSE Linux Enterprise Server for SAP Applications to monitor hardware-related metrics to provide insights that can help increase uptime of critical SAP applications.
         It is based on SUSE Linux Enterprise Server for SAP Applications 15 SP3.
         The concept however can also be used starting with SUSE Linux Enterprise Server for SAP Applications 15 SP1.
         </para>

diff --git a/adoc/SLES4SAP-sap-infra-monitoring-grafana.adoc b/adoc/SLES4SAP-sap-infra-monitoring-grafana.adoc
@@ -0,0 +1,84 @@
+// Grafana adoc file
+// Please use the following line to implement each tagged content to the main document:
+// include::SLES4SAP-sap-infra-monitoring-grafana.adoc[tag=grafana-XXXXX]
+
+// Grafana general
+# tag::grafana-general[]
+
+===== Grafana
+
+https://grafana.com/oss/grafana/[Grafana] is an open source visualization and analytics platform.
+Grafana's plug-in architecture allows interaction with a variety of data sources without creating data copies.
+Its graphical browser-based user interface visualizes the data through highly customizable views, providing an interactive diagnostic workspace.
+
+Grafana can display metrics data from Prometheus and log data from Loki side-by-side, correlating events from log files with metrics.
+This can provide helpful insights when trying to identify the cause for an issue.
+Also, Grafana can trigger alerts based on metrics or log entries, and thus help identify potential issues early.
+
+# end::grafana-general[]
+
+
+// Grafana implementing
+# tag::grafana-impl[]
+
+=== Grafana
+
+The Grafana RPM packages can be found in the PackageHub repository.
+The repository has to be activated via the `SUSEConnect` command first, unless you have activated it in the previous steps already.
+----
+# SUSEConnect --product PackageHub/15.3/x86_64
+----
+
+Grafana can then be installed via `zypper` command:
+----
+# zypper in grafana
+----
+
+
+Start and enable the Grafana server service:
+----
+# systemctl enable --now grafana-server.service
+----
+
+
+Now connect from a browser to your Grafana instance and log in:
+
+image::sap-infra-monitoring-grafana-login.png[Grafana Login page,scaledwidth=80%,title="Grafana welcome page"]
+
+==== Grafana data sources
+After the login, the data source must be added. On the right hand there is a wheel where a new data source can be added.
+
+image::sap-infra-monitoring-grafana-datasource-add.png[Grafana add a new data source,scaledwidth=80%,title="Adding a new Grafana data source"]
+
+Add a data source for the Prometheus service.
+
+.Prometheus example
+image::sap-infra-monitoring-grafana-data-prometheus.png[Prometheus data source,scaledwidth=80%,title="Grafana data source for Prometheus DB"]
+
+Also add a data source for Loki.
+
+.Loki example
+image::sap-infra-monitoring-grafana-data-loki.png[Loki data source,scaledwidth=80%,title="Grafana data source for LOKI DB"]
+
+Now Grafana can access both the metrics stored in Prometheus and the log data collected by Loki, to visualize them.
+
+==== Grafana dashboards
+
+Dashboards are how Grafana presents information to the user.
+Prepared dashboards can be downloaded from https://grafana.com/dashboards, or imported using the Grafana ID.
+
+.Grafana dashboard import
+image::sap-infra-monitoring-grafana-dashboards.png[Dashboard overview,scaledwidth=80%,title="Grafana dashboard import option"]
+
+The dashboards can also be created from scratch. Information from all data sources can be merged into one dashboard.
+
+image::sap-infra-monitoring-grafana-dashboard-new.png[Dashboard create a new dashboard,scaledwidth=80%,title="Build your own dashboard"]
+
+==== Putting it all together
+The picture below shows a dashboard displaying detailed information about the SAP HANA cluster, orchestrated by *pacemaker*.
+
+.Dashboard example for SAP HANA
+image::sap-infra-monitoring-grafana-hana-cluster.png[SUSE HANA cluster dashboard example,scaledwidth=80%,title="SUSE cluster exporter dashboard"]
+
+
+# end::grafana-impl[]