Illumina Innovates with Rancher and Kubernetes
To keep your clusters and applications healthy and driving your organizational productivity forward, you need to stay informed of events occurring in your clusters and projects, both planned and unplanned. When an event occurs, your alert is triggered, and you are sent a notification. You can then, if necessary, follow up with corrective actions.
Notifiers and alerts are built on top of the Prometheus Alertmanager. Leveraging these tools, Rancher can notify cluster owners and project owners of events they need to address.
Before you can receive alerts, you must configure one or more notifier in Rancher.
When you create a cluster, some alert rules are predefined. You can receive these alerts if you configure a notifier for them.
For details about what triggers the predefined alerts, refer to the documentation on default alerts.
This section covers the following topics:
Some examples of alert events are:
You can set an urgency level for each alert. This urgency appears in the notification you receive, helping you to prioritize your response actions. For example, if you have an alert configured to inform you of a routine deployment, no action is required. These alerts can be assigned a low priority level. However, if a deployment fails, it can critically impact your organization, and you need to react quickly. Assign these alerts a high priority level.
The scope for alerts can be set at either the cluster level or project level.
At the cluster level, Rancher monitors components in your Kubernetes cluster, and sends you alerts related to:
As a cluster owner, you can configure Rancher to send you alerts for cluster events.
Prerequisite: Before you can receive cluster alerts, you must add a notifier.
From the Global view, navigate to the cluster that you want to configure cluster alerts for. Select Tools > Alerts. Then click Add Alert Group.
Enter a Name for the alert that describes its purpose, you could group alert rules for the different purpose.
Based on the type of alert you want to create, complete one of the instruction subsets below.
This alert type monitor for events that affect one of the Kubernetes master components, regardless of the node it occurs on.
Select the System Services option, and then select an option from the drop-down.
Select the urgency level of the alert. The options are:
etcd
Configure advanced options. By default, the below options will apply to all alert rules within the group. You can disable these advanced options when configuring a specific rule.
This alert type monitors for specific events that are thrown from a resource type.
Choose the type of resource event that triggers an alert. The options are:
Select a resource type from the Choose a Resource drop-down that you want to trigger an alert.
Select the urgency level of the alert.
Info: Least urgent Select the urgency level of the alert by considering factors such as how often the event occurs or its importance. For example:
If you set a normal alert for pods, you’re likely to receive alerts often, and individual pods usually self-heal, so select an urgency of Info.
If you set a warning alert for StatefulSets, it’s very likely to impact operations, so select an urgency of Critical.
This alert type monitors for events that occur on a specific node.
Select the Node option, and then make a selection from the Choose a Node drop-down.
Choose an event to trigger the alert.
This alert type monitors for events that occur on any node on marked with a label. For more information, see the Kubernetes documentation for Labels.
Select the Node Selector option, and then click Add Selector to enter a key value pair for a label. This label should be applied to one or more of your nodes. Add as many selectors as you’d like.
This alert type monitors for the overload from Prometheus expression querying, it would be available after you enable monitoring.
Input or select an Expression, the drop down shows the original metrics from Prometheus, including:
Choose a Comparison.
Input a Threshold, for trigger alert when the value of expression cross the threshold.
Select a duration, for trigger alert when expression value crosses the threshold longer than the configured duration.
sum(node_load5) / count(node_cpu_seconds_total{mode="system"})
Continue adding more Alert Rule to the group.
Finally, choose the notifiers to send the alerts to.
Result: Your alert is configured. A notification is sent when the alert is triggered.
After you set up cluster alerts, you can manage each alert object. To manage alerts, browse to the cluster containing the alerts, and then select Tools > Alerts that you want to manage. You can: