Monitoring with Grafana and Prometheus#
This project deploys the kube-prometheus-stack Helm chart, which includes Grafana, Prometheus, Alertmanager, and a suite of pre-configured dashboards.
Access Grafana#
Via ingress#
https://grafana.your-domain.com — login with admin and the shared admin password
(set during Bootstrap the Cluster).
Via port-forward#
grafana.sh
# Or manually:
kubectl -n monitoring port-forward sts/grafana-prometheus 3000
# Open http://localhost:3000
Default dashboards#
The kube-prometheus-stack includes many dashboards out of the box:
Dashboard |
What it shows |
|---|---|
Kubernetes / Compute Resources / Cluster |
Cluster-wide CPU, memory, network |
Kubernetes / Compute Resources / Namespace |
Per-namespace resource usage |
Kubernetes / Compute Resources / Pod |
Per-pod CPU and memory |
Node Exporter / Nodes |
Node-level system metrics (CPU, memory, disk, network) |
CoreDNS |
DNS query rates and errors |
etcd |
etcd cluster health and performance |
Navigate to Dashboards in the Grafana sidebar to browse all available dashboards.
How Prometheus scrapes metrics#
Prometheus discovers scrape targets via ServiceMonitor resources. The kube-prometheus-stack automatically creates ServiceMonitors for core Kubernetes components.
Additional services can be monitored by creating their own ServiceMonitor. For example,
Longhorn has serviceMonitor.enabled: true in its Helm values, which creates a
ServiceMonitor for Longhorn metrics.
Data retention#
Prometheus stores metrics data in a Longhorn persistent volume (40Gi by default,
configured in kubernetes-services/templates/grafana.yaml). Default retention is 10
days (kube-prometheus-stack default).
To change retention, add to the Prometheus Helm values:
prometheus:
prometheusSpec:
retention: 30d
retentionSize: 35GB
Alerting#
Alertmanager is deployed as part of the stack. By default, alerts are only visible in the Alertmanager UI (accessible via port-forward):
kubectl -n monitoring port-forward svc/alertmanager-operated 9093
# Open http://localhost:9093
To configure alert notifications (email, Slack, PagerDuty), add an Alertmanager config to the Helm values. See the kube-prometheus-stack documentation for details.