Prometheus on RaspberryPi
Monitoring and alterting can be complicated to setup, in this article I show how to prepare a full fledged Prometheus and Grafna monitoring station with alerting (to Discord) on a RaspberryPi / raspbian.
We are going to setup the following componenets:
- Prometheus: to collect metrics and alert
- Prometheus alertmanager: to manage alerts
- Alertmanager Discord: to alert into a Discord channel
- Prometheus node exporter: to be able to monitor the monitoring station itself
- Grafana: to display monitoring dashboards
- Nginx: to access all these over a single domain and TLS
Prerequisites
1. You will need to set the hostname of your Raspberry Pi and make it resolvable on your local network. I will refer to this hostname as monitoring-pi
in this guide.
$ sudo hostname monitoring-pi
$ echo "127.0.0.1 monitoring-pi | sudo tee -a /etc/hosts
2. You will need to create a Discord server and create a webhook on a channel inside this server. (Edit channel, Integrations, New webhook, Copy Webhook URL) I will refer to this URL as {{ DISCORD_URL }}
in this guide.
3. I assume you will do this installation as the default pi
user, of course you can use a different one.
Packages
Unfortunately all these things are coming from different sources so we need to grab things from here and there.
Let’s start with the easy ones:
$ sudo apt install -y nginx-extras ssl-cert git prometheus-node-exporter
(ssl-cert manager will give us simple self-signed certificates)
prometheus-node-exporter brings exim4 (?!), it makes sense to immediately stop and disable it:
$ sudo systemctl stop exim4; sudo systemctl disable exim4
Go ahead with Grafana:
$ wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
$ echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
$ sudo apt-get update
$ sudo apt-get install -y grafana
$ sudo systemctl enable grafana
$ sudo systemctl start grafana
Get go
, so that we can build the remaing part of the stack:
$ wget https://golang.org/dl/go1.16.2.linux-armv6l.tar.gz
$ sudo rm -rf /usr/local/go; sudo tar -C /usr/local -xzf go1.16.2.linux-armv6l.tar.gz
$ ln -s /usr/local/go/bin/go /usr/local/bin
Build the remaining components:
$ go get github.com/benjojo/alertmanager-discord
$ go get github.com/prometheus/alertmanager/cmd/…
$ GO111MODULE=on go get github.com/prometheus/prometheus/cmd/…
This will take a while, once done, copy everything over to /usr/local/bin
$ sudo cp ~/go/bin/{alertmanager-discord,alertmanager,prometheus} /usr/local/bin/
After this step you can delete your ~/go
directory, it contains only the build dependencies (and that is a lot!).
Let’s create storage for prometheus and alertmanager:
$ sudo mkdir /var/lib/prometheus/{alertmanager,data}
$ sudo chown -R pi /var/lib/prometheus
BTW raspbian has an alermanager package, but it doesn’t contain the web interface hence building it from source.
Systemd
We have all the software on the machine, let’s configure prometheus, alertmanager and alertmanager-discord systemd units.
$ systemctl edit --force --full prometheus.service[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target[Service]
User=pi
Restart=on-failure
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/data \
--web.external-url=https://monitoring-pi/prometheus --web.route-prefix=/[Install]
WantedBy=multi-user.target$ systemctl edit --force --full prometheus-alertmanager.service[Unit]
Description=Alertmanager for prometheus
Documentation=https://prometheus.io/docs/alerting/alertmanager/[Service]
Restart=always
User=prometheus
ExecStart=/usr/local/bin/alertmanager --storage.path=/var/lib/prometheus/alertmanager/ --config.file=/etc/prometheus/alertmanager.yml --web.external-url=https://monitoring-pi/alertmanager --web.route-prefix=/ --cluster.listen-address=127.0.0.1:9094
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
SendSIGKILL=no[Install]
WantedBy=multi-user.target$ systemctl edit --force --full alertmanager-discord.service[Unit]
Description=Alertmanager Discord
Documentation=https://github.com/benjojo/alertmanager-discord
After=network-online.target[Service]
Environment="LISTEN_ADDRESS=127.0.0.1:19094"
Environment="DISCORD_WEBHOOK={{ DISCORD_URL }}"
User=pi
Restart=on-failure
ExecStart=/usr/local/bin/alertmanager-discord[Install]
WantedBy=multi-user.target
Let’s enable and start the services:
$ sudo systemctl enable prometheus.service; sudo systemctl start prometheus.service
$ sudo systemctl enable prometheus-alertmanager.service; sudo systemctl start prometheus-alertmanager.service
$ sudo systemctl enable alertmanager-discord.service; sudo systemctl start alertmanager-discord.service
Configure Prometheus and Alertmanager
$ sudo nano /etc/prometheus/prometheus.ymlglobal:
scrape_interval: 15s # let's be a bit more aggressive than the default
evaluation_interval: 15s
# scrape_timeout is set to the global default (10s).# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- /etc/prometheus/alert.rules.yml
scrape_configs:
#let's monitor prometheus itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
#let's monitor the monitoring station
- job_name: 'monitoring-pi'
static_configs:
- targets: ['monitoring-pi:9100']
#here you can add your further systems to monitor
Setting up rules us a precious task, I found a very good reference on grep.to and came up with this:
$ vi /etc/prometheus/alert.rules.ymlgroups:
- name: node_exporter_alerts
rules:
- alert: Node down
expr: up{job="monitoring-pi"} == 0
for: 2m
labels:
severity: warning
annotations:
title: Node {{ $labels.instance }} is down
description: Failed to scrape {{ $labels.job }} on {{ $labels.instance }} for more than 2 minutes. Node seems down.- alert: HostOutOfMemory
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
for: 2m
labels:
severity: warning
annotations:
summary: Host out of memory (instance {{ $labels.instance }})
description: Node memory is filling up (< 10% left)\n VALUE = {{ $value }}# I incuded here these two rules above only for reference, the full file can be found here: https://gist.github.com/krisek/62a98e2645af5dce169a7b506e999cd8#file-alert-rules-yml
Note #1: grep.to provides an amazing list of alerts, but finally I had to change the alert annotations, as they include the
$labels
variable, which wasn’t accepted by my Prometheus.Note #2: You will need to adopt this alerts as per your needs. Actually this is the most complicated part of setting up a proper monitoring solution, don’t be afraid “Adopt, adapt and improve (motto of the round table).”
Let’s proceed with the alertmanager configuration:
$ vi /etc/prometheus/alertmanager.ymlglobal:route:
group_by: ['instance', 'severity']
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: 'discord_webhook'routes:
- receiver: "discord_webhook"
group_wait: 10s
match_re:
severity: critical|warning
continue: truereceivers:
- name: 'discord_webhook'
webhook_configs:
- url: http://localhost:19094
- name: 'alert-team'
slack_configs:
- channel: "#webhook-channel"
text: "summary: {{ .CommonAnnotations.summary }}\ndescription: {{ .CommonAnnotations.description }}"
This will raise all warnings and critical alarms to our Discord channel.
Finally, let’s reload services:
$ sudo systemctl restart prometheus.service
$ sudo systemctl restart prometheus-alertmanager.service
We are almost there, what remains is to configure NGINX and Grafana.
BTW: if everything went well, you should receive now a notification about a failed service
smartd
. At least, on my vanilla raspbian installation this service fails to start and that raises an alarm… adopt, adapt and improve.
NGINX configuration
We’ll use NGINX as a reverse proxy to terminate SSL, ie. enable HTTPS access for our monitoring services. This can be accomplished with a very simple configuration:
$ vi /etc/nginx/sites-available/default
...
server {
#listen 80 default_server;
#listen [::]:80 default_server;# SSL configuration
#
listen 443 ssl default_server;
listen [::]:443 ssl default_server;
...
include snippets/snakeoil.conf;
...
server_name _;location /grafana {
proxy_pass http://localhost:3000;
proxy_http_version 1.1;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}location /prometheus {
rewrite /prometheus/(.*) /$1 break;
proxy_pass http://localhost:9090;
proxy_http_version 1.1;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}location /alertmanager {
rewrite /alertmanager/(.*) /$1 break;
proxy_pass http://localhost:9093;
proxy_http_version 1.1;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
...
The complete file can be found on: https://gist.github.com/krisek/65442ba99eb773e9acbd0597d0a5f859
NGINX configuration needs to be reloaded now:
$ sudo systemctl reload nginx
If everything went well, you can point your browser to https://monitoring-pi/prometheus
and https://monitoring-pi/alertmanager
and check the status of your monitoring system.
Grafana configuration
To get Grafana working behind a reverse proxy a few parameters need to be changed in /etc/grafana/grafana.ini
:
[server]
# Protocol (http, https, h2, socket)
;protocol = http# The ip address to bind to, empty will bind to all interfaces
http_addr = 127.0.0.1# The public facing domain name used to access grafana from a browser
domain = monitoring-pi# The full public facing url you use in browser, used for redirects and emails
# If you use reverse proxy and sub path specify full url (with sub path)
root_url = %(protocol)s://%(domain)s:%(http_port)s/grafana/# Serve Grafana from subpath specified in `root_url` setting. By default it is set to `false` for compatibility reasons.
serve_from_sub_path = true
Now let’s restart Grafana as well:
$ sudo systemctl restart grafana
Once this is done, point your browser to https://monitoring-pi/grafana
and if everything went well, Grafana should load.
Change the default password and import a dashboard, I found this one: https://grafana.com/grafana/dashboards/7675, after a little adjusting on the variables — Dashboard settings/Variables/host:
label_values(up{job=”monitoring-pi”}, instance)
it perfectly serves my basic needs.
Good luck with the implentation!