Prometheus

From ArchWiki

Prometheus is an open-source metrics collection and processing tool. It consists primarily of a timeseries database and a query language to access and process the metrics it stores. Separate services perform metric exposure, from which the Prometheus server can pull. It provides a very minimal web UI out of the box. To get a functional dashboard system, third-party tools like Grafana can be used.

Installation

Install the prometheus package. After that you can enable and start prometheus.service and access the application via HTTP on port 9090 by default.

The default configuration monitors the prometheus process itself, but not much beyond that. To perform system monitoring, you can install prometheus-node-exporter which performs metric scraping from the local system. You can start and enable the prometheus-node-exporter service. It will open port 9100 by default. Once the service is running, you will need to configure prometheus to scrape the exporter service periodically in order actually to collect the data. Do this by following the steps to add metrics as shown below.

Warning: The default configuration of prometheus listens on *:9090 and prometheus-node-exporter listens on *:9100, so make sure to change the configuration or enable the relevant firewall rules. See also the Prometheus security model.

Configuration

The Prometheus configuration is done through YAML files, the main one being located at /etc/prometheus/prometheus.yml.

Adding metrics

You can add new places to scrape metrics from by adding them to the scrape_configs array. To add the local node exporter as a source, next to the prometheus process itself, the configuration would look like this:

 scrape_configs:
   - job_name: 'prometheus'
     static_configs:
       - targets: ['localhost:9090']
   - job_name: 'node'
     static_configs:
       - targets: ['localhost:9100']

Exporters

The Arch Linux repository contains a subset of the available exporters:

The exporters are implemented as services. For example to run the node exporter, enable and start prometheus-node-exporter.service.

Using the UI

Prometheus comes with a very limited web UI to verify configuration, query and graph metrics. You can reach it at http://localhost:9090 by default. You can find an in-depth explanation of Prometheus' query language in the Prometheus documentation.

Alerting

alertmanager can send out custom alerts when certain conditions are met configured in /etc/prometheus/alert.rules.yml and what alert to send out is configured in /etc/alertmanager/alertmanager.yml. Alertmanager supports various ways to notify users such as email, slack, and more. To configure email alerts add the following snippet:

global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.example.com:25'
  smtp_from: 'alertmanager@example.com'
route:
  group_by: ['instance', 'severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 3h
  receiver: team-1
receivers:
  - name: 'team-1'
    email_configs:
      - to: 'admin@example.com'

For prometheus to send alerts to alertmanager include the following snippet in /etc/prometheus/prometheus.yml:

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093

To configure an alert for when a systemd unit fails add the following snippet to /etc/prometheus/alert.rules.yml. For more rules read the alerting rules documentation.

- name: systemd_unit
  interval: 15s
  rules:
  - alert: systemd_unit_failed
    expr: |
      node_systemd_unit_state{state="failed"} > 0
    for: 3m
    labels:
      severity: critical
    annotations:
      description: 'Instance : Service failed'
      summary: 'Systemd unit failed'

Tips and tricks

Telegraf instead of exporters

Telegraf can be used instead of multiple exporters when used with Prometheus Output Plugin. This reduces metrics collection into a single binary and offers more flexible configuration when compared to standard Prometheus exporters.

See also