Prometheus

From ArchWiki
Jump to navigation Jump to search

Prometheus is an open-source metrics collection and processing tool. It consists primarily of a timeseries database and a query language to access and process the metrics it stores. Separate services perform metric exposure, from which the Prometheus server can pull. It provides a very minimal web UI out of the box. To get a functional dashboard system, third-party tools like Grafana can be used.

Installation

Install the prometheus or prometheus-binAUR package. After that you can Enable and start the prometheus service and access the application via HTTP on port 9090 by default.

The default configuration monitors the prometheus process itself, but not much beyond that. To perform system monitoring, you can install prometheus-node-exporter or prometheus-node-exporter-binAUR, which performs metric scraping from the local system. You can start and enable the prometheus-node-exporter service. It will open port 9100 by default.

Configuration

The Prometheus configuration is done through YAML files, the main one being located at /etc/prometheus/prometheus.yml.

Adding metric

You can add new places to scrape metrics from by adding them to the scrape_configs array. To add the local node exporter as a source, next to the prometheus process itself, the configuration would look like this:

 scrape_configs:
   - job_name: 'prometheus'
     static_configs:
       - targets: ['localhost:9090']
   - job_name: 'localhost'
     static_configs:
       - targets: ['localhost:9100']

Exporters

The Arch Linux repository contains a subset of the [available exporters https://prometheus.io/docs/instrumenting/exporters/]:

Using the UI

Prometheus comes with a very limited web UI to verify configuration, query and graph metrics. You can reach it at http://localhost:9090 by default. You can find an in-depth explanation of Prometheus' query language in the Prometheus docs.

Alerting

alertmanager can send out custom alerts when certain conditions are met configured in /etc/prometheus/alert.rules.yml and what alert to send out is configured in /etc/alertmanager/alertmanager.yml. Alertmanager supports various ways to notify users such as email, slack, and more. To configure email alerts add the following snippet:

 global:
   resolve_timeout: 5m
   smtp_smarthost: 'smtp.example.com:25'
   smtp_from: 'alertmanager@example.com'
 route:
   group_by: ['instance', 'severity']
   group_wait: 30s
   group_interval: 5m
   repeat_interval: 3h
   receiver: team-1
 receivers:
   - name: 'team-1'
     email_configs:
       - to: 'admin@example.com'

For prometheus to send alerts to alertmanager include the following snippet in /etc/prometheus/prometheus.yml:

 alerting:
   alertmanagers:
   - static_configs:
     - targets:
        - localhost:9093

To configure an alert for when a systemd unit fails add the following snippet to /etc/prometheus/alert.rules.yml. For more rules read the alerting rules documentation.

 - name: systemd_unit
   interval: 15s
   rules:
   - alert: systemd_unit_failed
     expr: |
       node_systemd_unit_state{state="failed"} > 0
     for: 3m
     labels:
       severity: critical
     annotations:
       description: 'Instance Template:$labels.instance: Service Template:$labels.name failed'
       summary: 'Systemd unit failed'

See also