Prometheus

From ArchWiki
Jump to navigation Jump to search

Prometheus is an open-source metrics collection and processing tool. It consists primarily of a timeseries database and a query language to access and process the metrics it stores. Separate services perform metric exposure, from which the Prometheus server can pull. It provides a very minimal web UI out of the box. To get a functional dashboard system, third-party tools like Grafana can be used.

Installation

Install the prometheus or prometheus-binAUR package. After that you can Enable and start the prometheus service and access the application via HTTP on port 9090 by default.

The default configuration monitors the prometheus process itself, but not much beyond that. To perform system monitoring, you can install prometheus-node-exporter or prometheus-node-exporter-binAUR, which performs metric scraping from the local system. You can start and enable the prometheus-node-exporter service. It will open port 9100 by default. Once the service is running, you will need to configure prometheus to scrape the exporter service periodically in order actually to collect the data. Do this by following the steps to add a metric as shown below.

Warning: The default configuration of prometheus listens on *:9090 so make sure to change the configuration or enable the relevant firewall rules.
Warning: The default configuration of prometheus-node-exporter listens on *:9100 so make sure to change the configuration or enable the relevant firewall rules.

Configuration

The Prometheus configuration is done through YAML files, the main one being located at /etc/prometheus/prometheus.yml.

Adding metric

You can add new places to scrape metrics from by adding them to the scrape_configs array. To add the local node exporter as a source, next to the prometheus process itself, the configuration would look like this:

 scrape_configs:
   - job_name: 'prometheus'
     static_configs:
       - targets: ['localhost:9090']
   - job_name: 'localhost'
     static_configs:
       - targets: ['localhost:9100']

Exporters

The Arch Linux repository contains a subset of the available exporters:

Using the UI

Prometheus comes with a very limited web UI to verify configuration, query and graph metrics. You can reach it at http://localhost:9090 by default. You can find an in-depth explanation of Prometheus' query language in the Prometheus docs.

Alerting

alertmanager can send out custom alerts when certain conditions are met configured in /etc/prometheus/alert.rules.yml and what alert to send out is configured in /etc/alertmanager/alertmanager.yml. Alertmanager supports various ways to notify users such as email, slack, and more. To configure email alerts add the following snippet:

 global:
   resolve_timeout: 5m
   smtp_smarthost: 'smtp.example.com:25'
   smtp_from: 'alertmanager@example.com'
 route:
   group_by: ['instance', 'severity']
   group_wait: 30s
   group_interval: 5m
   repeat_interval: 3h
   receiver: team-1
 receivers:
   - name: 'team-1'
     email_configs:
       - to: 'admin@example.com'

For prometheus to send alerts to alertmanager include the following snippet in /etc/prometheus/prometheus.yml:

 alerting:
   alertmanagers:
   - static_configs:
     - targets:
        - localhost:9093

To configure an alert for when a systemd unit fails add the following snippet to /etc/prometheus/alert.rules.yml. For more rules read the alerting rules documentation.

 - name: systemd_unit
   interval: 15s
   rules:
   - alert: systemd_unit_failed
     expr: |
       node_systemd_unit_state{state="failed"} > 0
     for: 3m
     labels:
       severity: critical
     annotations:
       description: 'Instance : Service failed'
       summary: 'Systemd unit failed'

Tips & Tricks

Telegraf instead of exporters

Telegraf can be used instead of multiple exporters when used with Prometheus Output Plugin. This reduces metrics collection into a single binary and offers more flexible configuration when compared to standard Prometheus exporters.

See also