Prometheus对服务器运行状况进行监控




相较于ES体系下的MetricBeat, Prometheus更为轻量, 对服务器硬件要求更低, 同时也更容易进行搭建.

1. 安装Prometheus

  1. 首先官网下载tar.gz包, 并进行解压
  2. 将其移至/usr/local/
1
mv prometheus-2.3.2.linux-amd64 /usr/local/prometheus
  1. 配置Promethwus
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
vim prometheus.yml

# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

# Alertmanager configuration
# AlertManager与Prometheus为分离安装, 当Prometheus达到告警线时, 向AlertManager发送请求, 如何处理由AlertManager决定.
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093

# 告警规则文件路径
rule_files:
# - "first_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'

static_configs:
- targets: ['localhost:9090']

# 下面使我们自己添加的配置项, 在安装并启动了node_exporter之后进行添加, 并重启prometheus服务
- job_name: 'linux'

static_configs:
- targets: ['172.17.0.1:9100'] # 如果是本地进行测试的话, 也不要写localhost, 写本机ip即可.
  1. 创建Prometheus用户
1
2
groupadd prometheus
useradd -g prometheus -m -d /var/lib/prometheus -s /sbin/nologin prometheus
  1. 将Prometheus加入到Systemd服务中
1
2
3
4
5
6
7
8
9
10
11
12
vim /etc/systemd/system/prometheus.service

[Unit]
Description=prometheus
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus
Restart=on-failure
[Install]
WantedBy=multi-user.target

执行:

1
2
systemctl enable prometheus
systemctl start prometheus

查看prometheus运行情况:

1
2
3
4
5
6
7
8
9
10
smartkeyerror@Zero:/etc/systemd/system$ systemctl status prometheus
● prometheus.service - prometheus
Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: enabled)
Active: active (running) since 五 2018-07-20 22:26:24 CST; 11h ago
Main PID: 3704 (prometheus)
Tasks: 16
Memory: 62.8M
CPU: 58.576s
CGroup: /system.slice/prometheus.service
└─3704 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus

2. 安装node_exporter

  1. 官网下载tar.gz安装包, 并进行解压, 在这里将其移至prometheus所在文件目录下, 方便管理
1
mv node_exporter-0.16.0.linux-amd64 /usr/local/prometheus/node_exporter
  1. 同样的, 我们为其创建Systemd服务
1
2
3
4
5
6
7
8
9
10
11
12
vim /etc/systemd/system/node_exporter.service

[Unit]
Description=node_exporter
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/prometheus/node_exporter/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target

执行:

1
2
3
systemctl enable node_exporter
systemctl start node_exporter
systemctl status node_exporter # 查看node_exporter运行状况

3. 安装Grafana

  1. 安装, 按照官网的安装步骤即可, 比较方便
  2. 运行: systemctl start grafana-server
  3. 访问Grafana: 通过localhost:3000来访问web页面, 账号密码默认为admin.
  4. 添加Datasource
    Alt text

Alt text

点击Dashboards, 将数据进行import
Alt text
完成以后我们进行仪表盘的导入

Alt text

可以填写Dashboard的url或者是id来进行导入.可以在https://grafana.com/dashboards挑一个能够满足需求的仪表界面.推荐1860, 6287也还不错.

4. 安全问题:

  1. 安装Apache工具htpasswd
1
sudo apt-get install apache2-utils
  1. 找个地方生成账户密码文件
1
2
3
4
htpasswd -c /etc/nginx/site-enabled/prometheus.passwd username
New password:
Re-type new password:
Adding password for user nodeExporter # 输出该语句表示成功
  1. 配置Nginx进行端口加密
1
2
3
4
5
6
7
8
9
10
server {
listen 19100;
location / {
proxy_pass http://localhost:9100/;
# auth_basc必须添加, 否则将不会进行认证
auth_basic "Prometheus";
# 秘钥生成的方式可以使用htpasswd -c 路径/文件名称 username来生成, 登录名称即为username, 并不是auth_basic.
auth_basic_user_file /etc/nginx/sites-enabled/.htpasswd;
}
}
  1. 修改prometheus.yml配置文件
1
2
3
4
5
6
7
- job_name: 'linux'

static_configs:
- targets: ['172.17.0.1:19100'] # 修改端口以及添加basic_auth项
basic_auth:
username: nodeExporter
password: 123456
  1. 修改防火墙, 将9100端口设置为拒绝外网访问.

5. 多机部署

在多机部署之前, 我们首先要明白一件事情: node_exporter本身不产生数据, 只是数据的搬运工.
即在我们启动node_exporter之后, Prometheus如果不向端口9100请求数据的话, node_exporter不会主动的获取服务器信息.所以就有了以下的组织结构:
Alt text
在一台主机上安装Prometheus + Grafana, 其余服务器只安装node_exporter即可.

5.1 Docker安装

因为需要在很多台主机上进行node_exporter的安装, 为了避免过多的重复劳动, 可以将node_exporter, nginx打包成为一个镜像, 然后直接docker pull就好了.

5.2 Ansible自动化安装

Docker的安装较为麻烦, 这里采用Ansible对服务器进行批量的安装以及配置工作.
首先来看一下单机部署的流程:

  1. 创建目录
  2. wget 获取node_exporter的tar.gz包, 并进行解压缩
  3. 创建prometheus用户组
  4. 将node_exporter加入到Systemd服务中
  5. 生成密码, nginx添加一个server
  6. 启动node_exporter

将上述的流程打包成一个playbook, 在所需要安装的hosts上批量执行即可.
首先我们准备一个node_exporter.conf的nginx配置文件以及一个Systemd配置文件, 写入:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# node_exporter.conf
server {
listen 10029;
location / {
proxy_pass http://localhost:9100/;
auth_basic "Prometheus";
auth_basic_user_file /home/node_exporter/node_exporter.passwd;
}
}

# node_exporter.service

[Unit]
Description=node_exporter
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/home/node_exporter/node_exporter/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target

Ansible主任务文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
- name: get package
shell: mkdir /home/node_exporter; cd /home/node_exporter/; wget https://github.com/prometheus/node_exporter/releases/download/v0.16.0/node_exporter-0.16.0.linux-amd64.tar.gz; tar -zxvf node_exporter-0.16.0.linux-amd64.tar.gz node_exporter

- name: add group
shell: groupadd prometheus; useradd -g prometheus -m -d /var/lib/prometheus -s /sbin/nologin prometheus

- name: copy service.conf
copy: src=/home/smartkeyerror/ansible_node/node_exporter.service dest=/etc/systemd/system/

- name: copy nginx.conf
copy: src=/home/smartkeyerror/ansible_node/node_exporter.conf dest=/etc/nginx/conf.d/

- name: install htpasswd and get password
shell: apt-get install -y apache2-utils;
copy: src=/home/smartkeyerror/ansible_node/node_exporter.passwd dest=/home/node_exporter/

- name: restart nginx and start node_exporter
shell: nginx -s reload ; service node_exporter start