如果你需要监控两个主机、主机与外部服务之间的网络状况,那么就可以试一试本文提到的 Blackbox Exporter。
1. 安装 Blackbox
1.1 在主机上部署
1
2
3
4
5
| wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.21.0/blackbox_exporter-0.21.0.linux-amd64.tar.gz
tar -xzvf blackbox_exporter-0.21.0.linux-amd64.tar.gz
mv blackbox_exporter-0.21.0.linux-amd64/blackbox_exporter /usr/bin/
mkdir /etc/prometheus
mv blackbox_exporter-0.21.0.linux-amd64/blackbox.yml /etc/prometheus/
|
1
| rm -rf blackbox_exporter-0.21.0.linux-amd64*
|
1
| vim /usr/lib/systemd/system/blackbox_exporter.service
|
新增如下内容:
[Unit]
Description=blackbox_exporter
After=network.target
[Service]
Restart=on-failure
ExecStart=/usr/bin/blackbox_exporter \
--config.file=/etc/prometheus/blackbox.yml
Restart=on-failure
[Install]
WantedBy=multi-user.target
1
2
| systemctl enable blackbox_exporter
systemctl start blackbox_exporter
|
1
| systemctl status blackbox_exporter
|
1.2 在集群上部署
在 Kubernetes 上安装就简单多了,直接 apply 下面的 Yaml 就可以。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
| apiVersion: v1
kind: ConfigMap
metadata:
name: blackbox-config
namespace: kube-system
data:
blackbox.yml: |-
modules:
http_2xx:
prober: http
http:
preferred_ip_protocol: "ipv4"
http_post_2xx:
prober: http
http:
method: POST
tcp_connect:
prober: tcp
pop3s_banner:
prober: tcp
tcp:
query_response:
- expect: "^+OK"
tls: true
tls_config:
insecure_skip_verify: false
grpc:
prober: grpc
grpc:
tls: true
preferred_ip_protocol: "ip4"
grpc_plain:
prober: grpc
grpc:
tls: false
service: "service1"
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
- send: "SSH-2.0-blackbox-ssh-check"
irc_banner:
prober: tcp
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp:
prober: icmp
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: blackbox
namespace: kube-system
spec:
selector:
matchLabels:
app: blackbox
template:
metadata:
labels:
app: blackbox
spec:
hostAliases:
- ip: "0.0.0.0"
hostnames:
- "dev.chenshaowen.com"
containers:
- image: prom/blackbox-exporter:v0.21.1
name: blackbox
args:
- --config.file=/etc/blackbox_exporter/blackbox.yml
- --log.level=error
ports:
- containerPort: 9115
volumeMounts:
- name: config
mountPath: /etc/blackbox_exporter
volumes:
- name: config
configMap:
name: blackbox-config
---
apiVersion: v1
kind: Service
metadata:
name: blackbox
namespace: kube-system
spec:
selector:
app: blackbox
ports:
- port: 9115
targetPort: 9115
type: NodePort
|
2. 测试主机对外访问连通性
blackbox_exporter
默认在本地 9115 端口暴露服务。
实际上,通过拼接 URL 的方式,我们就可以实现主机对任意服务的连通测试,下面是一个实例,只需要提供 target、module 参数即可,其中 target 表示测试的目标,module 表示测试用的模块:
浏览器访问 http://BLACKBOX_HOST_IP:9115/probe?target=google.com&module=http_2xx,需要将 BLACKBOX_HOST_IP
替换为主机 IP。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
| # HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.030818323
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.353982702
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_http_content_length Length of http content response
# TYPE probe_http_content_length gauge
probe_http_content_length -1
# HELP probe_http_duration_seconds Duration of http request by phase, summed over all redirects
# TYPE probe_http_duration_seconds gauge
probe_http_duration_seconds{phase="connect"} 0.08580119300000001
probe_http_duration_seconds{phase="processing"} 0.201979714
probe_http_duration_seconds{phase="resolve"} 0.060847821999999996
probe_http_duration_seconds{phase="tls"} 0
probe_http_duration_seconds{phase="transfer"} 0.003931112
# HELP probe_http_redirects The number of redirects
# TYPE probe_http_redirects gauge
probe_http_redirects 3
# HELP probe_http_ssl Indicates if SSL was used for the final redirect
# TYPE probe_http_ssl gauge
probe_http_ssl 0
# HELP probe_http_status_code Response HTTP status code
# TYPE probe_http_status_code gauge
probe_http_status_code 200
# HELP probe_http_uncompressed_body_length Length of uncompressed response body
# TYPE probe_http_uncompressed_body_length gauge
probe_http_uncompressed_body_length 13645
# HELP probe_http_version Returns the version of HTTP of the probe response
# TYPE probe_http_version gauge
probe_http_version 1.1
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 4.032438981e+09
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
|
在注释中,对这些指标有着很详尽的描述,DNS 延时、响应、协议等。
3. Prometheus 配置
接着,我们可以让 Prometheus 定时抓取 blackbox 的指标。下面是相关的配置文件,需要将 BLACKBOX_HOST_IP
替换为 blackbox_exporter 部署主机的 IP 地址。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
| scrape_configs:
- job_name: 'blackbox_http'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets: ['github.com']
labels:
url: 'github.com'
- targets: ['dl-cdn.alpinelinux.org']
labels:
url: 'dl-cdn.alpinelinux.org'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: BLACKBOX_HOST_IP:9115
- job_name: 'blackbox_icmp'
metrics_path: /probe
params:
module: [icmp]
static_configs:
- targets: ['github.com']
labels:
url: 'github.com'
- targets: ['dl-cdn.alpinelinux.org']
labels:
url: 'dl-cdn.alpinelinux.org'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: BLACKBOX_HOST_IP:9115
|
4. 配置 Grafana 面板
这里采用的是 https://grafana.com/grafana/dashboards/13587 面板。
稍微进行了调整,最终看起来如下图:
5. 可能遇到的问题
5.1 probe_http_status_code
状态码一直为 0
blackbox_exporter
默认使用的是 ipv6 栈,需要手动指定为 ipv4。
编辑配置文件 /etc/prometheus/blackbox.yml
指定 ipv4 栈。
1
2
3
4
5
| modules:
http_2xx:
prober: http
http:
preferred_ip_protocol: "ipv4"
|
然后,重启服务:
1
| systemctl restart blackbox_exporter
|
5.2 Grafana 找不到 grafana-piechart-panel
1
| grafana-cli plugins install grafana-piechart-panel
|
vim /usr/local/grafana/conf/defaults.ini
[plugin.piechart]
path = /var/lib/grafana/plugins/grafana-piechart-panel
1
| systemctl restart grafana-server
|