Prometheus主机(node)监控

news/2024/6/18 19:01:03 标签: linux, centos, 运维

9 Prometheus node 监控

# 安装 node_exporter
[root@promethues ~]# tar zxvf node_exporter-1.2.2.linux-amd64.tar.gz -C /usr/local/
node_exporter-1.2.2.linux-amd64/
node_exporter-1.2.2.linux-amd64/LICENSE
node_exporter-1.2.2.linux-amd64/NOTICE
node_exporter-1.2.2.linux-amd64/node_exporter
[root@promethues ~]# ln -sv /usr/local/node_exporter-1.2.2.linux-amd64 /usr/local/node_exporter
‘/usr/local/node_exporter’ -> ‘/usr/local/node_exporter-1.2.2.linux-amd64’
[root@promethues ~]# 

# 编写启动文件
[root@promethues ~]# cat /usr/lib/systemd/system/node_exporter.service
[Unit] 
Description=Prometheus node_exporter 

[Service] 
User=nobody 
ExecStart=/usr/local/node_exporter/node_exporter --log.level=error 
ExecStop=/usr/bin/killall node_exporter 

[Install] 
WantedBy=default.target

# 启动服务并设置开机自启
[root@promethues ~]# systemctl enable --now node_exporter
Created symlink from /etc/systemd/system/default.target.wants/node_exporter.service to /usr/lib/systemd/system/node_exporter.service.

# 验证 node_exporter 状态
[root@promethues ~]# systemctl status node_exporter
● node_exporter.service - Prometheus node_exporter
   Loaded: loaded (/usr/lib/systemd/system/node_exporter.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2021-10-31 16:08:25 CST; 5s ago
 Main PID: 2254 (node_exporter)
    Tasks: 3
   Memory: 8.4M
   CGroup: /system.slice/node_exporter.service
           └─2254 /usr/local/node_exporter/node_exporter --log.level=error

Oct 31 16:08:25 promethues systemd[1]: Started Prometheus node_exporter.
[root@promethues ~]# 
[root@promethues ~]# ps -ef|grep node_exporter
nobody    2254     1  0 16:08 ?        00:00:00 /usr/local/node_exporter/node_exporter --log.level=error
root      2259  2013  0 16:10 pts/2    00:00:00 grep --color=auto node_exporter
[root@promethues ~]#

将主机加入监控项

[root@promethues ~]# cat /usr/local/prometheus/prometheus.yml
# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "host_monitor"
    static_configs:
      - targets: ["localhost:9100"]   # 新增 9100 端口主机监控

# 检查语法
[root@promethues ~]# /usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml 
Checking /usr/local/prometheus/prometheus.yml
  SUCCESS: 0 rule files found

[root@promethues ~]# 
# 重启 Prometheus
[root@promethues ~]# systemctl restart prometheus

查看是否已经被监控

9.1 查询指定 mertic_name

node_cpu_seconds_total

9.2 带标签的查询

node_cpu_seconds_total{instance="localhost:9100"}

9.3 多标签查询

node_cpu_seconds_total{instance="localhost:9100",mode="system"}

9.4 计算 CPU 使用率

100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100)

9.5 计算内存使用率

100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 100

9.6 计算磁盘使用率

100 - (((node_filesystem_size_bytes{fstype=~"xfs|ext4"} - node_filesystem_free_bytes{fstype=~"xfs|ext4"}) / node_filesystem_size_bytes{fstype=~"xfs|ext4"}) * 100)


http://www.niftyadmin.cn/n/1570317.html

相关文章

Prometheus邮件报警设置

1. 告警功能概述 Prometheus 对指标的收集、存储同告警能力分别属于 Prometheus Server 和 AlertManager 两个独立的组件&#xff0c;前者仅负责基于“告警规则”生成告警通知&#xff0c;具体的告警操作则由后者完成 Alertmanager 负责处理由客户端发来的告警通知 客户端通常…

failed to register layer: open xxx no such file or directory

今天遇到一个镜像下载失败的问题&#xff0c;如下 failed to register layer: open /ssd/docker/overlay2/8b59377a7b63cd2014d31a3a885353c107f2aad1fb07886c92e1aa35732b3d21/committed: no such file or directory搜索网上的解决办法&#xff0c;比如 docker system prune…

k8s pod 更换命名空间步骤

在实际生产中&#xff0c;有些 pod 由于需要更换命名空间&#xff08;namespace&#xff09;&#xff0c;如果没有原始的 yaml 文件&#xff0c;就需要将现有的 pod 信息导出&#xff0c;修改后&#xff0c;重新 apply 1 新建文件夹 rootmaster1:~# mkdir bi-parking-lot roo…

jenkins-deleteDir报错(FilePath is missing)

今天开发反馈一直使用好好的 jenkins 编译服务&#xff0c;居然报错了 看日志&#xff0c;说是 deleteDir 出了问题 org.jenkinsci.plugins.workflow.steps.MissingContextVariableException: Required context class hudson.FilePath is missing Perhaps you forgot to surro…

生产环境中调整docker数据目录

由于 docker 安装的目录问题&#xff0c;导致根路径将要满了&#xff0c;而数据目录则没有使用&#xff0c;所以准备调整 docker 的数据目录 [rootweb02 lib]# df -h 文件系统 容量 已用 可用 已用% 挂载点 devtmpfs 7.8G 0 7.8G …

k8s中ingress公有云迁移记录

业务需求&#xff1a;几个服务从某公有云环境迁移到另一个公有云环境&#xff0c;分为三个步骤 在新的公有云环境中发布 svc 和 deploy&#xff0c;configmap&#xff0c;secret在新的公有云环境中发布 ingress&#xff0c;发布完成后&#xff0c;绑定 hosts 进行测试在第二步…

分布式存储概述

1 存储分类 单机存储 SCSI/IDE/SATA//SAS/USB/PCI-E/SSD/M.2 NVME 协议(提升性能) 网络存储(带文件系统) NFS Samba NAS (Network Attached Storage&#xff1a;网络附属存储) SAN:SAN&#xff08;Storage Area Network&#xff0c;存储区域网络) 存储选择 单机 单机存储…

内部镜像无法下载的处理过程

周末下午一同事反应机器无法 pull 镜像了&#xff0c;遂登陆机器查看&#xff0c;排查步骤如下&#xff1a; 1.首先查看/etc/docker/daemon.json 是否有内部 harbor 配置 [rootqa-gpu018 ~]# cat /etc/docker/daemon.json {"insecure-registries": ["harbor.t…