700字范文 > Linux|centos二进制方式安装系统和网络监控神器prometheus+grafana（装逼神器它来了）

Linux|centos二进制方式安装系统和网络监控神器prometheus+grafana（装逼神器它来了）

时间：2019-08-01 02:20:59

Prometheus简单介绍：

Prometheus使用Go语言开发，是Google BorgMon监控系统的开源版本，怎么产生的就不在这讨论了，反正就是香，简单易用。

由Google发起Linux基金会旗下的原生云基金会(Cloud Native Computing Foundation), 将Prometheus纳入其下第二大开源项目。Prometheus目前在开源社区相当活跃（活跃表现在插件非常多），并且是kubernetes之后的第二个毕业项目，由于和kubernetes一样同为go语言编写的，同根同族，和etcd的情况类似，因此，也是非常容易的kubernetes集群就可以连入prometheus，也就是说在云原生领域普罗米修斯是天然的伴侣，基本不做第二选择。

开发层面

Prometheus支持多种语言（Go，java，python，ruby官方提供客户端，其他语言有第三方开源客户端）。我们可以通过客户端方面的对核心业务进行埋点。如下单流程、添加购物车流程。

在应用层用作应用监控系统

一些主流应用可以通过官方或第三方的导出器，来对这些应用做核心指标的收集。如redis,mysql，MongoDB，nginx，haproxy，kubernetes集群等等。

在系统层用作系统监控

除了常用软件， prometheus也有相关系统层和网络层exporter,用以监控服务器或网络。

集成其它监控方面

prometheus可以通过各种exporte，集成其他的监控系统，收集监控数据，如AWS CloudWatch,JMX，Pingdom等等。

那么，普罗米修斯也有一些缺点，在数据展示层面比较的弱，因此， grafana这家伙就闪亮登场了。

Grafana简单介绍：

grafana是用于可视化大型测量数据的开源程序，他提供了强大和优雅的方式去创建、共享、浏览数据。dashboard中显示了你不同metric数据源中的数据。

Grafana是一个开源的，拥有丰富dashboard和图表编辑的指标分析平台，和Kibana不同的是Grafana专注于时序类图表分析，而且支持多种数据源，如Graphite、InfluxDB、Elasticsearch、Mysql、K8s、Zabbix等。

瞅着这种描述，可能更多的可以用作运维相关的指标。

Grafana最早其实应该是Kibana3的一个分支，拥有自己的权限管理和用户管理系统，而Kibana没有权限管理。Kibana和ES结合紧密，支持强大的ES语法，比较适合做一些多维度的分析和查询，而Grafana更适合用于展示，图形比Kibana美观很多。

也就是说一般可以用到运维平台里面，但是仅仅是展示，显然对运维没有太大帮助，需要加入更多的告警或者互动查询相关的功能，然后从性能或者使用角度有更好的指标，才会被选择使用，另外，一般模板类的东西也可以用做参考。

一，

Prometheus的架构

prometheus是一个用Go编写的时序数据库，可以支持多种语言客户端，注意，因为它是数据库，所以它的缺点就是数据展示功能不够，因此，才有Grafana的闪亮登场。

TSDB简介

TSDB(Time Series Database)时序列数据库，我们可以简单的理解为一个优化后用来处理时间序列数据的软件，并且数据中的数组是由时间进行索引的。

时间序列数据库的特点

大部分时间都是写入操作。

写入操作几乎是顺序添加，大多数时候数据到达后都以时间排序。

写操作很少写入很久之前的数据，也很少更新数据。大多数情况在数据被采集到数秒或者数分钟后就会被写入数据库。

删除操作一般为区块删除，选定开始的历史时间并指定后续的区块。很少单独删除某个时间或者分开的随机时间的数据。

基本数据大，一般超过内存大小。一般选取的只是其一小部分且没有规律，缓存几乎不起任何作用。

读操作是十分典型的升序或者降序的顺序读。

高并发的读操作十分常见。

常见的时间序列数据库

influxDBRRDtoolGraphiteOpenTSDBKdb+DruidKairosDBPrometheus

Prometheus的生态系统

Prometheus生态系统由多个组件组成，它们中的一些是可选的。多数Prometheus组件是Go语言写的，这使得这些组件很容易编译和部署。

1.Prometheus Server

主要负责数据采集和存储，提供PromQL查询语言的支持。

2.客户端SDK

官方提供的客户端类库有go、java、scala、python、ruby，其他还有很多第三方开发的类库，支持nodejs、php、erlang等。

3.Push Gateway

支持临时性Job主动推送指标的中间网关。

4.PromDash

使用Rails开发可视化的Dashboard，用于可视化指标数据。

5.Exporter

Exporter是Prometheus的一类数据采集组件的总称。它负责从目标处搜集数据，并将其转化为Prometheus支持的格式。与传统的数据采集组件不同的是，它并不向中央服务器发送数据，而是等待中央服务器主动前来抓取。

Prometheus提供多种类型的Exporter用于采集各种不同服务的运行状态。目前支持的有数据库、硬件、消息中间件、存储系统、HTTP服务器、JMX等。

6.alertmanager

警告管理器，用来进行报警。

7.prometheus_cli

命令行工具。

8.其他辅助性工具

多种导出工具，可以支持Prometheus存储数据转化为HAProxy、StatsD、Graphite等工具所需要的数据存储格式。

架构图如下所示：

二，

普罗米修斯的部署方式

1. 二进制部署2. Docker部署3. kubernetes集群内部署

本文选择的是二进制部署方式，

在192.168.217.24服务器上安装Prometheus server，同时安装节点信息收集器node_exporter

在192.168.217.23服务器上安装MySQL信息收集器 mysqld_exporter和node_exporter 节点信息收集器（因MySQL安装在23服务器上的）

三，

Prometheus server的安装

下载地址：Download | Prometheus

因为我的是amd64架构的，因此，选择linux-amd64，版本选择长期支持稳定版本2.37.2，将下载的安装包上传到服务器24并解压。

tar zxf prometheus-2.37.2.linux-amd64.tar.gzmv prometheus-2.37.2.linux-amd64 /usr/local/prometheus[root@node4 prometheus]# lltotal 56drwxr-xr-x. 2 root root 38 May 8 console_libraries #web控制台的依赖库drwxr-xr-x. 2 root root 173 May 8 consoles #web控制台的网页文件drwxr-xr-x. 6 root root 126 Nov 15 22:12 data #时序数据库的数据-rw-r--r--. 1 root root11357 Apr 21 LICENSE #说明书-rw-r--r--. 1 root root3773 Apr 21 NOTICE #说明-rwxr-xr-x 1 3434 3434 109691493 Nov 4 19:09 prometheus #主程序，可执行文件-rw-r--r--. 1 root root1148 Nov 15 21:09 prometheus.yml #Prometheus的主要配置文件-rwxr-xr-x. 1 root root 97394322 Apr 21 promtool #Prometheus的管理工具，可以查看时序数据库，以及报警规则文件的测试等等功能。

例如，查看时序数据库

[root@node4 prometheus]# ./promtool tsdb listBLOCK ULID MIN TIME MAX TIME DURATIONNUM SAMPLES NUM CHUNKS NUM SERIES SIZE01GHXKCBSD5JWBT1YTDZE3ME78 1668503659419 1668506400000 45m40.581s 162609 1212 1212 49435401GHXP04ETXXVQ5W28XXGZDEC4 1668506400000 1668513600000 2h0m0s 591840 4896 1308 1609719

当然，这个Prometheus可以前台启动， ./程序就可以前台启动了，但每次启停需要占据一个shell，未免不人性化，因此，给它增加一个启停脚本，脚本如下：

cat >/etc/systemd/system/prometheus.service <<EOF[Unit]Descriptinotallow=Prometheus Monitoring SystemDocumentatinotallow=Prometheus Monitoring System[Service]ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --web.listen-address=:9090 # 这里的路径按实际填写[Install]WantedBy=multi-user.targetEOF

systemctl enable prometheus && systemctl start prometheus

查看服务状态，如绿色表示启动正常，否则需要排查问题，日志里可以看到有 TSDB started以及web准备完毕的语句Server is ready to receive web requests：

[root@node4 prometheus]# systemctl status prometheus● prometheus.serviceLoaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: disabled)Active: active (running) since Wed -11-16 11:09:20 CST; 20min agoMain PID: 3925 (prometheus)CGroup: /system.slice/prometheus.service└─3925 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --web.listen-address=:9090Nov 16 11:09:21 node4 prometheus[3925]: ts=-11-16T03:09:21.098Z caller=main.go:993 level=info fs_type=XFS_SUPER_MAGICNov 16 11:09:21 node4 prometheus[3925]: ts=-11-16T03:09:21.098Z caller=main.go:996 level=info msg="TSDB started"Nov 16 11:09:21 node4 prometheus[3925]: ts=-11-16T03:09:21.098Z caller=main.go:1177 level=info msg="Loading configuration file" filename=/usr/local/prometheus/prometheus.ymlNov 16 11:09:21 node4 prometheus[3925]: ts=-11-16T03:09:21.234Z caller=main.go:1214 level=info msg="Completed loading of configuration file" filename=/usr/local/prometheus/prometheus.yml totalDuration=135.316399ms db_storage=1.16…µsNov 16 11:09:21 node4 prometheus[3925]: ts=-11-16T03:09:21.235Z caller=main.go:957 level=info msg="Server is ready to receive web requests."Nov 16 11:09:21 node4 prometheus[3925]: ts=-11-16T03:09:21.236Z caller=manager.go:941 level=info component="rule manager" msg="Starting rule manager..."Nov 16 11:09:27 node4 prometheus[3925]: ts=-11-16T03:09:27.708Z caller=compact.go:510 level=info component=tsdb msg="write block resulted in empty block" mint=1668528000000 maxt=1668535200000 duration=36.455932msNov 16 11:09:27 node4 prometheus[3925]: ts=-11-16T03:09:27.713Z caller=head.go:842 level=info component=tsdb msg="Head GC completed" duration=3.84614msNov 16 11:09:27 node4 prometheus[3925]: ts=-11-16T03:09:27.714Z caller=checkpoint.go:97 level=info component=tsdb msg="Creating checkpoint" from_segment=0 to_segment=1 mint=1668535200000Nov 16 11:09:27 node4 prometheus[3925]: ts=-11-16T03:09:27.760Z caller=head.go:1011 level=info component=tsdb msg="WAL checkpoint complete" first=0 last=1 duration=46.524627msHint: Some lines were ellipsized, use -l to show in full.

此时的配置文件修改成这样：

# my global configglobal:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configurationalerting:alertmanagers:- static_configs:- targets:# - alertmanager:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.rule_files:# - "first_rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape:# Here it's Prometheus itself.scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ["192.168.217.24:9090"] #本机IP+端口，其它的不用改

打开浏览器，输入最后的那一段网址：

可以看到state是up 绿色的，检查点可以打开看看：

OK，Prometheus server 就安装好了

四，

node_exporter的安装和配置

node_exporter等于是一个客户端信息收集器，收集的目标是类unix操作系统的CPU，内存等等基本数据，具体的收集范围可以看它的帮助：

可以看到CPU，edac，ipvs等等都是默认收集的，但还有一些是不收集的，例如ntp时间服务器，但默认的已经可以满足我们基本的百分之99的需求了。

[root@node4 prometheus]# node_exporter --help截取里面的收集动作test fixtures to use for wifi collector metrics--collector.arp Enable the arp collector (default: enabled).--collector.bcache Enable the bcache collector (default: enabled).--collector.bonding Enable the bonding collector (default: enabled).--collector.btrfsEnable the btrfs collector (default: enabled).--collector.buddyinfoEnable the buddyinfo collector (default: disabled).--collector.cgroups Enable the cgroups collector (default: disabled).--collector.conntrackEnable the conntrack collector (default: enabled).--collector.cpu Enable the cpu collector (default: enabled).--collector.cpufreq Enable the cpufreq collector (default: enabled).--collector.diskstatsEnable the diskstats collector (default: enabled).--collector.dmi Enable the dmi collector (default: enabled).--collector.drbd Enable the drbd collector (default: disabled).--collector.drm Enable the drm collector (default: disabled).--collector.edac Enable the edac collector (default: enabled).--collector.entropy Enable the entropy collector (default: enabled).--collector.ethtool Enable the ethtool collector (default: disabled).--collector.fibrechannel Enable the fibrechannel collector (default: enabled).--collector.filefd Enable the filefd collector (default: enabled).--collector.filesystemEnable the filesystem collector (default: enabled).--collector.hwmonEnable the hwmon collector (default: enabled).--collector.infinibandEnable the infiniband collector (default: enabled).--collector.interruptsEnable the interrupts collector (default: disabled).--collector.ipvs Enable the ipvs collector (default: enabled).--collector.ksmd Enable the ksmd collector (default: disabled).--collector.lnstat Enable the lnstat collector (default: disabled).--collector.loadavg Enable the loadavg collector (default: enabled).--collector.logind Enable the logind collector (default: disabled).--collector.mdadmEnable the mdadm collector (default: enabled).--collector.meminfo Enable the meminfo collector (default: enabled).--collector.meminfo_numa Enable the meminfo_numa collector (default: disabled).--collector.mountstatsEnable the mountstats collector (default: disabled).--class Enable the netclass collector (default: enabled).--dev Enable the netdev collector (default: enabled).--stat Enable the netstat collector (default: enabled).--work_route Enable the network_route collector (default: disabled).--collector.nfs Enable the nfs collector (default: enabled).--collector.nfsd Enable the nfsd collector (default: enabled).--collector.ntp Enable the ntp collector (default: disabled).--collector.nvme Enable the nvme collector (default: enabled).--collector.os Enable the os collector (default: enabled).--collector.perf Enable the perf collector (default: disabled).--collector.powersupplyclass Enable the powersupplyclass collector (default: enabled).--collector.pressure Enable the pressure collector (default: enabled).--collector.processesEnable the processes collector (default: disabled).--collector.qdiscEnable the qdisc collector (default: disabled).--collector.rapl Enable the rapl collector (default: enabled).--collector.runitEnable the runit collector (default: disabled).--collector.schedstatEnable the schedstat collector (default: enabled).--collector.selinux Enable the selinux collector (default: enabled).--collector.slabinfo Enable the slabinfo collector (default: disabled).--collector.sockstat Enable the sockstat collector (default: enabled).--collector.softnet Enable the softnet collector (default: enabled).--collector.stat Enable the stat collector (default: enabled).--collector.supervisord Enable the supervisord collector (default: disabled).--collector.sysctl Enable the sysctl collector (default: disabled).--collector.systemd Enable the systemd collector (default: disabled).--collector.tapestatsEnable the tapestats collector (default: enabled).--collector.tcpstat Enable the tcpstat collector (default: disabled).--collector.textfile Enable the textfile collector (default: enabled).--collector.thermal_zone Enable the thermal_zone collector (default: enabled).--collector.time Enable the time collector (default: enabled).--collector.timexEnable the timex collector (default: enabled).--collector.udp_queuesEnable the udp_queues collector (default: enabled).--collector.unameEnable the uname collector (default: enabled).--collector.vmstat Enable the vmstat collector (default: enabled).--collector.wifi Enable the wifi collector (default: disabled).--collector.xfs Enable the xfs collector (default: enabled).--collector.zfs Enable the zfs collector (default: enabled).--collector.zoneinfo Enable the zoneinfo collector (default: disabled).

由于此采集器是go语言编写的，就一个可执行文件，因此，将node_exporter-1.4.0.linux-amd64.tar.gz上传到服务器后，解压并将可执行文件放到环境变量内即可。

tar zxf node_exporter-1.4.0.linux-amd64.tar.gzmv node_exporter-1.4.0.linux-amd64/node_exporter /usr/local/bin/

还是老办法，使用启停脚本进行管理：

多说一句，以上说的定制化收集其实就在这个启停脚本里设置即可，本例是默认，因此很多都没有写的。

cat >/etc/systemd/system/node_exporter.service <<EOF[Unit]Descriptinotallow=node_exporter Monitoring SystemDocumentatinotallow=node_exporter Monitoring System[Service]ExecStart=/usr/local/bin/node_exporter --web.listen-address=:9100[Install]WantedBy=multi-user.targetEOF

systemctl enable node_exporter && systemctl start node_exporter

查看服务状态，绿色表示正常：

[root@node4 ~]# systemctl status node_exporter● node_exporter.serviceLoaded: loaded (/etc/systemd/system/node_exporter.service; enabled; vendor preset: disabled)Active: active (running) since Wed -11-16 12:27:46 CST; 1min 23s agoMain PID: 7519 (node_exporter)CGroup: /system.slice/node_exporter.service└─7519 /usr/local/bin/node_exporter --web.listen-address=:9100Nov 16 12:27:46 node4 node_exporter[7519]: ts=-11-16T04:27:46.961Z caller=node_exporter.go:115 level=info collector=timexNov 16 12:27:46 node4 node_exporter[7519]: ts=-11-16T04:27:46.961Z caller=node_exporter.go:115 level=info collector=udp_queuesNov 16 12:27:46 node4 node_exporter[7519]: ts=-11-16T04:27:46.961Z caller=node_exporter.go:115 level=info collector=unameNov 16 12:27:46 node4 node_exporter[7519]: ts=-11-16T04:27:46.961Z caller=node_exporter.go:115 level=info collector=vmstatNov 16 12:27:46 node4 node_exporter[7519]: ts=-11-16T04:27:46.961Z caller=node_exporter.go:115 level=info collector=xfsNov 16 12:27:46 node4 node_exporter[7519]: ts=-11-16T04:27:46.961Z caller=node_exporter.go:115 level=info collector=zfsNov 16 12:27:46 node4 node_exporter[7519]: ts=-11-16T04:27:46.961Z caller=node_exporter.go:199 level=info msg="Listening on" address=:9100Nov 16 12:27:46 node4 node_exporter[7519]: ts=-11-16T04:27:46.961Z caller=tls_config.go:195 level=info msg="TLS is disabled." http2=falseNov 16 12:28:10 node4 systemd[1]: [/etc/systemd/system/node_exporter.service:2] Unknown lvalue 'Descriptinotallow' in section 'Unit'Nov 16 12:28:10 node4 systemd[1]: [/etc/systemd/system/node_exporter.service:3] Unknown lvalue 'Documentatinotallow' in section 'Unit'

现在的node采集器已经工作，差最后一哆嗦，将此采集器收集的数据接入Prometheus。集成方式为编辑Prometheus的配置文件，增加target字段：

（同样的，在23服务器也也这么安装部署一哈，把node_exporter服务启动了）

[root@node4 ~]# cat /etc/systemd/system/node_exporter.service [Unit]Descriptinotallow=node_exporter Monitoring SystemDocumentatinotallow=node_exporter Monitoring System[Service]ExecStart=/usr/local/bin/node_exporter --web.listen-address=:9100[Install]WantedBy=multi-user.target[root@node4 ~]# cat /usr/local/prometheus/prometheus.yml # my global configglobal:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configurationalerting:alertmanagers:- static_configs:- targets:# - alertmanager:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.rule_files:# - "first_rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape:# Here it's Prometheus itself.scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ["192.168.217.24:9090"]- job_name: "server"static_configs:- targets: ["192.168.217.24:9100"]- targets: ["192.168.217.23:9100"]

重启Prometheus server，在浏览器上就可以看到多出了两个target了：

五，

MySQL收集器的安装和配置（192.168.217.23服务器上执行）

解压安装包，并重命名到指定路径 /usr/local/下：

tar zxf mysqld_exporter-0.14.0.linux-amd64.tar.gzmv mysqld_exporter-0.14.0.linux-amd64 /usr/local/mysqld_exporter

数据库建立专用用户：

create user 'exporter'@'%' identified by '123456';GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%' WITH MAX_USER_CONNECTIONS 3;flush privileges;

编辑MySQL的配置文件：

MySQL的端口和密码，我这里不是默认的端口，是3311，MySQL安装在192.168.217.23上的。

cat >/usr/local/mysqld_exporter/f <<EOF[client]host = 192.168.217.23port = 3311user = exporterpassword = 123456[mysqladmin]host = 192.168.217.23port = 3311user = exporterpassword = 123456EOF

编辑启停脚本：

cat >/usr/lib/systemd/system/mysqld-exporter.service <<EOF[Unit]Description=mysqld_exporter[Service]User=expoterExecStart=/usr/local/mysqld_exporter/mysqld_exporter \--config.my-cnf=/usr/local/mysqld_exporter/f \--web.listen-address=:9104 \--collect.slave_status \--collect.binlog_size \--collect.info_schema.processlist \--collect.info_schema.innodb_metrics \--collect.engine_innodb_status \--collect.perf_schema.file_events \--collect.perf_schema.replication_group_member_statsRestart=on-failure[Install]WantedBy=multi-user.targeEOF

以上的参数都是通过 mysqld_exported 的帮助得来的，有兴趣的同学可以看看下面的帮助，对比使用了哪些参数：

[root@node3 ~]# mysqld_exporter --helpusage: mysqld_exporter [<flags>]Flags:-h, --help Show context-sensitive help (also try --help-long and --help-man).--exporter.lock_wait_timeout=2 Set a lock_wait_timeout (in seconds) on the connection to avoid long metadata locking.--exporter.log_slow_filter Add a log_slow_filter to avoid slow query logging of scrapes. NOTE: Not supported by Oracle MySQL.--collect.heartbeat.database="heartbeat" Database from where to collect heartbeat data--collect.heartbeat.table="heartbeat" Table from where to collect heartbeat data--collect.heartbeat.utc Use UTC for timestamps of the current server (`pt-heartbeat` is called with `--utc`)--collect.info_schema.processlist.min_time=0 Minimum time a thread must be in each state to be counted--collect.info_schema.processlist.processes_by_user Enable collecting the number of processes by user--collect.info_schema.processlist.processes_by_host Enable collecting the number of processes by host--collect.info_schema.tables.databases="*" The list of databases to collect table stats for, or '*' for all--collect.mysql.user.privileges Enable collecting user privileges from mysql.user--collect.perf_schema.eventsstatements.limit=250 Limit the number of events statements digests by response time--collect.perf_schema.eventsstatements.timelimit=86400 Limit how old the 'last_seen' events statements can be, in seconds--collect.perf_schema.eventsstatements.digest_text_limit=120 Maximum length of the normalized statement text--collect.perf_schema.file_instances.filter=".*" RegEx file_name filter for performance_schema.file_summary_by_instance--collect.perf_schema.file_instances.remove_prefix="/var/lib/mysql/" Remove path prefix in performance_schema.file_summary_by_instance--collect.perf_schema.memory_events.remove_prefix="memory/" Remove instrument prefix in performance_schema.memory_summary_global_by_event_name--web.config.file=""[EXPERIMENTAL] Path to configuration file that can enable TLS or authentication.--web.listen-address=":9104" Address to listen on for web interface and telemetry.--web.telemetry-path="/metrics" Path under which to expose metrics.--timeout-offset=0.25 Offset to subtract from timeout in seconds.--config.my-cnf="/root/.f" Path to .f file to read MySQL credentials from.--tls.insecure-skip-verify Ignore certificate and server verification when using a tls connection.--collect.global_variables Collect from SHOW GLOBAL VARIABLES--collect.slave_status Collect from SHOW SLAVE STATUS--collect.info_schema.processlist Collect current thread state counts from the information_schema.processlist--collect.mysql.userCollect data from mysql.user--collect.info_schema.tables Collect metrics from information_schema.tables--collect.info_schema.innodb_tablespaces Collect metrics from information_schema.innodb_sys_tablespaces--collect.info_schema.innodb_metrics Collect metrics from information_schema.innodb_metrics--collect.global_status Collect from SHOW GLOBAL STATUS--collect.binlog_size Collect the current size of all registered binlog files--collect.perf_schema.tableiowaits Collect metrics from performance_schema.table_io_waits_summary_by_table--collect.perf_schema.indexiowaits Collect metrics from performance_schema.table_io_waits_summary_by_index_usage--collect.perf_schema.tablelocks Collect metrics from performance_schema.table_lock_waits_summary_by_table--collect.perf_schema.eventsstatements Collect metrics from performance_schema.events_statements_summary_by_digest--collect.perf_schema.eventsstatementssum Collect metrics of grand sums from performance_schema.events_statements_summary_by_digest--collect.perf_schema.eventswaits Collect metrics from performance_schema.events_waits_summary_global_by_event_name--collect.auto_increment.columns Collect auto_increment columns and max values from information_schema--collect.perf_schema.file_instances Collect metrics from performance_schema.file_summary_by_instance--collect.perf_schema.memory_events Collect metrics from performance_schema.memory_summary_global_by_event_name--collect.perf_schema.replication_group_members Collect metrics from performance_schema.replication_group_members--collect.perf_schema.replication_group_member_stats Collect metrics from performance_schema.replication_group_member_stats--collect.perf_schema.replication_applier_status_by_worker Collect metrics from performance_schema.replication_applier_status_by_worker--collect.info_schema.userstats If running with userstat=1, set to true to collect user statistics--collect.info_schema.clientstats If running with userstat=1, set to true to collect client statistics--collect.perf_schema.file_events Collect metrics from performance_schema.file_summary_by_event_name--collect.info_schema.schemastats If running with userstat=1, set to true to collect schema statistics--collect.info_schema.innodb_cmp Collect metrics from information_schema.innodb_cmp--collect.info_schema.innodb_cmpmem Collect metrics from information_schema.innodb_cmpmem--collect.info_schema.query_response_time Collect query response time distribution if query_response_time_stats is ON.--collect.engine_tokudb_status Collect from SHOW ENGINE TOKUDB STATUS--collect.engine_innodb_status Collect from SHOW ENGINE INNODB STATUS--collect.heartbeatCollect from heartbeat--collect.info_schema.tablestats If running with userstat=1, set to true to collect table statistics--collect.info_schema.replica_host Collect metrics from information_schema.replica_host_status--collect.slave_hosts Scrape information from 'SHOW SLAVE HOSTS'--log.level=info Only log messages with the given severity or above. One of: [debug, info, warn, error]--log.format=logfmtOutput format of log messages. One of: [logfmt, json]--versionShow application version.

查看端口：

[root@node3 ~]# netstat -antup |grep 3311tcp 00 192.168.217.23:59276 192.168.217.23:3311TIME_WAIT - tcp 00 192.168.217.23:59278 192.168.217.23:3311TIME_WAIT - tcp 00 192.168.217.23:59274 192.168.217.23:3311TIME_WAIT - tcp 00 192.168.217.23:59270 192.168.217.23:3311TIME_WAIT - tcp 00 192.168.217.23:59272 192.168.217.23:3311TIME_WAIT - tcp6 00 :::3311 :::*LISTEN2859/mysqld tcp6 00 192.168.217.23:3311192.168.217.23:59270 TIME_WAIT -

[root@node3 ~]# netstat -antup |grep 9104tcp6 00 :::9104 :::*LISTEN7041/mysqld_exporte tcp6 00 192.168.217.23:9104192.168.217.24:60422 ESTABLISHED 7041/mysqld_exporte

将MySQL采集器接入Prometheus：

同样的，修改Prometheus的配置文件，增加一个target：

[root@node4 ~]# cat /usr/local/prometheus/prometheus.yml # my global configglobal:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configurationalerting:alertmanagers:- static_configs:- targets:# - alertmanager:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.rule_files:# - "first_rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape:# Here it's Prometheus itself.scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ["192.168.217.24:9090"]- job_name: "server"static_configs:- targets: ["192.168.217.24:9100"]- targets: ["192.168.217.23:9100"]- job_name: "mysqld"static_configs:- targets: ["192.168.217.23:9104"]

重启Prometheus server，再次打开浏览器，查看target，有一个绿色up表示接入成功：

六，

部署Grafana （在192.168.217.24上部署的）

Download Grafana | Grafana Labs

yum安装完毕后，Grafana就已经可以使用了，直接浏览器打开，输入192.168.217.24:3000就可以登录，初始的账号/密码是 admin/admin，登录后将会要求修改初始密码，按要求修改就可以了。（修改后的密码要记住哦）

登录进去后，集成Prometheus，选择data source 数据源：

点旁边的Seetings

dashboard的模板配置文件一般是json格式的文件，这些文件官网都有提供，网址是：Dashboards | Grafana Labs

例如，首页上的node exporter采集器的模板配置文件：

选择上图的import按钮，导入此文件：

同样的，MySQL_exporter收集器也需要一个json类型的配置文件来生成dashboard，在官网寻找就可以了，当然了，一般是选择标星高的，例如：

MySQL Overview | Grafana Labs这个ID为7362的模板文件下载了30多w次，证明还是比较可靠的哦。