NAME
collectl -
Collects data that describes the current system status.
简单翻译成中文就是:收集当前系统状态数据并予以显示
Collectl是一个系统指标收集工具。可以守护进程方式和交互方式运行。支持从一系列的子系统中收集数据。包含一个Graphite接口,使得数据可以轻易地传递给Graphite进行存储。
下面是官方的介绍:
There are a number of times in which you find yourself needing
performance data. These can include benchmarking, monitoring a
system's general heath or trying to determine what your system was
doing at some time in the past. Sometimes you just want to know
what the system is doing right now. Depending on what you're doing,
you often end up using different tools, each designed to for that
specific situation. Unlike most monitoring tools that either focus
on a small set of statistics, format their output in only one way,
run either interatively or as a daemon but not both, collectl tries
to do it all. You can choose to monitor any of a broad set of
subsystems which currently include buddyinfo, cpu, disk, inodes,
infiniband, lustre, memory, network, nfs, processes, quadrics,
slabs, sockets and tcp.
下载: /projects/collectl/files/
安装就不啰嗦了,非常简单!rpn包或源码安装!
使用使用介绍
collectl有三种运行模式:
1. Interactive Mode(交互模式): This is the default and in this mode
data is read from /proc and passes through analyze.
2. Record Mode(记录模式):read data from live system and write to file
or display on terminal
使用语法:collectl [-f file] [options]
3. Playback Mode(回放模式):read data from one or more raw data files
and display on terminal
使用语法:collectl -p file1 [file2 ...] [options]
众多监控工具中、collectl支持的性能数据种类应该是最全的一个,监控的子系统项类型:
SUMMARY SUBSYSTEMS --摘要子系统:显示的比较简单.
b - buddy info (memory fragmentation)
c - CPU
d - Disk
f - NFS V3 Data
i - Inode and File System
j - Interrupts
l - Lustre
m - Memory
n - Networks
s - Sockets
t - TCP
x - Interconnect
y - Slabs (system object caches)
DETAIL SUBSYSTEMS --细节子系统:显示比较详细的信息.
C -
CPU
D - Disk
E - Environmental data (fan, power, temp),
via ipmitool
F - NFS Data
J - Interrupts
L - Lustre OST detail OR client Filesystem
detail
M - Memory node data, which is also known as numa
data
N - Networks
T - 65 TCP counters only available in plot
format
X - Interconnect
Y - Slabs (system object caches)
Z - Processes
上面这些监控项目必须要以 -s 参数来指定,如:collectl -ss ,并且是运行在回放模式下.
常用的参数及说明:
collect 默认不带参数的情况下显示如下:
[root@twexdb1 qzhijun]# collectl
waiting for 1 second sample...
#
#cpu sys inter ctxsw KBRead
Reads KBWrit Writes KBIn
PktIn KBOut
PktOut
0 0 1032
439 0 0 0 0 2 23 6 21
0 0 1049
345 8 16 265
10 0 3 1 6
0 0 1074
229 0 0 0 0 3 25 6 23
0 0 1091
226 0 0 0 0 2 19 3 16
可以看到显示的内容:CPU/Disks/Network,显示的比较简单。
-s 显示子系统
1.显示摘要子系统信息指定项目信息:
举例:
1).只显示CPU的简单信息
[root@twexdb1 qzhijun]# collectl -sc
waiting for 1 second sample...
#
#cpu sys inter ctxsw
0 0 1099
342
0 0 1060
355
0 0 1115
266
0 0 1032
147
Ouch!
2).同时显示内存和磁盘的简单信息
[root@twexdb1 qzhijun]# collectl -sdm
waiting for 1 second sample...
#
#Free Buff Cach Inac Slab Map KBRead
Reads KBWrit Writes
118M 270M 5G 5G 223M
1G 0 0 264
8
118M 270M 5G 5G 223M
1G 0 0 0 0
118M 270M 5G 5G 223M
1G 0 0 52
10
119M 270M 5G 5G 223M
1G 8 16
1157 52
119M 270M 5G 5G 223M
1G 0 0 0 0
Ouch!
这个子系统也可以原来collectl这个命令不带任何参数的情况下追加或减少显示的信息,用+/-.
3).增加内存的显示信息:
[root@twexdb1 qzhijun]# collectl -s+m
waiting for 1 second sample...
#
#cpu sys inter ctxsw Free Buff Cach Inac Slab
Map KBRead Reads KBWrit Writes
KBIn PktIn
KBOut PktOut
0 0 2348
1851 116M 270M 5G
5G 223M 1G 0 0 0 0 2 22 4 19
1 0 3513
3354 116M 270M 5G
5G 223M 1G 0 0 316 18
78 777 120
701
0 0 1108
304 116M 270M
5G 5G 223M 1G 8
16 1 1 142 1605 184 1368
0 0 1151
683 115M 270M
5G 5G 223M 1G 0
0
28 4 9 65 31
60
Ouch!
4).同时增加内存与网络的显示信息:
[root@twexdb1 qzhijun]# collectl -s+mn
waiting for 1 second sample...
#
#cpu sys inter ctxsw Free Buff Cach Inac Slab
Map KBRead Reads KBWrit Writes
KBIn PktIn
KBOut PktOut
0 0 1032
554 116M 270M
5G 5G 224M 1G 0
0
352 9 4 40 11
35
0 0 1032
180 116M 270M
5G 5G 224M 1G 0
0
0
0
1
11 2 12
0 0 1026
174 116M 270M
5G 5G 224M 1G 8
16 1 1 1 4 1 6
0 0 1032
177 116M 270M
5G 5G 224M 1G 0
0
0
0
1
4
1
7
Ouch!
5).在默认显示信息的基础上减去CPU的信息:
[root@twexdb1 qzhijun]# collectl -s-c
waiting for 1 second sample...
#
#KBRead Reads KBWrit Writes KBIn PktIn KBOut
PktOut
8
16 1 1 29 278
52 230
0
0
0
0
50 556 69
463
0
0
20 3 6 49 14
46
0
0
1516 81
74 675 235
603
8
16 337 8 2 18
8
21
0
0
0
0
1
4
1
6
Ouch!
2.显示详细子系统指定项目信息:
[root@twexdb1 qzhijun]# collectl -sD
waiting for 1 second sample...
# DISK STATISTICS (/sec)
#
Pct
#Name KBytes Merged IOs Size KBytes
Merged IOs Size RWSize
QLen Wait SvcTim
Util
c0d0 0
0
0 0 0 0 0
0 0 0 0
0
0
sda 8 0 16 1 0 0 1
1 0 1 0
0
0
sdb 0 0 0
0 44 5 6 7
7
2 0 0 0
sdc 0 0 0
0 0 0 0 0
0
0 0 0 0
dm-0 8
0
16 1
0
0
1 1 0 1
0 0 0
dm-1 0
0
0 0 44 0 11 4 4 4
0 0 0
dm-2 0
0
0 0 0 0 0
0 0 0 0
0
0
dm-3 0
0
0 0 0 0 0
0 0 0 0
0
0
c0d0 0
0
0 0 0 0 0
0 0 0 0
0
0
还可以指定特定的磁盘:--dskfilt
[root@twexdb1 qzhijun]# collectl -sD --dskfilt sdb
waiting for 1 second sample...
监控某个特定的进程:
[root@twexdb1 qzhijun]# collectl -sZ --procfilt Cmysql --procopts
c
waiting for 60 second sample...
# PROCESS SUMMARY (counters are /sec)
# PID User PR PPID THRD S VSZ
RSS CP SysT
UsrT Pct AccuTime MajF MinF
Command
6839 root 18 1 0 S 10M 1M 3 0.00
0.00 0
00:00.09 0
0 /bin/sh
7002 mysql 14 6839 300 S
2G 1G 15 0.18
3.96 6 728:25:39
0 0 /usr/local/mysql/bin/mysqld
Ouch!
# DISK STATISTICS (/sec)
#
Pct
#Name KBytes Merged IOs Size KBytes
Merged IOs Size RWSize
QLen Wait SvcTim
Util
sdb 0 0 0
0 0 0 0 0
0
0 0 0 0
sdb 0 0 0
0 0 0 0 0
0
0 0 0 0
sdb 0 0 0
0 0 0 0 0
0
0 0 0 0
sdb 0 0 0
0 0 0 0 0
0
0 0 0 0
sdb 0 0 0
0 0 0 0 0
0
0 0 0 0
Ouch!
--procfilt Process Filters
c
- substring
of the command being executed as explicitly read
from /proc/pid/stat. Note that this can actually
be a perl expression, so if you
want a command that ends in a particular string
all you need to is append a to the end of the string.
Otherwise it would match any
commands con-
taining that string.
C - any command that starts with the specified
string
f - full path of the command, including
arguments, as read from /proc/pid/cmdline. Like
the c modifier this too can be a perl expression.
p - pid
P - parent pid
u - any process ownerd by this user’s UID or in
the range specifide by uxxx-yyy
U - any process owned by this username
--top 类似以linux下面的top工具那样实时显示.
如:
collectl -sCj --top
--iosize :显示平均的I/O大小(多了Size字段)
显示时间戳:
-oT 显示时间
-oD 显示日期和时间
-oDm 显示日期时间和毫秒
-i 指定监控时间间隔(以秒为单位)
[root@twexdb1 qzhijun]# collectl -sm -i 2
waiting for 2 second sample...
#
#Free Buff Cach Inac Slab Map
120M 276M 5G 5G 224M
1G
120M 276M 5G 5G 224M
1G
120M 276M 5G 5G 224M
1G
120M 276M 5G 5G 224M
1G
121M 276M 5G 5G 224M
1G
121M 276M 5G 5G 223M
1G
例:
以1/4秒采集系统数据并保存到日志文件中:
collectl -i.25 -oDm --iosize > testPerf.log
该程序还支持发送数据到远程主机,请参看man说明: man collectl
[root@twexdb1 qzhijun]# collectl --help
This is a subset of the most common switches and even the
descriptions are
abbreviated. To see all type 'collectl -x', to
get started just type 'collectl'
usage: collectl [switches]
-c, --count count collect this number of samples and
exit
-f, --filename file
name of
directory/file to write to
-i, --interval int
collection interval in seconds
[default=1]
-o, --options options misc
formatting options, --showoptions for all
d|D - include date in output
T - include time in
output
z - turn off compression of
plot files
-p, --playback file
playback
results from 'file' (be sure to quote
if wild
carded) or the shell might mess it up
-P, --plot generate output in 'plot' format
-s, --subsys subsys specify one or more subsystems [default=cdn]
--verbose
display output in verbose format
(automatically
selected
when brief doesn't make sense)
Various types of help
-h, --help print this text
-v, --version print
version
-V, --showdefs print
operational defaults
-x, --helpextend extended help, more details
descriptions too
-X, --helpall shows all
help concatenated together
--showoptions show all
the options
--showsubsys show all the subsystems
--showsubopts show all
subsystem specific options
--showtopopts show --top
options
--showheader show file header that 'would be'
generated
--showcolheaders show column headers that
'would be' generated
--showslabaliases for SLUB allocator, show non-root
aliases
--showrootslabs same as --showslabaliases but
use 'root' names