700字范文,内容丰富有趣,生活中的好帮手!
700字范文 > 详解linux io flush

详解linux io flush

时间:2021-08-11 23:29:02

相关推荐

详解linux io flush

女主宣言

今天小编为大家分享linux io flush,通过本文你会清楚知道fsync()、fdatasync()、sync()、O_DIRECT、O_SYNC、REQ_PREFLUSH、REQ_FUA的区别和作用,希望能对大家有所帮助。

PS:丰富的一线技术、多元化的表现形式,尽在“360云计算”,点关注哦!

1

fsync() fdatasync() sync()是什么?

首先它们是系统调用。

1.1 fsync

fsync(int fd) 系统调用把打开的文件描述符fd相关的所有缓冲元数据和数据都刷新到磁盘上(non-volatile storage)。

fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) so that all changed information can be retrieved even after the system crashed or was rebooted. This includes writing

throughorflushingadiskcacheifpresent.Thecallblocksuntilthedevicereportsthatthetransferhascompleted.Italsoflushesmetadatainformationassociatedwiththefile(seestat(2)).

1.2fdatasync

fdatasync(int fd) 类似fsync,但不flush元数据,除非元数据影响后面读数据。比如文件修改时间元数据变了就不会刷,而文件大小变了影响了后面对该文件的读取,这个会一同刷下去。所以fdatasync的性能要比fsync好。

fdatasync() is similar to fsync(), but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. For example, changes to st_atime or st_mtime (respectively, time of last access and time of last modification; see stat(2)) do not require

flushingbecausetheyarenotnecessaryforasubsequentdatareadtobehandledcorrectly.Ontheotherhand,achangetothefilesize(st_size,asmadebysayftruncate(2)),wouldrequire a metadata flush.The aim of fdatasync() is to reduce disk activity for applications that do not require all metadata to be

synchronizedwiththedisk.

1.3sync

sync(void) 系统调用会使包含更新文件的所有内核缓冲区(包含数据块、指针块、元数据等)都flush到磁盘上。

Flushfilesystembuffers,forcechangedblockstodisk,updatethesuperblock

2

O_DIRECT O_SYNC REQ_PREFLUSH REQ_FUA是什么?

它们都是flag,可能最终的效果相同,但它们在不同的层面上。O_DIRECT O_SYNC是系统调用open的flag参数,REQ_PREFLUSH REQ_FUA 是kernel bio的flag参数。要理解这几个参数要需要知道两个页缓存:

一个是你的内存,free -h可以看到的buff/cache;

另外一个是硬盘自带的page cache。

一个io写盘的简单流程如下:

可以对比linux storage stack diagram:

2.1O_DIRECT

O_DIRECT 表示io不经过系统缓存,这可能会降低你的io性能。它同步传输数据,但不保证数据安全。

备注:后面说的数据安全皆表示数据被写到磁盘的non-volatile storage

Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user- space buffers. The O_DIRECT flag on its own makes an effort to transfer data

synchronously,butdoesnotgivetheguaranteesoftheO_SYNCflagthatdataandnecessarymetadataaretransferred.ToguaranteesynchronousI/O,O_SYNCmustbeusedinadditiontoO_DIRECT.

通过dd命令可以清楚看到 O_DIRECT和非O_DIRECT区别,注意buff/cache的变化:

#清理缓存:#echo 3 > /proc/sys/vm/drop_caches#free -htotal used freeshared buff/cache availableMem: 62G 1.1G 61G 9.2M 440M 60GSwap: 31G0B 31G#dd without direct#dd if=/dev/zero of=/dev/bcache0 bs=1M count=1024 1024+0 records in1024+0 records out1073741824 bytes (1.1 GB) copied, 27.8166 s, 38.6 MB/s#free -htotal used freeshared buff/cache availableMem: 62G 1.0G 60G 105M 1.5G 60GSwap: 31G0B 31G#echo 3 > /proc/sys/vm/drop_caches#free -htotal used freeshared buff/cache availableMem: 62G 626M 61G 137M 337M 61GSwap: 31G0B 31G#dd with direct#dd if=/dev/zero of=/dev/bcache0 bs=1M count=1024 oflag=direct1024+0 records in1024+0 records out1073741824 bytes (1.1 GB) copied, 2.72088 s, 395 MB/s#free -htotal used freeshared buff/cache availableMem: 62G 628M 61G 137M 341M 61GSwap: 31G0B 31G

2.2 O_SYNC

O_SYNC 同步io标记,保证数据安全写到non-volatile storage

WriteoperationsonthefilewillcompleteaccordingtotherequirementsofsynchronizedI/Ofileintegritycompletion

2.3REQ_PREFLUSH

REQ_PREFLUSH 是bio的request flag,表示在本次io开始时先确保在它之前完成的io都已经写到非易失性存储里。我理解REQ_PREFLUSH之确保在它之前完成的io都写到非易失物理设备,但它自己可能是只写到了disk page cache里,并不确保安全。

可以在一个空的bio里设置REQ_PREFLUSH,表示回刷disk page cache里数据。

Explicit cache flushesThe REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted from the filesystem and will make sure the volatile cache of the storage device has been flushed before the actual I/O operation is started. This explicitly guarantees that previously completed write requests are

onnon-volatilestoragebeforetheflaggedbiostarts.InadditiontheREQ_PREFLUSHflagcanbesetonanotherwiseemptybiostructure,whichcausesonlyanexplicitcacheflushwithoutanydependentI/O.Itisrecommendtousetheblkdev_issue_flush()helperforapurecacheflush.

2.4REQ_FUA

REQ_FUA 是bio的request flag,表示数据安全写到非易失性存储再返回。

ForcedUnitAccessTheREQ_FUAflagcanbeORedintother/wflagsofabiosubmittedfromthefilesystemandwillmakesurethatI/Ocompletionforthisrequestisonlysignaledafterthedatahasbeencommittedtonon-volatilestorage.

3

实验验证

重新编译了bcache内核模块,打印出bio->bi_opf,opf是bio的 operation flag,它有req flag or组成,决定io的行为。

3.1request flag

#from linux source code include/linux/blk_types.henum req_opf {/* read sectors from the device */REQ_OP_READ = 0,/* write sectors to the device */REQ_OP_WRITE = 1,/* flush the volatile write cache */REQ_OP_FLUSH = 2,/* discard sectors */REQ_OP_DISCARD= 3,/* securely erase sectors */REQ_OP_SECURE_ERASE= 5,/* reset a zone write pointer */REQ_OP_ZONE_RESET = 6,/* write the same sector many times */REQ_OP_WRITE_SAME = 7,/* reset all the zone present on the device */REQ_OP_ZONE_RESET_ALL = 8,/* write the zero filled sector many times */REQ_OP_WRITE_ZEROES= 9,/* SCSI passthrough using struct scsi_request */REQ_OP_SCSI_IN= 32,REQ_OP_SCSI_OUT = 33,/* Driver private requests */REQ_OP_DRV_IN = 34,REQ_OP_DRV_OUT= 35,REQ_OP_LAST,};enum req_flag_bits {__REQ_FAILFAST_DEV = /* 8 no driver retries of device errors */REQ_OP_BITS,__REQ_FAILFAST_TRANSPORT, /* 9 no driver retries of transport errors */__REQ_FAILFAST_DRIVER, /* 10 no driver retries of driver errors */__REQ_SYNC, /* 11 request is sync (sync write or read) */__REQ_META, /* 12 metadata io request */__REQ_PRIO, /* 13 boost priority in cfq */__REQ_NOMERGE,/* 14 don't touch this for merging */__REQ_IDLE, /* 15 anticipate more IO after this one */__REQ_INTEGRITY, /* 16 I/O includes block integrity payload */__REQ_FUA, /* 17 forced unit access */__REQ_PREFLUSH, /* 18 request for cache flush */__REQ_RAHEAD, /* 19 read ahead, can fail anytime */__REQ_BACKGROUND, /* 20 background IO */__REQ_NOWAIT, /* 21 Don't wait if request will block */__REQ_NOWAIT_INLINE, /* 22 Return would-block error inline */.....}

3.2bio->bio_opf对照表

下面测试时候需要用到,可以直接跳过,有疑惑的时候回来查看

3.3测试

用测试工具dd对块设备/dev/bcache0直接测试。笔者通过dd源码已确认:oflag=direct表示文件已O_DIRECT打开,oflag=sync 表示已O_SYNC打开,conv=fdatasync表示dd结束后会发送一个fdatasync(fd), conv=fsync表示dd结束后会发送一个fsync(fd)。

direct

#dd if=/dev/zero of=/dev/bcache0 oflag=direct bs=8k count=1//messageskernel: bcache: cached_dev_make_request() bi_opf 34817, size 8192

bi_opf 34817,size 8192:bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE,是一个同步写,可以看到不保证数据安全。

direct & sync

#dd if=/dev/zero of=/dev/bcache0 oflag=direct,sync bs=8k count=1kernel: bcache: cached_dev_make_request() bi_opf 165889, size 8192kernel: bcache: cached_dev_make_request() bi_opf 264193, size 0

bi_opf 165889, size 8192:bi_opf=REQ_OP_WRITE | REQ_SYNC | REQ_IDLE | REQ_FUA ,是一个同步写请求,并且该io直接写到硬盘的non-volatile storage ;

bi_opf 264193, size 0:bi_opf= REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH ,size = 0,表示回刷disk的page cache保证以前写入的io都刷到non-volatile storage 通过这个可以理解O_SYNC为什么可以保证数据安全。

without direct

#dd if=/dev/zero of=/dev/bcache0 bs=8k count=1kernel: bcache: cached_dev_make_request() bi_opf 2049, size 4096kernel: bcache: cached_dev_make_request() bi_opf 2049, size 4096

bi_opf 2049, size 4096:bi_opf = REQ_OP_WRITE | REQ_SYNC,同步写请求,可以看到原先8k的io在page cache里被拆成了2个4k的io写了下来,不保证数据安全;

direct & fdatasync

#dd if=/dev/zero of=/dev/bcache0 oflag=direct conv=fdatasync bs=8k count=1kernel: bcache: cached_dev_make_request() bi_opf 34817, size 8192kernel: bcache: cached_dev_make_request() bi_opf 264193, size 0

bi_opf 34817, size 8192:bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE, 同步写请求,这个不保证数据安全;

bi_opf 264193, size 0:bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH,一个disk page cache 回刷请求,这个就是fdatasync()下发的。

direct & sync & fdatasync

#dd if=/dev/zero of=/dev/bcache0 oflag=direct,sync conv=fdatasync bs=8k count=1kernel: bcache: cached_dev_make_request() bi_opf 165889, size 8192kernel: bcache: cached_dev_make_request() bi_opf 264193, size 0kernel: bcache: cached_dev_make_request() bi_opf 264193, size 0

bi_opf 165889, size 8192:bi_opf=REQ_OP_WRITE | REQ_SYNC | REQ_IDLE | REQ_FUA ,是一个同步写请求,并且该io直接写到硬盘的non-volatile storage;

bi_opf 264193, size 0:bi_opf= REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH ,size = 0,表示回刷disk的page cache保证以前写入的io都刷到non-volatile storage 结合上面的分析,这三个bio其实就是一个写io,两个flush io,分别由O_SYNC和fdatasync触发。

direct & fsync

#dd if=/dev/zero of=/dev/bcache0 oflag=direct conv=fsync bs=8k count=1kernel: bcache: cached_dev_make_request() bi_opf 34817, size 8192kernel: bcache: cached_dev_make_request() bi_opf 264193, size 0

同direct + fdatasync,应该是写的是一个块设备,没有元数据或元数据没有变化,所以fdatasync和fsync收到的bio是一样的。

4

彩蛋

4.1 如何关闭硬盘缓存

查看当前硬盘写Cache状态:

#hdparm -W /dev/sda/dev/sda:write-caching = 1 (on)

关闭硬盘的写Cache:

#hdparm -W 0 /dev/sda

打开硬盘的写Cache:

#hdparm -W 1 /dev/sda

如果大家有什么建议或疑问,可以在下方留言交流。

360云计算

由360云平台团队打造的技术分享公众号,内容涉及数据库、大数据、微服务、容器、AIOps、IoT等众多技术领域,通过夯实的技术积累和丰富的一线实战经验,为你带来最有料的技术分享

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。