sync is a standard system call in the Unix operating system, which commits all data from the kernel filesystem buffers to non-volatile storage, i.e., data which has been scheduled for writing via low-level I/O system calls. Higher-level I/O layers such as stdio may maintain separate buffers of their own.
As a function in C, the sync() call is typically declared as void sync(void) in <unistd.h>. The system call is also available via a command line utility also called sync, and similarly named functions in other languages such as Perl and Node.js (in the fs module).
The related system call fsync() commits just the buffered data relating to a specified file descriptor.[1] fdatasync() is also available to write out just the changes made to the data in the file, and not necessarily the file's related metadata.[2]
Some Unix systems run a kind of flush or update daemon, which calls the sync function on a regular basis. On some systems, the cron daemon does this, and on Linux it was handled by the pdflush daemon which was replaced by a new implementation and finally removed from the Linux kernel in 2012.[3] Buffers are also flushed when filesystems are unmounted or remounted read-only,[4] for example prior to system shutdown.
Some applications, such as LibreOffice, also call the sync function to save recovery information in an interval.
Database use
editIn order to provide proper durability, databases need to use some form of sync in order to make sure the information written has made it to non-volatile storage rather than just being stored in a memory-based write cache that would be lost if power failed. PostgreSQL for example may use a variety of different sync calls, including fsync() and fdatasync(),[5] in order for commits to be durable.[6] Unfortunately, for any single client writing a series of records, a rotating hard drive can only commit once per rotation, which makes for at best a few hundred such commits per second.[7] Turning off the fsync requirement can therefore greatly improve commit performance, but at the expense of potentially introducing database corruption after a crash.
Databases also employ transaction log files (typically much smaller than the main data files) that have information about recent changes, such that changes can be reliably redone in case of crash; then the main data files can be synced less often.
Error reporting and checking
editTo avoid any data loss return values of fsync() should be checked because when performing I/O operations that are buffered by the library or the kernel, errors may not be reported at the time of using the write() system call or the fflush() call, since the data may not be written to non-volatile storage but only be written to the memory page cache. Errors from writes are instead often reported during system calls to fsync(), msync() or close().[8] Prior to 2018, Linux's fsync() behavior under certain circumstances failed to report error status,[9][10] change behavior was proposed on 23 April 2018.[11]
Performance controversies
editHardware-level cache semantics
editHard disks may default to using their own volatile write cache to buffer writes, which greatly improves performance while introducing a potential for lost writes if they "lie" about when the data has finished writing.[12] Tools such as hdparm -F will instruct the HDD controller to flush the on-drive write cache buffer. The performance impact of turning caching off is so large that even the normally conservative FreeBSD community rejected disabling write caching by default in FreeBSD 4.3.[13]
In SCSI and in SATA with Native Command Queuing (but not in plain ATA, even with TCQ) the host can specify whether it wants to be notified of completion when the data hits the disk's platters or when it hits the disk's buffer (on-board cache). Assuming a correct hardware implementation, this feature allows the disk's on-board cache to be used while guaranteeing correct semantics for system calls like fsync.[14] This hardware feature is called Force Unit Access (FUA) and it allows consistency with less overhead than flushing the entire cache as done for ATA (or SATA non-NCQ) disks.[15] Although Linux enabled NCQ around 2007, it did not enable SATA/NCQ FUA until 2012, citing lack of support in the early drives.[16][17]
Use in applications
editFirefox 3.0, released in 2008, introduced fsync system calls that were found to degrade its performance; the call was introduced in order to guarantee the integrity of the embedded SQLite database.[18]
More specifically, the user interface thread calls SQLite for every new page visited and the library calls fsync. On ext3's data=ordered mode, which is a fairly common setup of the time, calling fsync means that the entire filesytem's write cache is flushed, which can take a long time if the cache contains a lot of other pending operations (e.g. in the middle of copying a large file).[19]
This event inspired a lot of finger-pointing against the use of fsync, claiming performance problems, unnecessary HDD spinups, and/or needing only atomicity and not durability.[20] Linux Foundation chief technical officer Theodore Ts'o provided an analysis of the problem: He responds that there is no need to "fear fsync" when used properly, and emphasizes that fsync is the only POSIX way to request that a data is written to non-volatile storage. He retorts to the three criticisms of fsync stating that (1) the performance impact should be minimal in a typical use case (downloading a file while browsing the web), that fsync does not create extra I/O (it only waits for it), and that Firefox was wrong to make so much disk writes when browsing a webpage anyways; (2) that spinup could be avoided by a tweak of the existing laptop_mode; and (3) that "atomicity not durability" is a bad security comprise compared to spawning of an I/O thread to run fsync.[20]
Stewart Smith happened to have made a presentation on the use, misuse, and underuse of fsync in applications just the year before (2007). Although his talk was mostly about practices detrimental to data integrity (and ways to take care of both speed and safety),[21] the "eatmydata" tool for disabling all sync-type calls for a program ended up seeing wider distribution.[22] It is mainly used in "throwaway" tasks where data loss is acceptable,[23] for example in a temporary environment to speed up package installs.[24]
See also
editReferences
edit- ^ fsync specification
- ^ fdatasync specification
- ^ "R.I.P. Pdflush [LWN.net]".
- ^ "mount - Does umount calls sync to complete any pending writes". Unix & Linux Stack Exchange. Retrieved 2021-05-02.
- ^ Vondra, Tomas (2 February 2019). "PostgreSQL vs. fsync". Osuosl Org. Archived from the original (mp4) on 10 February 2019. Retrieved 10 February 2019.
- ^ PostgreSQL Reliability and the Write-Ahead Log
- ^ Tuning PostgreSQL WAL Synchronization Archived 2009-11-25 at the Wayback Machine
- ^ "Ensuring data reaches disk [LWN.net]".
- ^ "PostgreSQL's fsync() surprise [LWN.net]".
- ^ "Improved block-layer error handling [LWN.net]".
- ^ "Always report a writeback error once - Patchwork". Archived from the original on 2018-05-04. Retrieved 2018-05-03.
- ^ Write-Cache Enabled?
- ^ FreeBSD Handbook — Tuning Disks
- ^ Marshall Kirk McKusick. "Disks from the Perspective of a File System - ACM Queue". Queue.acm.org. Retrieved 2014-01-11.
- ^ Gregory Smith (2010). PostgreSQL 9.0: High Performance. Packt Publishing Ltd. p. 78. ISBN 978-1-84951-031-8.
- ^ "Enabling FUA for SATA drives (Was Re: [RFC][PATCH] libata: Enable SATA disk fua detection on default) (Linux SCSI)".
- ^ "Linux-Kernel Archive: [PATCH RFC] libata: FUA updates".
- ^ "Shaver » fsyncers and curveballs". Archived from the original on 2012-12-09. Retrieved 2009-10-15.
- ^ Mike Shaver:
On some rather common Linux configurations, especially using the ext3 filesystem in the "data=ordered" mode, calling fsync doesn't just flush out the data for the file it's called on, but rather on all the buffered data for that filesystem.
in "Delayed allocation and the zero-length file problem". - ^ a b "Don't fear the fsync!".
- ^ Smith, Stewart (2007). eat my data: how everybody gets file IO wrong (PDF). linux.conf.au.
- ^ "eatmydata (1) transparently disable fsync() and other data-to-disk synchronization calls".
- ^ "libeatmydata - disable fsync and SAVE!". www.flamingspork.com.
- ^ "Package: apt-eatmydata (1) - Disable fsync and friends for APT's dpkg calls". packages.debian.org.