fflush() works on FILE*, it just flushes the internal buffers in the FILE* of your application out to the OS.
fsync works on a lower level, it tells the OS to flush its buffers to the physical media.
OSs heavily cache data you write to a file. If the OS enforced every write to hit the drive, things would be very slow. fsync (among other things) allows you to control when the data should hit the drive.
Furthermore, fsync/commit works on a file descriptor. It has no knowledge of a FILE* and can't flush its buffers. FILE* lives in your application, file descriptors live in the OS kernel, typically.
fflush() works on FILE*, it just flushes the internal buffers in the FILE* of your application out to the OS.
fsync works on a lower level, it tells the OS to flush its buffers to the physical media.
OSs heavily cache data you write to a file. If the OS enforced every write to hit the drive, things would be very slow. fsync (among other things) allows you to control when the data should hit the drive.
Furthermore, fsync/commit works on a file descriptor. It has no knowledge of a FILE* and can't flush its buffers. FILE* lives in your application, file descriptors live in the OS kernel, typically.
The standard C function fflush() and the POSIX system call fsync() are conceptually somewhat similar. fflush() operates on C file streams (FILE objects), and is therefore portable.
fsync() operate on POSIX file descriptors.
Both cause buffered data to be sent to a destination.
On a POSIX system, each C file stream has an associated file descriptor, and all the operations on a C file stream will be implemented by delegating, when necessary, to POSIX system calls that operate on the file descriptor.
One might think that a call to fflush on a POSIX system would cause a write of any data in the buffer of the file stream, followed by a call of fsync() for the file descriptor of that file stream. So on a POSIX system there would be no need to follow a call to fflush with a call to fsync(fileno(fp)). But is that the case: is there a call to fsync from fflush?
No, calling fflush on a POSIX system does not imply that fsync will be called.
The C standard for fflush says (emphasis added) it
causes any unwritten data for [the] stream to be delivered to the host environment to be written to the file
Saying that the data is to be written, rather than that is is written implies that further buffering by the host environment is permitted. That buffering by the "host environment" could include, for a POSIX environment, the internal buffering that fsync flushes. So a close reading of the C standard suggests that the standard does not require the POSIX implementation to call fsync.
The POSIX standard description of fflush does not declare, as an extension of the C semantics, that fsync is called.
1. As you correctly concluded from your research fflush synchronizes the user-space buffered data to kernel-level cache (since it's working with FILE objects that reside at user-level and are invisible to kernel), whereas fsync or sync (working directly with file descriptors) synchronize kernel cached data with device. However, the latter comes without a guarantee that the data has been actually written to the storage device — as these usually come with their own caches as well. I would expect the same holds for msync called with MS_SYNC flag as well.
Relatedly, I find the distinction between synchronized and synchronous operations very useful when talking about the topic. Here's how Robert Love puts it succinctly:
A synchronous write operation does not return until the written data is—at least—stored in the kernel’s buffer cache. [...] A synchronized operation is more restrictive and safer than a merely synchronous operation. A synchronized write operation flushes the data to disk, ensuring that the on-disk data is always synchronized vis-à-vis the corresponding kernel buffers.
With that in mind you can call open with O_SYNC flag (together with some other flag that opens the file with a write permission) to enforce synchronized write operations. Again, as you correctly assumed this will work only with WRITE THROUGH disk caching policy, which effectively amounts to disabling disk caching.
You can read this answer about how to disable disk caching on Linux. Be sure to also check this website which also covers SCSI-based in addition to ATA-based devices (to read about different types of disks see this page on Microsoft SQL Server 2005, last updated: Apr 19, 2018).
Speaking of which, it is very informative to read about how the issue is dealt with on Windows machines:
To open a file for unbuffered I/O, call the CreateFile function with the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH flags. This prevents the file contents from being cached and flushes the metadata to disk with each write. For more information, see CreateFile.
Apparently, this is how Microsoft SQL Server 2005 family ensures data integrity:
All versions of SQL Server open the log and data files using the Win32 CreateFile function. The dwFlagsAndAttributes member includes the FILE_FLAG_WRITE_THROUGH option when opened by SQL Server. [...] This option instructs the system to write through any intermediate cache and go directly to disk. The system can still cache write operations, but cannot lazily flush them.
I'm saying this is informative in particular because of this blog post from 2012 showing that some SATA disks ignore the FILE_FLAG_WRITE_THROUGH! I don't know what the current state of affairs is, but it seems that in order to ensure that writing to a disk is truly synchronized, you need to:
- Disable disk caching using your device drivers.
- Make sure that the specific device you're using supports write-through/no-caching policy.
However, if you're looking for a guarantee of data integrity you could just buy a disk with its own battery-based power supply that goes beyond capacitors (which is usually only enough for completing the on-going write processes). As put in the conclusion in the blog article mentioned above:
Bottom-line, use Enterprise-Class disks for your data and transaction log files. [...] Actually, the situation is not as dramatic as it seems. Many RAID controllers have battery-backed cache and do not need to honor the write-through requirement.
2. To (partially) answer the second question, this is from the man pages SYNC(2):
According to the standard specification (e.g., POSIX.1-2001), sync() schedules the writes, but may return before the actual writing is done. However, since version 1.3.20 Linux does actually wait. (This still does not guarantee data integrity: modern disks have large caches.)
This would imply that fsync and sync work differently, however, note they're both implemented in unistd.h which suggests some consistency between them. However, I would follow Robert Love who does not recommend using sync syscall when writing your own code.
The only real use for sync() is in the implementation of the sync utility. Applications should use fsync() and fdatasync() to commit to disk the data of only the requisite file descriptors. Note that sync() may take several minutes or longer to complete on a busy system.
"I don't have any solution, but certainly admire the problem."
From all I read from your good references, is that there is no standard. The standard ends somewhere in the kernel. The kernel controls the device driver and the device driver (possibly supplied by the disk manufacturer) controls the disk through an API (device has small computer on board). The manufacturer may have added capacitors/battery with just enough power to flush its device buffers in case of power failure, or he may have not. The device may provide a sync function but whether this truely syncs (flushes) the device buffers is not known (device dependent). So unless you select and install a device according to your specifications (and verify those specs), you are never sure.
fwrite is writing to an internal buffer first, then sometimes (at fflush or fclose or when the buffer is full) calling the OS function write.
The OS is also doing some buffering and writes to the device might get delayed.
fsync is assuring that the OS is writing its buffers to the device.
In your case where you open-write-close you don't need to fsync. The OS knows which parts of the file are not yet written to the device. So if a second process wants to read the file the OS knows that it has the file content in memory and will not read the file content from the device.
Of course when thinking about power outage it might (depending on the circumstances) be a good idea to fsync to be sure that the file content is written to the device (which as Andrew points out, does not necessarily mean that the content is written to disc, because the device itself might do buffering).
Up to now I didn't notice, that fflush (which is called upon fclose) doesn't write to the file system, but only in an intermediate buffer. Could it be, that the time between 3) and 4) is too short and the change from 2) is not yet written to disc, so when I reopen with 4) I get a truncated file which, when it is closed again leads to permanent loss of those data?
No. A system that behaved that way would be unusable.
Should I use fsync() in that case after each file write?
No, that will just slow things down.
What do I have to consider for power outtages? It is not unlikeley, that the data corruption is related to power down.
Use a filesystem that's resistant to such corruption. Possibly even consider using a safer modification algorithm such as writing out a new version of the file with a different name, syncing, and then renaming it on top of the existing file.