The meaning of atomic in write()

Verdi March

28 Mar 2006 28 Mar '06

05:39

Hi, The open group specification says that write() is atomic as long as the number of bytes is not larger than PIPE_BUF. In the following program, sometimes only one process successfully writes to the file. I thought that fprintf also uses write() underneath, so the file should contains strings from both processes. Any insight? #include #include #include int main() { int i; FILE *f; printf("bufsiz=%d pipe_buf=%d\n", BUFSIZ, PIPE_BUF); f = fopen( "test.txt" , "w" ); /* f = stdout; */ fprintf( f, "Parent is process %d\n", getpid() ); assert(fork() >= 0); for (i=1; i<4; i++) fprintf( f, " %8d: %d\n", getpid(), i ); fclose( f ); return 0; } Here are some results, the first correctly contains strings from both processes, but the second only contains strings from one process: cincai@verdimar:/tmp> ./a.out; cat haha.txt bufsiz=8192 pipe_buf=4096 Parent is process 25831 25832: 1 25832: 2 25832: 3 bufsiz=8192 pipe_buf=4096 Parent is process 25831 25831: 1 25831: 2 25831: 3 cincai@verdimar:/tmp> ./a.out ;cat haha.txt bufsiz=8192 pipe_buf=4096 Parent is process 25836 25837: 1 25837: 2 25837: 3 -- Bis zu 70% Ihrer Onlinekosten sparen: GMX SmartSurfer! Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer

Show replies by date

Steve Graegert

28 Mar 28 Mar

06:58

New subject: [suse-programming-e] The meaning of atomic in write()

On 3/28/06, Verdi March wrote:

...

Hi,

The open group specification says that write() is atomic as long as the number of bytes is not larger than PIPE_BUF.

Yes, it does, but it is only specified for writes to pipes or FIFOs. In case of files, you'll need to use some kind of file/record locking. \Steve

Anders Johansson

18:04

New subject: [suse-programming-e] The meaning of atomic in write()

On Tue, 2006-03-28 at 07:39 +0200, Verdi March wrote:

...

Hi,

The open group specification says that write() is atomic as long as the number of bytes is not larger than PIPE_BUF. In the following program, sometimes only one process successfully writes to the file. I thought that fprintf also uses write() underneath, so the file should contains strings from both processes. Any insight?

Sorry for sending the reply off-list. I'll repeat, for the benefit of the group: The meaning of "atomic" is that you won't get a task switch in the middle of the call, it is guaranteed to complete once it's started, before another process gets to call it. The reason you only get output from one process at times is that one of the processes closes the file before the other one gets to write to it. You need to have a procedure in place for making sure that all files have finished writing before you call fclose If you had checked the return value of fprintf, you would have seen errors because the process is trying to write to a file that's closed. Always check return values. It's just good programming

Verdi March

29 Mar 29 Mar

07:08

New subject: [suse-programming-e] The meaning of atomic in write()

Hi,

...

--- Ursprüngliche Nachricht --- Von: Anders Johansson An: suse-programming-e@suse.com Betreff: Re: [suse-programming-e] The meaning of atomic in write() Datum: Tue, 28 Mar 2006 20:04:06 +0200

The reason you only get output from one process at times is that one of the processes closes the file before the other one gets to write to it. You need to have a procedure in place for making sure that all files have finished writing before you call fclose

...

From my understanding, the second (child) process inherits its parent's file descriptor f. Therefore, both f in parent and child point to the same open-file structure (according to UNIX book). I thought that kernel reclaims an open-file structure only when it's not pointed anymore by descriptors?

...

If you had checked the return value of fprintf, you would have seen errors because the process is trying to write to a file that's closed. Always check return values. It's just good programming

Each call to fprintf will always success because it sends the string to a buffer allocated by C library (which is default to 8KB on my system). Because the string is less than 8KB, fprintf() does not causes in write(). The write() is issued only during fclose(), and I've confirmed this with strace. I've modified the program to assert(fclose(f) == 0) (and assert(fprintf >= 0) just to make sure). But still, I get the same result: even if only one process succesfully writes, no assertion is violated. In the beginning I thought that this would not happen due to write() atomicity, just as Anders said. This means, the second write() continues writing from the offset after the first write(), resulting in test.txt to contains strings from both processes. If test.txt contains only strings of one process, this implies that write() is not atomic because both they are "overlap" (somehow to start writing from beginning of file). Earlier, Steve mentioned that the atomicity is valid only for FIFO or pipe. But I remembered vaguely from a newsgroup thread that the atomicity also holds for file? I realizes that the "text-book" solution is to lock the file before writing. Initially this program is just to show the effect of user-level I/O buffering by C library. But now I become more curious on this concurrent file access. -- Regards, Verdi -- Bis zu 70% Ihrer Onlinekosten sparen: GMX SmartSurfer! Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer

Anders Johansson

16:28

New subject: [suse-programming-e] The meaning of atomic in write()

On Wed, 2006-03-29 at 09:08 +0200, Verdi March wrote:

...

From my understanding, the second (child) process inherits its parent's file descriptor f. Therefore, both f in parent and child point to the same open-file structure (according to UNIX book). I thought that kernel reclaims an open-file structure only when it's not pointed anymore by descriptors?

Well, you are closing it explicitly, but you are right, I am an idiot. The child gets a new copy of the fd which is almost independent of the original, so it stays open. This is also why you get the effect you see, because both processes will have a local file location pointer, so both will start at position 0, the second one overwriting the first.

...

In the beginning I thought that this would not happen due to write() atomicity, just as Anders said. This means, the second write() continues writing from the offset after the first write(), resulting in test.txt to contains strings from both processes.

No, that's not what I said. Atomic means "uninterrupted". It means each call to write() will be allowed to complete before another call gets access to it. However, what I missed in my hazy thinking was that the file descriptor is copied, not shared, so each process has its own idea of where to write in the file. file locking or record locking won't help in this case, you need some other way of letting the processes cooperate, so they know where in the file to write

Verdi March

30 Mar 30 Mar

06:41

New subject: [suse-programming-e] The meaning of atomic in write()

Hi,

...

--- Ursprüngliche Nachricht --- Von: Anders Johansson An: suse-programming-e@suse.com Betreff: Re: [suse-programming-e] The meaning of atomic in write() Datum: Wed, 29 Mar 2006 18:28:23 +0200

No, that's not what I said. Atomic means "uninterrupted". It means each call to write() will be allowed to complete before another call gets access to it.

However, what I missed in my hazy thinking was that the file descriptor is copied, not shared, so each process has its own idea of where to write in the file. file locking or record locking won't help in this case, you need some other way of letting the processes cooperate, so they know where in the file to write

I think you're right, that "atomic" does not cover the location to start writing. If I changed my program a little bit: #include #include int main() { int i; FILE *f; f = fopen( "test.txt" , "w" ); if (fork() > 0) { fprintf(f, "bufsiz=%d pipe_buf=%d\n", BUFSIZ, PIPE_BUF); fprintf(f, "Parent is process %d\n", getpid()); } for (i=1; i<4; i++) fprintf( f, " %8d: %d\n", getpid(), i ); fclose(f); return 0; } so that the parent and child have different output length. One possible result is when child process (27747) is the last process to write(), but from the beginning of the file, overwriting the first 45 bytes: 27747: 1 27747: 2 27747: 3 7746 27746: 1 27746: 2 27746: 3 -- Regards, Verdi -- "Feel free" mit GMX FreeMail! Monat für Monat 10 FreeSMS inklusive! http://www.gmx.net

Manfred Hollstein

31 Mar 31 Mar

12:37

New subject: [suse-programming-e] The meaning of atomic in write()

Hi there, On Thu, 30 Mar 2006, 08:41:52 +0200, Verdi March wrote:

...

Hi,

...
--- Ursprüngliche Nachricht --- Von: Anders Johansson An: suse-programming-e@suse.com Betreff: Re: [suse-programming-e] The meaning of atomic in write() Datum: Wed, 29 Mar 2006 18:28:23 +0200

No, that's not what I said. Atomic means "uninterrupted". It means each call to write() will be allowed to complete before another call gets access to it.

However, what I missed in my hazy thinking was that the file descriptor is copied, not shared, so each process has its own idea of where to write in the file. file locking or record locking won't help in this case, you need some other way of letting the processes cooperate, so they know where in the file to write

I think you're right, that "atomic" does not cover the location to start writing. If I changed my program a little bit: [...]

I'm afraid everyone in this thread expects something that *buffered I/O* cannot provide. Opening a file using "fopen" creates a FILE structure with some buffer attached to it - unless the file to be opened appears to be an interactive device. If you really want to use the high-level (3) routines (fopen, fprintf, fread, fwrite, ...) _and_ want to ensure that no buffering inside a library is getting in your ways, you need to, at least, call "setbuf (fp, NULL)" to ensure *no* buffering will happen. BUT, it will not change anything wrt/ HOW the underlying kernel will deal with (non synchronous) write's going to the file - that's what you've seen. IF you want to change this, you need to take care there's a proper protocol for dealing with synchronous I/O used by your application; did you try something like fd = open ("path", O_CREAT | O_TRUNC | O_DIRECT, 0644); fp = fdopen (fd, "rw"); (proper checking of return codes assumed!)? Still, according to the "open(3)" manual page: O_DIRECT Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user space buffers. The I/O is synchronous, i.e., at the completion of a read(2) or write(2), data is guaranteed to have been transferred. Under Linux 2.4 transfer sizes, and the alignment of user buffer and file offset must all be multiples of the logical block size of the file system. Under Linux 2.6 alignment to 512-byte bound‐ aries suffices. A semantically similar interface for block devices is described in raw(8). there is absolutely NO guarantee for any data smaller than 512 bytes to appear in file the same order you might have intended. If you really need this, there is no other way than to "do their own caching". Getting back to the original poster's question, "fprintf" is using "write" at _some_ point in time, exactly when it'll do depends on something like the associated buffer's size... HTH, cheers. l8er manfred

Jerry Feldman

13:00

New subject: [suse-programming-e] The meaning of atomic in write()

...

I'm afraid everyone in this thread expects something that *buffered I/O* cannot provide. Note everyone :-) Opening a file using "fopen" creates a FILE structure with some buffer attached to it - unless the file to be opened appears to be an interactive device. If you really want to use the high-level (3) routines (fopen, fprintf, fread, fwrite, ...) _and_ want to ensure that no buffering inside a library is getting in your ways, you need to, at least, call "setbuf (fp, NULL)" to ensure *no* buffering will happen. BUT, it will not change anything wrt/ HOW the underlying kernel will deal with (non synchronous) write's going to the file - that's what you've seen. IF you want to change this, you need to take care there's a proper protocol for dealing with synchronous I/O used by your application; did you try something like

fd = open ("path", O_CREAT | O_TRUNC | O_DIRECT, 0644); fp = fdopen (fd, "rw");

(proper checking of return codes assumed!)?

Still, according to the "open(3)" manual page:

O_DIRECT Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user space buffers. The I/O is synchronous, i.e., at the completion of a read(2) or write(2), data is guaranteed to have been transferred. Under Linux 2.4 transfer sizes, and the alignment of user buffer and file offset must all be multiples of the logical block size of the file system. Under Linux 2.6 alignment to 512-byte bound‐ aries suffices. A semantically similar interface for block devices is described in raw(8).

there is absolutely NO guarantee for any data smaller than 512 bytes to appear in file the same order you might have intended. If you really need this, there is no other way than to "do their own caching".

Getting back to the original poster's question, "fprintf" is using "write" at _some_ point in time, exactly when it'll do depends on something like the associated buffer's size... I agree pretty much with your statement. The OP was opening a file for writing, then performing a fork where both the

On Friday 31 March 2006 7:37 am, Manfred Hollstein wrote: parent and child were writing to the file simultaneously. I took the program that the OP submitted and ran it on SuSE 10 (32-bit), and ran it on RHEL 4 (64-bit IA64). Both had the same successful result without the overwriting that the OP experienced. (In both cases, 2.6 kernel). The issues that come into play is that the buffers for streamIO (eg. FILE) are user space buffers that are flushed based on a number of rules. The flush might occur in the fprintf, or when the user calls fflush(3), or when the user closes the file. But, there are also kernel buffers, and some data can remain in kernel buffers for a short time before those are physically written. This is a function of the driver and when the sync(2) system call commits the buffers. -- Jerry Feldman Boston Linux and Unix user group http://www.blu.org PGP key id:C5061EA9 PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9

Anders Johansson

18:46

New subject: [suse-programming-e] The meaning of atomic in write()

On Fri, 2006-03-31 at 08:00 -0500, Jerry Feldman wrote:

...

I agree pretty much with your statement. The OP was opening a file for writing, then performing a fork where both the parent and child were writing to the file simultaneously. I took the program that the OP submitted and ran it on SuSE 10 (32-bit), and ran it on RHEL 4 (64-bit IA64). Both had the same successful result without the overwriting that the OP experienced. (In both cases, 2.6 kernel).

Are you implying that the file location pointer is shared across a dup() call in these systems?

Jerry Feldman

19:23

New subject: [suse-programming-e] The meaning of atomic in write()

On Friday 31 March 2006 1:46 pm, Anders Johansson wrote:

...

On Fri, 2006-03-31 at 08:00 -0500, Jerry Feldman wrote:

...
I agree pretty much with your statement. The OP was opening a file for writing, then performing a fork where both the parent and child were writing to the file simultaneously. I took the program that the OP submitted and ran it on SuSE 10 (32-bit), and ran it on RHEL 4 (64-bit IA64). Both had the same successful result without the overwriting that the OP experienced. (In both cases, 2.6 kernel).

Are you implying that the file location pointer is shared across a dup() call in these systems? No. A fork(2) clones the parent's environment. Actually, they share the same memory, but the pages are marked copy-on-write which means that when there is a change, the page is copied before it is dirtied, so it is has a similar behavior if the copy operation was done.

What I am implying is that after the fork(2) occurs, and both processes are writing to the same file, data integrity is maintained. I need to qualify this a bit more in that the write(2) system call is atomic, but the two processes data may be interleaved depending on the amount of data. This is the result I get on both SuSE 10 (32-bit single processor) and a dual processor IA64 (HP Integrity 1620). gaf@cedar C]$ cat test.txt 28483: 1 28483: 2 28483: 3 bufsiz=8192 pipe_buf=4096 Parent is process 28482 28482: 1 28482: 2 28482: 3 In the case of dup(2), you are simply making a copy of a file descriptor. The file descriptors all point to the same open file description, so that: fd2 = dup(fd1); write(fd1, buf, count); // writes to the current location, and bumps the location pointer by count. write(fd2, buf, count); // Writes to the updated location pointer set previously. in this case: fd1 = open(...); pid = fork(); if (pid == 0) { // child write(fd1, ...) // updates location ... close and exit } else if (pid > 0) { // parent write (fd1, ...); // updates location ... close wait exit } else { // error pid == -1) report error and exit } In the above code, there is a race condition in that either the parent or the child writes first, so the second data written will be written to a location beyond where the first data was written. Whether the parent or child writes first depends on a number of things. -- Jerry Feldman Boston Linux and Unix user group http://www.blu.org PGP key id:C5061EA9 PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9

Verdi March

1 Apr 1 Apr

09:33

New subject: [suse-programming-e] The meaning of atomic in write()

Hi, On Saturday 01 April 2006 03:23, Jerry Feldman wrote:

...

In the above code, there is a race condition in that either the parent or the child writes first, so the second data written will be written to a location beyond where the first data was written. Whether the parent or child writes first depends on a number of things.

yep, the lesson that I learned. Initially I expect that 'atomic' covers both the location pointer and the 'no interleaving of bytes up to a certain size'. Turns out that the first is not. And the occurrance of the race condition varies among platforms. I used a shell script that repeatedly executes the program. The shell script stops until the race condition occurs. On SUSE 9.3 and FC4 (64-bit Opteron), the race condition can occur pretty fast, while on SunOS it does not occur (but maybe it's because I didn't run the script long enough). -- Regards, Verdi

Verdi March

09:50

New subject: [suse-programming-e] The meaning of atomic in write()

On Saturday 01 April 2006 17:33, Verdi March wrote:

...

I used a shell script that repeatedly executes the program. The shell script stops until the race condition occurs.

To be specific, stops when the file contains "overlapping" output. -- Regards, Verdi

Jerry Feldman

13:56

New subject: [suse-programming-e] The meaning of atomic in write()

On Sat, 1 Apr 2006 17:33:00 +0800 Verdi March wrote:

...

yep, the lesson that I learned. Initially I expect that 'atomic' covers both the location pointer and the 'no interleaving of bytes up to a certain size'. Turns out that the first is not.

And the occurrance of the race condition varies among platforms. I used a shell script that repeatedly executes the program. The shell script stops until the race condition occurs. On SUSE 9.3 and FC4 (64-bit Opteron), the race condition can occur pretty fast, while on SunOS it does not occur (but maybe it's because I didn't run the script long enough). The location pointer should be atomic. I did not run a recent stress test on the 2.6 kernel. One of the problems is that you used streamIO, where the buffering is both in user space and underneath in kernel space. The system calls, open(2), dup(2), write(2), close(2) are atomic. And as I mentioned, they refer to the same single kernel open file structure. But, the streamIO functions are library functions and are not guaranteed to be atomic. There are a number of methods in Linux to guarantee atomicity. You can use file locking, fcntl(2), flock(2), lockf(3):

child: flock(2) // set the lock fprintf(3) fflush(3) flock(2) // release the lock ... Parent does the same thing. You still have a race condition as to whose data is going to be written first, but that is an application decision. Another way is to write your own function using the vsprintf(3) function. I know that using a fixed size buffer here is unsafe, but I'm using it for the example. In the function, below, by using write(2) you are bypassing the stream's file structure (and its own location pointer). int myprintf(FILE *stream, const char *fmt, ...) { va_list lp; va_start(lp, fmt); int rc; size_t wrc; char buf[some size]; rc = vsprintf(buf, lp); /* move stuff into buf */ (check rc to make sure vsprintf succeeded) wrc = write(fileno(stream), buf, strlen(buf)); (check wrc) return 0; } -- Jerry Feldman Boston Linux and Unix user group http://www.blu.org PGP key id:C5061EA9 PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9

Verdi March

4 Apr 4 Apr

15:49

New subject: [suse-programming-e] The meaning of atomic in write()

Hi, On Saturday 01 April 2006 21:56, Jerry Feldman wrote:

...

The location pointer should be atomic. I did not run a recent stress test on the 2.6 kernel. One of the problems is that you used streamIO, where the buffering is both in user space and underneath in kernel space. The system calls, open(2), dup(2), write(2), close(2) are atomic. And as I mentioned, they refer to the same single kernel open file structure. But, the streamIO functions are library functions and are not guaranteed to be atomic.

[deleted]

...

Another way is to write your own function using the vsprintf(3) function. I know that using a fixed size buffer here is unsafe, but I'm using it for the example. In the function, below, by using write(2) you are bypassing the stream's file structure (and its own location pointer).

I tried out your suggestion to bypass C stdio. I purposely open a file using O_WRONLY|O_CREAT|O_TRUNC only (no O_APPEND or O_DIRECT), then do the write without locking. The result, I can still get the race condition where both processes write from the beginning of the file. Looks like use either O_APPEND or application-level handling to guarantee that the race condition won't happen. -- Regards, Verdi

Jerry Feldman

17:01

New subject: [suse-programming-e] The meaning of atomic in write()

On Tuesday 04 April 2006 11:49 am, Verdi March wrote:

...

Hi,

On Saturday 01 April 2006 21:56, Jerry Feldman wrote:

...
The location pointer should be atomic. I did not run a recent stress test on the 2.6 kernel. One of the problems is that you used streamIO, where the buffering is both in user space and underneath in kernel space. The system calls, open(2), dup(2), write(2), close(2) are atomic. And as I mentioned, they refer to the same single kernel open file structure. But, the streamIO functions are library functions and are not guaranteed to be atomic.

[deleted]

...
Another way is to write your own function using the vsprintf(3) function. I know that using a fixed size buffer here is unsafe, but I'm using it for the example. In the function, below, by using write(2) you are bypassing the stream's file structure (and its own location pointer).

I tried out your suggestion to bypass C stdio. I purposely open a file using O_WRONLY|O_CREAT|O_TRUNC only (no O_APPEND or O_DIRECT), then do the write without locking. The result, I can still get the race condition where both processes write from the beginning of the file. You should not get that under any case. char buf[SOME_SIZE]; fd = open(filename, O_WRONLY|O_CREAT|O_TRUNC ); // create the empty file if (pid = fork()) == 0) { // child

sprintf(buf, "child pid is %d\n", getpid()); write(fd, buf, strlen(buf)); } else if (pid > 0) { // parent sprintf(buf, "Parent pid is %d\n", getpid()); write(fd, buf, strlen(buf)); } else { assert... } // both for(i = 0; i < 4; i++) { sprintf(buf, "%d: Iteration %d\n", getpid(), i); write(fd, buf, strlen(buf)); } close(fd); exit(0); } The file descriptor points to the same file structure, so that the location pointers should be correct. The race condition is simply who writes first, If you want you can either post your code or send me the code at gaf@hp.com

...

Looks like use either O_APPEND or application-level handling to guarantee that the race condition won't happen.

-- Jerry Feldman Boston Linux and Unix user group http://www.blu.org PGP key id:C5061EA9 PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9

Verdi March

5 Apr 5 Apr

14:39

New subject: [suse-programming-e] The meaning of atomic in write()

Hi Jerry, On Wednesday 05 April 2006 01:01, Jerry Feldman wrote:

...

You should not get that under any case.

[deleted]

...

The file descriptor points to the same file structure, so that the location pointers should be correct. The race condition is simply who writes first, If you want you can either post your code or send me the code at gaf@hp.com

I attach the C program, some sample outputs I've collected, and two shell scripts for stress-testing (one for detecting child-overwrites-parent, the other for parent-overwrites-child). Both shell scripts expect a.out in the current directory. The parent-overwrites-child happens more frequently than child-overwrites-parent. In fact, on an Opteron machine (Fedora), only parent-overwrites-child occurs, but not child-overwrites-parent (though I've waited long enough). -- Regards, Verdi

Jerry Feldman

6 Apr 6 Apr

15:46

New subject: [suse-programming-e] The meaning of atomic in write()

Thanks. I will take a close look at your code and try it on SuSE 10 (single 32-bit) and one of our HP Integrity servers (either RHEL 4 or SLES9) in the lab. On Wednesday 05 April 2006 10:39 am, Verdi March wrote:

...

The parent-overwrites-child happens more frequently than child-overwrites-parent. In fact, on an Opteron machine (Fedora), only parent-overwrites-child occurs, but not child-overwrites-parent (though I've waited long enough).

-- Jerry Feldman Boston Linux and Unix user group http://www.blu.org PGP key id:C5061EA9 PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9

Jerry Feldman

7 Apr 7 Apr

23:58

New subject: [suse-programming-e] The meaning of atomic in write()

I found a document from a reliable source that states that write(2) is NOT atomic. http://marc.theaimsgroup.com/?l=linux-kernel&m=107375454908544 "There are file descriptors that have atomicity guarantees (pipes(, but regular files do not". Linus Torvalds. Also note that I used flock(2) as a quick check, but I had forgotten that flock(2) does not lock on NFS. It works quite well on local disks. -- Jerry Feldman Boston Linux and Unix user group http://www.blu.org PGP key id:C5061EA9 PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9

Verdi March

8 Apr 8 Apr

16:51

New subject: [suse-programming-e] The meaning of atomic in write()

Hi, On Saturday 08 April 2006 07:58, Jerry Feldman wrote:

...

I found a document from a reliable source that states that write(2) is NOT atomic. http://marc.theaimsgroup.com/?l=linux-kernel&m=107375454908544 "There are file descriptors that have atomicity guarantees (pipes(, but regular files do not". Linus Torvalds.

Thanks, this should clarify everything. -- Regards, Verdi

Jerry Feldman

19:52

New subject: [suse-programming-e] The meaning of atomic in write()

On Sun, 9 Apr 2006 00:51:37 +0800 Verdi March wrote:

...

Hi,

On Saturday 08 April 2006 07:58, Jerry Feldman wrote:

...
I found a document from a reliable source that states that write(2) is NOT atomic. http://marc.theaimsgroup.com/?l=linux-kernel&m=107375454908544 "There are file descriptors that have atomicity guarantees (pipes(, but regular files do not". Linus Torvalds.

Thanks, this should clarify everything. I suggest that you use the lockf(3) or fcntl(2) functions to lock and unlock regions of a file before you write. Flock(2) will work fine on local drives but not on NFS file systems. -- Jerry Feldman Boston Linux and Unix user group http://www.blu.org PGP key id:C5061EA9 PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9

Jerry Feldman

29 Mar 29 Mar

16:33

New subject: [suse-programming-e] The meaning of atomic in write()

...

From my understanding, the second (child) process inherits its parent's file descriptor f. Therefore, both f in parent and child point to the same open-file structure (according to UNIX book). I thought that kernel reclaims an open-file structure only when it's not pointed anymore by descriptors? The way it works: open(2) opens a file and returns a file descriptor. The use count on that file is incremented. fork(2) replicates the file descriptors such that the parent and child now have their own, and the use count is incremented. At some point, the unlink(2) system call is called. This removes the file name from the specified directory and decrements the use count. Assuming

On Wednesday 29 March 2006 2:08 am, Verdi March wrote: the file only had a single hard link, the use count is now 2. Parent closes file. Use count is decremented. The child can still write. The child closes file, and the use count becomes zero, and the OS physically deletes the file. Note that the system calls are atomic. -- Jerry Feldman Linux Expertise Center (PTAC-MA/TX) Hewlett-Packard Co. 200 Forest Street MRO1-3/K12 Marlborough, MA 01752-3081 508-467-4315 (http://www.testdrive.hp.com)

Anders Johansson

16:38

New subject: [suse-programming-e] The meaning of atomic in write()

On Wed, 2006-03-29 at 11:33 -0500, Jerry Feldman wrote:

...

The way it works: open(2) opens a file and returns a file descriptor. The use count on that file is incremented. fork(2) replicates the file descriptors such that the parent and child now have their own, and the use count is incremented. At some point, the unlink(2) system call is called.

No, the syscall is close(), not unlink() What confused me initially was that I fooled myself into thinking parent and child shared a single fd, which got closed. Obviously this is not the case

Jerry Feldman

16:52

New subject: [suse-programming-e] The meaning of atomic in write()

On Wednesday 29 March 2006 11:38 am, Anders Johansson wrote:

...

On Wed, 2006-03-29 at 11:33 -0500, Jerry Feldman wrote:

...
The way it works: open(2) opens a file and returns a file descriptor. The use count on that file is incremented. fork(2) replicates the file descriptors such that the parent and child now have their own, and the use count is incremented. At some point, the unlink(2) system call is called.

No, the syscall is close(), not unlink()

What confused me initially was that I fooled myself into thinking parent and child shared a single fd, which got closed. Obviously this is not the case I added the unlink(2) to the mix showing that you can also write to (or read from) a deleted file. -- Jerry Feldman Boston Linux and Unix user group http://www.blu.org PGP key id:C5061EA9 PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9

6599

Age (days ago)

6610

Last active (days ago)

List overview

Download

22 comments

6 participants

participants (6)

Anders Johansson
Jerry Feldman
Jerry Feldman
Manfred Hollstein
Steve Graegert
Verdi March