COMS 4995 Advanced Systems Programming


File I/O system calls


// Need to specify mode if file is being created...
int open(const char *path, int oflag, mode_t mode);
// ...otherwise, mode argument is omitted.
int open(const char *path, int oflag);

Creates an entry in the process’s file descriptor table and returns a file descriptor (the index of the entry in the table). The entry stores the current offset into the file, the open options, and other metadata.

Example (taken from man open in Linux):

#include <fcntl.h>

int fd;
mode_t mode = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH;
char *pathname = "/tmp/file";

fd = open(pathname, O_WRONLY | O_CREAT | O_TRUNC, mode);

Another example – creating a lock file for distributed synchronization (e.g. multi-process webserver):

fd = open("/var/run/", O_WRONLY | O_CREAT | O_EXCL, 0644);

Note the use of octal notation – 0644 corresponds to S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH


Redundant: creat(path, mode) is equivalent to open(path, O_WRONLY | O_CREAT | O_TRUNC, mode)

And it should have been called create, says Ken Thompson


int close(int fildes);

Deletes the file descriptor table entry at index fildes.


off_t lseek(int fildes, off_t offset, int whence);

Note that lseek() doesn’t actually do file I/O – it only modifies the file table entry.


ssize_t read(int fildes, void *buf, size_t nbyte);

Returns number of bytes read, 0 if end of file, -1 on error. Number of bytes read may be less than the requested nbyte – check return value.

read() may block forever on a “slow” read from pipes, FIFOs (aka named pipes), sockets, or keyboard.

For sockets, read(socket, buf, nbyte) is equivalent to recv(socket, buf, nbyte, 0):

ssize_t recv(int socket, void *buffer, size_t length, int flags)


ssize_t write(int fildes, const void *buf, size_t nbyte);

Returns number of bytes written, -1 on error. Number of bytes written may be less than the requested nbyte (e.g. filling up a disk).

write() may block forever on a “slow” write into pipes, FIFOs, or sockets (e.g. see discussion on blocking in man 7 pipe).

For sockets, write(socket, buf, nbyte) is equivalent to send(socket, buf, nbyte, 0)

ssize_t send(int socket, const void *buffer, size_t length, int flags)

Atomicity (from APUE 3.12): given an operation with multiple steps, either all steps are performed (on success) of none are performed (on failure). Not possible to observe a subset of steps performed.

If the file was opened with O_APPEND flag, the file offset gets set to the end of the file prior to each write

C Standard I/O Library

Wrapping the File I/O API

FILE *fopen(const char *pathname, const char *mode); // open()
int fclose(FILE *stream);  // close()
int fseek(FILE *stream, long offset, int whence);  // lseek()
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);  // read()
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);  // write()

Note that FILE *stream replaces int fd – standard I/O is based on streams.

FILE * is an “opaque object” – not meant to be inspected, just passed into API.


Goal: reduce number of read()/write() syscalls while performing stream operations.

Trace syscall invocations via strace:

File sharing

  1. Kernel data structures for open files

    Figure 3.7, APUE

    Three key data structures:

    • Per-process file descriptor table
    • File table entry (possibly shared, see later examples). Instance of opened file
    • Inode (kernel representation of file)
  2. Two independent processes with the same file open

    Figure 3.8, APUE

    (e.g. fork-then-open.c)

    Independent file table entries, but they point to the same inode

  3. Kernel data structures after dup(1)

    Figure 3.9, APUE

    New file descriptor table slot points to the same file table entry.

  4. Sharing of open files between parent and child after fork

    Figure 8.2, APUE

    (e.g. open-then-fork.c)

    Similar to dup() – child file descriptor table is constructed as if it were dup()‘d – references parent file table entries.

Last updated: 2024-02-04