w5 – Operating System (2025 Fall)

Worksheet 5

Video Lecture

dup() v.s dup2()

In the lecture, we close stout first, then call dup(). dup() returns the lowest-numbered file descriptor that was unused in the calling process, which is 1 now. Then, writing to fd2 or printf (which calls write(1, "cat 2\n", 6)) both is redirected to cats.txt.

int main() {
    int fd = open("cats.txt", O_RDWR | O_CREAT);
    close(1); // close stdout
    
    int fd2 = dup(fd); // fd2 = 1
    write(fd,"cat 1\n", 6);
    printf("cat 2\n"); // write to cats.txt
    
    return 0;
}

Have you wonder what’s the difference with dup(fd, 1)? From man 2 dup:

The steps of closing and reusing the file descriptor newfd are performed atomically. This is important, because trying to implement equivalent functionality using close(2) and dup() would be subject to race conditions, whereby newfd might be reused between the two steps. Such reuse could happen because the main program is interrupted by a signal handler that allocates a file descriptor, or because a parallel thread allocates a file descriptor.

It’s fine if you don’t understand; later in the semester we will talk about signal and multi-threading.

Readings

Hands-on Lab

Bonus Challenge

The textbook offers us a challenge:

Measuring the cost of a context switch is a little trickier. The lmbench benchmark does so by running two processes on a single CPU, and setting up two UNIX pipes between them; a pipe is just one of many ways processes in a UNIX system can communicate with one another. The first process then issues a write to the first pipe, and waits for a read on the second; upon seeing the first process waiting for something to read from the second pipe, the OS puts the first process in the blocked state, and switches to the other process, which reads from the first pipe and then writes to the second. When the second process tries to read from the first pipe again, it blocks, and thus the back-and-forth cycle of communication continues. By measuring the cost of communicating like this repeatedly, lmbench can make a good estimate of the cost of a context switch. You can try to re-create something similar here, using pipes, or perhaps some other communication mechanism such as UNIX sockets.

One difficulty in measuring context-switch cost arises in systems with more than one CPU; what you need to do on such a system is ensure that your context-switching processes are located on the same processor. Fortunately, most operating systems have calls to bind a process to a particular processor; on Linux, for example, the sched setaffinity() call is what you’re looking for. By ensuring both processes are on the same processor, you are making sure to measure the cost of the OS stopping one process and restoring another on the same CPU.

Try run in a VM with only one CPU, or try to use sched_setaffinity(), to run lm_bench. What is the context switch time you find on your system?

Learning Goals

I understand how to redirect I/O using dup2()
I understand how the shell uses pipes (|) to connect commands (e.g., ls | wc) and how pipe is actually implemented using the pipe() and dup2() system calls.
I understand why pipeline execution can be faster than sequential execution.
I understand why unused pipe file descriptors should be closed.
I understand named pipe (FIFO) and how it differs from pipe.
I understand the difference between a per-process file descriptor table and the system-wide open file and i-node tables.
I understand when a process will receive a SIGPIPE signal.