How cat works?

Nini is always fascinated by how cat🐱 works, not because he is himself a cat, but also because it is beautiful. See coreutils’s cat.c

cat’s job is simple: Read from a file and write it to standard output.

However, the actual implementation in coreutils is optimized to handle different use cases efficiently. Depending on which options you use and what kinds of files you are reading or writing, cat dynamically chooses between three main copying strategies:

  • copy_cat(): a kernel-assisted, zero-copy strategy for speed.
  • simple_cat(): a straightforward read/write loop for ordinary copies.
  • cat(): interprets options like -n, -v, etc.

simple_cat()

This function performs the most direct kind of copying:

while (true) {
  n_read = read(input_desc, buf, bufsize);
  if (n_read == 0) break;    // End of file
  write(STDOUT_FILENO, buf, n_read);
}

It reads a chunk of bytes from the input file descriptor, writes that chunk to standard output. It repeats until the input ends.

cat()

When you use options like -n, cat switches to this more complex version.

It first reads data into a buffer, scans it character by character, and keeps track of newlines and line counts. . For -n, it writes the line number to the stdout as well.

cat() is expected to be a lot slower than the other two, because reading byte by byte will be a lot slower than reading in one go. (Maybe you can verify this using a large file? Remember to clear the page cache first before you run)

copy_cat()

This is the most optimized method available on modern Linux systems. It calls the system function:

copy_file_range(input_desc, NULL, STDOUT_FILENO, NULL, copy_max, 0);

This allows the kernel to move data directly from the input file to the output file without copying it into user space.

It can only be used when both input and output are regular files and on the same file-system. cat will always try copy_file_range first. If it fails, cat falls back to simple_cat() automatically.

Tip

cat always write to standard out, but standard out can be redirected by the shell to a regular file!!!!!

Chooses one

The main function of cat follows this logic:

if (any formatting options enabled)
    use cat()
else if (copy_file_range works)
    use copy_cat()
else
    use simple_cat()

Try it

Try comparing the code path taken by cat /etc/passwd versus cat -n /etc/passwd in GDB. Set breakpoints and observe which function is used.

You must have cat compiled with debug symbols. The default cat in your system likely doesn’t have debug symbol

I have compiled coreutils with debug symbols for you. You can run the first cell of this Colab notebook. It will download the compiled coreutils into the current directory. In the bottom left of the screen, you can open a terminal and use ls to view the coreutils directory.

ls -al coreutils/src/cat
gdb --args coreutils/src/cat -n /etc/passwd 
# In gdb:
# (gdb) break cat
# (gdb) break simple_cat
# (gdb) break copy_cat
# (gdb) run

Try running with and without formatting options (like -n, -T, etc.) to see how the code path changes.


cat shouting

Now, observe cat’s behavior using `strace:

Scenario 1: virtual → stdout

strace -f -e openat,read,write,copy_file_range sh -c "cat /proc/meminfo"

Expected: only read and write.

Scenario 2: virtual → regular file

strace -f -e openat,read,write,copy_file_range sh -c "cat /proc/meminfo > B"

Expected: copy_file_range is attempted, fails, then falls back to read/write.

Scenario 3: regular → regular (same filesystem)

echo "hello" > A
strace -f -e openat,read,write,copy_file_range sh -c "cat A > B"

Expected: successful copy_file_range if supported.

Back to top