Static vs Dynamic Linking

One morning at Datadog, a client cat from a small startup called Panda the cat bear called for help.
The datadog service stopped logging data after they updated their servers.

Nini was calm as always: “This sounds like a linking problem. Let’s investigate.”

Niko: “A LinkedIn problem?””

“No, Niko. Linking is how we combine your program with libraries.
Sometimes the library code is copied into the program itself,
sometimes the program looks for the library later when it runs.”

The client added:
> “We just compiled our tool lz4cat, but it fails to run. It says it cannot find liblz4.so!”

Your task today is to play the role of the engineer debugging the client’s problem.
You will build lz4cat in three different ways — static, dynamic, and dynamic with RPATH, on the Alpine Linux distribution. lz4cat uses the lz4 library to do the decompression and output the content to stdout.

By the end of this lab, you should be able to answer:
- What happens at build time vs run time?
- Why do some binaries run without extra settings, while others need LD_LIBRARY_PATH?
- How do static and dynamic linking affect size and memory use?

open in GitHub Codespaces.

You do not need to edit code in this lab. We will offer you a Makefile, create a sample compressed file for you to test the decompression. Just run the steps and answer the quizzes.

1. Hands-On

  1. Run ./setup.sh to install LZ4 into the directory /os.
  2. Build a static lz4cat and run it.
  3. Build a dynamic lz4cat, watch it fail at run time, then fix it with LD_LIBRARY_PATH.
  4. Build a dynamic+RPATH lz4cat that runs without environment variables.
  5. Compare binaries and inspect metadata (ldd, readelf -d).

Setup

Download the LZ4 compression library, build the library, and install it to /os.

./setup.sh

Search for the following variables in the Makefile:

  • LIBRARY_PATHBuild-time search path for the linker to resolve -lfoo to libfoo.a / libfoo.so. No effect at run time.
  • LD_LIBRARY_PATHRun-time search path for the dynamic loader to locate needed .so when the program starts. Ignored by static binaries.
  • RPATH (-Wl,-rpath,<dir>) — A path baked into the ELF at build time that the loader uses at run time (no env vars needed).

Create a sample compressed file:

make sample

Static

make build_static, then make run_static

Verify it doesn’t depend on any other library: ldd ./b/lz4cat_static

Dynamic

make build_dyn, then make run_dyn1, make run_dyn2.

The dynamic binary can be compiled, but it fails to start until the loader can find liblz4.so (you fix this with LD_LIBRARY_PATH)

RPATH

make build_dyn_rpath, then make run_rpath

The RPATH hardcodes /os/lib into the ELF so it runs with no env var.

  • make inspect (see sizes, ldd, readelf -d)

Concept Check

Q1. Which environment variable is read at build time by the linker to find -lXXX library?

Q2. Which environment variable is read at run time by the dynamic loader to find .so files?

Q3. Which one hardcode the library path into the binary, so we don’t depend on any environment variable at runtime?

Q4. Your dynamic binary failed before you exported LD_LIBRARY_PATH. What failed?

Q5. Assume a library libX size = 3 MiB (code),. Ignore data and ASLR effects for this exercise. Two different executables A and B: 1) Each program’s own code = 1 MiB. 2) Each runs 1 process on the same machine. 3) Both use libX.

(a) Static linking — what is the total disk footprint?

(b) Dynamic linking — what is the total disk footprint?

Q6. Same scenario, but now 50 processes of A and 50 processes of B run concurrently. Which has smaller RAM use for libX code pages (text)?

Q7. If two different statically linked executables contain byte-identical copies of libX’s code, the kernel will naturally share the same memory pages across the two executables.

2. Shared Libraries

Visit these Alpine package pages and examine their “Required by”.

Q8. Which library is the most widely required of the four?

Q9. Which of the following correctly describe the purpose ncurses?

Q10. “Required by” on those pages reports…

Q11. What is the best reason why libgcc shows up as a dependency so often?


3. Readings

In the lecture, we talk about how shared libraries can amortize memory consumption because multiple processes can map the same .so and share text segments. However, what if the library is very niche, and not many processes need it? These articles give us a different perspective to consider.

  1. One-binary deployment perspective — cross-compile for different architectures, easy to copy around different machines: link 1
  2. Size perspective — when (and when not) dynamic linking actually wins for binary size: link 2
  3. Security perspective: It is commonly believed that dynamic linking allows for easier security patching because you update a shared library once, and all applications using it are protected. Read this article to learn about the security risk made possible by LD_PRELOAD: link 3 (Other fun things you can do)

Q12. According to article 2, if a library is only used by one program on the entire machine. It is more cost-effective to use:

Q13. A C program compiled on Ubuntu 24.04 and linked only to libc will run on Ubuntu 18.04, because both Ubuntu 24.04 and 18.04 are installed with libc.

Q14. You’re the developer of a single commandline tool fzf. On your Github repo, what’s the safest way to ship your program so it can run on multiple architectures?

Q15. According to article 3, what are some possible things you can do with LD_PRELOAD?
Back to top