fork()
/exec()
Why the UNIX chose a two-step interface (fork
then exec
) instead of one single API for process creation?
The biggest reason is that the parent can adjust many of the child process’s execution environment:
- scheduling priority (
nice
) - resource limits (
rlimit
) - open file (
dup2
) - permission (
umask
) - working directory (
chdir
) - user ID (
setuid
) - signal handling.
A management process run as root
. It forks a child, and drops child’s privileges from root to the nobody
user. This will exec()
the target binary with minimum security risk.
If you have a single API for process creation, you would need to populate a very big struct
of options.
When we exec()
a program, we can also pass some inputs into the program. There are two ways to pass inputs to the program:
- command-line arguments (via argv)
- environment variables (via envp)
Read OSTEP-proc-5 to find out more! Have you ever run AI tool like this one that needs an API key to talk to OpenAI? It is a bad idea to just write this in the code: Why? Because if you commit your code to GitHub, people can see your API key 🙃. It is also a bad idea to pass the API key as command line argument like this: Why? Because if you run the program on a public workstation, anyone on the same workstation can see your secret by looking at Instead, the best practice is to pass the API key or any secret information as environment variable: Before we run the code, we set the environment variable by pasting the API key we get from OpenAI: Environment Variables
# inside my python code ...
= "sk-ABCDEF123456" api_key
$ python3 llama.py api="sk-ABCDEF123456"
top
or ps
.import getpass
import os
if "OPENAI_API_KEY" not in os.environ:
"OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ") os.environ[
export OPENAI_API_KEY=sk-ABCDEF123456
Environment variables let us pass context to the code we’re running without hard-coding it in the code.
Try it on Github Codespace
Let’s play with an example script that query Taipei YouBike 2.0 real-time API. Open a previous Github Codespace, and run the following in the terminal to install some packages:
sudo apt update
sudo apt install -y jq curl
- Save this script as
ubike.sh
#!/usr/bin/env bash
#
# Simple helper for Taipei YouBike 2.0 real-time data
# • If $STATION is unset/empty : list all station names (sna)
# • If $STATION is set : print the specific field for the matching sna
DATA_URL="https://tcgbusfs.blob.core.windows.net/dotapp/youbike/v2/youbike_immediate.json"
json=$(curl -s "$DATA_URL")
if [[ -z "${STATION:-}" ]]; then
echo "Available stations:"
echo "$json" | jq -r '.[] | .sna' | sort -u
else
echo "Details for station: $STATION"
echo "$json" | jq -r --arg sna "$STATION" '
.[]
| select(.sna == $sna)
| to_entries[]
| "\(.key)=\(.value)"
'
fi
make it executable:
chmod +x ubike.sh
Run it:
# List every station name
./ubike.sh # STATION is empty
# Inspect one specific station (Chinese names work fine)
export STATION="YouBike2.0_捷運科技大樓站"
./ubike.sh
As you can see, we just change the behavior of the Docker container images often don’t change once they are built, meaning that their contents (binaries, scripts, configurations) are fixed. Environment variables help us inject dynamic behavior, such as API keys or password, into a container at runtime. The following command launches a PostgreSQL database container. Note the Under the hood, Docker calls Question: Why not just use command-line arguments: ubike.sh
without passing an argument.Real-World Example: Starting a PostgreSQL Container
-e
flags: they inject environment variables into the container (more customizable variables here).docker run --name some-postgres \
-e POSTGRES_USER=myuser \
-e POSTGRES_PASSWORD=mypassword \
-e POSTGRES_DB=mydatabase \
-d postgres
execve()
to start the database process. The database process reads values like POSTGRES_USER
from the environment.docker run postgres --user=myuser --password=mypassword
?
Running Programs in the Background: Daemonize
You ssh into a Linux server, start a long-running program (say ./my_model_training
), and then your network drops or you log out. Your probably know that the program will get killed….
How to keep it alive? We usually use In the 1990s, PTT was run at 杜奕瑾’s dormitory, manually from a terminal. To keep the program running, PTT called a That’s why even if the admin logged out, PTT kept running Here’s the specific code that redirect Here’s the code that sends This is equivalent to running this in a shell: On modern Linux servers, you rarely write Systemd then: Did you see systemd being the ancestor of all the processes in the system? These processes like databases, web servers, or messaging systems. They run for months, listening on a network port and responding to requests. If they ever crash, they must be restarted immediately to keep the system available.tmux
and detach. But the classic UNIX way is to daemonize: turn your program into a background service that is no longer tied to your terminal.How PTT Did It (The Old Days)
daemonize()
function from logind
, which can handle thousands of log-ins per second. The daemonize()
function works like this:/dev/null
or log files.stderr
to logfile. A daemon has no screen, so we send errors into a file.if (logfile) {
if ((fd = OpenCreate(logfile, O_WRONLY | O_APPEND)) < 0) {
("Can't open logfile");
perror(1);
exit}
if (fd != 2) {
(fd, 2);
dup2(fd);
close}
}
stdin
and stdout
to black holeif ((fd = open("/dev/null", O_RDWR)) < 0) {
("Can't open /dev/null");
perror(1);
exit}
(fd, 0);
dup2(fd, 1);
dup2if (!logfile)
(fd, 2); dup2
$ ./ptt </dev/null >/dev/null 2>>"$logfile"
Today: systemd Instead of DIY
daemonize()
yourself. Instead, you write a systemd service unit:[Service]
ExecStart=/usr/local/bin/myapp
Environment="API_KEY=sk-XXX"
Restart=always