Parallelizing Rainbow Table Lookup with Pipes

Niko: My mom is 70 and forgets things, but she never forgets me and my birthday. So she uses niko0107 as the password for every site. Works for her.

Nini: That’s cute. Do you know how websites actually store your password?

Niko: They keep it secret, right? Plain text in their disk?

Nini: No. They hash it with a cryptographic function. When you type a password, the site runs it through a hash function and stores only the hash. The hash can be very long. In practice, it is not possible to find out the plaintext password without quantum computer.

Niko: So if someone steals the hashes, they still can’t read the passwords?

Nini: They can try. If lots of people reuse simple passwords like niko0107, an attacker can compute hashes for many candidate passwords and compare them to the stolen hashes. If one matches, they know the plaintext. That’s why leaks are dangerous: the attacker gets a list of hashes and then tries candidates until something matches. They call this Rainbow table

Niko: So they just try every password?

Nini: Yes. If the attacker has the leaked hashes, they can use a password list or generate guesses, hash each guess, and compare. If you reuse your password on other sites (e.g., Gmail, Yahoo), they can try those sites with the recovered password.

Niko: Oh mao.

Nini: Let’s do a hand-on lab to get a feeling. First you will learn how hashing and checking work. Then you will see how pipes let the two stages run at the same time so the check can start before hashing finishes.

Websites never store your password in plaintext. Instead, when you register or log in:

  1. Your plaintext password (like admin) is run through a cryptographic hash function.
  2. Only the hash (like 21232f297a57a5a743894a0e4a801fc3) is stored in the website’s backend database.
  3. This way, even the site’s administrator cannot see your actual password (only the hash).

But if the database is ever leaked or hacked, attackers can try to reverse the hashes.

They do this using a rainbow table: a huge file of common passwords. According to an iThome news (2021):

來自臺灣的威脅情報研究人員 StillAzureH 近日向 Have I Been Pwned(HIBP)舉報,已於駭客論壇上發現來自臺灣入口網站蕃薯藤(yam.com)的外洩資料…… 該資料庫內含超過 1,300 萬筆用戶資料,其中包含使用者名稱、姓名、生日、電話、地址,與 未加鹽的 MD5 雜湊密碼

Because these passwords are hashed without salt using just MD5, attacker can easily recreate user’s plaintext password easily.

Rainbow Table Lookup with Pipes

In this lab, you’ll learn how pipes can accelerate computation by running two stages concurrently:

  • Stage 1 (hash): reads a password list and outputs MD5 hashes
  • Stage 2 (check): compares the MD5 with a target hash

We’ll run them:

  • Sequentially with crack-seq (hash then check). crack-seq runs hash to calculate the MD5 of each password in the password file, and write the output into hashed.txt. When hash finishes, crack-seq runs check that reads from hashed.txt and compares all the MD5 hashes in hashed.txt with a target hash. crack-seq has to wait for its child process to finish before it can fork a second child process.

  • In parallel with crack-pipeline (hash and check in pipeline). crack-pipeline doesn’t write to a file first. It doesn’t wait for the first child to finish. It forks the two child processes together, and connect the two processes with a pipe. The first child hash writes directly to stdout line by line, and the second child check can read line by line as it is emitted from the pipe and process them.

open in GitHub Codespaces.

️ Setup

  1. Run the setup script: download the password list, clone the MD5 library, and compile the programs.
./setup.sh

Test the MD5 tool

Hash the password “admin” with MD5:

./md5sum admin

Expected output:

21232f297a57a5a743894a0e4a801fc3

Run Sequential Version

Use time to measure wall-clock time:

time ./crack-seq 21232f297a57a5a743894a0e4a801fc3

Example output:

admin is found in passwords-10000.txt

real    0m0.038s
user    0m0.024s
sys     0m0.004s

Run Pipeline Version

Run the hash and check stage in parallel through a pipe:

time ./crack-pipeline 21232f297a57a5a743894a0e4a801fc3

Example output:

admin is found in passwords-10000.txt

real    0m0.012s
user    0m0.005s
sys     0m0.006s

Try a hash not in the list

time ./crack-pipeline 21232f297a57a5a743894a0e4a801fb3

Output:

21232f297a57a5a743894a0e4a801fb3 is not found (password file = passwords-10000.txt)

real    0m0.026s
user    0m0.021s
sys     0m0.005s

Try a larger password file

Use the PASSWORD_FILE environment variable to change the password file:

time PASSWORD_FILE=passwords-1000000.txt ./crack-pipeline 21232f297a57a5a743894a0e4a801fc3
admin is found in passwords-1000000.txt

real    0m0.012s
user    0m0.007s
sys     0m0.004s

And run sequential for comparison:

time PASSWORD_FILE=passwords-1000000.txt ./crack-seq 21232f297a57a5a743894a0e4a801fc3
admin is found in passwords-1000000.txt

real    0m1.217s
user    0m0.937s
sys     0m0.092s

Reflections

In the real world, attackers don’t just parallelize two processes. They may run thousands of processes across many machines. To protect our password, website owner uses salted hashes and strong hashing algorithms (SHA-256, not MD5!). To protect ourselves, we enable two-factor authentication and use password manager to assign a long, unique password for each website.

Back to top