Parallelizing Rainbow Table Lookup with Pipes
Niko: My mom is 70 and forgets things, but she never forgets me and my birthday. So she uses niko0107
as the password for every site. Works for her.
Nini: That’s cute. Do you know how websites actually store your password?
Niko: They keep it secret, right? Plain text in their disk?
Nini: No. They hash it with a cryptographic function. When you type a password, the site runs it through a hash function and stores only the hash. The hash can be very long. In practice, it is not possible to find out the plaintext password without quantum computer.
Niko: So if someone steals the hashes, they still can’t read the passwords?
Nini: They can try. If lots of people reuse simple passwords like niko0107
, an attacker can compute hashes for many candidate passwords and compare them to the stolen hashes. If one matches, they know the plaintext. That’s why leaks are dangerous: the attacker gets a list of hashes and then tries candidates until something matches. They call this Rainbow table
Niko: So they just try every password?
Nini: Yes. If the attacker has the leaked hashes, they can use a password list or generate guesses, hash each guess, and compare. If you reuse your password on other sites (e.g., Gmail, Yahoo), they can try those sites with the recovered password.
Niko: Oh mao.
Nini: Let’s do a hand-on lab to get a feeling. First you will learn how hashing and checking work. Then you will see how pipes let the two stages run at the same time so the check can start before hashing finishes.
Websites never store your password in plaintext. Instead, when you register or log in:
- Your plaintext password (like
admin
) is run through a cryptographic hash function. - Only the hash (like
21232f297a57a5a743894a0e4a801fc3
) is stored in the website’s backend database. - This way, even the site’s administrator cannot see your actual password (only the hash).
But if the database is ever leaked or hacked, attackers can try to reverse the hashes.
They do this using a rainbow table: a huge file of common passwords. According to an iThome news (2021):
來自臺灣的威脅情報研究人員 StillAzureH 近日向 Have I Been Pwned(HIBP)舉報,已於駭客論壇上發現來自臺灣入口網站蕃薯藤(yam.com)的外洩資料…… 該資料庫內含超過 1,300 萬筆用戶資料,其中包含使用者名稱、姓名、生日、電話、地址,與 未加鹽的 MD5 雜湊密碼。
Because these passwords are hashed without salt using just MD5, attacker can easily recreate user’s plaintext password easily.
Rainbow Table Lookup with Pipes
In this lab, you’ll learn how pipes can accelerate computation by running two stages concurrently:
- Stage 1 (
hash
): reads a password list and outputs MD5 hashes - Stage 2 (
check
): compares the MD5 with a target hash
We’ll run them:
Sequentially with
crack-seq
(hash then check).crack-seq
runshash
to calculate the MD5 of each password in the password file, and write the output intohashed.txt
. Whenhash
finishes,crack-seq
runscheck
that reads fromhashed.txt
and compares all the MD5 hashes inhashed.txt
with a target hash.crack-seq
has to wait for its child process to finish before it can fork a second child process.In parallel with
crack-pipeline
(hash and check in pipeline).crack-pipeline
doesn’t write to a file first. It doesn’t wait for the first child to finish. It forks the two child processes together, and connect the two processes with apipe
. The first childhash
writes directly tostdout
line by line, and the second childcheck
can read line by line as it is emitted from the pipe and process them.
Hash the password “admin” with MD5: Expected output: Use Example output: Run the hash and check stage in parallel through a pipe: Example output: Output: Use the And run sequential for comparison: In the real world, attackers don’t just parallelize two processes. They may run thousands of processes across many machines. To protect our password, website owner uses salted hashes and strong hashing algorithms (SHA-256, not MD5!). To protect ourselves, we enable two-factor authentication and use password manager to assign a long, unique password for each website..
️ Setup
./setup.sh
Test the MD5 tool
./md5sum admin
21232f297a57a5a743894a0e4a801fc3
Run Sequential Version
time
to measure wall-clock time:time ./crack-seq 21232f297a57a5a743894a0e4a801fc3
admin is found in passwords-10000.txt
real 0m0.038s
user 0m0.024s
sys 0m0.004s
Run Pipeline Version
time ./crack-pipeline 21232f297a57a5a743894a0e4a801fc3
admin is found in passwords-10000.txt
real 0m0.012s
user 0m0.005s
sys 0m0.006s
Try a hash not in the list
time ./crack-pipeline 21232f297a57a5a743894a0e4a801fb3
21232f297a57a5a743894a0e4a801fb3 is not found (password file = passwords-10000.txt)
real 0m0.026s
user 0m0.021s
sys 0m0.005s
Try a larger password file
PASSWORD_FILE
environment variable to change the password file:time PASSWORD_FILE=passwords-1000000.txt ./crack-pipeline 21232f297a57a5a743894a0e4a801fc3
admin is found in passwords-1000000.txt
real 0m0.012s
user 0m0.007s
sys 0m0.004s
time PASSWORD_FILE=passwords-1000000.txt ./crack-seq 21232f297a57a5a743894a0e4a801fc3
admin is found in passwords-1000000.txt
real 0m1.217s
user 0m0.937s
sys 0m0.092s
Reflections