Message Digest 5 or MD5 in short, is a cryptographic hashing algorithm and a one-way function that takes a message of arbitrary length and squishes them to produce a 128-bit digest. These MD5 hashes are represented using 32 hexadecimal characters – for example, MD5 digest of “Hello” is 09f7e02f1290be211da707a266f153b3. Cryptographic hashes are used to verify the integrity of files, store secret information like passwords in databases or convert data to fixed-length output.
In a Windows environment, you can get the MD5 hash digest of a file using the Get-FileHash command via the command line.
Get-FileHash <file_path> -Algorithm MD5 |
Let us now see how MD5 hash or any other cryptographic hashes are used to verify the file integrities. Very often when you are downloading any file from the internet, you see the website owners displaying hash values of the packages you download. This is to make sure that the data in transit is not compromised when it reaches your end. If in case it is compromised, you can identify that from the hash value that both the packages are different.
Another use case of the MD5 hash algorithm is storing passwords. With the rise in data breaches exposing clear text passwords of users, developers all over the world adapted to store sensitive information like passwords in hashed format rather than plain text so that even if the passwords get breached, it is difficult to obtain clear text password just by looking at the hash.
Here’s a simple code in python 3 that takes a string as input and gives the HASH value of the string.
import hashlib password = input(“enter the secret value”) print(hashlib.md5(password.encode(‘utf-8’)).hexdigest()) |
For a given password admin@123 , the above program produces 4ba5961e53c25141805b00806740d069 hash.
Let us now see how to decrypt MD5 hashes, If you have until here reading the blog, you would have understood that MD5 hashing algorithm or any other hashing algorithms are one-way Hashing algorithms and are irreversible which means that we can get the hash of any message by passing it to MD5 algorithm but the vice versa of getting the message if given a hash is not possible. Hence the name irreversible and one-way hash functions. So the whole concept of decrypting a hash does not abide to such hashing algorithms, such a concept will only abide by cryptographic algorithms which take a secret alongside the message and encrypt the content such as 3DES, AES algorithms.
Having said that, there are few hacks using which we can get the clear text password from the hash.
- Brute force attack
- Dictionary attack
- Rainbow table attack
Brute force attack uses a trial-and-error method to generate all combinations of strings. These strings are then passed to the MD5 algorithm to get the hash values. Now the hash to decrypt is looked at in the generated hashes, if found then we already know the plain text produced that hash. This way, even though we were not able to get the password from hash, we use the trial-and-error method to get the matching hash and trace it back to its plaintext.
The whole concept of using strong passwords is to protect against such brute force attacks. Based on the length and complexity of the password, it can take anywhere from a few seconds to many years to crack the password. If your password is let’s say ‘12345 ‘ It would take a second for a computer program to reach that string combination given a set of numbers. Whereas If your password follows a strong password policy like the use of both upper-case and lower-case letters, the inclusion of one or more numerical digits and special characters, such as @, #, $ there are obviously more number of combinations possible which takes years to crack the hash.
Here’s a python program that uses itertools library in python to generate hashes with different combinations of strings and prints out the plain text if the desired hash is matched.
from itertools import chain, product import hashlib import string def bruteforce(charset, maxlength): return (”.join(ch) for ch in chain.from_iterable(product(charset, repeat=i) for i in range(1, maxlength + 1))) passwordhash = input(“enter the hash to crack: “) for plaintext in bruteforce(string.ascii_lowercase, 10): bruteforcedhash = hashlib.md5(plaintext.encode(‘utf-8’)).hexdigest() if bruteforcedhash == passwordhash: print(“plain text of the given hash is”,plaintext) break |
Output:
>>> enter the hash to crack: 47bce5c74f589f4867dbd57e9ca9f808 >>> plain text of the given hash is aaa |
Dictionary attack is a type of brute force attack technique for defeating a cipher by trying to determine its decryption key or passphrase by trying thousands or millions of likely possibilities, such as words in a dictionary or previously used passwords instead of random string combinations.
Most commonly used passwords are zipped and are available online. Attackers use such a wordlist to crack the password hashes. Hence it’s recommended to not use the same password repeatedly or not to use default and easy passwords like admin@123 etc. Using a brute force method for a password like admin@123 takes years to crack as the program has to iterate through a set of numbers, special characters and characters whereas the same password hash can easily be cracked with help of a wordlist containing frequently used passwords or dictionary attack.
This python program below iterates through a words list instead of creating random word combinations and will try to find out the plaintext of the given hash.
import hashlib import string passwordhash = input(“enter the hash to crack: “) with open(“words.txt”,”r”) as fileobject: content = fileobject.readlines() for plaintext in content: wordhash = hashlib.md5(plaintext.encode(‘utf-8’)).hexdigest() if wordhash == passwordhash: print(“plain text of the given hash is “,plaintext) break |
Output:
>>> enter the hash to crack: 040173afc2e9520e65a1773779691d3e >>> plain text of the given hash is passw0rd! |
Rainbow table attack
None of the brute force or dictionary attacks stores the hashes of the words tried for future use, but a particular attack called Rainbow table attack stores hashes generated and keeps its database increasing to quickly crack any new hash. This helps in saving the computation power to generate new hashes. While storing all the hashes takes a lot of memory, the rainbow table attack forms a hybrid way of using a reduction function to store hashes in order to save both computation and memory usages.
While a hash function maps a plaintext to a hash, reduce function in Rainbow table maps hashes to plaintexts. Please note that it is not an inverse hash function. If the set of plaintexts is defined as [0-9]{6} ({6} specifies that the length of the plain text should be 6).
Let us consider a hash of a string MD5(123456) -> “e10adc3949ba59abbe56e057f20f883e”. For this hash, the reduction function fn_reduction() can be as simple as taking the first six characters from the hash i.e fn_reduction(“e10adc3949ba59abbe56e057f20f883e”) -> e10adc. Hence we are able to successfully map some plain text from the hash.This is the purpose of the reduction function.
Both the hash and reduction functions are one-way functions mapping either of the sets to each other, Rainbow tables are the chains of these hash and reduction functions which start at plaintext, hashes it and maps the hash, then uses reduction function to get the plain text, then again hashes the plain text to map to a different hash, and this process goes on until a threshold number of chain links are formed for each plain text. If an attacker has a rainbow table with the hash for the password “passw0rd!,” any user that uses that password will have the same hash, so that password can easily be cracked. Since most people use common passwords or reuse passwords, it makes cracking easy by using rainbow tables.
After generating many chains, rainbow table might look like this:
123456 -> e10adc3949ba59abbe56e057f20f883e
dafd2e -> db76bf5d3ab154f473a153ae7f5e0645
[……]
b6e002 -> 018891fbf4e2250899d5f5011035db17
After generating a couple of hashes for random plaintexts, we now have to search for our required hash in any of the above chains.
The algorithm goes like this:
- Look for the hash in the list of generated hashes that we just did, if we are able to find it, break it out of the loop
- If we cannot find the hash, then reduce the hash to get a plaintext, lets call it plain_text_1 and search for the Hash of plain_text_1, i.e MD5(plain_text_1)
- Keep iterating until at some point we come across the hash in the list of final hashes
- If any chain consists of the hash, then the chain that formed that hash will contain the original hash which means we can trace it back to its plain text
In this way, we can add intelligence to the algorithm where it only stores hashes that are linked to the required plaintext thus saving huge memory and processor resources.
But what are the preventive methods for these attacks? Thankfully there’s a concept called “Salting” If you are a developer having past experience of working with any secret tokens or passwords, you would have come across this technique. So before storing the hash of the password directly into the database, developers will add an extra string that is unique to each user which makes the hash nowhere related to the original password.
There are few guidelines while implementing this salting technique such as Not using very short and predictable salts, Using a random salt that really randomizes the hash from the actual password instead of constant salt being appended to the end or beginning. Likewise adding the user’s details like username, email as salt is too predictable too.
Finally, I would like to end this blog by saying that the MD5 hash is compromised and is a vulnerable hashing algorithm. Hashes generated from MD5 are prone to Hash collisions, which means by carefully tweaking a few bits of data before feeding to the algorithm and with the help of recent advancements in the processors field which lead to faster computing powers, researchers were able to produce the same hash for two different images. You can read more about the research in this article.