Photo by Mauro Sbicego / Unsplash

Don't be a target: Password checking and hash functions

Sep 14, 2024

Handling with checking passwords is always a critical task and it is important to understand the mechanisms behind of it to create a proper solution. Also, understanding how common attacks are performed will help you to identify bad practices.

All examples will be written in Python, but the same principals can be applied to any programming language.

Why you should never store a password?

There is an important premise that you should never store a password in a database or any other type of storage. Even a password vault should never be used to store a password used only for authentication.

The main reason for that is to avoid any type of leak. If the database gets compromised, you will never leak everybody's password.

But if you are not going to save the password into a database, how can you can compare it in the future? For these scenarios, you should use hash functions.

What is a hash?

A "hash" can be saw as a fingerprint or identification of a set of data. Any "hash" created by the same "hash function" has a fixed length.

Example of how to use a simple hash function:

from hashlib import md5

# The hashlib.md5() input must be in bytes, that's
# why we use the `b` before the quotes
my_text_bytes = b"my very long string with a lot of useless information"

# Generate the hash object
hash_object = md5(my_text_bytes) 

# Now you can:

# Get the hexadecimal representation of the hash
print(hash_object.hexdigest())

# Get the binary representation of the hash
print(hash_object.digest())

hash-example.py

If you do not understand what an encode is or how the base works in mathematics, then I would recommend you to learn about how computers stores data and a bit of compression:

And also learn about the base64, one of the most used encodings in computers:
https://en.wikipedia.org/wiki/Base64

The bad hash for security

The hash function has a collision factor inherent to it. This collision factor is the probability of two or more inputs have the same output.

There are three concepts important to consider:

  • Any hash function will have collision. Whenever you digest any input to create a fixed-length output, you will have an infinity amount of collision.
  • A well-distributed probability of collision usually makes the function a better hash function for security.
  • A well-distributed function tends to be very costly to be calculated by a computer.

With those concepts in mind, we can conclude that:

  • If an uneven collision distribution is not a problem for your use-case, any hash function is suitable. In this scenario, you may want to have very fast hash algorithms like the MD5 or SHA-256.
  • For security applications, a well-distributed probability of collision is crucial and you may want to sacrifice performance to achieve it. Examples of great algorithms includes BLAKE2b, BLAKE3, and PBKDF2-HMAC.

So, why is important to have a well-distributed probability of collision for a hash function in a security applications? The simple answer is to avoid brute-force attacks.

If you want the long answer, I have another video from Computerphile for you:

The great hash for security

Now that you know what's wrong, let's create an example of a good hash implementation for passwords.

Important note: security solutions often becomes obsolete in the computers' world. If you are reading the later in the future, assume that my solution is wrong and check what your programming language or framework recommends.

First, let's create our salt. Salt is an unique random value, used for each derived key. This prevents attackers from reusing pre-computed tables ("rainbow tables") of hash values that they might have built for common passwords.

openssl rand -base64 32

create-salt.sh

This is a Python example of how to hash a password:

from hashlib import pbkdf2_hmac

our_pass = "nothing-is-true;everything-is-permitted"
our_pass_bytes = our_pass.encode('utf-8')

# A unique random salt is used for each derived key. This can be a value
# you pull from a vault during the application startup
salt = "deA2aT7aT315Yq2fYlQkipeqHH5l+IJreV/5BYFFTCs=".encode('utf-8')

# The iteration number will consume CPU time but you should use
# a big number like this one or bigger.
our_app_iters = 500000

# Calculate the hash
dk = pbkdf2_hmac('sha256', our_pass_bytes, salt, our_app_iters)

# Print in hex format
dk.hex()

# It will return:
>>> 'bd7921ecad9d7802ac20702915026f1ca39238d9073bfa10a2015f1600b106f6'

pbkdf2_hmac_example.py

Check the official documentation for more details:

hashlib — Secure hashes and message digests
Source code: Lib/hashlib.py This module implements a common interface to many different secure hash and message digest algorithms. Included are the FIPS secure hash algorithms SHA1, SHA224, SHA256,…

The example above uses the PBKDF2-HMAC (Password-Based Key Derivation Function 2 - Hash-based Message Authentication Code) function. This is a very long name for a hash function that tries to make impractical the brute-force attacks from discovering the original passwords.

The important thing to consider is:

  • Never save the salt value in the same database as your password hashes. If your database gets leaked, nobody will be able to easily recalculate the hashes using brute-force.

Now you have everything you needed to implement a password check system! 😄

See ya and don't be a target.

References

Luiz Costa

I am a senior software engineer at Red Hat / Ansible. I love automation tools, games, and coffee. I am also an active contributor to open-source projects on GitHub.