Don't be a target: Password checking and hash functions
Handling with checking passwords is always a critical task and it is important to understand the mechanisms behind of it to create a proper solution. Also, understanding how common attacks are performed will help you to identify bad practices.
All examples will be written in Python, but the same principals can be applied to any programming language.
Why you should never store a password?
There is an important premise that you should never store a password in a database or any other type of storage. Even a password vault should never be used to store a password used only for authentication.
The main reason for that is to avoid any type of leak. If the database gets compromised, you will never leak everybody's password.
But if you are not going to save the password into a database, how can you can compare it in the future? For these scenarios, you should use hash functions.
What is a hash?
A "hash" can be saw as a fingerprint or identification of a set of data. Any "hash" created by the same "hash function" has a fixed length.
Example of how to use a simple hash function:
If you do not understand what an encode is or how the base works in mathematics, then I would recommend you to learn about how computers stores data and a bit of compression:
And also learn about the base64
, one of the most used encodings in computers:
https://en.wikipedia.org/wiki/Base64
The bad hash for security
The hash function has a collision factor inherent to it. This collision factor is the probability of two or more inputs have the same output.
There are three concepts important to consider:
- Any hash function will have collision. Whenever you digest any input to create a fixed-length output, you will have an infinity amount of collision.
- A well-distributed probability of collision usually makes the function a better hash function for security.
- A well-distributed function tends to be very costly to be calculated by a computer.
With those concepts in mind, we can conclude that:
- If an uneven collision distribution is not a problem for your use-case, any hash function is suitable. In this scenario, you may want to have very fast hash algorithms like the
MD5
orSHA-256
. - For security applications, a well-distributed probability of collision is crucial and you may want to sacrifice performance to achieve it. Examples of great algorithms includes
BLAKE2b
,BLAKE3
, andPBKDF2-HMAC
.
So, why is important to have a well-distributed probability of collision for a hash function in a security applications? The simple answer is to avoid brute-force attacks.
If you want the long answer, I have another video from Computerphile for you:
The great hash for security
Now that you know what's wrong, let's create an example of a good hash implementation for passwords.
Important note: security solutions often becomes obsolete in the computers' world. If you are reading the later in the future, assume that my solution is wrong and check what your programming language or framework recommends.
First, let's create our salt
. Salt is an unique random value, used for each derived key. This prevents attackers from reusing pre-computed tables ("rainbow tables") of hash values that they might have built for common passwords.
This is a Python example of how to hash a password:
Check the official documentation for more details:
The example above uses the PBKDF2-HMAC (Password-Based Key Derivation Function 2 - Hash-based Message Authentication Code) function. This is a very long name for a hash function that tries to make impractical the brute-force attacks from discovering the original passwords.
The important thing to consider is:
- Never save the
salt
value in the same database as your password hashes. If your database gets leaked, nobody will be able to easily recalculate the hashes using brute-force.
Now you have everything you needed to implement a password check system! 😄
See ya and don't be a target.
References
- Key derivation - Secure hashes and message digests
https://docs.python.org/3/library/hashlib.html#hashlib.pbkdf2_hmac - Rainbow Tables - Web Development - Udacity
https://www.youtube.com/watch?v=SOV0AeHuHaQ