Technology

How sites (hopefully) store our passwords so that they don’t leak out in public and in data leaks

When we register for an online service, whether it’s an email account, an e-shop or a forum, we usually have to choose a password to log in with. We could say that the administrators of the site in question can then simply look at their database and immediately see what password we have chosen. Fortunately, there are methods to prevent this from happening, so that even the owners of sites and databases don’t know the password, but it is always about the credibility of the site in question.

So how should a service store the combination of our login name and password so that everything is as secure as possible?

Why online services do not store passwords in their original form

Storing passwords in their original form seems like the easiest option. If a website doesn’t bother with security, it even does exactly that. But a little extra work is also in the interest of sites that accept registrations. With a data leak, which could easily happen, the site in question could be in major trouble because it failed to secure and encrypt the data sufficiently.

So what should proper security look like, at least in principle? Admittedly, we can’t always verify this, but we can sleep a little easier knowing that our passwords (at least usually) don’t appear on sites in their original form (sometimes this unencrypted form is called plain text).

Passwords are stored in encrypted form

First, let’s start a little broadly. The basis of encrypting passwords in order to store them is the so-called hash function. This works simply by giving some input (i.e. the necessary password, but in principle it can be any text), using a mathematical formula to encrypt this data, and we get some output that bears no resemblance whatsoever to the original form of the password, which in practice is a seemingly completely random blob of characters, which we call a hash (read hash).

  • Example of creating a simple hash: The input to our hash function must be a prime number, and our mathematical formula is that we square this prime number and take the 5th to 10th decimal places, which we additionally write in backslashes. If we take a calculator and calculate the square root of the prime number 3, we get the result 1.7320508075 … If we take the 5th to 10th decimal (highlighted in the result), we get 508075, and according to our fancy function, we still invert it, so we get 570805, which is our final hash.

The principle is that the hash function should relatively easily convert the input to output, but (ideally) impossible to deduce from the resulting output what the original input was. If we only knew the output of 815780, it would be quite difficult to figure out that the original input prime number was 9973. While we know exactly how to derive the hash, figuring out the original prime number in practice means that we simply have to try and try before we figure out the result.

With the computing power of today’s computers, this would be a matter of moments, but we are more concerned with the principle of operation. The mathematical functions that convert an input password into a hash are considerably more complicated and are capable of handling arbitrary input, i.e. not just prime numbers, as in the example. You can try out the different hashing methods and see what the resulting hashes look like, for example, here. An important condition of a hash is also its unpredictability. Changing a single character in the input should produce a completely different looking output.

What was all that for?

We have now shown how to encrypt some text into a hash, at least in principle. When storing passwords, this is absolutely crucial. This is because if we create an account somewhere with a password, that password will be encrypted into a form from which it is impossible to retrieve the original password.

Then, whenever we log in, the password we enter is encrypted with an identical mathematical formula and compared to the encrypted version stored in the database. If the result is the same, i.e. the password is the same as the one entered during registration, then the login is successful.

Thus, the resulting password may never be stored in the database in its original form. It is sufficient to know the resulting hash and the function that was used to arrive at that result. Even with knowledge of the function, however, the password cannot be derived backwards. This storage of passwords has the advantage that in a data leak, the hacker knows only the resulting hash, not the password itself. Cracking the hash should be as difficult as trying to guess the password itself, so it’s not really worth it for attackers to try. Another advantage is that even the site owners don’t know our password and won’t be able to find it out from their database.

Hashing may not protect against a dictionary attack – how about salting the password this way?

We wrote about what a dictionary attack is here. In a nutshell, it is a type of brute force attack where the attacker tries to figure out our password by trying it. But a dictionary attack is a much more sophisticated method because it doesn’t try passwords completely at random, but uses knowledge and tendencies. Thus, as passwords, the attacker, or his software, tries common words, but also information he knows about the account holder he wants to crack.

The problem with hashing is that some users use the same passwords. And if we use the same hashing method on the same password, the same hash comes out. So if an attacker could get access to the password database, he can compare if he can find the same hashed passwords for some users, because those users use the same password. The dictionary attack is also insidious in that the attacker can use the same hashing method to create these ciphers for the most commonly used passwords, so it is then no problem for him to just compare his list of known hashes with the passwords he finds in the database. Simply put, even hashing won’t prevent an attacker from figuring out the most commonly used passwords.

Now comes the next level of security, called salted hashing, which translates to salted hashes. This method works by appending some random string of characters to each password before hashing. This string can be public, someone uses the user’s login name as this string, for example.

But the string should be different for each user (which is why using the login name works so well). When hashing, changing a single character also matters, which will produce a completely different hash. By adding something to the password before hashing, the resulting hash is also completely different and completely unpredictable. Even users who use identical passwords will have their passwords encrypted under a different hash. For a legitimate login, the only thing that happens is that a given salt is added to the password, and only then is the stored hash verified.

Using a salt thus better protects against dictionary attacks, because if the attacker doesn’t know where or even if the salt was used, he can’t easily verify what the same passwords users are using. Of course, he can try adding a random public salt to known passwords, but that’s a lot of extra work for him. although it is not downright bulletproof protection, an attacker is likely to be at least discouraged from trying to crack passwords, and will have a very difficult task overall to figure out the hashes of even the most widely used passwords.

We will never properly know the security of websites

Beware, though, that we never actually see the so-called under-the-hood workings of websites as users. If a service uses encryption or not, we don’t actually have much way of knowing. The second problem is that even encryption methods can be broken over time. What looks impossible today may one day be mathematically solved in a way that makes it easy for hackers to figure out the original shape of the hash.

This is simply another layer of security for accounts and our passwords, but even encryption is not all-pervasive, and it is always possible that passwords will still be leaked in some other way, or that the hashes in question will be cracked over time. Thus, if we discover that a service we use has leaked data, we should change the password on that service and any passwords that resemble the one that was leaked, albeit in encrypted form, for example.

Related Articles

Back to top button