SHA stands for Secure Hashing Algorithm. There are 4 families of SHA (SHA-0, SHA-1, SHA-2 and SHA-3), some with sub variants (SHA-256, etc.). It is, as the name implies, a secure algorithm used for hashing data.
What is hashing?
Hashing is an algorithm that applies a mathematical function to a set of data, and condenses it to a fixed size. For a given data input, the resulting output is known as a hash.
Hashing algorithms need to provide deterministic, unique and irreversible hashes. Deterministic means that the same input gives us the same output every time. Unique means that no two inputs give us the same output (each unique input should result in a unique output). Lastly, it need to be irreversible, meaning that if someone had the hash, they would not be able to reproduce the original data. In other words, hashing is a one-way function. This keeps the original data secret.
What is hashing used for?
What might one use hashing for? A number of things…
Hopefully you aren’t storing your passwords in plaintext, and are instead (salting and) hashing them, and storing the hashes in the database. Then, when a user enters their password, you hash it and compare the (guaranteed unique!) hash outputs to see if their password is valid.
Additionally, you could use hashing to verify files or messages. You can hash a file or message before and after transmission to be sure that there was no tampering.
Hashes are also sometimes used to prove work, like in some cryptocurrency systems.
Hashing can be found in a number of protocols as well, like TLS, PGP, SSH, and so on.
SHA-0 was first published in 1993, but had a relatively short life, as a security vulnerability was discovered, and it fell out of use by 1995 (when it was replaced by SHA-1). Starting in 1998 and continuing on through 2008, a number of full- or near-collisions were found. As mentioned earlier, hashes need to produce unique outputs given unique inputs. Collisions means that two unique inputs produce the same output. No bueno.
SHA-1 was published in 1995, designed by the NSA. It was considered secure for about 10 years (since 2005, considered breakable with sufficient computing power / money…). Most organizations had moved on to SHA-2 by 2010, and many browsers finally stopped supporting SHA-1 SSL certs in 2017.
For a given input, the SHA-1 algorithm produces a 160-bite hash, or message digest, as its output.
In 2017, CWI and Google announced they found a SHA-1 collision.
Another NSA-designed family of algorithms, SHA-2 was introduced in 2001. SHA-2 was introduced to replace SHA-0 and SHA-1.
It is a family of 6 functions: SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256. Their digests (hash outputs) vary in length, which can be determined from the name: 224, 256, 384, 512, 224 and 256 bytes, respectively.
SHA-2 is vulnerable to a length extension attack. An attacker, knowing the hash of a message, and the length of that original message, can create a new hash that includes additional information, yet still passes inspection.
Finally, SHA-3 is another family of hashes (SHA3-224, SHA3-256, SHA3-384, SHA3-512, SHAKE128, SHAKE256). It was released in 2015, and created by Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche (so, not the NSA).
Because SHA-3 has a different mathematical basis (“sponge construction”) than SHA-2 (Merkle-Damgård), it isn’t susceptible to length extension attacks. Unlike SHA-2, SHA-3 wasn’t created to replace earlier versions.
Try it out for yourself! On a Unix command line, you can use
shasum to test out different SHA variants.
You can input a file, or use standard i/o. The
-a denotes the algorithm. More info here.
echo "The old SHA-1 is dead, long live the new SHA-2" | shasum -a 1 outputs
echo "The old SHA-1 is dead, long live the new SHA-2" | shasum -a 256 outputs
As expected, they’re unique, and are also different lengths according to the SHA type. Neat!