Danni-Tech

Making the Complicated Simple

Key Points from This Video on Hashing:

  • What is Hashing?
    Hashing is the process of using a mathematical algorithm to produce a numeric value (digest) that represents the original data. This is often referred to as a fingerprint, digest, or simply a hash.

  • Simple Example of Hashing:

    • The word “friend” is hashed, with each letter represented by its alphabetical position, resulting in a numeric value of 56.
    • Even a small change (e.g., removing a letter to form “fried”) results in a completely different hash value, such as 42, demonstrating how hashes detect data changes.
  • Purpose of Hashing:
    Hashing is commonly used to verify data integrity by detecting changes in the original data.

  • Collisions in Hashing:

    • A collision occurs when two different inputs produce the same hash value.
    • Collisions are unavoidable because hash functions generate a fixed-size digest, which limits the number of possible outputs.
  • Example of a Collision:

    • Using a simple 2-bit hashing algorithm (SimpleHash2), there are only 4 possible outputs: 00, 01, 10, 11.
    • With more inputs than outputs, different messages will eventually share the same hash, causing a collision.
  • Common Hashing Algorithms:

    • MD5 – 128-bit digest
    • SHA-1 – 160-bit digest
    • SHA-224 – 224-bit digest
    • SHA-384 – 384-bit digest
    • SHA-512 – 512-bit digest
  • Demonstration Using Linux (Ubuntu via WSL):

    • The echo command is used to send data to standard output:
      echo "friend"
    • Data can be hashed using utilities like sha224sum:
      echo "friend" | sha224sum
    • To avoid adding a newline character, use the -n flag:
      echo -n "friend" | sha224sum
  • Key Observations from the Demo:

    1. Hashes are irreversible – You cannot determine the original message from the hash.
    2. Small changes produce drastically different hashes – Even a tiny modification alters the entire digest.
    3. Fixed-length output – Regardless of the input size, the hash output is always a fixed length.