Hash Functions¶

What is a Hash Function?¶

A cryptographic hash function takes an input of arbitrary length and produces a fixed-size output (the hash or digest).

flowchart LR
    A(["Input\n(any size)"]) --> B["Hash\nFunction H(x)"]
    B --> C(["Output\n(fixed size)"])
    A2(["'Hello'"]) --> B2["H(x)"]
    B2 --> C2(["a591a6d4..."])
    A3(["'Hello World'"])  --> B3["H(x)"]
    B3 --> C3(["7f83b165..."])

    style A fill:#7c4dff,color:#fff
    style C fill:#00897b,color:#fff
    style A2 fill:#7c4dff,color:#fff
    style C2 fill:#00897b,color:#fff
    style A3 fill:#7c4dff,color:#fff
    style C3 fill:#00897b,color:#fff

Examples: - MD5: 128-bit output - SHA-1: 160-bit output - SHA-256: 256-bit output - SHA-3: Variable output

Properties of Cryptographic Hash Functions¶

1. Deterministic¶

Same input always produces same output.

\[H(\text{"Hello"}) = \text{"same hash every time"}\]

2. Fast to Compute¶

Should be efficient to compute hash for any input.

3. Pre-Image Resistance (One-Way)¶

Given hash \(h\), it should be computationally infeasible to find any message \(m\) such that:

\[H(m) = h\]

Analogy: Easy to scramble an egg, impossible to unscramble it.

4. Second Pre-Image Resistance (Weak Collision Resistance)¶

Given message \(m_1\), it should be infeasible to find a different message \(m_2\) such that:

\[H(m_1) = H(m_2)\]

Prevents: Attacker finding alternative message with same hash.

5. Collision Resistance (Strong Collision Resistance)¶

It should be infeasible to find any two different messages \(m_1\) and \(m_2\) such that:

\[H(m_1) = H(m_2)\]

Prevents: Attacker creating two messages with same hash.

6. Avalanche Effect¶

Changing one bit in the input should change approximately half the output bits.

Example:

H("Hello")  = a591a6d4...
H("Hella")  = c3d5e7f9...  (completely different!)

Birthday Paradox & Collision Attacks¶

The Birthday Problem¶

In a room of 23 people, there's a 50% chance two share a birthday.

Why? We're looking for any collision, not a specific match.

Birthday Attack on Hash Functions¶

For an \(n\)-bit hash: - Brute force pre-image: \(2^n\) attempts - Find collision: \(2^{n/2}\) attempts (much easier!)

Example: SHA-256 (256 bits) - Pre-image attack: \(2^{256}\) attempts - Collision attack: \(2^{128}\) attempts

Implication: Hash must be twice as long as desired security level.

Common Hash Functions¶

MD5 (Message Digest 5)¶

Output: 128 bits

BROKEN — Do not use!

Collisions found in 2004. Can generate collisions in seconds.

Still used for: Checksums (non-security), legacy systems

SHA-1 (Secure Hash Algorithm 1)¶

Output: 160 bits

DEPRECATED — Phase out!

Practical collision demonstrated in 2017 (Google's SHAttered attack).

Transition: Move to SHA-2 or SHA-3

SHA-2 Family¶

Variants: - SHA-224: 224 bits - SHA-256: 256 bits (most common) - SHA-384: 384 bits - SHA-512: 512 bits

SECURE — Widely used

Structure: Merkle-Damgård construction

Applications: - Bitcoin/blockchain - TLS/SSL certificates - Digital signatures - Password hashing (with salt)

SHA-3 (Keccak)¶

Output: Variable (224, 256, 384, 512 bits)

SECURE — Newest standard

Structure: Sponge construction (different from SHA-2)

Note: SHA-3 is not a replacement for SHA-2, it's an alternative.

Hash Function Structure¶

Merkle-Damgård Construction (SHA-1, SHA-2)¶

flowchart LR
    M["Message"] --> P["Padding\n+ length"]
    P --> B1["Block 1"]
    P --> B2["Block 2"]
    P --> B3["Block 3"]

    IV(["IV"]) --> C1["Compress"]
    B1 --> C1
    C1 --> C2["Compress"]
    B2 --> C2
    C2 --> C3["Compress"]
    B3 --> C3
    C3 --> H(["Hash"])

    style IV fill:#7c4dff,color:#fff
    style H fill:#00897b,color:#fff

Vulnerability: Length extension attacks

Sponge Construction (SHA-3)¶

flowchart LR
    subgraph Absorbing
        B1["Block 1"] --> A1["Absorb\n+ Permute"]
        B2["Block 2"] --> A2["Absorb\n+ Permute"]
        B3["Block 3"] --> A3["Absorb\n+ Permute"]
        A1 --> A2 --> A3
    end
    subgraph Squeezing
        A3 --> S1["Squeeze"]
        S1 --> H(["Hash Output"])
    end

    style H fill:#00897b,color:#fff

Advantage: No length extension attacks

Applications of Hash Functions¶

1. Data Integrity¶

Verify files haven't been modified.

Example: Download verification

# Publisher provides: file.zip + SHA-256 hash
$ sha256sum file.zip
a591a6d4...  file.zip

# Compare with published hash

2. Password Storage¶

NEVER store passwords in plaintext!

Store \(H(\text{password})\) instead.

Login verification: 1. User enters password 2. Compute \(H(\text{password})\) 3. Compare with stored hash

Problem: Rainbow tables (pre-computed hashes)

Solution: Add salt (random value)

\[H(\text{password} || \text{salt})\]

3. Digital Signatures¶

Sign the hash of a message, not the message itself.

\[\text{Signature} = \text{Sign}(H(\text{message}))\]

Why? - Faster (hash is small) - RSA can only sign limited data - Ensures integrity

4. Proof of Work (Blockchain)¶

Find input such that hash has specific properties.

Bitcoin mining: Find nonce such that:

\[H(\text{block} || \text{nonce}) < \text{target}\]

Must start with certain number of zeros.

5. Commitment Schemes¶

Commit to a value without revealing it.

Process: 1. Commit: Publish \(H(\text{value} || \text{nonce})\) 2. Reveal: Show value and nonce 3. Verify: Others compute hash to confirm

Message Authentication Codes (MAC)¶

What is a MAC?¶

A keyed hash - ensures message came from someone with the secret key.

\[\text{MAC}_K(m) = \text{tag}\]

Properties: - Requires secret key - Provides authentication and integrity - Does NOT provide confidentiality

HMAC (Hash-based MAC)¶

flowchart LR
    K["Secret Key K"] --> IP["K ⊕ ipad"]
    K --> OP["K ⊕ opad"]
    M["Message m"] --> IP
    IP --> H1["Hash"]
    H1 --> OP
    OP --> H2["Hash"]
    H2 --> T(["HMAC Tag"])

    style T fill:#00897b,color:#fff

\[\text{HMAC}_K(m) = H((K \oplus \text{opad}) \| H((K \oplus \text{ipad}) \| m))\]

Why two hashes? - Prevents length extension attacks - Stronger security proof

Password Hashing¶

BAD: Plain Hash¶

# DON'T DO THIS!
password_hash = sha256(password)

Problems: - Rainbow tables - Same password → same hash - Too fast (GPUs can try billions/second)

BETTER: Hash + Salt¶

salt = random_bytes(16)
password_hash = sha256(password + salt)
# Store: salt + password_hash

Improvements: - Each user has different salt - Rainbow tables won't work - Still too fast

BEST: Key Derivation Functions (KDF)¶

Use functions designed for password hashing:

bcrypt: - Adjustable work factor - Built-in salt - Slow by design

scrypt: - Memory-hard (GPU-resistant) - Configurable parameters

Argon2: - Winner of Password Hashing Competition - Memory-hard - Side-channel resistant - Recommended for new systems

import argon2

hasher = argon2.PasswordHasher()
hash = hasher.hash("my_password")
# Verify
hasher.verify(hash, "my_password")  # Returns True

Attacks on Hash Functions¶

1. Brute Force¶

Try random inputs until hash matches.

Defense: Use long hash (256+ bits)

2. Birthday Attack¶

Find any collision (easier than pre-image).

Complexity: \(O(2^{n/2})\) for \(n\)-bit hash

Defense: Hash must be 2× desired security level

3. Rainbow Tables¶

Pre-computed hash tables.

Defense: Use salts (makes tables impractical)

4. Length Extension Attack¶

For Merkle-Damgård hashes, given \(H(m)\), can compute \(H(m || m')\) without knowing \(m\).

Vulnerable: MD5, SHA-1, SHA-2

Defense: Use HMAC or SHA-3

Choosing a Hash Function¶

For Integrity/Checksums:¶

SHA-256 (industry standard)
SHA-3 (alternative)

For Digital Signatures:¶

SHA-256 or SHA-512
Match hash strength to signature key strength

For Password Hashing:¶

Argon2 (best)
bcrypt (good)
scrypt (good)
NEVER use MD5, SHA-1, or plain SHA-256

For Blockchain/PoW:¶

SHA-256 (Bitcoin)
Others use various algorithms

Key Takeaways¶

Hash functions are one-way, fixed-size outputs
Three main properties: pre-image resistance, second pre-image resistance, collision resistance
MD5 and SHA-1 are broken - don't use for security
SHA-256 is the standard for most applications
Birthday attack makes collisions easier (\(2^{n/2}\) vs \(2^n\))
Use salts for password storage
Use KDFs (Argon2, bcrypt) not plain hashes for passwords
HMAC provides authenticated integrity