Docker & Containerization - Packaging Code So It Runs the Same Everywhere // Megha Bose

Helpful context:

Operating Systems - The Software That Runs All Other Software

“Works on my machine.”

The phrase that ended careers. The phrase that made release day a negotiation between developers and operations. The phrase that, more than any other in software engineering, represents the gap between writing code and running it reliably.

Solomon Hykes demoed Docker at PyCon 2013 with five minutes of live coding and a container that started in seconds. By the end of that year, GitHub was flooded with Dockerfiles. By 2016, “containerize your app” was standard engineering advice. By 2020, if you weren’t using containers in production, you were explaining why.

The “works on my machine” problem is a packaging problem. An application depends on a specific Python version, a specific OpenSSL build, a specific set of shared libraries, environment variables set just so. The developer’s machine has those things. The CI server has different versions. Production has a third configuration. The application works somewhere and fails somewhere else, and debugging the difference is miserable.

Docker’s answer: package the application with everything it needs. Not just the code - the runtime, the libraries, the configuration. The image is the artifact that ships, not the source code.

What Containers Actually Are

The common explanation is wrong: containers are not lightweight VMs. A VM virtualizes hardware - it runs a complete operating system, including a kernel, on top of a hypervisor. Each VM is isolated at the hardware level.

A container shares the host operating system’s kernel. There is no second kernel. What provides isolation is two Linux kernel features that have existed for over a decade before Docker made them accessible:

Namespaces partition kernel resources so that processes inside a namespace see only what belongs to that namespace. There are several namespace types, each isolating a different resource:

PID namespace: the container’s process tree starts at PID 1, independent of the host’s PID space. The container’s init process has no visibility into host processes.
Network namespace: the container has its own network interfaces, routing table, and port space. Port 8080 in the container is not port 8080 on the host until explicitly mapped.
Mount namespace: the container has its own filesystem view. The host’s filesystem is not visible inside the container unless explicitly mounted.
User namespace: user IDs inside the container map to different user IDs on the host. Container root (UID 0) can map to an unprivileged user on the host.

cgroups (control groups) limit and account for resource usage. A cgroup can enforce that a container uses at most 2 CPU cores and 512MB of RAM, throttling its access to resources when it exceeds limits. This prevents one container from starving others on the same host.

Together, namespaces and cgroups are the two kernel primitives that make containers work. Docker (and other runtimes) are orchestration layers that make it easy to create and manage namespace/cgroup configurations. The kernel does the actual isolation.

This distinction matters: containers are not as isolated as VMs. The host kernel surface is exposed to all containers. A kernel vulnerability can be exploited from inside a container to affect other containers or the host. Container escapes are real, documented, and have appeared in CVEs. This is not hypothetical.

The History That Made Docker Possible

1979: chroot. The chroot syscall changes the apparent root directory for a process, isolating its filesystem view. Used in the original Unix to build software in a clean environment. It’s the conceptual ancestor of containers, though it provides no resource isolation and is not a true security boundary.

2002: Linux namespaces. The Linux kernel gains PID namespace support, followed over the next decade by network, mount, UTS, and IPC namespaces. These are the fundamental building blocks.

2007: cgroups. Google engineers (Paul Menage and Rohit Seth) contribute cgroups to the Linux kernel. Google had been running their internal workloads with cgroup-like technology (called “process containers”) for years. cgroups let you limit, account, and isolate resource usage per process group.

2008: LXC (Linux Containers) combines namespaces and cgroups into a container format. It’s usable but requires significant manual configuration.

2013: Docker. Docker packages LXC (later replaced with its own runtime, libcontainer) into a developer-friendly tool. The critical innovations: the Dockerfile (a reproducible, version-controlled recipe for building an image), the layered image format (enabling fast incremental builds and efficient storage), and Docker Hub (a public registry making it trivial to share and distribute images). Docker made containers accessible to people who had never heard of namespaces.

2015: OCI. The Open Container Initiative standardizes the container image format and runtime interface. Docker donates its specifications. Now any OCI-compliant runtime can run any OCI-compliant image. Docker is no longer the runtime - it’s a runtime.

Image Layers and Union Filesystems

A Docker image is not a monolithic blob. It’s a stack of read-only layers. Each Dockerfile instruction that modifies the filesystem (RUN, COPY, ADD) creates a new layer. The final image is the union of all layers, presented as a single filesystem via a union filesystem driver (OverlayFS is the modern default on Linux).

When a container runs, a thin writable layer is added on top of the read-only image layers. Writes go into the writable layer (copy-on-write). The underlying image layers are never modified.

Why this matters: layers are cached and shared. If you build an image with a Python base layer, that layer lives once on disk - ten images built on the same Python base all reference the same layer. Rebuilds that haven’t changed a layer use the cached version from disk. A Dockerfile that copies source code (which changes frequently) should put those COPY instructions late, after package installation (which changes infrequently), to maximize cache hits.

The layer order in a Dockerfile has real performance consequences:

# Bad: cache-busting order
COPY . /app
RUN pip install -r /app/requirements.txt

# Good: stable dependencies first
COPY requirements.txt /app/
RUN pip install -r /app/requirements.txt
COPY . /app

In the bad version, any source code change invalidates the pip install layer. In the good version, pip install is only re-run when requirements.txt changes.

Multi-Stage Builds

A compiled language (Go, Java, Rust) needs a compiler to build but not to run. A naive Dockerfile installs the compiler, compiles the binary, and ships an image containing both - resulting in images that are hundreds of megabytes larger than necessary, with build tools that increase the attack surface.

Multi-stage builds solve this. You write multiple FROM instructions in one Dockerfile. The final stage copies only the artifacts it needs from earlier stages:

FROM golang:1.22 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o server .

FROM gcr.io/distroless/static-debian12
COPY --from=builder /app/server /server
ENTRYPOINT ["/server"]

The builder stage has the Go toolchain, all the source code, and the module cache. The final stage has only the compiled binary and a minimal base image (distroless - no shell, no package manager, no utilities, just libc). The resulting image is typically under 20MB instead of hundreds.

Container Registries

An image built on a developer’s laptop needs to get to the production cluster. Container registries serve this function: push an image to the registry, pull it from any machine with network access.

Docker Hub is the default public registry. Public images are free; private images require a paid tier. Docker Hub rate-limits pulls for unauthenticated users - in 2020 this caused widespread CI/CD failures when pipelines pulling base images from Docker Hub hit rate limits. The wake-up call for organizations to mirror their base images.

AWS ECR (Elastic Container Registry) integrates with IAM - push and pull permissions are IAM policies. No rate limits within AWS. ECR’s VPC endpoints let EC2 instances and ECS tasks pull images without traversing the public internet. If you’re running on AWS, ECR is the natural choice.

GCR/Artifact Registry is Google’s equivalent. GitHub Container Registry (GHCR) lets you host images alongside your source code, with access controlled by GitHub permissions.

Image scanning is a critical CI/CD step. Trivy (open source) and Snyk scan images for known CVEs in base image packages and application dependencies. An unscanned image might be running with a three-year-old version of OpenSSL with twenty known vulnerabilities. Most organizations enforce image scanning as a CI gate.

The Security Reality

Containers share the host kernel. This is the fact that makes container security different from VM security.

A virtual machine’s guest kernel is isolated from the host kernel by the hypervisor. A kernel exploit inside a VM affects only that VM (and possibly the hypervisor, but hypervisor exploits are separately analyzed). A kernel exploit from inside a container affects the host kernel - and potentially all other containers on the same host.

This is why container escapes are possible. CVE-2019-5736 allowed a container process to overwrite the host’s runc binary by exploiting a file descriptor leak during container exec. CVE-2020-15257 (Containerd shim API access) allowed containers to escalate to host. These are not theoretical; they are CVEs with proof-of-concept exploits.

gVisor is Google’s response. gVisor interposes a user-space kernel (written in Go) between the container and the host kernel. Container syscalls go to gVisor, which reimplements the Linux syscall interface in user space, forwarding only a limited set to the host kernel. This dramatically reduces the host kernel attack surface. Google runs its Cloud Run product on gVisor.

Firecracker is Amazon’s response for their Lambda and Fargate products. Firecracker is a microVM manager: lightweight VMs that start in 125ms and use ~5MB of overhead. Lambda functions actually run in Firecracker microVMs, not in containers - you get VM-level kernel isolation with near-container startup speed. This is the isolation model for untrusted workloads.

Docker’s Ecosystem Complications

Docker Desktop changed its licensing in 2022: free for personal use, paid subscription required for business use (companies with over 250 employees or over $10M in revenue). This triggered significant evaluation of alternatives.

Podman is the primary alternative. Podman is daemonless - it doesn’t run a background daemon process (Docker requires dockerd running as root). Each podman command runs directly as the calling user. Podman is rootless by default, which means container processes run without root privileges on the host even if they’re root inside the container. It’s compatible with Docker’s CLI (many organizations alias docker=podman). Red Hat distributes Podman as their preferred container tool.

containerd is the underlying runtime that Docker uses (Docker is a high-level tool that calls containerd). Kubernetes deprecated its Docker integration (dockershim) in 2022 in favor of direct containerd integration. The Kubernetes ecosystem moved on; Docker’s high-level tooling is still useful for development but is no longer in the container runtime critical path for production clusters.

OCI standardization means none of this is vendor lock-in. An OCI-compliant image built by Docker runs on containerd, runs on Podman, runs on Kubernetes. The ecosystem is standardized at the image format level.

The Path to Kubernetes

A single container on a single host is straightforward. Production means many containers across many hosts, with requirements that no single container tool handles:

Scheduling containers onto hosts with available capacity
Replacing failed containers automatically
Rolling updates without downtime
Service discovery (containers need to find each other without hardcoded IPs)
Load balancing across container replicas
Secret management (environment variables with credentials, injected securely)
Persistent storage (databases need disks that survive container restarts)

This is container orchestration, and Kubernetes is the answer the industry converged on after a brief period of competition (Docker Swarm, Mesos, Nomad). AWS ECS is the AWS-native alternative - simpler to operate, tightly integrated with ALB, IAM, and ECR, less powerful than Kubernetes. ECS is a good choice for organizations that want managed container infrastructure without Kubernetes' learning curve.

EKS (Elastic Kubernetes Service) runs Kubernetes on AWS. GKE (Google Kubernetes Engine) is the Google Cloud equivalent - GKE is generally considered the best-managed Kubernetes offering given that Google created Kubernetes internally (as Borg) and runs it at a scale no other organization matches.

Multi-Region Container Deployments

Running containers globally introduces registry and image pull latency. ECR is regional - an image in us-east-1 requires replication to eu-west-1 to avoid cross-region pull latency. ECR supports replication rules that automatically copy images to specified regions on push.

Container image scanning also becomes more complex: a vulnerability discovered in a base image requires patching and rebuilding the derived image in every region. CI/CD pipelines need to coordinate multi-region deployments - rolling out a new image version to US-EAST-1 first, validating, then deploying to EU-WEST-1 and AP-SOUTHEAST-1. AWS CodeDeploy and GitHub Actions with multi-region deployment steps handle this, but the orchestration is non-trivial.

Data sovereignty for containerized workloads: containers themselves are stateless and region-agnostic, but the storage they connect to (RDS, S3, DynamoDB) is regional. A container in the EU region connecting to a US-EAST-1 database violates data residency requirements for EU user data. Containerization doesn’t abstract away geography; it makes it easier to run in multiple regions, but the data architecture must be region-aware.

Future Outlook

WebAssembly (WASM) containers are the most interesting near-term development. WASM bytecode is more portable than container images (no architecture-specific layers), starts faster than containers (microseconds vs seconds), and has a smaller attack surface (a deliberately constrained instruction set). WASI (WebAssembly System Interface) is the standard for running WASM outside the browser, with filesystem and network access. Projects like Spin (Fermyon) and WasmEdge let you deploy WASM components as microservices. Solomon Hykes (Docker’s creator) has said that if WASM had existed in 2013, Docker might not have been needed.

Confidential containers encrypt the container’s memory with hardware-based trusted execution environments (Intel TDX, AMD SEV). The host cannot read the container’s memory - not even the cloud provider’s operator can see what’s running. This matters for healthcare and financial workloads where even the infrastructure operator must not have access to data. AWS Nitro Enclaves and Azure Confidential Containers are early implementations.

The container model has won for cloud-native deployment. The questions that remain are about isolation guarantees (containers vs microVMs vs confidential containers) and packaging formats (OCI containers vs WASM). The trajectory is toward smaller, faster, more isolated execution units - not away from the fundamental insight that packaging an application with its dependencies is the right abstraction.

Summary

Dimension	Containers	VMs	Firecracker microVMs	WASM
Startup time	Seconds	Minutes	~125ms	Microseconds
Kernel isolation	Shared host kernel	Full isolation	Full isolation	Sandboxed (no kernel access)
Image size	MB-range	GB-range	MB-range (custom kernel)	KB-range
Container escape risk	Real	Low	Low	Very low
Portability	OCI standard (high)	Architecture-dependent	AWS-specific	Architecture-independent
Best for	Web services, microservices	Legacy apps, untrusted workloads	Serverless functions (Lambda)	Edge compute, plugins

Read Next:

CI/CD - Deploying Early, Deploying Often, Deploying Without Fear