Docker & Containerization
Prerequisite:
Overview
“It works on my machine” is one of the most common - and most expensive - phrases in software engineering. It usually means an application depends on a specific OS version, library, or environment variable that differs between the developer’s laptop, the CI server, and production. Containers solve this by packaging the application together with everything it needs to run.
Docker is the most widely used containerization platform. A Docker container is a lightweight, isolated process that shares the host OS kernel but has its own filesystem, network stack, and process namespace. This makes containers far faster to start than virtual machines and nearly as portable.
Containers vs Virtual Machines
Virtual machines virtualize hardware: each VM includes a full OS kernel, which takes significant resources and minutes to boot. Containers share the host kernel and use two Linux kernel features for isolation:
- Namespaces: provide isolation for process IDs, network interfaces, mount points, user IDs, and more - each container sees only its own namespace
- cgroups (control groups): limit and account for CPU, memory, disk I/O, and network resources - preventing one container from starving others
The result is containers that start in seconds, use tens of megabytes rather than gigabytes of overhead, and can run many per host without conflict.
Docker Architecture
Docker uses a client-server model:
- Docker daemon (
dockerd): the background process that manages images, containers, networks, and volumes - Docker client (
dockerCLI): sends commands to the daemon via a REST API - Images: read-only templates built from a Dockerfile, stored in layers
- Containers: running instances of images - writable layers stacked on top of the image layers
- Registry: a repository of images (Docker Hub is the default public registry; you can run private registries like AWS ECR or GitHub Container Registry)
Dockerfiles and Layers
A Dockerfile defines how to build an image step by step. Each instruction (RUN, COPY, ADD) creates a new layer. Layers are cached: if a layer and all its inputs are unchanged, Docker reuses the cached layer, making rebuilds fast.
Key instructions:
FROM: base image - always start with the smallest appropriate imageRUN: execute a shell command (install packages, compile code)COPY: copy files from the host into the imageENV: set environment variablesEXPOSE: document which port the container listens on (does not publish it)CMD: default command to run - can be overridden at runtimeENTRYPOINT: fixed executable for the container - CMD becomes default arguments
Image Best Practices
Use minimal base images. Alpine Linux is a popular choice at ~5MB. Prefer distroless images (no shell, no package manager) for production to reduce the attack surface. Avoid ubuntu:latest when python:3.12-slim exists.
Use multi-stage builds. Separate the build environment (compilers, build tools, dev dependencies) from the runtime image. The final stage copies only the compiled artifact from the build stage, keeping the production image small and free of build tools.
Run as a non-root user. The default is to run as root inside the container, which is a security risk - if the container is compromised, the attacker has root on the container filesystem. Add a RUN useradd -r appuser and USER appuser before CMD.
Docker Compose
docker-compose (or docker compose in newer versions) defines multi-container applications in a single YAML file. You specify services, their images or build contexts, environment variables, ports, networks, and volumes. A single docker compose up starts the entire stack.
Networks in Compose are created automatically - services can reach each other by service name (DNS resolution within the Docker network). This replaces hard-coded IP addresses with stable hostnames.
Volumes and Bind Mounts
Containers are ephemeral - their writable layer disappears when the container is removed. For persistent data:
- Volumes: managed by Docker, stored in Docker’s data directory, survive container removal, portable between containers
- Bind mounts: mount a host directory directly into the container - useful for development (live code reload), but couples the container to the host filesystem layout
For production databases and stateful services, use named volumes. For development iteration, bind mounts enable rapid feedback without rebuilding images.
Networking
Docker provides several network drivers:
- Bridge (default): containers on the same bridge network can communicate; traffic to the host is NAT’d
- Host: container shares the host’s network stack directly - no isolation, but useful for performance-sensitive workloads
- Overlay: multi-host networking used by Docker Swarm and Kubernetes for container-to-container communication across machines
Container Security
Containers are not a security boundary equivalent to VMs. Hardening steps include:
- Read-only filesystem: use
--read-onlyto prevent the container from writing to its own filesystem (mount specific writable directories as tmpfs or volumes) - seccomp and AppArmor: kernel security profiles that restrict which system calls and operations containers can perform
- Rootless containers: run the Docker daemon itself without root privileges - Podman does this by default
- Scan images: tools like Trivy or Grype scan images for known CVEs in base image packages
Docker vs containerd vs Podman
Docker uses containerd as its low-level runtime. Kubernetes deprecated the Docker daemon (dockershim) in favor of direct containerd integration. Podman is a daemonless alternative that runs containers as the calling user (rootless by default), is compatible with Docker commands, and is increasingly preferred in security-conscious environments.
Examples
Multi-stage build for a Python app. Stage 1 (builder): install build dependencies with pip install --user, compile any C extensions. Stage 2 (runtime): start from python:3.12-slim, copy only the installed packages from /root/.local in the builder stage, copy the application code, set USER to a non-root user, and set the CMD. The resulting image contains no pip, no compiler, and no development dependencies.
docker-compose for web + database + Redis. Define three services: web (build from local Dockerfile, expose port 8000, depends on db and redis), db (use postgres:16-alpine, set env vars for credentials, mount a named volume for data), redis (use redis:7-alpine). All three join the same default network and can reach each other by service name.
Health check configuration. Add a HEALTHCHECK instruction to your Dockerfile: HEALTHCHECK --interval=30s --timeout=3s CMD curl -f http://localhost:8000/health || exit 1. Docker will mark the container unhealthy if the check fails, enabling orchestrators like Kubernetes (or docker-compose depends_on with condition: service_healthy) to wait for the service to be ready before routing traffic.
Read Next: