Reproducible Environments - Making Sure It Runs on Someone Else's Machine
Helpful context:
The Production Bug That Wasn’t
Six months after shipping a machine learning model, a colleague opens a bug report: the model’s outputs have changed. The code hasn’t been touched. The data hasn’t changed. But the numbers are different.
You trace the issue. Somewhere in the dependency tree, numpy updated from 1.24.3 to 1.26.0. A transitive dependency - one you didn’t directly depend on, one you probably didn’t even know was in your stack - pulled in the new version. NumPy’s random number generation behavior changed subtly between those versions. The model’s results are now unreproducible against the original baseline.
This is not a pathological edge case. It happens constantly, in every team that doesn’t actively manage dependencies. The Python packaging ecosystem’s default behavior is to install the latest compatible version of everything, every time. “Compatible” means many things; “latest” means time-dependent; and the combination means that pip install my-project next week produces a different environment than pip install my-project today.
Reproducible environments are the solution. The philosophy is simple: every dependency, direct and transitive, pinned to an exact version, stored in version control, installed identically everywhere.
A Brief History of Python Packaging
Understanding how Python packaging evolved explains why it’s still fragmented and why the “right” answer changes every few years.
pip arrived in 2008 as a replacement for easy_install. It introduced the concept of packages hosted on PyPI, installable with a single command. A miraculous improvement at the time. But pip’s resolver, for most of its history, was not a real dependency resolver - it installed things greedily and hoped for the best. Conflicts were your problem to discover.
virtualenv (2007, standardized as venv in Python 3.3) solved the isolation problem: each project gets its own Python installation, isolated from system Python and from other projects. This was essential but orthogonal to reproducibility.
requirements.txt became the lingua franca for pinning: run pip freeze > requirements.txt, commit the file, have others pip install -r requirements.txt. This worked, mostly. The problems were subtle: pip freeze captures everything in your environment, including tools you installed interactively, and it provides no signal about which packages are direct dependencies vs. transitive ones.
pipenv (2017) tried to solve this by introducing a Pipfile for direct dependencies and Pipfile.lock for the full pinned tree. It was the right idea. The execution was troubled - a slow resolver, confusing behavior, and eventually slow maintenance.
Poetry (2018) got the design right: a clean pyproject.toml for direct dependencies, poetry.lock for the full pinned tree, and a proper dependency resolver. Poetry became the community favorite for library authors and is still widely used.
uv (2024, from Astral) is written in Rust and is roughly 10-100x faster than pip at dependency resolution. It is rapidly becoming the default recommendation for new projects. Its lockfile format, uv.lock, pins the complete dependency graph deterministically. It also replaces virtualenv, pip, pip-tools, and pipx with a single tool.
The fragmentation is real and frustrating. Ask five experienced Python engineers what toolchain you should use, and you’ll get five confident but different answers. The ecosystem has never agreed on a single standard, and each generation of tools has tried to paper over the limitations of the previous one.
The Dependency Resolution Problem
When you install package A (which requires B>=1.0) and package C (which requires B<2.0), a resolver must find a version of B that satisfies both constraints. This is an instance of the Boolean satisfiability problem - NP-complete in the worst case.
pip’s original greedy resolver would install the latest B, then sometimes fail at install time if a constraint was violated. Modern pip uses a backtracking resolver (added in pip 20.3) that actually walks the constraint graph, but it’s slow for complex graphs and still occasionally produces suboptimal solutions.
Poetry uses a custom resolver that produces a lockfile guaranteeing that the installed environment is exactly as specified. The lockfile includes every transitive dependency, pinned to its exact version, along with checksums for the distribution files. Installing from poetry.lock on any machine produces a byte-identical environment (modulo platform differences for compiled packages).
uv uses a resolver derived from PubGrub, a modern dependency resolution algorithm that produces human-readable error messages when resolution fails - a notable improvement over pip’s cryptic conflict errors.
conda uses a SAT solver (previously GLPK, now libmamba). SAT solvers guarantee a globally consistent solution if one exists, making conda’s resolution more reliable for complex scientific stacks. The cost is solver time: conda environments with many packages can take minutes to solve.
Virtual Environments: Isolation Without Reproducibility
A virtual environment creates an isolated Python installation for a project. Installing packages inside it doesn’t affect the system Python or other virtual environments.
python -m venv .venv
source .venv/bin/activate # Linux/macOS
.venv\Scripts\activate # Windows
pip install numpy pandas
Virtual environments solve the isolation problem - packages from Project A don’t interfere with Project B. But they don’t solve reproducibility. Two developers who create the same virtual environment from the same requirements.in on different days may get different versions of transitive dependencies.
The resolution: lockfiles. You pin every dependency (not just the direct ones) in a lockfile that is committed to version control. The lockfile is the source of truth. Everyone installs from the lockfile, on every machine, in CI, in Docker, always.
requirements.txt: What It Gets Wrong
pip freeze > requirements.txt is the classic approach, and it has some significant problems.
First, it captures the entire current environment, including tools you installed manually that aren’t actual dependencies of your project. A requirements.txt that lists ipython==8.12.0 because you ran pip install ipython interactively is misleading.
Second, it conflates direct dependencies with transitive ones. When another developer reads your requirements.txt, there’s no signal about which packages are actually your code’s direct dependencies versus packages those dependencies pulled in. Maintenance is harder: when you want to update a direct dependency, you can’t easily tell which other entries are safe to change alongside it.
Third, requirements.txt has no built-in mechanism for separating production dependencies from development dependencies (test runners, linters, type checkers). Separating them requires multiple files with -r other-file.txt includes - workable but messy.
pip-tools is a middle ground that addresses some of this. You write a requirements.in with just your direct dependencies (unpinned), run pip-compile to generate a fully pinned requirements.txt, and commit both. The .in file is readable and maintainable; the generated .txt file is what gets installed.
Poetry and pyproject.toml
pyproject.toml is the modern standard for Python project metadata, introduced by PEP 517/518. It replaces setup.py, setup.cfg, and every other project metadata format. All serious modern tools use it.
Poetry’s model is clean: pyproject.toml declares direct dependencies with version constraints; poetry.lock pins the full resolution to exact versions and checksums. You commit both. On any machine:
poetry install # installs exactly what's in poetry.lock
poetry add requests # adds to pyproject.toml, updates poetry.lock
poetry update requests # relaxes to latest compatible, updates poetry.lock
The separation of concerns is correct: pyproject.toml is human-authored and expresses intent; poetry.lock is machine-generated and expresses the resolution. Never edit poetry.lock by hand.
Poetry also handles development dependency groups cleanly:
[tool.poetry.dependencies]
python = "^3.11"
numpy = "^1.26"
torch = "^2.2"
[tool.poetry.group.dev.dependencies]
pytest = "^8.0"
ruff = "^0.4"
mypy = "^1.8"
Installing in production: poetry install --without dev. This keeps dev tools out of production Docker images.
uv: The Fastest Path Forward
uv is the most significant development in Python packaging since pip. Written in Rust by the team behind Ruff, it resolves dependencies roughly 10-100x faster than pip and produces a lockfile by default.
# Create a new project
uv init my-project
cd my-project
# Add dependencies - writes to pyproject.toml, generates uv.lock
uv add numpy torch scikit-learn
uv add --dev pytest ruff mypy
# Install exactly what's in the lockfile
uv sync
# Run a script in the project environment
uv run python train.py
# Update a dependency
uv lock --upgrade-package numpy
uv.lock is committed to version control. uv sync on any machine installs exactly those versions. The resolver is fast enough that uv sync in CI completes in seconds rather than minutes.
uv also replaces pyenv for Python version management: uv python install 3.11 downloads and installs a specific Python version without touching the system.
The main risk: uv is young (2024). Its lockfile format is not yet stable across major versions. For projects that need long-term stability guarantees, Poetry’s more established ecosystem may be preferable. For new projects where you can track the tooling, uv is the right choice.
conda: When pip Is Not Enough
For scientific computing and ML, pip alone is insufficient. numpy, scipy, PyTorch, and their dependencies link against native libraries - BLAS, LAPACK, CUDA, cuDNN - that live outside Python’s package management. pip install numpy installs the Python package, but which BLAS implementation is linked? What CUDA version?
conda manages both Python packages and native libraries in a single coherent environment. conda install pytorch -c pytorch installs PyTorch along with the correct CUDA and cuDNN versions, resolved together.
conda create -n my-project python=3.11
conda activate my-project
conda install numpy scipy pytorch -c pytorch
conda env export > environment.yml
environment.yml captures the full environment including native library versions. This is essential when the CUDA version must match exactly - a mismatch produces mysterious errors or silent incorrect computations.
The downsides are real. conda environments are large (several gigabytes). The solver is slower than modern pip alternatives. And conda’s history with licensing is complicated: in 2020, Anaconda changed its terms of service to require paid licenses for commercial use of the defaults channel by organizations with more than 200 employees. This surprised many teams who had been using conda freely for years. The conda-forge community channel remains free and is the correct default channel for all new conda installs.
Docker: The Full Stack
Python-level pinning captures packages. It does not capture the operating system, system libraries, the Python interpreter itself, or environment variables. Docker does.
A Dockerfile pins everything:
FROM python:3.11.9-slim-bookworm AS base
WORKDIR /app
COPY pyproject.toml uv.lock ./
RUN pip install uv && uv sync --frozen --no-dev
COPY src/ ./src/
CMD ["python", "-m", "src.serve"]
The --frozen flag tells uv to refuse to update the lockfile and fail if uv.lock doesn’t satisfy pyproject.toml. This is the correct behavior in production: if the lockfile is out of date with the manifest, the build fails explicitly rather than silently resolving to different versions.
Multi-stage builds keep images small:
FROM python:3.11.9-slim-bookworm AS builder
WORKDIR /build
RUN pip install uv
COPY pyproject.toml uv.lock ./
RUN uv export --frozen --no-dev -o requirements.txt
RUN pip install --user --no-cache-dir -r requirements.txt
FROM python:3.11.9-slim-bookworm AS runtime
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY src/ ./src/
ENV PATH="/root/.local/bin:$PATH"
CMD ["python", "-m", "src.serve"]
The builder stage installs compilers and build tools; the runtime stage has none of them. The final image is smaller and has a smaller attack surface.
In the context of CI/CD - Deploying Early, Deploying Often, Deploying Without Fear , “same image everywhere” is significantly more reproducible than virtual environments. The CI pipeline, staging environment, and production all run the exact same Docker image. There’s no “works in CI but not in prod” class of failure because there’s nothing different between the two environments.
Lambda Layers and Serverless
AWS Lambda has a 50 MB deployment package limit (250 MB unzipped). A PyTorch-based inference function easily exceeds this. The solution is Lambda Layers: separately versioned packages that are mounted into the Lambda execution environment at runtime.
A common pattern: build a Lambda Layer for your heavy dependencies (numpy, scipy, scikit-learn) once, publish it as a versioned layer, and reference that layer ARN in your function definition. The function package contains only your application code. Layer pinning by ARN is reliable - a layer version is immutable.
For very large models, Lambda’s size limits make it impractical regardless of packaging. ECS or EKS with container-based deployment is the correct solution - and with container images as the artifact, Docker’s full reproducibility applies.
The Lockfile Maintenance Problem
Lockfiles solve reproducibility but introduce a maintenance burden: security updates require explicit dependency bumps. If cryptography releases a security patch, you must update your lockfile to pick it up. It doesn’t happen automatically.
This is the correct behavior from a reproducibility standpoint - you never want dependencies changing without explicit action. But it means:
- You need automated tooling to notify you of security vulnerabilities in pinned dependencies (Dependabot, pip-audit in CI, GitHub’s dependency review).
- You need a discipline of regularly upgrading dependencies rather than ignoring them until they cause problems.
- “I’ll update dependencies later” accumulates debt. Teams that haven’t updated for a year face a waterfall of breaking changes when they finally do.
There is no perfect answer. The choice is between the known risk of unpinned dependencies (your environment changes without you knowing) and the known cost of pinned dependencies (you must actively maintain them). The correct answer is pinned dependencies with automated security scanning and a regular upgrade cadence.
Future: uv as the Universal Python Tool
The trajectory is fairly clear. uv has absorbed the functionality of pip, venv, pip-tools, pipx, and pyenv into a single tool with a coherent design. The remaining question is whether it absorbs Poetry’s lockfile and package publication workflows, which it has begun to do as of 2024.
Nix represents the most radical approach to reproducibility: a purely functional package manager where every package’s inputs are hashed, and any package build is a deterministic function of its inputs. Nix can capture not just Python packages but system libraries, compilers, and the kernel interface. Nix is powerful and used heavily in some ML infrastructure teams, but its learning curve is steep and its documentation has historically been poor.
Hermit and similar tools attempt to extend per-project lockfiles to multiple languages - if your project uses Python, Go, and Node.js, you want a single lockfile and single installation command. Hermit shells tools rather than managing packages, which is a different model from Python’s virtual environments but useful for polyglot repositories.
Summary
| Tool | Solves | Doesn’t Solve | When to Use |
|---|---|---|---|
| venv / virtualenv | Isolation between projects | Reproducibility across machines | Always; baseline |
| requirements.txt (pip freeze) | Basic pinning | Transitive dep clarity; tooling separation | Legacy projects |
| pip-tools | Pinning with input/output separation | Speed; native deps | Minimalist teams |
| Poetry | Full dependency management with lockfile | Speed; native deps | Library authors; established teams |
| uv | Everything pip + venv + pip-tools do, faster | Stability (still maturing) | New projects |
| conda | Native deps (CUDA, BLAS) + Python packages | Speed; commercial licensing concerns | Scientific ML with GPU deps |
| Docker | Full environment including OS | Image size for large deps | Production deployments; CI |
| Lambda Layers | Large deps in serverless | Very large models | Serverless ML inference |
Read Next: