Debugging & Profiling - Finding What's Wrong and Where It's Slow // Megha Bose

Helpful context:

Writing Clean Code - The Art of Making Future-You Grateful

On September 9, 1947, Grace Hopper’s team at Harvard found a moth jammed inside relay 70 of the Mark II computer. They taped it into their logbook with the note “First actual case of bug being found.” The word was already in use - Thomas Edison used it decades earlier - but the moth made it stick.

What is striking about the story is not the moth. It is the logbook. Hopper’s team kept systematic records: the failure, the investigation, the cause, the fix. The moth was found because they were looking for it methodically, not because they got lucky.

Seventy-five years later, the central problem is the same. You believe the code does one thing. The code does something else. Debugging is the process of finding where those two diverge and why. The tools have changed enormously; the craft has not.

The Foundation: Knowing Your System

The single biggest factor in how fast you debug is not which tools you use. It is how deeply you understand what you are debugging.

This is not vague advice about “understanding code.” It has a specific, concrete meaning: knowing where your system can fail.

Every system has a finite set of places where things go wrong. In a typical web backend: request parsing fails on malformed or missing fields; authentication fails when tokens expire or permissions are checked incorrectly; business logic breaks on edge cases the developer did not anticipate; database queries return null or nothing when the data does not match assumptions; external API calls time out or return unexpected formats; caches return stale data after a write; concurrent requests interfere with each other. These are the failure modes of the system. A programmer who deeply knows the system has an internal map of these failure points - not as an abstract list but as specific components with histories. “This payment service has timed out before under high load.” “The date comparison in the order query has had timezone issues in the past.” “Whenever authentication-related errors appear, it is usually the token refresh logic, not the token validation.”

When a bug appears, this map acts as a filter. Instead of treating 10,000 lines of code as equally suspicious, you immediately have a ranked shortlist. The symptom narrows it further. You check the two most likely candidates. You find the bug in minutes instead of hours.

The second thing deep knowledge gives you is pattern recognition. When you debug the same system over time, you accumulate a library of bug signatures - what a given type of failure looks like, where it tends to come from, what confirms or rules it out. The first time a certain error appears in a new system, you have to trace it carefully. The third time you see a symptom with the same shape, you recognize it. “This looks like a cache invalidation issue” or “this is the same class of concurrency bug we had in March.” You are not solving a fresh problem; you are matching a pattern. That is why experienced engineers on a codebase they know well can debug something in ten minutes that would take a new person three hours - not because they are smarter but because they are doing a lookup, not a search.

Programmers without this depth start from the entry point. They sprinkle print statements from the top, step through tangentially related code, and try changes to see what happens. This works, eventually. It is slow, and it does not compound - each debugging session starts from scratch. The debugging itself does not build toward anything.

The practical implication is something you can act on: when you encounter a codebase, spend time mapping its failure modes before you ever need to debug it. Where does a request enter? What transforms it at each stage? Where could something be null, empty, stale, or the wrong type? Where has the code been fragile historically - what do past bugs in the git history tell you? That map is your debugging infrastructure. The investment is invisible until you need it, and then it pays back immediately.

Backtracking: Start From the Symptom, Work to the Root

Given a failure, the instinct of most beginners is to trace forwards: start from the entry point, step through the code in order, and hope to stumble across something wrong. This is slow because most of the code is irrelevant to the bug, and you have no guide for which path to follow.

Backtracking is the opposite approach: start from the failure - the thing you can observe - and ask what must have been true one step earlier for this failure to occur. Then ask what must have been true one step before that. Follow the chain of causality backward until you reach something you can verify or fix. The failure is your entry point, not the beginning of the code.

A concrete example. A web endpoint returns a wrong total for a user’s order history.

The total is computed by sum(order.amount for order in orders). Check the value of orders - it is a short list, missing several entries.
Where does orders come from? A database query filtered by user ID and a date range. Log what the query actually executes. The date range looks wrong - the end date is yesterday rather than today.
Where does the end date come from? It is constructed from the request parameter, passed through a utility function that normalizes the input. Inspect what that function returns for today’s date. It returns yesterday.
Why? The function converts the input to UTC before doing date arithmetic. The server is in UTC+5:30. At 11 PM local time, “today” in UTC is yesterday.

Each step backward asked a single question: what is the actual value here, and how was it produced? The final answer - a timezone-naive date conversion in a UTC-offset environment - is a specific, fixable cause. You reached it without reading any code that was not directly on the path from symptom to root.

The reason backtracking and system knowledge work so well together is that each backward step is not truly open-ended. When you ask “where could orders be wrong?”, you are not picking randomly from all possible answers. If you know the system, you know the failure modes at that layer. Queries in this codebase have had date range issues before. The cache sometimes returns stale results after a write. The ORM occasionally silently drops filters when given unexpected input. Your knowledge gives you a ranked list of hypotheses for each backward step, instead of an unbounded search.

Backtracking without system knowledge is methodical but slow - you have to verify every possible cause at each layer. Backtracking with system knowledge is fast - you have priors. The knowledge tells you which links in the chain are the fragile ones, so you can jump directly to checking those first instead of following the chain one step at a time.

The flip side is equally important: when a backward trace takes a long time and each step feels opaque, that is information. It means your model of the system at that layer is incomplete. The debugging session is also a learning session. Once you find the root, you now understand something about this system that you did not understand before. File that knowledge away. The next bug with a similar signature will take a fraction of the time.

Be Explicit About Your Hypothesis

The most common debugging mistake is jumping to trying a fix before forming a clear hypothesis about the cause. A guess that happens to work leaves you not knowing why it worked, which means the same class of bug - or a cousin of it - comes back later. A fix that follows from understanding is durable.

Before you change anything, write down (literally, or in your head with discipline) a specific, falsifiable hypothesis:

Not: “something is wrong with the database”
But: “the query returns null when the user has no past orders, and the code does not handle that case - it calls .total() on None”

The hypothesis should be specific enough that you can design a small test to confirm or refute it. Add one assertion or one print statement targeted at that specific claim. If the hypothesis is correct, you will know immediately. If it is wrong, you have learned something and the search space has narrowed.

One change at a time. When you change multiple things simultaneously and the bug disappears, you do not know which change fixed it, whether the fix is correct or merely masking a deeper problem, or whether the two changes together introduced a new subtle bug. This is not caution for its own sake - it is how you maintain the ability to understand what you are doing.

The instinct to “just try something” is understandable when you have been staring at a problem for an hour. It almost never helps and often makes things worse. The better response to being stuck is to step back and re-examine your assumptions - which of the things you believe about this system might be wrong?

Reading Errors: The Message Tells You More Than You Think

Most programmers skim error messages. This is leaving information on the table.

Error messages are usually precise. “KeyError: ‘user_id’” does not mean “something failed” - it means a dictionary access with the key 'user_id' failed, at a specific line, in a specific function. The chain of information is right there.

For Python tracebacks, read bottom-up. The last line is what happened; the lines above are the chain of calls that led there. The frame immediately above the bottom is almost always where your code did something wrong.

Traceback (most recent call last):
  File "main.py", line 10, in <module>
    result = process(data)
  File "main.py", line 5, in process
    return data["key"] * 2
KeyError: 'key'

Bottom up: KeyError: 'key' - a key access failed. One frame up: data["key"] in process. That is the location. Now the question becomes: why does data not have "key"? Is data the wrong object? Did an earlier step fail to populate it? Was it deleted? The traceback gives you location; backtracking gives you cause.

Common Python exceptions and what they are actually telling you:

KeyError: a dict access used a key that does not exist. The data arriving at that point does not have the shape you expected.
AttributeError: 'NoneType' object has no attribute 'X': you called .X on None. Hunt the function that was supposed to return something but returned None instead - that is your real bug.
TypeError: unsupported operand type(s): you applied an operation to the wrong type. Often a str where int was expected, or None where a value was expected.
IndexError: list access out of bounds. An off-by-one error, or processing a shorter collection than assumed.
RecursionError: a recursive function has no base case, or the input is large enough to exhaust the call stack.

Choosing Your Tool

Print debugging is the oldest technique and is still useful. The advantages are that it requires nothing - no setup, no special knowledge - and leaves a trace you can read after the fact. The limitation is that it is static: you have to predict which values you want to see before you run, and if you guessed wrong, you run again.

The debugger (Python’s pdb, VS Code’s integrated debugger, gdb for C/C++) lets you pause execution and inspect anything interactively. You can check variables you did not think to print, navigate up and down the call stack, set conditional breakpoints, and modify values at runtime to test hypotheses without restarting. The command line version:

def process(items):
    total = 0
    for item in items:
        breakpoint()    # drops into interactive pdb here
        total += item
    return total

At the pdb prompt, n steps to the next line, s steps into a function, p expr prints any expression, w shows the call stack, q quits. VS Code wraps this in a graphical interface; the underlying mechanism is identical.

The practical rule: use print debugging for simple “what value does X have here” questions. Switch to a debugger when you have run more than two or three print-run-read cycles without finding the bug, or when the bug involves complex branching where you would need to predict many possible paths.

Binary search on the codebase works when you have no idea where a bug is. Add an assertion in the middle of the code: if it fails, the bug is before that point; if it passes, the bug is after. Bisect until you isolate a single expression. For commit-level regressions, git bisect does this automatically:

git bisect start
git bisect bad HEAD           # current commit is broken
git bisect good v1.2.0        # this release was fine
git bisect run pytest tests/test_regression.py   # automated bisection

Git will binary-search through the commit history and identify the exact commit that introduced the regression - without you having to read each commit.

Time-travel debugging (rr on Linux, IntelliTrace in Visual Studio) records a complete execution history and lets you step backwards through code. This is valuable for bugs that are hard to reproduce - heisenbugs that disappear when you add print statements, race conditions, one-in-a-thousand crashes. You know what the final wrong state is; rr lets you ask “how did we get here?” by rewinding.

The Rubber Duck: Why Narrating Works

There is a technique - often treated as a joke, consistently effective - called rubber duck debugging. You explain your code, out loud, line by line, to a rubber duck, a colleague, or a wall. The explanation itself frequently reveals the bug.

The duck is not helping. What happens is that narrating forces you to externalize your mental model and compare it, step by step, to what the code actually says. Bugs persist because your brain fills in gaps when you read your own code silently - you read what you intended, not what is written. When you narrate explicitly (“and then I take the index and add one, because the array is zero-indexed and I want the second element…"), you are forced to verify each claim as you say it.

The moment of recognition often sounds like: “…and then I call this function which should return the user object, but actually wait, this function returns the user ID, not the user object, so when I then call .name on the return value…”

If you have been staring at a bug for more than thirty minutes, stop and explain the relevant code out loud. This is not a fallback for when the real techniques fail. It is one of the most reliable ways to break out of a stuck state.

Common Bug Patterns

Off-by-one errors. The most frequent bug in code that handles sequences. Involves < vs <=, zero-indexing confusion, or loop bounds. When something works for n elements but fails at n+1, or works generally but fails on the first or last element, look here first.

# Bug: IndexError on the last iteration
for i in range(len(arr)):
    if arr[i] > arr[i + 1]:   # fails when i = len(arr) - 1
        arr[i], arr[i+1] = arr[i+1], arr[i]

# Fix
for i in range(len(arr) - 1):
    if arr[i] > arr[i + 1]:
        arr[i], arr[i+1] = arr[i+1], arr[i]

Mutable default arguments. Python evaluates default arguments once at function definition time, not at each call. This is one of the most surprising Python-specific bugs.

# Bug: all calls share the same list object
def append_to(item, result=[]):
    result.append(item)
    return result

append_to(1)    # [1]
append_to(2)    # [1, 2] - not [2]

# Fix: use None as sentinel
def append_to(item, result=None):
    if result is None:
        result = []
    result.append(item)
    return result

Reference vs copy. Assigning a list does not copy it. Both names point to the same object.

a = [1, 2, 3]
b = a           # b is another name for the same list
b.append(4)
print(a)        # [1, 2, 3, 4] - a was modified through b

b = a[:]        # shallow copy - fixes this case
b = a.copy()    # also shallow copy
import copy; b = copy.deepcopy(a)   # deep copy for nested structures

Mutation during iteration. Modifying a list while iterating over it produces undefined behavior - skipped elements, wrong counts, or crashes depending on the implementation.

# Bug: elements get skipped
items = [1, 2, 3, 4, 5]
for item in items:
    if item % 2 == 0:
        items.remove(item)   # modifying the list you are iterating

# Fix: iterate over a copy, or build a new list
items = [item for item in items if item % 2 != 0]

Profiling: Measure Before You Optimize

“Premature optimization is the root of all evil.” Knuth said that in 1974. It still needs to be said because the instinct to optimize the thing that feels slow is nearly universal and nearly always wrong.

The 80/20 rule applies reliably to performance: 80% of execution time is typically spent in 20% of the code. Optimizing outside that 20% is effort that produces no measurable speedup. Programmers are consistently bad at predicting where their code spends time. The function that looked expensive because it had complex logic might run in two milliseconds; the innocent-looking loop that calls a library function might account for 90% of runtime.

Measure first. Optimize only what measurement tells you to optimize. Re-measure after to confirm improvement.

cProfile is Python’s built-in profiler. It instruments every function call and reports total time, per-call time, and call counts:

import cProfile
import pstats

with cProfile.Profile() as pr:
    your_function_here()

stats = pstats.Stats(pr)
stats.sort_stats("cumulative")
stats.print_stats(10)   # top 10 functions by cumulative time

Or without modifying code:

python -m cProfile -s cumtime my_script.py

The output tells you which function consumed the most total time. Start there - not at the function that looks slow.

line_profiler goes one level deeper: time per individual line within a function. Install with pip install line-profiler. Decorate the hot function with @profile and run kernprof -l -v script.py. This tells you whether the bottleneck is the data structure lookup, the string formatting, or the actual computation.

@profile
def process_data(items):
    result = []
    for item in items:
        result.append(item * 2)
    return result

The workflow: cProfile to find the slow function. line_profiler to find the slow line within that function. Fix only that line. Re-run cProfile to confirm the fix actually improved things and did not simply move the bottleneck elsewhere.

Observability in Production

Development debugging tools require you to be present - you attach a debugger, you run the program interactively. Production systems are different. You cannot pause a live server mid-request. The bug might only occur for 0.1% of users, or only under specific load conditions, or only on certain data. The system must tell you what is happening through instrumentation built into it.

Structured logging is print debugging at scale. Instead of freeform strings, log machine-readable JSON with consistent fields. This makes logs searchable across millions of events.

import logging
import json

logger = logging.getLogger(__name__)

def process_order(order_id: str, user_id: str):
    logger.info(json.dumps({
        "event": "order_started",
        "order_id": order_id,
        "user_id": user_id,
    }))
    # ... processing ...
    logger.info(json.dumps({
        "event": "order_complete",
        "order_id": order_id,
        "duration_ms": elapsed,
    }))

When something fails in production, you search for the order_id in your log aggregator and see every event that touched that order - a distributed version of a call stack.

Distributed tracing (OpenTelemetry, Jaeger) extends this across multiple services. A single user request might pass through ten services; distributed tracing attaches a trace ID at entry and propagates it through every downstream call, so you can reconstruct the full path and see exactly where time was spent or where an error was introduced.

The deeper point: systems that are observable by design are fundamentally easier to debug than systems that rely on attaching a debugger after the fact. Logging and tracing answer “what happened?” for events you did not predict would fail. The debugger can only answer “what is happening now?” for a specific run you are present for. Building observability in from the beginning is not extra work - it is a different kind of debugging infrastructure that works at a scale no debugger can.

Practical Patterns That Actually Help

Blame the most recent change first. If something was working yesterday and is not working today, the bug is almost certainly in what changed between then and now. Before doing anything else, run git diff or check your recent edits. This sounds too simple to mention, but the number of times a programmer spends an hour debugging something that was introduced two commits ago - and would have been visible in 30 seconds of git diff - is embarrassingly high.

Find the minimal reproducing case. When you have a failing input, try to reduce it: remove parts of the input one by one and check whether the bug persists. Stop when removing anything more makes the bug disappear. This smallest-possible failing case is valuable in two ways. First, the reduction process itself often reveals the bug - by the time you have stripped the input down to three fields, it is usually obvious which one is causing the problem. Second, a minimal reproducing case is what you need to write a regression test, ask for help, or file a bug report.

Write down three things you believe are true, then verify each one. When you are stuck, pick the three assumptions your mental model is most dependent on: “the function always receives a non-empty list,” “the API always returns a string,” “the database column is UTC.” Verify each one explicitly. Bugs routinely live in assumptions everyone thought were obviously true. Saying “I know that’s not the problem” is often exactly wrong.

Read the library source, not just the docs. Library documentation is written by humans and is frequently incomplete, outdated, or subtly wrong about edge case behavior. When you are spending more than fifteen minutes wondering “does this function do X or Y in this case?” - just read the source. On most systems python -c "import library; import inspect; print(inspect.getfile(library))" gives you the path. Five minutes of reading the actual implementation eliminates an entire class of uncertainty that no amount of re-reading the docs will resolve.

Check for environment differences when “it works on my machine." When code behaves differently between your laptop and a server, the bug is almost always an environment difference, not a logic error. Things to enumerate: Python version (list the exact minor version), timezone (servers often run UTC; dev machines often do not), locale and encoding, installed package versions (pip freeze), environment variables, whether a dependency is mocked vs real. Isolate the environment, not just the code.

Slow down when you are certain. The most dangerous words in debugging are “I know what this does.” The certainty is often the blind spot. When you have been staring at code for a while and you find yourself saying “there is no way the bug is in this function,” that function deserves a closer look. Your confidence is generated by your mental model, and your mental model is what is wrong. Step through the code you “know” anyway - the act of reading it slowly often reveals the discrepancy.

Let the test tell you what to look for. When a test fails, read the assertion message precisely. “Expected 5, got 4” tells you the off-by-one error is one too few, not one too many - which narrows the search to things that would undercount. “Expected list, got None” tells you a function returned None when it should have returned a value - backtrack to that function. The test is telling you exactly what went wrong; most programmers read the assertion message as “test failed” and start guessing, when the message already contains a hypothesis.

Use a second pair of eyes after 30 minutes stuck. Pair debugging is underrated. Not because your colleague is more capable, but because they do not have your incorrect mental model baked in. You have been staring at the code convinced that one part is fine and another part is suspicious. They walk in without that prior. They read the “fine” part and immediately ask “wait, why does this function return None here?” The value is not their expertise - it is their fresh model. If no colleague is available, writing out the problem to ask for help online often produces the same effect (you answer your own question while writing it).

Summary

Technique	Best For	When to Use
Know the system	All debugging	Always - this is the foundation
Backtracking	Finding root cause	Start here; follow the chain from symptom backward
Print debugging	Simple value inspection	Quick questions, one or two rounds
`pdb` / breakpoint	Complex control flow	When you have run 2-3 print cycles without finding it
Binary search / `git bisect`	Unknown location, regressions	When you have no hypothesis about where the bug is
Rubber duck	Breaking a stuck state	After 30 minutes of not making progress
`cProfile`	Finding the slow function	Always profile before optimizing
`line_profiler`	Finding the slow line	After cProfile identifies the hot function
Structured logging	Production debugging	Build in from the start, not as an afterthought

The tools are learnable in an afternoon. The discipline - forming precise hypotheses, backtracking systematically, building deep knowledge of what you are working with, measuring before optimizing - takes longer and compounds over time. The best debuggers are not the ones who know the most tools. They are the ones who know their system well enough that finding a bug feels less like a search and more like confirming something they almost already knew.

Read Next:

Design Patterns - Recurring Solutions to Recurring Problems