Git & Version Control - History You Can Rewrite, Merges You Can't Avoid // Megha Bose

Helpful context:

The Unix Philosophy - One Tool, One Job, Compose Everything

It is 11 PM. You have been working on a feature for a week. You are refactoring a core module, halfway through, everything broken. And then your colleague tells you there is a critical production bug in the code you were about to touch. You need to patch production right now. Your half-broken work is all over the working directory.

If you do not know Git well, this is a crisis. If you do, it is a five-second stash and checkout. The work is safe, the branch is clean, the hotfix ships, you come back to exactly where you left off.

Git as time machine. Git as insurance policy. Git as the thing that lets you work fearlessly precisely because you can always go back.

The Origin Story: BitKeeper Drama and Two Weeks of Fury

In 2002, the Linux kernel project had a problem. It was the world’s largest collaborative software project - thousands of contributors, millions of lines of code - and they were managing patches by email and hand-merging. Larry McVoy offered them BitKeeper, his proprietary distributed version control system, for free.

It worked well. For three years, the Linux kernel used BitKeeper. Then in 2005, a developer named Andrew Tridgell wrote a tool to reverse-engineer the BitKeeper protocol. McVoy revoked the free license.

Linus Torvalds was furious. He was also without a version control system for the Linux kernel. His response was to write one himself.

In two weeks.

The result was Git. Linus’s design goals, as he later described them, were blunt: he wanted to be nothing like CVS or SVN (centralized, slow, bad branching). He wanted it to be distributed, fast, and to guarantee data integrity. He wanted operations that felt instant rather than requiring a server. He wanted branching to be trivially cheap.

Git was initially quite rough - command-line only, no documentation, assumed its users were kernel developers. Junio Hamano took over maintenance within months, and Git has been a collaborative open-source project since 2005. It is now the dominant version control system in the world, and GitHub built a $7.5 billion company (acquired by Microsoft) almost entirely on top of it.

The Object Model: Git is a Content-Addressable Filesystem

Here is the key insight that makes every Git command make sense: Git is not primarily a version control system. It is a content-addressable key-value store with a version control interface on top.

Every piece of data Git stores - every file, every directory, every commit - is stored as an object identified by the SHA-1 hash (or SHA-256 in newer Git) of its contents. The hash is both the name and the verification mechanism. If two files have the same content, they share the same object. If any bit in any object is corrupted, its hash no longer matches.

Git has exactly four object types:

Blob: The raw contents of a file. No filename, no permissions - just bytes. Two files with identical content share one blob.

Tree: A directory. A list of entries: (permissions, type, sha1, name). A tree entry points to either a blob (file) or another tree (subdirectory).

Commit: A snapshot. Points to a root tree (the state of your project at that moment), zero or more parent commits, author, committer, timestamp, and message. The commit is identified by the SHA-1 of all that data - which includes the parent commit’s SHA-1, which includes its parent’s, and so on. This chain of hashes makes history tamper-evident.

Tag: A named pointer to another object (usually a commit), with a message. Used for marking releases.

# See the objects Git stores
git cat-file -t HEAD           # commit
git cat-file -p HEAD           # show commit content
git cat-file -p HEAD^{tree}    # show root tree
git cat-file -p HEAD~1         # show parent commit

A branch is just a file in .git/refs/heads/ containing a single SHA-1. main is a file containing the hash of the latest commit on main. Branching in Git is creating a 41-byte file. Switching branches is updating a pointer and updating the working tree.

The HEAD file at .git/HEAD contains either a ref (ref: refs/heads/main) or a bare SHA-1 (detached HEAD state). That is the entirety of Git’s branching mechanism.

The Three Zones

Git’s workflow has three places your files can be:

Working tree: Your files on disk, as you see them in your editor. Changes here are untracked until explicitly staged.

Staging area (index): The proposed next commit. git add moves changes from the working tree to the index. The index is not a set of diffs - it is a full snapshot of the files you have staged. git status compares the working tree to the index (unstaged changes) and the index to HEAD (staged changes).

Repository: The .git/ directory containing all objects and refs. git commit takes the current index, creates a tree object, creates a commit object pointing to that tree and to the current HEAD, and updates HEAD to point to the new commit.

working tree  →   staging area   →   repository
   (edit)      (git add / git rm)    (git commit)

Understanding the index explains why git add exists at all (why not just git commit *?): it lets you commit a partial set of changes, staging only what belongs in this commit while leaving other work in progress. git add -p (patch mode) lets you stage individual hunks within a file - the most powerful workflow for keeping commits clean.

Branches: The Pointer Model

Every branch is a named pointer to a commit. Every checkout moves HEAD to point to that branch, and the branch pointer advances with every new commit.

git branch feature/login       # create branch (just creates a pointer file)
git switch feature/login       # move HEAD to this branch
git switch -c feature/login    # create and switch in one command (modern)

When you commit on feature/login, Git:

Creates a new commit object with HEAD’s commit as its parent
Updates .git/refs/heads/feature/login to point to the new commit
HEAD still points to feature/login, which now points to the new commit

main is not special. It is just a branch that convention says you merge into. You could name your main branch anything.

Merge vs Rebase: A Genuine Debate

Both merge and rebase integrate changes from one branch into another. They produce different histories.

Merge creates a new commit with two parents - the tips of both branches. The true history is preserved: you can see exactly when work was started, when it diverged, and when it was integrated. The commit graph has branch and merge points.

Rebase replays your commits on top of another branch, as if you had started your work there. The commits get new SHA-1s (different parent, different content). The history appears linear - no branch point, no merge commit, just a sequence of commits.

git merge feature/login        # preserve true history, create merge commit
git rebase main                # replay feature commits on top of current main

The rebase vs merge debate is real and ongoing. Linear history advocates argue that merge commits make git log unreadable - a graph full of merge commits hides the actual progression of work. True history advocates argue that rewriting history is dishonest: you lose the information about when things actually happened and what the working state was at each point.

The rule that matters: Never rebase commits that have been pushed to a shared branch. Rebase rewrites SHA-1s, which means anyone who has pulled those commits now has a divergent history. If you force-push rebased commits to main, every other developer gets a history conflict on their next pull. This causes real damage to real teams. Rebase locally, on your own feature branch, before merging. Never rebase main.

Content Addressing Explains Everything

Once you understand that Git is a content-addressed store, many Git behaviors that seem mysterious become obvious:

git diff between commits: Compare the trees. Object hashes that appear in both trees are unchanged files (same content, same hash). New hashes are new or modified files.

git cherry-pick: Take a commit’s diff (computed by comparing it to its parent) and apply it to the current HEAD as a new commit. The new commit has a new SHA-1 because its parent is different, but the change is the same.

git stash: Creates a commit (or two - one for the index, one for the working tree) and stores the stash refs in refs/stash. “Stashing” is committing without advancing the branch.

git reflog: Every time HEAD moves (commit, checkout, rebase, reset), Git records the old SHA-1 in the reflog - a log of where HEAD has been. The reflog is your safety net. Even after a git reset --hard that seems to delete commits, those commits still exist as orphaned objects for at least 90 days. The reflog lets you find them.

git reflog                      # see every HEAD position
git reset --hard HEAD@{3}       # jump to 3 positions ago

Almost nothing in Git is permanent. A git reset --hard HEAD~3 feels destructive. The commits still exist - they are just not pointed to by any branch. git reflog finds them; git checkout <sha1> recovers them. The objects are only truly deleted by git gc after the reflog retention period expires.

Rewriting History: When It Is and Is Not Safe

Interactive rebase (git rebase -i) lets you rewrite unpushed history:

git rebase -i HEAD~5   # reword, edit, squash, drop any of the last 5 commits

The editor presents:

pick a1b2c3d feat: add user authentication
pick e4f5a6b fix: correct password hash comparison
pick 9d8e7f6 chore: remove debug print statements

Change pick to squash to merge a commit into its predecessor. Change to reword to edit a commit message. Change to drop to delete a commit entirely. These operations are safe on your local branch that no one else has pulled.

The moment you git push, history is shared. Rewriting it creates divergence. The only safe rewrite after pushing is git revert - which creates a new commit that undoes a previous commit’s changes, preserving the history of both the original commit and the revert.

GitOps tools (Argo CD, Flux) treat the Git history as the source of truth for infrastructure state. Every commit to a GitOps repo represents a desired cluster state. Rewriting that history means the desired state history is inconsistent, which breaks the audit trail that makes GitOps valuable. Immutable history is a design requirement, not a preference, in GitOps environments.

The Porcelain/Plumbing Split (and Why Git’s UX Is Hard)

Git distinguishes between plumbing commands (low-level, stable, scripting-friendly: cat-file, hash-object, update-ref, read-tree, write-tree) and porcelain commands (high-level, user-facing, unstable across versions: commit, checkout, branch, merge).

The plumbing commands have been stable for 15 years. They output parseable text and have predictable behavior. Shell scripts that use plumbing commands continue to work across Git versions.

The porcelain commands are what users actually use, and they are famously inconsistent. git checkout does three different things: switch branches, restore working tree files, and checkout a specific commit (entering detached HEAD). This was so confusing that Git 2.23 (2019) introduced git switch and git restore to split those responsibilities.

git reset has three entirely different behaviors depending on --soft, --mixed, and --hard. The flag names do not hint at what they do. This is why people still confuse them.

The inconsistency is a historical artifact: Git grew organically, with commands added to meet immediate needs, with interface standardization a distant second priority. Linus’s two-week sprint optimized for correctness and speed, not UX.

Alternatives that take UX seriously:

Jujutsu (jj) is a new VCS from Google that uses Git’s object model but redesigns the interface from scratch. Working changes are a real commit (you are always on a commit, never on “uncommitted” changes). Anonymous branches make experimental work natural. The conflict state is explicit in the history rather than a mid-operation state. Jujutsu is seeing real adoption among developers who are fluent in Git’s model and frustrated with its interface.

Pijul takes a different approach at the data model level - instead of storing snapshots and computing diffs, it stores patches as the fundamental unit. Merging becomes commutative: merging patch A then patch B gives the same result as merging patch B then patch A, which eliminates entire classes of merge conflicts. The tradeoff: a different mental model that requires learning.

Neither will replace Git in the next five years - the ecosystem lock-in (GitHub, GitLab, every CI/CD system) is too deep. But they represent genuine intellectual progress on the version control problem.

Git as Infrastructure

GitHub and GitLab built their CI/CD systems on a simple observation: a commit is a unit of work. Every commit to main can trigger a test suite. Every PR can be reviewed before merging. Every tag can trigger a release pipeline.

GitHub Actions runs workflow YAML files stored in the repository itself, triggered by git events (push, PR, tag). The workflow is versioned alongside the code it tests. This is the Unix philosophy applied to CI: small composable steps (checkout, test, build, deploy), piped through a YAML-defined pipeline.

GitOps extends this to infrastructure: the desired state of your Kubernetes cluster is stored in Git. Argo CD or Flux watches for commits and reconciles the cluster to match the desired state. A git commit is a deployment. git revert is a rollback. The audit log is git log. Every change to production is a code review.

This works because of the properties that Linus designed in: content-addressed integrity (the commit SHA-1 is a tamper-evident identifier of the exact code deployed), immutable history (you cannot pretend a deployment did not happen), and distributed design (the full history is on every developer’s machine, not just a central server).

Concept	The Mental Model
Git object store	Content-addressed key-value store; hash = identity
Blob	File contents (no name, no metadata)
Tree	Directory listing (names → hashes)
Commit	Snapshot + parent + metadata; chains form tamper-evident history
Branch	A file containing one SHA-1; creation is free, deletion is safe
Index / staging area	Proposed next commit; `git add` populates it
Rebase	Replay commits with new parents; rewrites SHA-1s; never on shared branches
Reflog	Every HEAD position for 90 days; the true safety net
GitOps	Git commit as deployment unit; `git log` as audit trail

Read Next:

CI/CD - Deploying Early, Deploying Often, Deploying Without Fear