Machine Learning
- Building Sharded Transformers - When a Model Doesn't Fit on One Machine Mar 2022 - 12 min read
- KV Caching - Never Recompute What You've Already Attended To Mar 2022 - 11 min read
- Direct Preference Optimization - Alignment Without the Reinforcement Loop Mar 2022 - 9 min read
- RLHF - Aligning Models by Learning What Humans Prefer Mar 2022 - 12 min read
- LoRA & Quantization - Fine-Tuning at a Fraction of the Cost Mar 2022 - 11 min read
- Transfer Learning & Fine-Tuning - Borrowing Knowledge From a Related Problem Mar 2022 - 12 min read
- Mechanistic Interpretability - What Is the Model Actually Computing? Mar 2022 - 11 min read
- Efficient Transformers - Attention Without the Quadratic Cost Mar 2022 - 13 min read
- Scaling Laws - More Compute, More Data, More Predictably Better Mar 2022 - 10 min read
- Vision & Multimodal ML - From Pixels to Cross-Modal Understanding Mar 2022 - 11 min read
- Tokenization - Breaking Language Into Pieces a Model Can Learn From Mar 2022 - 10 min read
- Evaluating LLMs - Why Benchmarks Are Harder Than They Look Mar 2022 - 9 min read
- Transformers From First Principles - Why Attention Changed Everything Mar 2022 - 11 min read
- Speech & Audio ML - Teaching Machines to Hear Mar 2022 - 15 min read
- Positional Encodings - Teaching Attention Where Things Are in a Sequence Mar 2022 - 12 min read
- Attention Mechanisms - Not All Tokens Are Created Equal Mar 2022 - 14 min read
- Sequence Modeling & Language Models - Predicting One Token at a Time Mar 2022 - 10 min read
- Recurrent Neural Networks - Memory Hidden in the Hidden State Mar 2022 - 11 min read
- Word Embeddings - Meaning as Position in Space Mar 2022 - 10 min read
- BatchNorm & LayerNorm - Keeping Activations From Exploding or Vanishing Mar 2022 - 12 min read
- Bias, Variance & Overfitting - The Three-Way Tradeoff You Can't Escape Mar 2022 - 10 min read
- Objective Functions & Loss Design - What You Optimize Is What You Get Mar 2022 - 11 min read
- Autodiff - Derivatives Without Doing the Algebra by Hand Mar 2022 - 13 min read
- Gradient Descent - Follow the Slope Until the Ground Levels Off Mar 2022 - 12 min read
- Neural Networks & Perceptrons - Function Approximation, Layer by Layer Mar 2022 - 10 min read