Overview

The Unix philosophy is deceptively simple: write programs that do one thing and do it well, write programs that work together, and write programs that handle text streams. Coined by Doug McIlroy in the 1970s, this design principle gave rise to the command-line tools that still power data pipelines, DevOps workflows, and production debugging today. The shell is not just a way to launch programs - it is a programming environment where small tools are composed into powerful ad-hoc workflows with almost no ceremony.

  • Problem it solves: Monolithic applications that do everything become hard to automate, test, and extend. Unix tools allow ad-hoc data transformation and automation without writing a full program.
  • Alternatives: Python scripting for more complex logic; SQL for structured relational data; purpose-built ETL frameworks (Airflow, dbt) at scale.
  • Pros: Universally available on Linux/macOS; composable; zero deployment overhead; excellent for one-off data exploration and log analysis.
  • Cons: Plain-text orientation makes structured data (JSON, Parquet) awkward; pipelines lack type safety; error handling across pipes is fragile.

Essential Commands

The filesystem commands you use every day:

ls -la              # list all files with permissions, size, timestamps
pwd                 # print working directory
cd /var/log         # change directory
cp src.txt dst.txt  # copy a file
mv old.txt new.txt  # move or rename
rm -rf build/       # remove directory recursively (no undo)
mkdir -p a/b/c      # create nested directories

For reading files without loading them fully into memory:

cat file.txt        # dump entire file
less file.txt       # paginated reader (q to quit, / to search)
head -n 20 file.txt # first 20 lines
tail -f app.log     # follow a file as it grows (great for logs)

Pipes and Redirection

The pipe operator | feeds the stdout of one command into the stdin of the next. Redirection operators control where output goes:

# redirect stdout to a file (overwrite)
ls -la > listing.txt

# append stdout
echo "new entry" >> log.txt

# redirect stderr to a file
python script.py 2> errors.log

# discard stderr entirely
ffmpeg -i input.mp4 output.mp4 2>/dev/null

# pipe: count lines in a directory listing
ls -la | wc -l

Text Processing

grep, sed, and awk are the workhorses of text processing.

grep searches for patterns using regular expressions:

grep -n "ERROR" app.log            # show line numbers
grep -r "TODO" src/                # recursive search in directory
grep -v "DEBUG" app.log            # invert match (exclude DEBUG lines)
grep -E "4[0-9]{2}|5[0-9]{2}" access.log  # extended regex: 4xx or 5xx

sed is a stream editor - it reads line by line and applies substitutions:

# replace first occurrence per line
sed 's/foo/bar/' file.txt

# replace all occurrences per line
sed 's/foo/bar/g' file.txt

# edit in place (macOS needs -i '')
sed -i 's/localhost/prod.example.com/g' config.txt

awk treats each line as fields separated by whitespace (or a delimiter):

# print the 3rd column of a space-separated file
awk '{print $3}' data.txt

# sum the 5th column
awk '{sum += $5} END {print sum}' numbers.txt

# filter rows where field 2 is greater than 100
awk '$2 > 100 {print $0}' data.txt

Combining them: sort, uniq, and wc:

# count unique IP addresses in an access log
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20

File Permissions

Every file has an owner, a group, and a permission triplet for owner/group/other:

-rwxr-xr-- 1 megha staff 4096 Sep 15 12:00 deploy.sh

The nine permission characters mean: owner can read/write/execute, group can read/execute, others can only read. chmod changes permissions (numeric or symbolic):

chmod 755 deploy.sh     # rwxr-xr-x - executable by all
chmod +x script.sh      # add execute bit
chmod go-w config.yml   # remove write from group and others
chown megha:staff file  # change owner and group

Process Management

ps aux              # list all running processes
top                 # live process viewer (q to quit)
kill -9 1234        # force-kill PID 1234
kill -TERM 1234     # ask process to terminate gracefully (default)

# background jobs
long_command &      # run in background
jobs                # list background jobs
fg %1               # bring job 1 to foreground
bg %1               # resume stopped job in background
nohup ./server.sh & # keep running after terminal closes

Environment Variables

echo $PATH          # colon-separated list of directories searched for commands
export MY_VAR=hello # make variable available to child processes
printenv            # list all environment variables

Put persistent exports in ~/.bashrc or ~/.zshrc. A clean $PATH matters - programs are found in the first matching directory, left to right.

Shell Scripting Basics

#!/usr/bin/env bash
# shebang tells the OS which interpreter to use

set -euo pipefail   # exit on error, unset var, or pipe failure

NAME="world"
echo "Hello, $NAME"

for file in *.log; do
    echo "Processing $file"
    gzip "$file"
done

if [ -f config.yml ]; then
    echo "Config exists"
else
    echo "Config missing, exiting" >&2
    exit 1
fi

Remote and File Transfer

ssh user@host.example.com          # connect to remote host
ssh -L 5432:localhost:5432 user@host  # SSH tunnel: forward local port to remote
scp file.txt user@host:/tmp/       # copy file to remote
rsync -avz src/ user@host:/dest/   # sync directory (efficient, resumable)

find and xargs

find searches the filesystem; xargs converts stdin lines into command arguments:

# find all Python files modified in the last day
find . -name "*.py" -mtime -1

# delete all .pyc files
find . -name "*.pyc" -delete

# run a command on each found file (handles spaces in filenames)
find . -name "*.log" -print0 | xargs -0 gzip

Examples

Log analysis one-liner - top 10 slowest requests:

# nginx log format: IP - - [date] "method url" status bytes response_time
awk '$NF > 1.0 {print $NF, $7}' access.log \
  | sort -rn \
  | head -10

Batch rename files - replace spaces with underscores:

#!/usr/bin/env bash
set -euo pipefail

for f in *\ *; do
    new="${f// /_}"
    mv "$f" "$new"
    echo "Renamed: $f -> $new"
done

Count error types in a log file:

grep "ERROR" app.log \
  | awk '{print $5}' \
  | sort \
  | uniq -c \
  | sort -rn

The command line rewards investment. Once you can fluently compose awk, grep, sed, sort, and xargs, you can explore a new dataset or debug a production system without waiting for a specialised tool to be built. That leverage compounds over a career.


Read Next: