Command Line & Unix Philosophy
Overview
The Unix philosophy is deceptively simple: write programs that do one thing and do it well, write programs that work together, and write programs that handle text streams. Coined by Doug McIlroy in the 1970s, this design principle gave rise to the command-line tools that still power data pipelines, DevOps workflows, and production debugging today. The shell is not just a way to launch programs - it is a programming environment where small tools are composed into powerful ad-hoc workflows with almost no ceremony.
- Problem it solves: Monolithic applications that do everything become hard to automate, test, and extend. Unix tools allow ad-hoc data transformation and automation without writing a full program.
- Alternatives: Python scripting for more complex logic; SQL for structured relational data; purpose-built ETL frameworks (Airflow, dbt) at scale.
- Pros: Universally available on Linux/macOS; composable; zero deployment overhead; excellent for one-off data exploration and log analysis.
- Cons: Plain-text orientation makes structured data (JSON, Parquet) awkward; pipelines lack type safety; error handling across pipes is fragile.
Essential Commands
The filesystem commands you use every day:
ls -la # list all files with permissions, size, timestamps
pwd # print working directory
cd /var/log # change directory
cp src.txt dst.txt # copy a file
mv old.txt new.txt # move or rename
rm -rf build/ # remove directory recursively (no undo)
mkdir -p a/b/c # create nested directories
For reading files without loading them fully into memory:
cat file.txt # dump entire file
less file.txt # paginated reader (q to quit, / to search)
head -n 20 file.txt # first 20 lines
tail -f app.log # follow a file as it grows (great for logs)
Pipes and Redirection
The pipe operator | feeds the stdout of one command into the stdin of the next. Redirection operators control where output goes:
# redirect stdout to a file (overwrite)
ls -la > listing.txt
# append stdout
echo "new entry" >> log.txt
# redirect stderr to a file
python script.py 2> errors.log
# discard stderr entirely
ffmpeg -i input.mp4 output.mp4 2>/dev/null
# pipe: count lines in a directory listing
ls -la | wc -l
Text Processing
grep, sed, and awk are the workhorses of text processing.
grep searches for patterns using regular expressions:
grep -n "ERROR" app.log # show line numbers
grep -r "TODO" src/ # recursive search in directory
grep -v "DEBUG" app.log # invert match (exclude DEBUG lines)
grep -E "4[0-9]{2}|5[0-9]{2}" access.log # extended regex: 4xx or 5xx
sed is a stream editor - it reads line by line and applies substitutions:
# replace first occurrence per line
sed 's/foo/bar/' file.txt
# replace all occurrences per line
sed 's/foo/bar/g' file.txt
# edit in place (macOS needs -i '')
sed -i 's/localhost/prod.example.com/g' config.txt
awk treats each line as fields separated by whitespace (or a delimiter):
# print the 3rd column of a space-separated file
awk '{print $3}' data.txt
# sum the 5th column
awk '{sum += $5} END {print sum}' numbers.txt
# filter rows where field 2 is greater than 100
awk '$2 > 100 {print $0}' data.txt
Combining them: sort, uniq, and wc:
# count unique IP addresses in an access log
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20
File Permissions
Every file has an owner, a group, and a permission triplet for owner/group/other:
-rwxr-xr-- 1 megha staff 4096 Sep 15 12:00 deploy.sh
The nine permission characters mean: owner can read/write/execute, group can read/execute, others can only read. chmod changes permissions (numeric or symbolic):
chmod 755 deploy.sh # rwxr-xr-x - executable by all
chmod +x script.sh # add execute bit
chmod go-w config.yml # remove write from group and others
chown megha:staff file # change owner and group
Process Management
ps aux # list all running processes
top # live process viewer (q to quit)
kill -9 1234 # force-kill PID 1234
kill -TERM 1234 # ask process to terminate gracefully (default)
# background jobs
long_command & # run in background
jobs # list background jobs
fg %1 # bring job 1 to foreground
bg %1 # resume stopped job in background
nohup ./server.sh & # keep running after terminal closes
Environment Variables
echo $PATH # colon-separated list of directories searched for commands
export MY_VAR=hello # make variable available to child processes
printenv # list all environment variables
Put persistent exports in ~/.bashrc or ~/.zshrc. A clean $PATH matters - programs are found in the first matching directory, left to right.
Shell Scripting Basics
#!/usr/bin/env bash
# shebang tells the OS which interpreter to use
set -euo pipefail # exit on error, unset var, or pipe failure
NAME="world"
echo "Hello, $NAME"
for file in *.log; do
echo "Processing $file"
gzip "$file"
done
if [ -f config.yml ]; then
echo "Config exists"
else
echo "Config missing, exiting" >&2
exit 1
fi
Remote and File Transfer
ssh user@host.example.com # connect to remote host
ssh -L 5432:localhost:5432 user@host # SSH tunnel: forward local port to remote
scp file.txt user@host:/tmp/ # copy file to remote
rsync -avz src/ user@host:/dest/ # sync directory (efficient, resumable)
find and xargs
find searches the filesystem; xargs converts stdin lines into command arguments:
# find all Python files modified in the last day
find . -name "*.py" -mtime -1
# delete all .pyc files
find . -name "*.pyc" -delete
# run a command on each found file (handles spaces in filenames)
find . -name "*.log" -print0 | xargs -0 gzip
Examples
Log analysis one-liner - top 10 slowest requests:
# nginx log format: IP - - [date] "method url" status bytes response_time
awk '$NF > 1.0 {print $NF, $7}' access.log \
| sort -rn \
| head -10
Batch rename files - replace spaces with underscores:
#!/usr/bin/env bash
set -euo pipefail
for f in *\ *; do
new="${f// /_}"
mv "$f" "$new"
echo "Renamed: $f -> $new"
done
Count error types in a log file:
grep "ERROR" app.log \
| awk '{print $5}' \
| sort \
| uniq -c \
| sort -rn
The command line rewards investment. Once you can fluently compose awk, grep, sed, sort, and xargs, you can explore a new dataset or debug a production system without waiting for a specialised tool to be built. That leverage compounds over a career.
Read Next: