Linux Essentials
Refresher on essential Linux commands and concepts every DevOps engineer needs. File system, processes, networking, and shell scripting.
Why Linux for DevOps?
Almost every AWS service runs on Linux under the hood. EC2 instances, containers, Lambda runtimes โ they're all Linux. This module is a practical refresher of the commands and concepts you use daily as a DevOps engineer.
๐ Filesystem
Navigate, search, and manipulate files. Understand permissions, ownership, and the directory hierarchy.
โ๏ธ Processes
Manage running processes, background jobs, signals, and resource monitoring.
๐ Networking
Debug connectivity, inspect ports, configure firewall rules, and troubleshoot DNS.
๐ Shell Scripting
Write automation scripts using variables, loops, conditionals, and functions.
๐ง systemd
Manage services, view logs, and set up applications to start on boot.
๐ Users & Permissions
Manage users, groups, sudo access, and file permissions (chmod, chown).
Filesystem Essentials
Directory Hierarchy
/ Root of everything
โโโ /etc Configuration files (nginx.conf, ssh/sshd_config)
โโโ /var Variable data (logs in /var/log, app data)
โโโ /home User home directories
โโโ /tmp Temporary files (auto-cleaned)
โโโ /opt Optional/third-party software
โโโ /usr User programs and utilities
โโโ /proc Virtual filesystem โ live kernel/process info
โโโ /dev Device filesEssential Commands
# Navigation & listing
ls -lah # List all files with human-readable sizes
cd /var/log # Change directory
pwd # Print working directory
# Finding files
find /var/log -name "*.log" -mtime -7 # Files modified in last 7 days
find / -type f -size +100M # Files larger than 100MB
locate nginx.conf # Fast search (uses db, run updatedb first)
# Viewing & searching content
cat /etc/os-release # View file contents
less /var/log/syslog # Paginated viewing (q to quit)
head -50 /var/log/syslog # First 50 lines
tail -f /var/log/syslog # Live follow โ essential for debugging!
grep -r "error" /var/log/ # Recursive search in directory
grep -i "timeout" app.log | wc -l # Count matching lines
# File operations
cp -r source/ dest/ # Copy directory recursively
mv old.txt new.txt # Move/rename
rm -rf temp/ # Remove recursively (careful!)
ln -s /opt/app/current /app # Symbolic linkDevOps daily driver: tail -f is arguably the most-used command during debugging. Combine with grep: tail -f /var/log/app.log | grep ERROR
Permissions & Ownership
# Understanding permission strings
# -rwxr-xr-- 1 ec2-user deploy 4096 Mar 14 10:00 deploy.sh
# โโโ โโโ โโ
# โโโ โโโ โโโโ Others: read only (r--)
# โโโ โโโโโโโโ Group: read + execute (r-x)
# โโโโโโโโโโโโโ Owner: read + write + execute (rwx)
# Numeric permissions
chmod 755 deploy.sh # rwxr-xr-x (owner: all, group/others: read+execute)
chmod 644 config.yml # rw-r--r-- (owner: read+write, group/others: read)
chmod 600 .env # rw------- (owner only โ for secrets!)
chmod +x script.sh # Add execute permission
# Ownership
chown ec2-user:deploy app/ # Change owner and group
chown -R www-data:www-data /var/www # Recursive ownership change
# sudo โ run as root
sudo systemctl restart nginx
sudo -u postgres psql # Run as specific user
sudo su - # Switch to root shellRule of thumb: Application code = 755, config files = 644, secrets/keys = 600. Never use 777 in production.
Process Management
# Viewing processes
ps aux # All running processes
ps aux | grep node # Find specific process
top # Live process monitor (q to quit)
htop # Better version of top (install: yum install htop)
# Process control
kill 12345 # Graceful stop (SIGTERM)
kill -9 12345 # Force kill (SIGKILL) โ last resort
killall node # Kill all processes by name
nohup node server.js & # Run in background, survives logout
# Background jobs
command & # Run in background
jobs # List background jobs
fg %1 # Bring job 1 to foreground
Ctrl+Z # Suspend current process
bg # Resume suspended process in background
# Resource monitoring
free -h # Memory usage
df -h # Disk usage per mount
du -sh /var/log/* # Size of each item in directory
uptime # System uptime and load averages
lsof -i :3000 # What process is using port 3000?Demo: Track Down a Memory-Hungry Process
A common DevOps scenario โ find what's eating all the memory:
# Sort processes by memory usage (top 10)
ps aux --sort=-%mem | head -10
# Or use top in batch mode
top -b -n1 | head -20
# Check if OOM killer has struck
dmesg | grep -i "out of memory"
journalctl -k | grep -i "oom"Networking
# Network interfaces & IPs
ip addr show # Modern way (replaces ifconfig)
ip route show # Routing table
hostname -I # Quick: show all IPs
# Connectivity testing
ping -c 4 google.com # ICMP ping (4 packets)
traceroute google.com # Trace packet route
curl -I https://example.com # HTTP headers only
curl -v https://example.com # Verbose โ shows TLS handshake
wget -O /tmp/file.zip URL # Download file
# Port & connection inspection
ss -tlnp # List listening TCP ports (replaces netstat)
ss -tlnp | grep 3000 # Is anything listening on port 3000?
lsof -i :80 # What process owns port 80?
nc -zv hostname 5432 # Test if PostgreSQL port is open
# DNS
dig example.com # DNS lookup (detailed)
nslookup example.com # Simple DNS lookup
cat /etc/resolv.conf # DNS resolver config
# Firewall (Amazon Linux / CentOS)
sudo iptables -L -n # List firewall rules
sudo firewall-cmd --list-all # firewalld (if installed)EC2 debugging essentials: When an EC2 instance can't reach RDS, check in this order: (1) Security Group rules, (2) nc -zv rds-endpoint 5432, (3) Route table, (4) NACL rules.
Demo: Debug a Connection Issue
# Scenario: App can't connect to RDS on port 5432
# 1. Can we resolve the hostname?
dig aws-sandbox-db.xxxxx.us-east-1.rds.amazonaws.com
# 2. Can we reach the port?
nc -zv aws-sandbox-db.xxxxx.us-east-1.rds.amazonaws.com 5432
# Success: Connection to ... port 5432 [tcp/postgresql] succeeded!
# Failure: nc: connect to ... port 5432 (tcp) failed: Connection timed out
# 3. Check local listening ports
ss -tlnp | grep 5432
# 4. Check Security Group (from AWS CLI)
aws ec2 describe-security-groups --group-ids sg-xxxxx \
--query 'SecurityGroups[*].IpPermissions'systemd & Service Management
systemd manages services (daemons) on modern Linux. It's how you start, stop, enable, and monitor services like nginx, PostgreSQL, or your app.
# Service control
sudo systemctl start nginx # Start service
sudo systemctl stop nginx # Stop service
sudo systemctl restart nginx # Restart
sudo systemctl reload nginx # Reload config (no downtime)
sudo systemctl status nginx # Check status + recent logs
# Boot behavior
sudo systemctl enable nginx # Start on boot
sudo systemctl disable nginx # Don't start on boot
systemctl is-enabled nginx # Check if enabled
# Viewing logs (journalctl)
journalctl -u nginx # All logs for nginx service
journalctl -u nginx --since "1 hour ago" # Recent logs
journalctl -u nginx -f # Live follow (like tail -f)
journalctl -p err # Only errors across all servicesCreating a Custom Service
Run your Node.js app as a systemd service (alternative to pm2):
[Unit]
Description=AWS Sandbox App
After=network.target
[Service]
Type=simple
User=ec2-user
WorkingDirectory=/home/ec2-user/aws-sandbox-app
ExecStart=/usr/bin/node server.js
Restart=on-failure
RestartSec=10
Environment=NODE_ENV=production
EnvironmentFile=/home/ec2-user/.env
[Install]
WantedBy=multi-user.target# Register and start
sudo systemctl daemon-reload
sudo systemctl enable aws-sandbox
sudo systemctl start aws-sandbox
# Check it's running
sudo systemctl status aws-sandbox
curl http://localhost:3000/healthShell Scripting Refresher
#!/bin/bash
# Script: Check deployment health across multiple instances
set -euo pipefail # Exit on error, undefined vars, pipe failures
# --- Variables ---
APP_NAME="aws-sandbox"
HEALTH_ENDPOINT="/health"
TIMEOUT=5
MAX_RETRIES=3
# --- Colors for output ---
RED='\033[0;31m'
GREEN='\033[0;32m'
NC='\033[0m' # No color
# --- Functions ---
check_health() {
local host=$1
local url="http://${host}:3000${HEALTH_ENDPOINT}"
for i in $(seq 1 $MAX_RETRIES); do
status=$(curl -s -o /dev/null -w "%{http_code}" --max-time $TIMEOUT "$url" 2>/dev/null || echo "000")
if [ "$status" = "200" ]; then
echo -e " ${GREEN}โ $host โ healthy${NC}"
return 0
fi
echo " Retry $i/$MAX_RETRIES for $host (got: $status)"
sleep 2
done
echo -e " ${RED}โ $host โ UNHEALTHY${NC}"
return 1
}
# --- Main ---
echo "Checking $APP_NAME deployment health..."
echo "========================================="
INSTANCES=("10.0.10.5" "10.0.11.8" "10.0.10.12")
FAILED=0
for instance in "${INSTANCES[@]}"; do
if ! check_health "$instance"; then
FAILED=$((FAILED + 1))
fi
done
echo "========================================="
if [ $FAILED -gt 0 ]; then
echo -e "${RED}$FAILED instance(s) unhealthy!${NC}"
exit 1
else
echo -e "${GREEN}All instances healthy!${NC}"
fiScripting Cheat Sheet
| Pattern | Syntax | Example |
|---|---|---|
| Variable | VAR="value" | echo $VAR |
| If/else | if [ cond ]; then ... fi | if [ -f file ]; then ... |
| For loop | for i in list; do ... done | for f in *.log; do ... |
| While loop | while [ cond ]; do ... done | while read line; do ... |
| Function | func() { ... } | deploy() { npm install; } |
| Command sub | $(command) | DATE=$(date +%Y%m%d) |
| Exit on error | set -e | Always use in scripts |
| Pipe | cmd1 | cmd2 | cat log | grep ERR | wc -l |
Text Processing (awk, sed, jq)
DevOps engineers transform and extract data constantly. These are your power tools:
# awk โ column extraction and processing
ps aux | awk '{print $1, $2, $11}' # Print user, PID, command
df -h | awk 'NR>1 {print $5, $6}' # Disk usage % and mount
awk -F: '{print $1}' /etc/passwd # List all usernames
# sed โ find and replace in files
sed -i 's/old-value/new-value/g' config.yml # Replace in-place
sed -n '10,20p' file.txt # Print lines 10-20
sed '/^#/d' config.yml # Remove comment lines
# jq โ parse JSON (essential for AWS CLI output)
aws ec2 describe-instances | jq '.Reservations[].Instances[] | {
id: .InstanceId,
state: .State.Name,
ip: .PrivateIpAddress
}'
# xargs โ pass stdin as arguments
find /var/log -name "*.log" -mtime +30 | xargs rm -f # Delete old logs
cat servers.txt | xargs -I {} ssh {} "uptime" # Run on each serverjq is indispensable for working with AWS CLI output. Install it on Amazon Linux: sudo yum install jq
Quick Reference Card
| Task | Command |
|---|---|
| Check disk space | df -h |
| Check memory | free -h |
| Follow logs live | tail -f /var/log/app.log |
| Find large files | find / -size +100M -type f |
| Who's using port 80? | lsof -i :80 |
| Test TCP connection | nc -zv host port |
| Service status | systemctl status service |
| View recent errors | journalctl -p err --since "1h ago" |
| Kill by name | killall process-name |
| Create user | sudo useradd -m -s /bin/bash user |
| Parse JSON output | command | jq '.key' |
| Search in files | grep -r "pattern" /path/ |
Key Takeaways
tail -f,grep, andjqare your daily debugging triosystemctl+journalctlreplace old-style init/service/syslog- Always use
set -euo pipefailat the top of bash scripts - Permission model:
600for secrets,644for configs,755for scripts ss -tlnpis the modern replacement fornetstat -tlnp- Know
awk,sed, andjqโ they save hours of manual work