Module 8

Linux Essentials

Refresher on essential Linux commands and concepts every DevOps engineer needs. File system, processes, networking, and shell scripting.

BashFilesystemProcessesNetworkingsystemd

Why Linux for DevOps?

Almost every AWS service runs on Linux under the hood. EC2 instances, containers, Lambda runtimes โ€” they're all Linux. This module is a practical refresher of the commands and concepts you use daily as a DevOps engineer.

๐Ÿ“‚ Filesystem

Navigate, search, and manipulate files. Understand permissions, ownership, and the directory hierarchy.

โš™๏ธ Processes

Manage running processes, background jobs, signals, and resource monitoring.

๐ŸŒ Networking

Debug connectivity, inspect ports, configure firewall rules, and troubleshoot DNS.

๐Ÿ“ Shell Scripting

Write automation scripts using variables, loops, conditionals, and functions.

๐Ÿ”ง systemd

Manage services, view logs, and set up applications to start on boot.

๐Ÿ” Users & Permissions

Manage users, groups, sudo access, and file permissions (chmod, chown).


Filesystem Essentials

Directory Hierarchy

text
/             Root of everything
โ”œโ”€โ”€ /etc      Configuration files (nginx.conf, ssh/sshd_config)
โ”œโ”€โ”€ /var      Variable data (logs in /var/log, app data)
โ”œโ”€โ”€ /home     User home directories
โ”œโ”€โ”€ /tmp      Temporary files (auto-cleaned)
โ”œโ”€โ”€ /opt      Optional/third-party software
โ”œโ”€โ”€ /usr      User programs and utilities
โ”œโ”€โ”€ /proc     Virtual filesystem โ€” live kernel/process info
โ””โ”€โ”€ /dev      Device files

Essential Commands

bash
# Navigation & listing
ls -lah              # List all files with human-readable sizes
cd /var/log          # Change directory
pwd                  # Print working directory

# Finding files
find /var/log -name "*.log" -mtime -7     # Files modified in last 7 days
find / -type f -size +100M                # Files larger than 100MB
locate nginx.conf                         # Fast search (uses db, run updatedb first)

# Viewing & searching content
cat /etc/os-release           # View file contents
less /var/log/syslog          # Paginated viewing (q to quit)
head -50 /var/log/syslog      # First 50 lines
tail -f /var/log/syslog       # Live follow โ€” essential for debugging!
grep -r "error" /var/log/     # Recursive search in directory
grep -i "timeout" app.log | wc -l  # Count matching lines

# File operations
cp -r source/ dest/           # Copy directory recursively
mv old.txt new.txt            # Move/rename
rm -rf temp/                  # Remove recursively (careful!)
ln -s /opt/app/current /app   # Symbolic link
๐Ÿ’ก Tip

DevOps daily driver: tail -f is arguably the most-used command during debugging. Combine with grep: tail -f /var/log/app.log | grep ERROR


Permissions & Ownership

bash
# Understanding permission strings
# -rwxr-xr-- 1 ec2-user deploy 4096 Mar 14 10:00 deploy.sh
#  โ”‚โ”‚โ”‚ โ”‚โ”‚โ”‚ โ”‚โ”‚
#  โ”‚โ”‚โ”‚ โ”‚โ”‚โ”‚ โ””โ”˜โ”€โ”€ Others: read only (r--)
#  โ”‚โ”‚โ”‚ โ””โ”˜โ”˜โ”€โ”€โ”€โ”€โ”€ Group:  read + execute (r-x)
#  โ””โ”˜โ”˜โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Owner:  read + write + execute (rwx)

# Numeric permissions
chmod 755 deploy.sh    # rwxr-xr-x  (owner: all, group/others: read+execute)
chmod 644 config.yml   # rw-r--r--  (owner: read+write, group/others: read)
chmod 600 .env         # rw-------  (owner only โ€” for secrets!)
chmod +x script.sh     # Add execute permission

# Ownership
chown ec2-user:deploy app/          # Change owner and group
chown -R www-data:www-data /var/www # Recursive ownership change

# sudo โ€” run as root
sudo systemctl restart nginx
sudo -u postgres psql               # Run as specific user
sudo su -                            # Switch to root shell
๐Ÿ“˜ Key Concept

Rule of thumb: Application code = 755, config files = 644, secrets/keys = 600. Never use 777 in production.


Process Management

bash
# Viewing processes
ps aux                     # All running processes
ps aux | grep node         # Find specific process
top                        # Live process monitor (q to quit)
htop                       # Better version of top (install: yum install htop)

# Process control
kill 12345                 # Graceful stop (SIGTERM)
kill -9 12345              # Force kill (SIGKILL) โ€” last resort
killall node               # Kill all processes by name
nohup node server.js &     # Run in background, survives logout

# Background jobs
command &                  # Run in background
jobs                       # List background jobs
fg %1                      # Bring job 1 to foreground
Ctrl+Z                     # Suspend current process
bg                         # Resume suspended process in background

# Resource monitoring
free -h                    # Memory usage
df -h                      # Disk usage per mount
du -sh /var/log/*          # Size of each item in directory
uptime                     # System uptime and load averages
lsof -i :3000             # What process is using port 3000?
๐Ÿงช

Demo: Track Down a Memory-Hungry Process

A common DevOps scenario โ€” find what's eating all the memory:

bash
# Sort processes by memory usage (top 10)
ps aux --sort=-%mem | head -10

# Or use top in batch mode
top -b -n1 | head -20

# Check if OOM killer has struck
dmesg | grep -i "out of memory"
journalctl -k | grep -i "oom"

Networking

bash
# Network interfaces & IPs
ip addr show               # Modern way (replaces ifconfig)
ip route show              # Routing table
hostname -I                # Quick: show all IPs

# Connectivity testing
ping -c 4 google.com       # ICMP ping (4 packets)
traceroute google.com      # Trace packet route
curl -I https://example.com  # HTTP headers only
curl -v https://example.com  # Verbose โ€” shows TLS handshake
wget -O /tmp/file.zip URL    # Download file

# Port & connection inspection
ss -tlnp                   # List listening TCP ports (replaces netstat)
ss -tlnp | grep 3000       # Is anything listening on port 3000?
lsof -i :80               # What process owns port 80?
nc -zv hostname 5432       # Test if PostgreSQL port is open

# DNS
dig example.com            # DNS lookup (detailed)
nslookup example.com       # Simple DNS lookup
cat /etc/resolv.conf       # DNS resolver config

# Firewall (Amazon Linux / CentOS)
sudo iptables -L -n        # List firewall rules
sudo firewall-cmd --list-all  # firewalld (if installed)
๐Ÿ’ก Tip

EC2 debugging essentials: When an EC2 instance can't reach RDS, check in this order: (1) Security Group rules, (2) nc -zv rds-endpoint 5432, (3) Route table, (4) NACL rules.

๐Ÿงช

Demo: Debug a Connection Issue

bash
# Scenario: App can't connect to RDS on port 5432

# 1. Can we resolve the hostname?
dig aws-sandbox-db.xxxxx.us-east-1.rds.amazonaws.com

# 2. Can we reach the port?
nc -zv aws-sandbox-db.xxxxx.us-east-1.rds.amazonaws.com 5432
# Success: Connection to ... port 5432 [tcp/postgresql] succeeded!
# Failure: nc: connect to ... port 5432 (tcp) failed: Connection timed out

# 3. Check local listening ports
ss -tlnp | grep 5432

# 4. Check Security Group (from AWS CLI)
aws ec2 describe-security-groups --group-ids sg-xxxxx \
  --query 'SecurityGroups[*].IpPermissions'

systemd & Service Management

systemd manages services (daemons) on modern Linux. It's how you start, stop, enable, and monitor services like nginx, PostgreSQL, or your app.

bash
# Service control
sudo systemctl start nginx      # Start service
sudo systemctl stop nginx       # Stop service
sudo systemctl restart nginx    # Restart
sudo systemctl reload nginx     # Reload config (no downtime)
sudo systemctl status nginx     # Check status + recent logs

# Boot behavior
sudo systemctl enable nginx     # Start on boot
sudo systemctl disable nginx    # Don't start on boot
systemctl is-enabled nginx      # Check if enabled

# Viewing logs (journalctl)
journalctl -u nginx             # All logs for nginx service
journalctl -u nginx --since "1 hour ago"  # Recent logs
journalctl -u nginx -f          # Live follow (like tail -f)
journalctl -p err               # Only errors across all services

Creating a Custom Service

Run your Node.js app as a systemd service (alternative to pm2):

ini/etc/systemd/system/aws-sandbox.service
[Unit]
Description=AWS Sandbox App
After=network.target

[Service]
Type=simple
User=ec2-user
WorkingDirectory=/home/ec2-user/aws-sandbox-app
ExecStart=/usr/bin/node server.js
Restart=on-failure
RestartSec=10
Environment=NODE_ENV=production
EnvironmentFile=/home/ec2-user/.env

[Install]
WantedBy=multi-user.target
bash
# Register and start
sudo systemctl daemon-reload
sudo systemctl enable aws-sandbox
sudo systemctl start aws-sandbox

# Check it's running
sudo systemctl status aws-sandbox
curl http://localhost:3000/health

Shell Scripting Refresher

bashdeploy-check.sh
#!/bin/bash
# Script: Check deployment health across multiple instances
set -euo pipefail   # Exit on error, undefined vars, pipe failures

# --- Variables ---
APP_NAME="aws-sandbox"
HEALTH_ENDPOINT="/health"
TIMEOUT=5
MAX_RETRIES=3

# --- Colors for output ---
RED='\033[0;31m'
GREEN='\033[0;32m'
NC='\033[0m'  # No color

# --- Functions ---
check_health() {
  local host=$1
  local url="http://${host}:3000${HEALTH_ENDPOINT}"

  for i in $(seq 1 $MAX_RETRIES); do
    status=$(curl -s -o /dev/null -w "%{http_code}" --max-time $TIMEOUT "$url" 2>/dev/null || echo "000")

    if [ "$status" = "200" ]; then
      echo -e "  ${GREEN}โœ“ $host โ€” healthy${NC}"
      return 0
    fi
    echo "  Retry $i/$MAX_RETRIES for $host (got: $status)"
    sleep 2
  done

  echo -e "  ${RED}โœ— $host โ€” UNHEALTHY${NC}"
  return 1
}

# --- Main ---
echo "Checking $APP_NAME deployment health..."
echo "========================================="

INSTANCES=("10.0.10.5" "10.0.11.8" "10.0.10.12")
FAILED=0

for instance in "${INSTANCES[@]}"; do
  if ! check_health "$instance"; then
    FAILED=$((FAILED + 1))
  fi
done

echo "========================================="
if [ $FAILED -gt 0 ]; then
  echo -e "${RED}$FAILED instance(s) unhealthy!${NC}"
  exit 1
else
  echo -e "${GREEN}All instances healthy!${NC}"
fi

Scripting Cheat Sheet

PatternSyntaxExample
VariableVAR="value"echo $VAR
If/elseif [ cond ]; then ... fiif [ -f file ]; then ...
For loopfor i in list; do ... donefor f in *.log; do ...
While loopwhile [ cond ]; do ... donewhile read line; do ...
Functionfunc() { ... }deploy() { npm install; }
Command sub$(command)DATE=$(date +%Y%m%d)
Exit on errorset -eAlways use in scripts
Pipecmd1 | cmd2cat log | grep ERR | wc -l

Text Processing (awk, sed, jq)

DevOps engineers transform and extract data constantly. These are your power tools:

bash
# awk โ€” column extraction and processing
ps aux | awk '{print $1, $2, $11}'         # Print user, PID, command
df -h | awk 'NR>1 {print $5, $6}'         # Disk usage % and mount
awk -F: '{print $1}' /etc/passwd           # List all usernames

# sed โ€” find and replace in files
sed -i 's/old-value/new-value/g' config.yml   # Replace in-place
sed -n '10,20p' file.txt                      # Print lines 10-20
sed '/^#/d' config.yml                        # Remove comment lines

# jq โ€” parse JSON (essential for AWS CLI output)
aws ec2 describe-instances | jq '.Reservations[].Instances[] | {
  id: .InstanceId,
  state: .State.Name,
  ip: .PrivateIpAddress
}'

# xargs โ€” pass stdin as arguments
find /var/log -name "*.log" -mtime +30 | xargs rm -f   # Delete old logs
cat servers.txt | xargs -I {} ssh {} "uptime"           # Run on each server
๐Ÿ’ก Tip

jq is indispensable for working with AWS CLI output. Install it on Amazon Linux: sudo yum install jq


Quick Reference Card

TaskCommand
Check disk spacedf -h
Check memoryfree -h
Follow logs livetail -f /var/log/app.log
Find large filesfind / -size +100M -type f
Who's using port 80?lsof -i :80
Test TCP connectionnc -zv host port
Service statussystemctl status service
View recent errorsjournalctl -p err --since "1h ago"
Kill by namekillall process-name
Create usersudo useradd -m -s /bin/bash user
Parse JSON outputcommand | jq '.key'
Search in filesgrep -r "pattern" /path/

Key Takeaways

  • tail -f, grep, and jq are your daily debugging trio
  • systemctl + journalctl replace old-style init/service/syslog
  • Always use set -euo pipefail at the top of bash scripts
  • Permission model: 600 for secrets, 644 for configs, 755 for scripts
  • ss -tlnp is the modern replacement for netstat -tlnp
  • Know awk, sed, and jq โ€” they save hours of manual work