Chapter 20: Bash Pattern Scan (awk)

What is awk? (super simple first)

awk = named after its creators (Aho, Weinberger, Kernighan) It’s a text processing language built into Linux/Bash for scanning patterns and doing actions on structured text (like CSV, logs, tables).

Think of it as:

  • grep on steroids (finds patterns + does calculations/changes)
  • A mini programming language that reads line-by-line, splits into fields, and lets you print, calculate, filter, summarize

Most people use awk for:

  • Extract columns from files
  • Sum/average/count numbers
  • Process logs (find errors, count IPs)
  • Format reports
  • Replace text smarter than sed
  • One-liners that save hours in scripts

Basic Structure (memorize this!)

Bash
  • pattern → when to run the action (optional)
  • action → what to do (in { } curly braces)
  • If no pattern → action runs on every line
  • If no action → default is { print $0 } (print whole line)

Fields: awk automatically splits each line into $1, $2, $3… using spaces/tabs as separator (default)

  • $0 = whole line
  • $1 = first field
  • $NF = last field
  • NF = number of fields in line
  • NR = current record/line number

1. Create a test file right now (copy-paste)

Bash

(Comma-separated – common in real life)

2. Super basic examples (try these now!)

Bash

-F ‘,’ = change Field Separator to comma (very important!)

3. Most useful built-in variables

Variable Meaning Example use
$0 Whole line print $0
$1..$n Individual fields print $1, $3
$NF Last field print $NF (last column)
NF Number of fields in line if (NF > 5) print “Too many fields”
NR Line number (record number) print NR “: ” $0
FNR Line number in current file Useful when multiple files
FS Input Field Separator -F ‘:’ or BEGIN {FS=”,”}
OFS Output Field Separator BEGIN {OFS="

4. Patterns (when to run action)

Bash

5. Calculations & Summaries (where awk shines)

Bash

BEGIN {} → runs before first line END {} → runs after last line (perfect for totals)

6. Real-world one-liners you will use daily

Bash

7. Quick cheat-sheet table

Goal Command example (try it!)
Print column 2 (comma sep) awk -F ',' '{print $2}' file.csv
Print columns 1 & 3 awk -F ',' '{print $1, $3}' file.csv
Sum column 5 awk -F ',' '{sum+=$5} END {print sum}' file.csv
Average column 5 awk -F ',' '{s+=$5;n++} END {print s/n}' file.csv
Filter salary > 100000 awk -F ',' '$5 > 100000' file.csv
Count matching lines awk -F ',' '/Developer/ {c++} END {print c}' file.csv
Change separator to pipe awk -F ‘,’ ‘BEGIN {OFS=”
Process multiple files awk ‘{print FILENAME “:” $0}’ file1.txt file2.txt
Only lines with 5 fields awk -F ‘,’ ‘NF == 5’ data.txt
Top 5 most frequent words `awk ‘{for(i=1;i<=NF;i++) count[$i]++} END {for(w in count) print count[w], w}’ RS=”[ \n]+” file.txt

8. Pro tips from daily use

  • Always use -F for CSV/colon/tab files
  • Quote the program ‘…’ to avoid shell interpreting $
  • Use BEGIN for headers, END for footers/totals
  • awk is fast even on GB files
  • Combine with sort | uniq -c | sort -rn for top-N reports
  • For very complex logic → write full .awk script:
Bash

Save as sum.awk, chmod +x sum.awk, then ./sum.awk -F ‘,’ employees.txt

Now open your terminal and try these 3:

Bash

Tell me what you see! Or ask:

  • “How to find duplicate lines with awk?”
  • “How to process JSON-like logs?”
  • “Best awk for access.log analysis?”

We’ll build exact commands together! 😄

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *