Chapter 49: XPath Tutorial
XPath tutorial written as if I am your personal teacher sitting next to you with a whiteboard, a cup of tea, and unlimited patience.
We will go slowly, step by step, from zero knowledge to being able to write real, useful XPath expressions confidently. Every concept comes with:
- clear explanation
- visual tree drawings (in text)
- many small → medium → realistic examples
- common mistakes beginners make
- “try this yourself” exercises
- real-world context (where you actually meet this)
Let’s begin.
Lesson 0 – What is XPath really? (5-minute honest explanation)
XPath = XML Path Language It is a query language for selecting parts of an XML (or HTML) document.
Think of XPath as the GPS/navigation system for XML trees.
You give it a path/description like:
“/library/book[1]/title” → “go to the root → find library → find the first book → take its title”
And it gives you back the matching node(s).
Why do people use XPath?
- Select nodes precisely (better than CSS selectors in many cases)
- Filter by conditions ([@price > 1000])
- Navigate up/down/sideways (parent::, following-sibling::, ancestor::)
- Used everywhere: JavaScript, Python (lxml), Java, C#, XSLT, browsers dev tools, XML editors, SOAP, web scraping, test automation…
Two most important facts right now:
- XPath starts counting from 1 (not 0)
- XPath is case-sensitive (like XML itself)
Lesson 1 – The absolute simplest XPath expressions
Let’s use this small but realistic XML:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
<library owner="Alice"> <book id="b1" year="2018" lang="en"> <title>Atomic Habits</title> <author>James Clear</author> <price currency="INR">499</price> </book> <book id="b2" year="1997" lang="en"> <title>Rich Dad Poor Dad</title> <author>Robert Kiyosaki</author> <price currency="INR">349</price> </book> <magazine id="m1"> <title>India Today</title> <issue>July 2025</issue> </magazine> </library> |
1.1 Absolute path (starts from root)
/library/book/title→ selects both <title> elements
/library/book[1]/title→ selects only the first one → “Atomic Habits”
/library/magazine/title→ “India Today”
Important note: /library/book/title means direct child relationship Spaces or newlines between tags do not matter — XPath ignores them.
1.2 Relative path (starts from current context)
If you are already on <book>, then:
title→ the <title> child
price/@currency→ “INR”
Lesson 2 – The most important symbols & shortcuts
| Symbol | Meaning | Example XPath | What it selects |
|---|---|---|---|
| / | direct child | /library/book | All <book> that are direct children of <library> |
| // | descendant (any level) | //title | All <title> anywhere in document |
| . | current node | ./price | <price> child of current context |
| .. | parent node | ../author | <author> sibling of current node |
| @ | attribute | @id or //@year | All id attributes / all year attributes anywhere |
| * | any element | /library/* | All direct children of <library> |
| [] | predicate / condition | //book[@year=”2018″] | Books published in 2018 |
| position() | current position in list | (//book)[position()=2] | Second book in document order |
| last() | last item in list | (//book)[last()] | Last book |
| text() | select text nodes | //title/text() | Text content of all titles |
Lesson 3 – Most useful real-world examples (try them!)
3.1 Select all titles
//title
//title/text()3.2 Select books from 2018 or later
//book[@year >= 2018]3.3 Select books cheaper than 400 INR
//book[price[@currency="INR"] and number(price) < 400]3.4 Select books that are in stock
//book[stock > 0]
//book[stock != "0"]3.5 Select the second book
(//book)[2]
//book[2] ← different meaning!Important difference:
//book[2] → all <book> that are the 2nd child of their parent
(//book)[2] → the 2nd <book> in the whole document3.6 Select books that have a price in INR
//book[price/@currency = "INR"]3.7 Select books whose title starts with “A”
//book[starts-with(title, "A")]3.8 Select books whose title contains “Dad”
//book[contains(title, "Dad")]3.9 Select the last book
//book[last()]3.10 Select books that are not in stock
//book[not(inStock = "true")]
//book[inStock != "true"]Lesson 4 – Axes (moving in fancy directions)
Axes let you move in non-straight directions.
Most useful ones:
| Axis name | Shortcut | Meaning | Example XPath | What it selects |
|---|---|---|---|---|
| child | (default) | direct children | book/author | <author> children of <book> |
| descendant | // | all descendants | //title | All titles anywhere |
| descendant-or-self | .// | current node + descendants | .//price | Prices from current node down |
| parent | .. | parent node | price/.. | The parent of price (usually <book>) |
| ancestor | ancestor:: | all ancestors | price/ancestor::book | All <book> ancestors of price |
| following-sibling | following-sibling:: | siblings after current node | title/following-sibling::author | Author after title |
| preceding-sibling | preceding-sibling:: | siblings before current node | author/preceding-sibling::title | Title before author |
Lesson 5 – Try yourself exercises (do these!)
- Select all prices that are in INR
- Select the title of the second book
- Select books that cost less than 400
- Select books published after 2000
- Select all author names
- Select the last product in the document
- Select books that do NOT have a price in USD
Lesson 6 – Real-world context (where you actually meet XPath)
- Browser DevTools – $x(“//button”)
- JavaScript – document.evaluate()
- Python lxml/etree – root.xpath()
- Java – XPathFactory
- XSLT – <xsl:template match=”…”>
- Web scraping – Selenium, Scrapy
- SOAP / REST APIs that return XML
- Automated testing (Cypress, Selenium)
Would you like to continue with one of these next?
- XPath with namespaces (very common in real XML)
- XPath functions (contains, starts-with, normalize-space, count, sum…)
- XPath axes in detail (ancestor, following, preceding…)
- XPath predicates advanced (position, last, not, and/or)
- XPath in JavaScript (document.evaluate)
- XPath vs CSS selectors – when to use which
- Real-world examples: RSS feed, SOAP, e-invoice, Android manifest
Just tell me which direction feels most useful or interesting for you right now! 😊
