Chapter 49: XPath Tutorial

XPath tutorial written as if I am your personal teacher sitting next to you with a whiteboard, a cup of tea, and unlimited patience.

We will go slowly, step by step, from zero knowledge to being able to write real, useful XPath expressions confidently. Every concept comes with:

  • clear explanation
  • visual tree drawings (in text)
  • many small → medium → realistic examples
  • common mistakes beginners make
  • “try this yourself” exercises
  • real-world context (where you actually meet this)

Let’s begin.

Lesson 0 – What is XPath really? (5-minute honest explanation)

XPath = XML Path Language It is a query language for selecting parts of an XML (or HTML) document.

Think of XPath as the GPS/navigation system for XML trees.

You give it a path/description like:

“/library/book[1]/title” → “go to the root → find library → find the first book → take its title”

And it gives you back the matching node(s).

Why do people use XPath?

  • Select nodes precisely (better than CSS selectors in many cases)
  • Filter by conditions ([@price > 1000])
  • Navigate up/down/sideways (parent::, following-sibling::, ancestor::)
  • Used everywhere: JavaScript, Python (lxml), Java, C#, XSLT, browsers dev tools, XML editors, SOAP, web scraping, test automation…

Two most important facts right now:

  1. XPath starts counting from 1 (not 0)
  2. XPath is case-sensitive (like XML itself)

Lesson 1 – The absolute simplest XPath expressions

Let’s use this small but realistic XML:

XML

1.1 Absolute path (starts from root)

xpath
/library/book/title

→ selects both <title> elements

xpath
/library/book[1]/title

→ selects only the first one → “Atomic Habits”

xpath
/library/magazine/title

→ “India Today”

Important note: /library/book/title means direct child relationship Spaces or newlines between tags do not matter — XPath ignores them.

1.2 Relative path (starts from current context)

If you are already on <book>, then:

xpath
title

→ the <title> child

xpath
price/@currency

→ “INR”

Lesson 2 – The most important symbols & shortcuts

Symbol Meaning Example XPath What it selects
/ direct child /library/book All <book> that are direct children of <library>
// descendant (any level) //title All <title> anywhere in document
. current node ./price <price> child of current context
.. parent node ../author <author> sibling of current node
@ attribute @id or //@year All id attributes / all year attributes anywhere
* any element /library/* All direct children of <library>
[] predicate / condition //book[@year=”2018″] Books published in 2018
position() current position in list (//book)[position()=2] Second book in document order
last() last item in list (//book)[last()] Last book
text() select text nodes //title/text() Text content of all titles

Lesson 3 – Most useful real-world examples (try them!)

3.1 Select all titles

xpath
//title
//title/text()

3.2 Select books from 2018 or later

xpath
//book[@year >= 2018]

3.3 Select books cheaper than 400 INR

xpath
//book[price[@currency="INR"] and number(price) < 400]

3.4 Select books that are in stock

xpath
//book[stock > 0]
//book[stock != "0"]

3.5 Select the second book

xpath
(//book)[2]
//book[2] ← different meaning!

Important difference:

xpath
//book[2] → all <book> that are the 2nd child of their parent
(//book)[2] → the 2nd <book> in the whole document

3.6 Select books that have a price in INR

xpath
//book[price/@currency = "INR"]

3.7 Select books whose title starts with “A”

xpath
//book[starts-with(title, "A")]

3.8 Select books whose title contains “Dad”

xpath
//book[contains(title, "Dad")]

3.9 Select the last book

xpath
//book[last()]

3.10 Select books that are not in stock

xpath
//book[not(inStock = "true")]
//book[inStock != "true"]

Lesson 4 – Axes (moving in fancy directions)

Axes let you move in non-straight directions.

Most useful ones:

Axis name Shortcut Meaning Example XPath What it selects
child (default) direct children book/author <author> children of <book>
descendant // all descendants //title All titles anywhere
descendant-or-self .// current node + descendants .//price Prices from current node down
parent .. parent node price/.. The parent of price (usually <book>)
ancestor ancestor:: all ancestors price/ancestor::book All <book> ancestors of price
following-sibling following-sibling:: siblings after current node title/following-sibling::author Author after title
preceding-sibling preceding-sibling:: siblings before current node author/preceding-sibling::title Title before author

Lesson 5 – Try yourself exercises (do these!)

  1. Select all prices that are in INR
  2. Select the title of the second book
  3. Select books that cost less than 400
  4. Select books published after 2000
  5. Select all author names
  6. Select the last product in the document
  7. Select books that do NOT have a price in USD

Lesson 6 – Real-world context (where you actually meet XPath)

  • Browser DevTools – $x(“//button”)
  • JavaScript – document.evaluate()
  • Python lxml/etree – root.xpath()
  • Java – XPathFactory
  • XSLT – <xsl:template match=”…”>
  • Web scraping – Selenium, Scrapy
  • SOAP / REST APIs that return XML
  • Automated testing (Cypress, Selenium)

Would you like to continue with one of these next?

  • XPath with namespaces (very common in real XML)
  • XPath functions (contains, starts-with, normalize-space, count, sum…)
  • XPath axes in detail (ancestor, following, preceding…)
  • XPath predicates advanced (position, last, not, and/or)
  • XPath in JavaScript (document.evaluate)
  • XPath vs CSS selectors – when to use which
  • Real-world examples: RSS feed, SOAP, e-invoice, Android manifest

Just tell me which direction feels most useful or interesting for you right now! 😊

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *