Chapter 52: XPath Syntax

What is XPath really? (5-minute honest explanation)

XPath = XML Path Language It is a query language for selecting nodes (elements, attributes, text, comments…) from an XML or HTML document.

Think of XPath as the address/navigation system for XML trees.

You write a path/description like:

text

→ “start from the root → go inside library → take the first book → take its title”

And XPath returns the matching node(s).

Why do people use XPath?

  • Select nodes very precisely (often more powerful than CSS selectors)
  • Filter by conditions ([@price > 1000])
  • Navigate in any direction (parent::, following-sibling::, ancestor::)
  • Used everywhere:
    • Browser DevTools ($x())
    • JavaScript (document.evaluate())
    • Python (lxml, xml.etree)
    • Java, C#, PHP
    • XSLT templates
    • Web scraping (Selenium, Scrapy)
    • SOAP / REST APIs that return XML
    • Automated testing

Two most important facts right now (memorize them):

  1. XPath starts counting from 1 (not 0 like arrays in programming)
  2. XPath is case-sensitive (exactly like XML tag names)

Lesson 1 – The absolute simplest XPath expressions

Let’s use this small but realistic XML:

XML

1.1 Absolute paths (start from root with /)

xpath
/library/book/title

→ selects both <title> elements → returns two nodes

xpath
/library/book[1]/title

→ selects only the first one → returns “Atomic Habits”

xpath
/library/magazine/title

→ “India Today”

Important note /library/book/title means direct child relationship Spaces or newlines between tags do not matter — XPath ignores them.

1.2 Relative paths (start from current context)

If you are already on a <book> node, then:

xpath
title

→ the <title> child

xpath
price/@currency

→ “INR”

Lesson 2 – The most important symbols & shortcuts

Symbol Meaning Example XPath What it selects
/ direct child /library/book All <book> that are direct children of <library>
// descendant (any level) //title All <title> anywhere in document
. current node ./price <price> child of current context
.. parent node ../author <author> sibling of current node
@ attribute @id or //@year All id attributes / all year attributes anywhere
* any element /library/* All direct children of <library>
[] predicate / condition //book[@year=”2018″] Books published in 2018
position() current position in list (//book)[position()=2] Second book in document order
last() last item in list (//book)[last()] Last book
text() select text nodes //title/text() Text content of all titles

Lesson 3 – Most useful real-world examples (try them!)

3.1 Select all titles

xpath
//title
//title/text()

3.2 Select books from 2018 or later

xpath
//book[@year >= 2018]

3.3 Select books cheaper than 400 INR

xpath
//book[price[@currency="INR"] and number(price) < 400]

3.4 Select books that are in stock

xpath
//book[stock > 0]
//book[stock != "0"]

3.5 Select the second book

xpath
(//book)[2]
//book[2] ← different meaning!

Important difference:

xpath
//book[2] → all <book> that are the 2nd child of their parent
(//book)[2] → the 2nd <book> in the whole document

3.6 Select books that have a price in INR

xpath
//book[price/@currency = "INR"]

3.7 Select books whose title starts with “A”

xpath
//book[starts-with(title, "A")]

3.8 Select books whose title contains “Dad”

xpath
//book[contains(title, "Dad")]

3.9 Select the last book

xpath
//book[last()]

3.10 Select books that are not in stock

xpath
//book[not(inStock = "true")]
//book[inStock != "true"]

Lesson 4 – Axes (moving in fancy directions)

Axes let you move in non-straight directions.

Most useful ones:

Axis name Shortcut Meaning Example XPath What it selects
child (default) direct children book/author <author> children of <book>
descendant // all descendants //title All titles anywhere
descendant-or-self .// current node + descendants .//price Prices from current node down
parent .. parent node price/.. The parent of price (usually <book>)
ancestor ancestor:: all ancestors price/ancestor::book All <book> ancestors of price
following-sibling following-sibling:: siblings after current node title/following-sibling::author Author after title
preceding-sibling preceding-sibling:: siblings before current node author/preceding-sibling::title Title before author

Lesson 5 – Try yourself exercises (do these!)

  1. Select all prices that are in INR
  2. Select the title of the second book
  3. Select books that cost less than 400
  4. Select books published after 2000
  5. Select all author names
  6. Select the last product in the document
  7. Select books that do NOT have a price in USD

Lesson 6 – Real-world context (where you actually meet XPath)

  • Browser DevTools – $x(“//button”)
  • JavaScript – document.evaluate()
  • Python lxml/etree – root.xpath()
  • Java – XPathFactory
  • XSLT – <xsl:template match=”…”>
  • Web scraping – Selenium, Scrapy
  • SOAP / REST APIs that return XML
  • Automated testing (Cypress, Selenium)

Would you like to continue with one of these next?

  • XPath with namespaces (very common in real XML)
  • XPath functions (contains, starts-with, normalize-space, count, sum…)
  • XPath axes in detail (ancestor, following, preceding…)
  • XPath predicates advanced (position, last, not, and/or)
  • XPath in JavaScript (document.evaluate)
  • XPath vs CSS selectors – when to use which
  • Real-world examples: RSS feed, SOAP, e-invoice, Android manifest

Just tell me which direction feels most useful or interesting for you right now! 😊

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *