Chapter 51: XPath Nodes
XPath Nodes — written as if I’m your patient teacher sitting next to you.
We’ll go slowly, step by step, with many drawings, analogies, real examples, common mistakes, and exercises.
Lesson 1 – What exactly is a “node” in XPath?
In XPath, everything in an XML document is a node.
A node is the smallest unit XPath can select or work with.
Real-life analogy Imagine the XML document is a big family photo album. Each page, each photo, each caption, each sticky note, each photo corner label — every single thing is a node.
There are 7 main types of nodes in XPath (and in the XML DOM):
| node-type number | Node type name | What it is in plain English | Example in XML | How XPath selects it (typical) |
|---|---|---|---|---|
| 1 | element | Any tag: <book>, <price>, <author>… | <book id=”101″> | //book, //price |
| 2 | attribute | Name=value pairs inside opening tags | id=”101″, currency=”INR” | @id, //@currency, price/@currency |
| 3 | text | Plain readable text between tags | Atomic Habits, 499.00 | //title/text(), text() |
| 4 | CDATA section | Text that should NOT be parsed as markup | <![CDATA[<b>not a tag</b>]]> | //text() (CDATA is also text node in XPath) |
| 7 | processing-instruction | <?xml … ?> or <?php … ?> | <?xml version=”1.0″ encoding=”UTF-8″?> | //processing-instruction() |
| 8 | comment | <!– comment –> | <!– TODO: update price –> | //comment() |
| 9 | document | The entire document (the invisible root) | The whole XML file | / or /* |
Most important fact for beginners 90–95% of the time you will only care about 3 types:
- element nodes (type 1)
- attribute nodes (type 2)
- text nodes (type 3)
Lesson 2 – Visualizing nodes in a real XML document
Let’s take this small but realistic XML:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
<?xml version="1.0" encoding="UTF-8"?> <library owner="Alice"> <!-- Popular books section --> <book id="b1" year="2018" lang="en"> <title>Atomic Habits</title> <author>James Clear</author> <price currency="INR">499</price> </book> </library> |
Here’s how XPath sees the nodes (simplified drawing):
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
Document (node-type 9) ← invisible root │ └── Processing-Instruction (7): <?xml version="1.0" encoding="UTF-8"?> │ └── Element: library (1) owner="Alice" │ ├── Comment (8): <!-- Popular books section --> │ └── Element: book (1) id="b1" year="2018" lang="en" │ ├── Element: title (1) │ └── Text (3): "Atomic Habits" │ ├── Element: author (1) │ └── Text (3): "James Clear" │ └── Element: price (1) currency="INR" └── Text (3): "499" |
Lesson 3 – How to select different node types in XPath
3.1 Selecting element nodes (most common)
//book → all <book> elements
/library/book → <book> that are direct children of <library>
book → <book> elements from current context3.2 Selecting attribute nodes
//@id → every id attribute anywhere
//book/@id → id attribute of every book
//price/@currency → currency attribute of every price3.3 Selecting text nodes
//title/text() → all text nodes directly inside any <title>Important difference:
//title → the <title> elements
//title/text() → the text nodes inside the <title> elements3.4 Selecting comment nodes
//comment() → all comment nodes anywhere3.5 Selecting processing instructions
//processing-instruction()Lesson 4 – Very practical examples (copy-paste & try)
Example 1 – Get all book titles (two ways)
//book/title
//book/title/text()Both give you the titles — but the second one gives you text nodes, the first gives you element nodes.
Example 2 – Get all prices in INR
//price[@currency = "INR"]
//price[@currency = "INR"]/text()First → the <price> elements Second → the text inside them (“499”, “349”, etc.)
Example 3 – Get the title of the first book
/library/book[1]/title
/library/book[1]/title/text()Example 4 – Get all attributes named “id”
//@idExample 5 – Get books that are not in stock
//book[inStock = "false"]
//book[inStock/text() = "false"]Both work — but the second is more precise (it looks only at text nodes).
Lesson 5 – Common beginner mistakes & how to fix them
Mistake 1 Thinking //title gives you the text
//title = "Atomic Habits" ← WRONG!Correct
//title = "Atomic Habits" ← compares element to string → usually false
//title/text() = "Atomic Habits" ← correctMistake 2 Forgetting that whitespace creates text nodes
|
0 1 2 3 4 5 6 7 8 |
<book> <title>Atomic Habits</title> </book> |
→ There is a text node with newline + spaces before <title>
So book/text() will return that whitespace, not the title.
Fix Use book/title/text() instead
Mistake 3 Using //text() when you want element text
//text() = "Atomic Habits" ← usually false — matches any text nodeBetter
//title[text() = "Atomic Habits"]Lesson 6 – Try yourself exercises (do these!)
- Select all prices (both element and text node versions)
- Select all books that have price > 400
- Select the title of the book with id=”b2″
- Select all attributes named “currency”
- Select all comments in the document
- Select the text inside the magazine title
- Select books that are not in stock
Lesson 7 – Real-world context (where you actually use node-type aware XPath)
- Browser DevTools → $x(“//title/text()”)
- Web scraping → selecting text nodes only to avoid tags
- XSLT → matching text nodes vs elements
- SOAP / web services → extracting values from very nested XML
- Automated testing → checking exact text content without markup
- Data extraction → getting clean prices, names, dates from XML feeds
Would you like to continue with one of these next?
- XPath with namespaces (very common in real XML)
- Advanced node tests (text(), comment(), processing-instruction())
- XPath functions that work with nodes (name(), local-name(), lang(), normalize-space())
- XPath axes in detail (ancestor, following-sibling, preceding…)
- Real-world examples — RSS, SOAP envelope, e-invoice, Android manifest
- XPath vs CSS selectors – when to use which for node selection
Just tell me which direction feels most useful or interesting for you right now! 😊
