Chapter 13: XML XPath
XML + XPath — written as if I’m sitting next to you, explaining everything step by step like a patient teacher who wants you to really understand both the XML structure and how to find things inside it using XPath.
We will go slowly: first refresh XML structure → then understand what XPath is → learn the most useful expressions → see many real examples → practice common patterns → finish with tips & pitfalls.
1. Quick Reminder: Why we need XPath
XML documents are hierarchical trees (like folders inside folders).
Example simple XML we will use a lot:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
<?xml version="1.0" encoding="UTF-8"?> <library> <book category="fiction" year="2018" lang="en"> <title>Atomic Habits</title> <author> <first>James</first> <last>Clear</last> </author> <price currency="INR">499.00</price> <stock>45</stock> </book> <book category="self-help" year="1997" lang="en"> <title>Rich Dad Poor Dad</title> <author>Robert Kiyosaki</author> <price currency="INR">350.00</price> <stock>12</stock> </book> <book category="fiction" year="2003" lang="en"> <title>The Alchemist</title> <author>Paulo Coelho</author> <price currency="USD">12.99</price> <stock>0</stock> </book> <magazine> <title>India Today</title> <issue>July 2025</issue> <price currency="INR">120.00</price> </magazine> </library> |
Now imagine you want to answer questions like:
- Give me all book titles
- Find the price of “Atomic Habits”
- Get all books published after 2000
- Find books that are out of stock
- Get the last name of the author of the first book
XPath is the standard language created exactly to answer these kinds of questions easily.
2. What is XPath? (The clearest explanation)
XPath = XML Path Language
It is a query language for selecting nodes (elements, attributes, text…) from an XML document.
Think of it like:
- File path on your computer: /home/user/documents/report.pdf
- URL path: /products/electronics/phones/iphone-15
XPath is the same idea — but for navigating inside XML trees.
Two important styles:
- Absolute path → starts from the root → /library/book/title
- Relative path → starts from current position → book/title
3. Most Important XPath Concepts & Symbols
| Symbol / Expression | Meaning | Example | What it selects |
|---|---|---|---|
| / | Child separator (direct child) | /library/book | All <book> directly under <library> |
| // | Descendant-or-self (any level) | //title | All <title> elements anywhere |
| . | Current node | ./price | <price> child of current node |
| .. | Parent node | ../author | <author> sibling of current node |
| @ | Attribute | @category or //@year | All category attributes / all year attrs |
| * | Any element | //book/* | All direct children of any <book> |
| [] | Predicate (condition) | //book[@category=’fiction’] | Books with category=”fiction” |
| position() | Position in list | (//book)[position()=1] | First <book> in document order |
| last() | Last item | (//book)[last()] | Last <book> |
| text() | Select text content | //title/text() | Text inside all <title> elements |
| starts-with() | String starts with… | //title[starts-with(.,’The ‘)] | Titles starting with “The “ |
| contains() | String contains… | //book[contains(author,’Coelho’)] | Books where author contains “Coelho” |
| = , != , < , > | Comparison operators | //book[@year > 2000] | Books newer than 2000 |
4. Step-by-Step XPath Examples (using the library XML)
| What you want to select | XPath expression | Result (what it finds) |
|---|---|---|
| All book titles | //book/title or //title | Atomic Habits, Rich Dad Poor Dad, The Alchemist |
| Title of the first book | (//book/title)[1] | Atomic Habits |
| All prices | //price | 499.00, 350.00, 12.99, 120.00 |
| Prices in INR | //price[@currency=’INR’] | 499.00, 350.00, 120.00 |
| All books that are fiction | //book[@category=’fiction’] | Atomic Habits + The Alchemist |
| Books published after 2000 | //book[@year > 2000] | Atomic Habits |
| Books that are out of stock | //book[stock=0] | The Alchemist |
| Author’s last name of first book | //book[1]/author/last | Clear |
| All author names (simple & complex) | //author or //author/text() or //author/* | James Clear, Robert Kiyosaki, Paulo Coelho |
| All elements that have a currency attribute | //*[@currency] | All <price> elements |
| Second book’s title | (//book)[2]/title | Rich Dad Poor Dad |
| Books with price less than 400 INR | //book[price[@currency=’INR’ and . < 400]] | Rich Dad Poor Dad |
| Magazines (anything that is not book) | //*[not(self::book)] | <magazine> element |
5. Real-World Style Examples (very common patterns)
Pattern 1: Find products by category and price range
//product[@category='electronics' and price > 1000 and price < 5000]Pattern 2: Get all items from orders of a specific customer
//order[customer/name='Samarth Jain']//item/namePattern 3: Find elements with specific text (case-sensitive)
//title[. = 'Atomic Habits']Pattern 4: Find elements containing certain text (partial match)
//title[contains(., 'Dad')]Pattern 5: Select attribute values only
//book/@categoryPattern 6: Count something
count(//book[@year > 2000])(returns 1 in our example)
6. How XPath is used in real code (very quick examples)
JavaScript (browser)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
const xmlDoc = new DOMParser().parseFromString(xmlString, "application/xml"); const titles = xmlDoc.evaluate("//book/title", xmlDoc, null, XPathResult.ANY_TYPE, null); let titleNode = titles.iterateNext(); while (titleNode) { console.log(titleNode.textContent); titleNode = titles.iterateNext(); } |
Python (lxml – very popular)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
from lxml import etree tree = etree.parse("library.xml") root = tree.getroot() # All book titles for title in root.xpath("//book/title"): print(title.text) # Books after 2000 for book in root.xpath("//book[@year > 2000]"): print(book.find("title").text) |
Java (very common in enterprise)
|
0 1 2 3 4 5 6 7 |
XPath xpath = XPathFactory.newInstance().newXPath(); NodeList books = (NodeList) xpath.evaluate("//book[@category='fiction']", doc, XPathConstants.NODESET); |
Quick Summary – XPath Cheat Sheet (keep this handy)
| Goal | Typical XPath |
|---|---|
| All something | //something |
| Something inside specific parent | /root/parent/something |
| By attribute | //tag[@attr=’value’] |
| By attribute value comparison | //tag[@price > 500] |
| Text equals | //tag[. = ‘exact text’] |
| Text contains | //tag[contains(., ‘part’)] |
| First / last item | (//tag)[1] or (//tag)[last()] |
| Position | (//tag)[position() = 2] |
| Has attribute | //*[@attr] |
| Children of current | ./child or .//child (any level) |
Would you like to continue with one of these next?
- More advanced XPath (functions: normalize-space(), string-length(), not(), or, and…)
- XPath with namespaces (very common in real SOAP, UBL, Android…)
- How to use XPath in different languages (Java, Python, C#, JavaScript…)
- Common mistakes people make with XPath
- XPath vs CSS Selectors – when to use which
- Real-world examples from e-invoice, SOAP, Android manifest
Just tell me what feels most useful or interesting for you right now! 😊
