Chapter 17: XML Validator

XML Validation — written as if I’m sitting next to you, explaining everything patiently step by step like a real teacher who wants you to really understand the concepts, see the differences, and know how people actually validate XML in real projects in 2025–2026.

We will go slowly and clearly:

  • What does “validation” actually mean?
  • Why do we need it?
  • The two main kinds of validation
  • Tools & methods people use today
  • Many realistic examples
  • Common mistakes & how to avoid them

Let’s start.

1. What Does “XML Validation” Really Mean?

Validation = checking whether an XML document follows the rules that were defined for it.

There are two completely different levels of correctness in XML:

Level Name What it checks Required? Error message example
Level 1 Well-formedness Syntax rules (tags closed, nested properly, quotes, etc.) Mandatory “Missing closing tag”, “Attribute not quoted”
Level 2 Validity Whether the content follows a specific structure (allowed elements, order, data types, required fields…) Optional “Element <price> is missing required attribute ‘currency’”

Most important sentence you should remember:

Every XML document must be well-formed. Only some XML documents are also valid.

Analogy:

  • Well-formed = the book has correct grammar, punctuation, chapters are properly numbered
  • Valid = the book also follows the official style guide of the publisher (e.g. every chapter must have exactly 3 subsections, no footnotes allowed in chapter 1, prices must be in INR format…)

2. The Two Main Ways to Define “Valid XML”

Method File name / format What it is Still widely used in 2025–2026? Main users today
DTD .dtd Document Type Definition – old, simple, limited Yes, but declining Legacy systems, SOAP, very old configs
XML Schema (XSD) .xsd XML Schema Definition – modern, powerful Very widely used e-invoice, finance, government, healthcare, EDI

Reality in 2025–2026:

  • DTD → still exists in many old systems, but almost nobody chooses it for new work
  • XSD (XML Schema) → the dominant standard when validation is needed

3. First Example – What “Valid” vs “Invalid” Looks Like

XML document (very simple invoice)

XML

Now imagine we have this schema rule (simplified XSD):

  • <invoice> must have exactly these children in this order:
    1. <number> (required, string)
    2. <date> (required, xs:date format)
    3. <customer> (required, must have attribute gstin)
    4. <items> (required, at least 1 <item>)
    5. <total> (required, decimal with max 2 decimal places)

Valid version (follows all rules)

XML

Invalid versions (will be rejected by validator):

XML

4. Very Simple DTD Example (you should at least see it once)

invoice.dtd

dtd
<!ELEMENT invoice (number, date, customer, items, total)>
<!ELEMENT number (#PCDATA)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT customer (#PCDATA)>
<!ATTLIST customer gstin CDATA #REQUIRED>
<!ELEMENT items (item+)>
<!ELEMENT item (description, quantity, price)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT quantity (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT total (#PCDATA)>

XML that references this DTD

XML

Problems with DTD (why people mostly moved away):

  • No data types (everything is text or CDATA)
  • No namespaces support
  • Very weak constraints
  • No regex or patterns
  • Security issues (external DTD can be dangerous)

5. Modern Way – XML Schema (XSD) – Realistic Example

Very simplified invoice.xsd

XML

XML that references this schema

XML

6. Tools People Actually Use to Validate XML in 2025–2026

Tool / Environment Best for How to use (quick command)
xmllint (libxml2) Command line, scripts, CI/CD xmllint –schema invoice.xsd document.xml
Oxygen XML Editor Developers, daily work Open file → right-click → Validate → XML Schema
XMLSpy Enterprise, complex schemas Professional GUI, very powerful
Java (JAXP) Enterprise applications SchemaFactory + Validator
Python (lxml) Scripting, automation xmlschema library or lxml with schema
Online validators Quick checks freeformatter.com, xmlvalidation.com, liquid-technologies.com
BaseX, eXist-db XML databases Built-in validation

Quick Summary – XML Validation in One Page

  • Well-formed = correct syntax → always required
  • Valid = correct structure & content according to rules → optional but important in serious systems
  • DTD → old, simple, still exists in legacy code
  • XSD (XML Schema) → modern standard, very powerful, dominant today
  • Validation is not automatic — you must explicitly tell the parser to validate
  • Real-world usage: e-invoice, financial messages, healthcare (HL7 CDA), government data exchange, configuration files

Would you like to go deeper into any of these next?

  • Writing a realistic XSD from scratch (step by step)
  • How to validate with namespaces (very common issue)
  • Validation in code (Java, Python, JavaScript examples)
  • Difference between XSD 1.0 vs 1.1 (assertions, etc.)
  • Common validation error messages and how to fix them
  • How GST e-invoice validation really works in India

Just tell me what interests you most right now! 😊

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *