Chapter 17: XML Validator
XML Validation — written as if I’m sitting next to you, explaining everything patiently step by step like a real teacher who wants you to really understand the concepts, see the differences, and know how people actually validate XML in real projects in 2025–2026.
We will go slowly and clearly:
- What does “validation” actually mean?
- Why do we need it?
- The two main kinds of validation
- Tools & methods people use today
- Many realistic examples
- Common mistakes & how to avoid them
Let’s start.
1. What Does “XML Validation” Really Mean?
Validation = checking whether an XML document follows the rules that were defined for it.
There are two completely different levels of correctness in XML:
| Level | Name | What it checks | Required? | Error message example |
|---|---|---|---|---|
| Level 1 | Well-formedness | Syntax rules (tags closed, nested properly, quotes, etc.) | Mandatory | “Missing closing tag”, “Attribute not quoted” |
| Level 2 | Validity | Whether the content follows a specific structure (allowed elements, order, data types, required fields…) | Optional | “Element <price> is missing required attribute ‘currency’” |
Most important sentence you should remember:
Every XML document must be well-formed. Only some XML documents are also valid.
Analogy:
- Well-formed = the book has correct grammar, punctuation, chapters are properly numbered
- Valid = the book also follows the official style guide of the publisher (e.g. every chapter must have exactly 3 subsections, no footnotes allowed in chapter 1, prices must be in INR format…)
2. The Two Main Ways to Define “Valid XML”
| Method | File name / format | What it is | Still widely used in 2025–2026? | Main users today |
|---|---|---|---|---|
| DTD | .dtd | Document Type Definition – old, simple, limited | Yes, but declining | Legacy systems, SOAP, very old configs |
| XML Schema (XSD) | .xsd | XML Schema Definition – modern, powerful | Very widely used | e-invoice, finance, government, healthcare, EDI |
Reality in 2025–2026:
- DTD → still exists in many old systems, but almost nobody chooses it for new work
- XSD (XML Schema) → the dominant standard when validation is needed
3. First Example – What “Valid” vs “Invalid” Looks Like
XML document (very simple invoice)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
<?xml version="1.0" encoding="UTF-8"?> <invoice> <number>INV-2025-0789</number> <date>2025-07-25</date> <customer>GSTIN:29AABCM9876Q1Z5</customer> <items> <item> <description>Laptop Dell XPS</description> <quantity>1</quantity> <price>145000.00</price> </item> </items> <total>145000.00</total> </invoice> |
Now imagine we have this schema rule (simplified XSD):
- <invoice> must have exactly these children in this order:
- <number> (required, string)
- <date> (required, xs:date format)
- <customer> (required, must have attribute gstin)
- <items> (required, at least 1 <item>)
- <total> (required, decimal with max 2 decimal places)
Valid version (follows all rules)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
<invoice> <number>INV-2025-0789</number> <date>2025-07-25</date> <customer gstin="29AABCM9876Q1Z5">Creative Minds Academy</customer> <items> <item> <description>Laptop Dell XPS</description> <quantity>1</quantity> <price currency="INR">145000.00</price> </item> </items> <total currency="INR">145000.00</total> </invoice> |
Invalid versions (will be rejected by validator):
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
<!-- Missing required <date> --> <invoice> <number>INV-2025-0789</number> <customer gstin="29AABCM9876Q1Z5">ABC Corp</customer> ... </invoice> <!-- Wrong order --> <invoice> <date>2025-07-25</date> <number>INV-2025-0789</number> <!-- wrong order --> ... </invoice> <!-- Invalid data type --> <date>25-07-2025</date> <!-- not xs:date format --> <!-- Missing required attribute --> <customer>Creative Minds</customer> <!-- gstin attribute required --> |
4. Very Simple DTD Example (you should at least see it once)
invoice.dtd
<!ELEMENT invoice (number, date, customer, items, total)>
<!ELEMENT number (#PCDATA)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT customer (#PCDATA)>
<!ATTLIST customer gstin CDATA #REQUIRED>
<!ELEMENT items (item+)>
<!ELEMENT item (description, quantity, price)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT quantity (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT total (#PCDATA)>XML that references this DTD
|
0 1 2 3 4 5 6 7 8 9 10 |
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE invoice SYSTEM "invoice.dtd"> <invoice> ... </invoice> |
Problems with DTD (why people mostly moved away):
- No data types (everything is text or CDATA)
- No namespaces support
- Very weak constraints
- No regex or patterns
- Security issues (external DTD can be dangerous)
5. Modern Way – XML Schema (XSD) – Realistic Example
Very simplified invoice.xsd
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="invoice"> <xs:complexType> <xs:sequence> <xs:element name="number" type="xs:string"/> <xs:element name="date" type="xs:date"/> <xs:element name="customer"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="gstin" type="xs:string" use="required"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="items"> <xs:complexType> <xs:sequence> <xs:element name="item" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="description" type="xs:string"/> <xs:element name="quantity" type="xs:positiveInteger"/> <xs:element name="price"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:decimal"> <xs:attribute name="currency" type="xs:string" use="required"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="total"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:decimal"> <xs:attribute name="currency" type="xs:string" use="required"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> |
XML that references this schema
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
<?xml version="1.0" encoding="UTF-8"?> <invoice xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="invoice.xsd"> <number>INV-2025-0789</number> <date>2025-07-25</date> <customer gstin="29AABCM9876Q1Z5">Creative Minds Academy</customer> ... </invoice> |
6. Tools People Actually Use to Validate XML in 2025–2026
| Tool / Environment | Best for | How to use (quick command) |
|---|---|---|
| xmllint (libxml2) | Command line, scripts, CI/CD | xmllint –schema invoice.xsd document.xml |
| Oxygen XML Editor | Developers, daily work | Open file → right-click → Validate → XML Schema |
| XMLSpy | Enterprise, complex schemas | Professional GUI, very powerful |
| Java (JAXP) | Enterprise applications | SchemaFactory + Validator |
| Python (lxml) | Scripting, automation | xmlschema library or lxml with schema |
| Online validators | Quick checks | freeformatter.com, xmlvalidation.com, liquid-technologies.com |
| BaseX, eXist-db | XML databases | Built-in validation |
Quick Summary – XML Validation in One Page
- Well-formed = correct syntax → always required
- Valid = correct structure & content according to rules → optional but important in serious systems
- DTD → old, simple, still exists in legacy code
- XSD (XML Schema) → modern standard, very powerful, dominant today
- Validation is not automatic — you must explicitly tell the parser to validate
- Real-world usage: e-invoice, financial messages, healthcare (HL7 CDA), government data exchange, configuration files
Would you like to go deeper into any of these next?
- Writing a realistic XSD from scratch (step by step)
- How to validate with namespaces (very common issue)
- Validation in code (Java, Python, JavaScript examples)
- Difference between XSD 1.0 vs 1.1 (assertions, etc.)
- Common validation error messages and how to fix them
- How GST e-invoice validation really works in India
Just tell me what interests you most right now! 😊
