Chapter 18: XML DTD
1. What is a DTD? (The clearest explanation)
DTD = Document Type Definition
A DTD is a set of rules that describes:
- Which elements are allowed in the XML document
- In what order they must appear
- Which attributes each element can have
- Whether elements/attributes are required or optional
- What kind of content each element can contain (text, other elements, both…)
Think of a DTD as a very strict blueprint or building code that says:
“If you want to build a house (XML document) in this style, then every house must have exactly one front door (<root>), at least two windows (<child>), no swimming pool on the roof, and the door must have a number attribute…”
Key point:
- A document that follows all DTD rules → valid
- A document that follows syntax rules but not DTD rules → well-formed but invalid
2. Two Ways to Use a DTD
| Way | Syntax in XML file | When people use it |
|---|---|---|
| Internal DTD | Inside the XML file itself (inside <!DOCTYPE … >) | Quick tests, small documents, learning |
| External DTD | Separate .dtd file referenced with SYSTEM or PUBLIC | Real projects, reusable rules, company/gov standards |
Most common today: External DTD (but even this is declining)
3. Very First Realistic Example – Internal DTD
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE student [ <!ELEMENT student (rollno, name, class, section, marks)> <!ELEMENT rollno (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT class (#PCDATA)> <!ELEMENT section (#PCDATA)> <!ELEMENT marks (subject+)> <!ELEMENT subject (name, score)> <!ELEMENT name (#PCDATA)> <!ELEMENT score (#PCDATA)> ]> <student> <rollno>101</rollno> <name>Priyanka Reddy</name> <class>XI</class> <section>A</section> <marks> <subject> <name>Maths</name> <score>94</score> </subject> <subject> <name>Physics</name> <score>88</score> </subject> </marks> </student> |
What this DTD says:
- Root element must be <student>
- <student> must contain exactly in this order: rollno, name, class, section, marks
- Each of those is text (#PCDATA)
- <marks> must contain one or more (+) <subject>
- Each <subject> must have exactly<name> and <score>
Invalid examples (validator would reject these):
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
<!-- Missing <section> --> <student> <rollno>101</rollno> <name>Priyanka</name> <class>XI</class> <marks>...</marks> </student> <!-- Wrong order --> <student> <name>...</name> <rollno>...</rollno> <!-- wrong order --> </student> <!-- <marks> has no <subject> --> <marks></marks> <!-- + means at least one --> <!-- Extra element not allowed --> <student> ... <phone>9876543210</phone> <!-- not declared in DTD --> </student> |
4. Most Important DTD Building Blocks (Cheat Sheet)
| Declaration | Meaning / Example | Most common usage |
|---|---|---|
| <!ELEMENT name (child1, child2)> | Element with exactly these children in order | Strict structure |
<!ELEMENT name (child1 |
child2)> | Element can have either one or the other |
| <!ELEMENT name (child*)> | Zero or more children | Optional repeating items |
| <!ELEMENT name (child+)> | One or more children | Required repeating items |
| <!ELEMENT name (#PCDATA)> | Only text content | Leaf elements |
| <!ELEMENT name EMPTY> | No content allowed | Empty tags (<br/>) |
| <!ELEMENT name ANY> | Anything allowed (very loose) | Transitional / debugging |
| <!ATTLIST element attr CDATA #REQUIRED> | Attribute is mandatory | Required fields |
| <!ATTLIST element attr CDATA #IMPLIED> | Attribute is optional | Optional fields |
<!ATTLIST element attr (yes |
no) "yes"> | Enumeration – fixed list of values |
| <!ATTLIST element attr ID #REQUIRED> | Unique identifier | Linking with IDREF |
5. Real-World Style Example – External DTD (what you see in serious files)
invoice.dtd
<!ELEMENT invoice (header, seller, buyer, items, totals)>
<!ATTLIST invoice
number CDATA #REQUIRED
date CDATA #REQUIRED
><!ELEMENT header (invoiceNumber, issueDate, dueDate)><!ELEMENT invoiceNumber (#PCDATA)>
<!ELEMENT issueDate (#PCDATA)>
<!ELEMENT dueDate (#PCDATA)>
<!ELEMENT seller (name, gstin, address?)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT gstin (#PCDATA)>
<!ELEMENT address (#PCDATA)>
<!ELEMENT buyer (name, gstin)>
<!ELEMENT items (item+)>
<!ELEMENT item (description, quantity, rate, amount)>
<!ATTLIST item line CDATA #REQUIRED>
<!ELEMENT description (#PCDATA)>
<!ELEMENT quantity (#PCDATA)>
<!ELEMENT rate (#PCDATA)>
<!ELEMENT amount (#PCDATA)>
<!ELEMENT totals (subtotal, tax, grandTotal)>
<!ELEMENT subtotal (#PCDATA)>
<!ELEMENT tax (#PCDATA)>
<!ELEMENT grandTotal (#PCDATA)>
XML file that uses it
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE invoice SYSTEM "invoice.dtd"> <invoice number="INV/2025/0789" date="2025-07-25"> <header> <invoiceNumber>INV/2025/0789</invoiceNumber> <issueDate>2025-07-25</issueDate> <dueDate>2025-08-10</dueDate> </header> ... </invoice> |
6. Very Common DTD Patterns You Will See
- ID / IDREF for internal linking
<!ATTLIST person id ID #REQUIRED>
<!ATTLIST employee manager IDREF #IMPLIED>- Enumeration for controlled values
<!ATTLIST order status (pending | processing | shipped | delivered) "pending">- Mixed content (text + elements)
<!ELEMENT description (#PCDATA | bold | italic)*>7. Honest Reality in 2025–2026
| Situation | What actually happens today |
|---|---|
| New projects needing validation | Use XSD (XML Schema) |
| Very old systems / SOAP services | Still many use DTD |
| Configuration files (web.xml, etc.) | Often internal DTD or no validation |
| Learning XML | DTD is still taught first because it’s simpler |
| Government / enterprise standards | Mostly XSD now (GST e-invoice, ISO 20022, HL7, etc.) |
Bottom line:
DTD is legacy technology — you should understand it, but almost never choose it for new work.
Would you like to continue with one of these next?
- Writing a more realistic DTD for invoice / order / student report
- Differences between DTD and XSD – detailed comparison
- How to validate XML against DTD (xmllint, online tools, code examples)
- Common errors people make when writing DTDs
- How ID/IDREF really works in practice
- Transitioning from DTD to XSD (what changes)
Just tell me what feels most useful or interesting for you right now! 😊
