chapter 2: Introduction to XML
introduction to XML, written as if I’m a patient teacher explaining it to a beginner student (or someone who has forgotten the basics and wants to really understand it properly).
I will go slowly, use lots of everyday analogies, show many small and medium examples, point out common beginner mistakes, and explain why things are done the way they are.
Let’s begin.
What is XML really? (The simplest honest explanation)
XML = eXtensible Markup Language
It is not a programming language. It is not a database. It is not a replacement for JSON (although people compare them a lot).
XML is a very strict, very readable way to label and organize information so that both humans and computers can understand the meaning and structure of the data.
Think of XML as:
A librarian who is extremely pedantic about putting clear labels on every shelf, box, and item inside the library.
Example everyone understands:
|
0 1 2 3 4 5 6 7 8 9 10 11 |
<person> <name>Ananya Mehta</name> <age>24</age> <city>Hyderabad</city> <isStudent>true</isStudent> </person> |
Everyone who reads this immediately understands:
- This is information about one person
- We know her name, age, city, and whether she is a student
The labels (<name>, <age>, etc.) are telling us what each piece of data means.
That is the whole philosophy of XML.
Why was XML created? (Very short history – just the important part)
Around 1996–1998 people had many different formats:
- CSV (comma separated values) → very flat, no hierarchy
- Custom text formats (very hard to read and error-prone)
- Binary formats (not human readable at all)
Problem: Companies and organizations could not easily exchange data with each other.
So XML was invented with one main goal:
“Let’s create a format that is human-readable, self-describing, hierarchical, and extensible — so anyone can understand it even years later.”
It became extremely popular between ~2000–2015 and is still very widely used today (even if JSON became more fashionable for new APIs).
The absolute most important rules of XML (you must remember these)
| # | Rule | Why it matters | Wrong example | Correct example |
|---|---|---|---|---|
| 1 | Exactly one root element | XML document must have exactly one parent | <book>..</book><pen>..</pen> | <library><book>..</book></library> |
| 2 | Every opening tag must have a closing tag | Otherwise structure breaks | <name>Riya | <name>Riya</name> |
| 3 | Tags must be properly nested (no crossing) | Like Russian dolls — inner ones close first | <b><i>text</b></i> | <b><i>text</i></b> |
| 4 | Tag names are case-sensitive | <Person> ≠ <person> | <Person>..</person> | <Person>..</Person> |
| 5 | Attribute values must be in quotes | Otherwise parser gets confused | id=101 | id=”101″ or id=’101′ |
| 6 | Special characters in text must be escaped | <, >, & have special meaning | Price < 500 & discount > 10% | Price < 500 & discount > 10% |
If you remember only these 6 rules, most of your XML will be well-formed (which is the first level of being correct).
First real example – very simple but complete
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
<?xml version="1.0" encoding="UTF-8"?> <student rollNumber="A102"> <fullName>Priyanka Reddy</fullName> <class>10</class> <section>A</section> <marks> <english>88</english> <maths>94</maths> <science>91</science> <social>85</social> </marks> <attendancePercentage>96.4</attendancePercentage> <isActive>true</isActive> </student> |
What we learn from this small document:
- <?xml … ?> → declaration (optional but very good habit)
- <student> → root element (only one allowed)
- rollNumber=”A102″ → attribute
- <marks> → parent element (contains children)
- <english>88</english> → element with text content
- Numbers, booleans, strings — everything is stored as text between tags
Elements vs Attributes – The golden question
This is the question almost every beginner asks:
Should I put this information in an element or as an attribute?
Here’s the most practical modern advice (2024–2026 style):
| Use elements when … | Use attributes when … |
|---|---|
| The value can be long or multi-line | The value is short, fixed, identifier-like |
| You think you might want to add sub-elements later | It feels like metadata or property |
| It is the main content or noun | It is additional information about the element |
| Examples: name, address, description, comment, price | Examples: id, code, date, version, status, type |
Very common real-world pattern (this pattern is used in millions of systems):
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
<product id="P-784512" category="electronics" inStock="true" discountPercent="15"> <name>Noise ColorFit Ultra 3 Smartwatch</name> <brand>Noise</brand> <price>3499.00</price> <color>Jet Black</color> <description>1.96" AMOLED display, Bluetooth calling, 100+ sports modes</description> <images> <image url="https://example.com/img1.jpg" primary="true"/> <image url="https://example.com/img2.jpg"/> </images> </product> |
See the logic?
- id, category, inStock, discountPercent → attributes (short, descriptive)
- name, price, description, images → elements (more important or structured)
Empty elements (two correct ways)
Both are 100% valid:
|
0 1 2 3 4 5 6 7 8 9 10 |
<!-- Style 1 – short form (most popular today) --> <image src="profile.jpg" alt="User photo"/> <!-- Style 2 – long form --> <image src="profile.jpg" alt="User photo"></image> |
Most modern XML uses the short form <tag/> when there is no content.
Very common special characters you must escape
| You want to write … | You must write in XML … |
|---|---|
| < | < |
| > | > |
| & | & |
| ” (inside double-quoted attribute) | “ |
| ‘ (inside single-quoted attribute) | ‘ |
Real example:
|
0 1 2 3 4 5 6 7 8 9 |
<message> Price is &lt; ₹500 &amp; discount &gt; 20% He said: &quot;This is amazing!&quot; </message> |
One more complete example – something people actually use
Small online order fragment
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
<?xml version="1.0" encoding="UTF-8"?> <order orderId="ORD-20250621-4781" placedOn="2025-06-21" status="processing"> <customer id="CUST-9382"> <name>Samarth Jain</name> <email>samarth.jain.94@gmail.com</email> <phone>+919876543210</phone> </customer> <shippingAddress type="home"> <flat>503</flat> <building>Sunrise Apartments</building> <street>Kondapur Main Road</street> <city>Hyderabad</city> <state>Telangana</state> <pincode>500084</pincode> </shippingAddress> <items count="2"> <item> <sku>WSH-BLK-M</sku> <name>Black Hoodie - Medium</name> <quantity>1</quantity> <unitPrice>1299.00</unitPrice> </item> <item> <sku>TS-WHT-L</sku> <name>White T-Shirt - Large</name> <quantity>2</quantity> <unitPrice>499.00</unitPrice> </item> </items> <totalAmount currency="INR">2297.00</totalAmount> </order> |
This kind of structure is extremely common in:
- E-commerce
- Billing / invoicing
- ERP systems
- Government data exchange
- Many older (and some modern) APIs
Quick summary – What you should remember today
- XML = structured, labeled, hierarchical data
- One root element only
- Properly nested and closed tags
- Case-sensitive names
- Quoted attributes
- Escape < > & when they are normal text
- Use elements for important/content data
- Use attributes for metadata / identifiers
Where would you like to go next?
- More examples from real life (invoice, resume, configuration, RSS feed…)
- Well-formed vs valid XML (big difference)
- Attributes vs child elements – more decision practice
- CDATA – when you need to put HTML, code, JSON inside XML
- Comments, processing instructions, and XML declaration details
- Naming conventions people use in serious projects
- First look at XML Namespaces (why they exist)
Tell me what feels most interesting or useful for you right now and we’ll continue from there. 😊
