chapter 2: Introduction to XML

introduction to XML, written as if I’m a patient teacher explaining it to a beginner student (or someone who has forgotten the basics and wants to really understand it properly).

I will go slowly, use lots of everyday analogies, show many small and medium examples, point out common beginner mistakes, and explain why things are done the way they are.

Let’s begin.

What is XML really? (The simplest honest explanation)

XML = eXtensible Markup Language

It is not a programming language. It is not a database. It is not a replacement for JSON (although people compare them a lot).

XML is a very strict, very readable way to label and organize information so that both humans and computers can understand the meaning and structure of the data.

Think of XML as:

A librarian who is extremely pedantic about putting clear labels on every shelf, box, and item inside the library.

Example everyone understands:

XML

Everyone who reads this immediately understands:

  • This is information about one person
  • We know her name, age, city, and whether she is a student

The labels (<name>, <age>, etc.) are telling us what each piece of data means.

That is the whole philosophy of XML.

Why was XML created? (Very short history – just the important part)

Around 1996–1998 people had many different formats:

  • CSV (comma separated values) → very flat, no hierarchy
  • Custom text formats (very hard to read and error-prone)
  • Binary formats (not human readable at all)

Problem: Companies and organizations could not easily exchange data with each other.

So XML was invented with one main goal:

“Let’s create a format that is human-readable, self-describing, hierarchical, and extensible — so anyone can understand it even years later.”

It became extremely popular between ~2000–2015 and is still very widely used today (even if JSON became more fashionable for new APIs).

The absolute most important rules of XML (you must remember these)

# Rule Why it matters Wrong example Correct example
1 Exactly one root element XML document must have exactly one parent <book>..</book><pen>..</pen> <library><book>..</book></library>
2 Every opening tag must have a closing tag Otherwise structure breaks <name>Riya <name>Riya</name>
3 Tags must be properly nested (no crossing) Like Russian dolls — inner ones close first <b><i>text</b></i> <b><i>text</i></b>
4 Tag names are case-sensitive <Person> ≠ <person> <Person>..</person> <Person>..</Person>
5 Attribute values must be in quotes Otherwise parser gets confused id=101 id=”101″ or id=’101′
6 Special characters in text must be escaped <, >, & have special meaning Price < 500 & discount > 10% Price &lt; 500 &amp; discount &gt; 10%

If you remember only these 6 rules, most of your XML will be well-formed (which is the first level of being correct).

First real example – very simple but complete

XML

What we learn from this small document:

  • <?xml … ?> → declaration (optional but very good habit)
  • <student> → root element (only one allowed)
  • rollNumber=”A102″ → attribute
  • <marks> → parent element (contains children)
  • <english>88</english> → element with text content
  • Numbers, booleans, strings — everything is stored as text between tags

Elements vs Attributes – The golden question

This is the question almost every beginner asks:

Should I put this information in an element or as an attribute?

Here’s the most practical modern advice (2024–2026 style):

Use elements when … Use attributes when …
The value can be long or multi-line The value is short, fixed, identifier-like
You think you might want to add sub-elements later It feels like metadata or property
It is the main content or noun It is additional information about the element
Examples: name, address, description, comment, price Examples: id, code, date, version, status, type

Very common real-world pattern (this pattern is used in millions of systems):

XML

See the logic?

  • id, category, inStock, discountPercent → attributes (short, descriptive)
  • name, price, description, images → elements (more important or structured)

Empty elements (two correct ways)

Both are 100% valid:

XML

Most modern XML uses the short form <tag/> when there is no content.

Very common special characters you must escape

You want to write … You must write in XML …
< <
> >
& &
” (inside double-quoted attribute)
‘ (inside single-quoted attribute)

Real example:

XML

One more complete example – something people actually use

Small online order fragment

XML

This kind of structure is extremely common in:

  • E-commerce
  • Billing / invoicing
  • ERP systems
  • Government data exchange
  • Many older (and some modern) APIs

Quick summary – What you should remember today

  1. XML = structured, labeled, hierarchical data
  2. One root element only
  3. Properly nested and closed tags
  4. Case-sensitive names
  5. Quoted attributes
  6. Escape < > & when they are normal text
  7. Use elements for important/content data
  8. Use attributes for metadata / identifiers

Where would you like to go next?

  • More examples from real life (invoice, resume, configuration, RSS feed…)
  • Well-formed vs valid XML (big difference)
  • Attributes vs child elements – more decision practice
  • CDATA – when you need to put HTML, code, JSON inside XML
  • Comments, processing instructions, and XML declaration details
  • Naming conventions people use in serious projects
  • First look at XML Namespaces (why they exist)

Tell me what feels most interesting or useful for you right now and we’ll continue from there. 😊

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *