What XML is and how it is structured
XML (eXtensible Markup Language) is a text format for encoding structured data in a way both humans and machines can read. Unlike HTML, which has a fixed vocabulary of elements for web pages, XML lets you define your own element names for any domain. A medical record system might use <Patient> and <Diagnosis>; a financial feed might use <Trade> and <Settlement>. The grammar rules are universal; the vocabulary is application-specific.
Every XML document has a tree structure. The root element contains all other elements; each element can contain child elements, text content, or both. Attributes provide metadata about an element (<book isbn="978-0-13-110362-7">). The optional XML declaration at the top (<?xml version="1.0" encoding="UTF-8"?>) states the XML version and character encoding. CDATA sections (<![CDATA[.]]>) allow raw text containing characters that would otherwise need escaping, such as HTML embedded inside XML.
Well-formed vs. valid XML
Well-formed XML satisfies the W3C grammar: every opening tag has a closing tag, tags are properly nested (never overlapping), attribute values are quoted, and reserved characters use their escape sequences. A parser will reject any XML that is not well-formed. Valid XML is well-formed XML that additionally conforms to a schema, either a DTD (Document Type Definition) or an XSD (XML Schema Definition). Validity is optional but required by data exchange standards that need to guarantee field types and element order.
| Character | Must be written as | Context |
|---|---|---|
& | & | Anywhere in text or attribute value |
< | < | Anywhere in text content |
> | > | In text content (required after ]]) |
" | " | Inside double-quoted attribute values |
' | ' | Inside single-quoted attribute values |
Where XML is used today
XML is often described as "legacy" but it underpins large portions of enterprise software. Microsoft Office formats (.docx.xlsx.pptx) are ZIP archives containing XML files, the document content, styles, and relationships are all XML. SOAP web services: still common in banking, insurance, government, and SAP ecosystems, exchange XML envelopes. RSS and Atom feeds are XML; most podcast clients and news readers consume them directly. SVG graphics are XML, making them editable in a text editor and indexable by search engines. In healthcare, HL7 v2/v3 and CDA documents use XML; in finance, FpML and XBRL define financial data in XML.
Namespaces and common parsing patterns
XML namespaces (xmlns:prefix="URI") prevent name collisions when documents from different vocabularies are combined. A SOAP envelope, for example, declares the soap: namespace for envelope-level elements and a separate namespace for the application payload. Namespace URIs are identifiers, they do not need to resolve to a webpage. When inspecting XML from an API or export tool, copy the raw response, paste it here, and click Format to restore indentation before reading nested namespace-qualified elements.
Related tools: JSON formatter for REST API responses, YAML formatter for configuration files. See CI/CD integration guide for XML linting in build pipelines.