HHG to the eXtensible Markup Language in PDS4

From The SBN Wiki
Revision as of 14:15, 19 November 2012 by Raugh (talk | contribs) (Creation - Safety Save)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

XML is the acronym for the eXtensible Markup Language, developed and published as an official recommendation of the World Wide Web Consortium ("W3C"). Recommendations of the W3C are equivalent to standards from other international bodies, such as ISO and IEEE.

What It Is

XML is a syntax standard intended for use primarily in text files. It provides extremely generic rules for identifying markup within text - that is, tags that indicate something about how the enclosed content should be processed.

If you're familiar with HTML and its tags, then an XML document will look familiar. But unlike HTML, XML is very strict about things like case-matching, order of markup tags, and such. Most HTML is not valid XML.

What It Is Not

XML is not a language itself. It is not code, nor is it a processor. It does not even define the actual markup - it only defines a standard way to differentiate between content and markup.

Basic Requirements

  • Any XML document must start with a processing instruction which looks something like this:
      <?xml version="1.0" encoding="UTF-8"?>
This tells the processor software that this is an XML document, that it follows version 1.0 of the XML standard, and the character set used in the document is going to be UTF-8 (i.e., Unicode).
  • An XML element consists of an opening tag, perhaps some content, and a closing tag.
  • An opening tag has the syntax <tag-name>, where tag-name is composed of letters, digits, and other printing characters as defined by the XML standard. The name must begin with a letter. It may not contain whitespace of any kind.
  • An opening tag may also have attributes following the tag name. Attributes have the form att-name="att-value". Attribute values must always be quoted, though you can use either single or double quotes.
  • A closing tag has the form </tag-name>. Closing tags never have attributes, even if the opening tag did.
  • If a tag has no content, the opening and closing tag can be combined using the shorthand notation <tag-name/>.
  • Tags may be nested but they may not be interleaved. <bold><italic>Hello, world.</italic></bold> is valid; <bold><italic>Hello, world.</bold></italic> is not.
  • Note that the XML standard does not define tag names - it only limits the character set and defines how to tell tags from content.
  • The XML comment marker begins with &lt!-- and ends with -->. Everything in between is to be ignored by processors. Comments can start anywhere outside a tag where whitespace would be valid; they extend over line breaks, but they cannot be nested.