User:Paul Wormer/scratchbook

The rules for XHTML 1.0 (eXtensible HyperText Markup Language version 1.0) were formally established on 26 January 2000. The XHTML language is very similar to HTML version 4.01 whose standard was established a year earlier. The two languages share their vocabulary: elements, attributes, and character entities are identical. There are only a few minor syntactic difference (see the next section). The meaning (semantics), too, of the terms in the respective vocabularies is exactly the same. Both languages have as their main goal the composing of documents for the World Wide Web. The languages do not only offer the same markup facilities for typesetting and layout, but also their hyperlinks are the same. Hyperlinks enable readers to switch with one mouse click to other places within a document and to other documents on the Web. Exactly as HTML 4.01, XHTML 1.0 is able to call scripts (pieces of computer code often in the language JavaScript) that operate on objects of the Document Object Model (DOM). The DOM sees a document as a tree-like structure consisting of objects that have methods for traversal, augmenting, and modification of documents. The DOM methods may be invoked from HTML and XHTML by scripts written in an appropriate scripting language, such as VBScript or JavaScript.

HTML 4 is the first version of HTML that is strictly conform SGML (Standard Generalized Markup Language, ISO 8879). The SGML standard is extensive and complex; a simplified version, named XML 1.0, appeared in 1998. XHTML 1.0 is an adaptation of HTML 4 to the XML 1.0 standard. The syntactic differences between HTML 4.01 and XHTML 1.0 are all due to the fact that XML requires a slightly stricter syntax than SGML.

A later version of XHTML 1.0 is XHTML 1.1, which is not widely used. An ongoing development in the field  is towards a new standard of HTML, provisionally called HTML5.

Differences of XHTML 1.0 with HTML 4.01
As said, the syntactic differences between HTML 4 and XHTML 1.0 are caused by the former being SGML-conform and the latter being XML-conform. The main differences are:


 * XHTML documents must be well-formed. Well-formedness is a concept introduced by XML. Essentially this means that all elements must have closing tags (as described below), and that all the elements must nest properly. Incorrect XHTML is:
 * Many HTML based user agents (browsers, etc.) can handle such overlapping nesting, but from the point of view of XML the nesting in this example is malformed.
 * Many HTML based user agents (browsers, etc.) can handle such overlapping nesting, but from the point of view of XML the nesting in this example is malformed.


 * HTML is case-insensitive, whereas XHTML requires element and attribute names to be in lower case.


 * For non-empty elements, end tags are required. Incorrect is:
 * Correct is:
 * Correct is:


 * Attribute values must always be quoted. Incorrect is:
 * Correct are
 * and
 * Both types of quotes (&thinsp;' and "&thinsp;) are allowed, they are completely equivalent.
 * Both types of quotes (&thinsp;' and "&thinsp;) are allowed, they are completely equivalent.


 * Attribute minimization is forbidden. Correct:  Incorrect:


 * Empty elements (elements without content) must either have an end tag or the start tag must end with />. Both are correct:
 * and


 * The characters &lt; and &amp; will be treated as the start of markup and hence these characters cannot be used in their own meaning. Strings as &amp;lt; and &amp;amp; will be translated by the XML processor to &lt; and &amp; respectively, and not seen as markup characters. For example, use &amp;amp; in contexts like  . If it is inconvenient to use these character entities (for instance because there are many inside a script), then the text containing the characters (for instance the script source) can be enclosed as follows:


 * In HTML the  attribute of many elements can serve as the target of a hyperlink (i.e., as "fragment identifier"). XHTML 1.0 documents must use the   attribute&mdash;instead of  &mdash;when defining fragment identifiers on the following elements: , and  . For backward reference one often assigns both   and    to the same fragment identifier.


 * SGML and XML both permit references to characters by using hexadecimal values. In SGML (and HTML 4) these references could be made using &#XNN;, &#xNN, &#Xnn;,  or &#xnn;. In XML (and XHTML 1) documents,  lower-case X (i.e. &#xnn; or &#xNN;) is mandatory. For example, &amp;#XAE;, &amp;#xAE;, &amp;#Xae;, and  &amp;#xae; are all valid HTML 4 for &#xae;, but only  &amp;#xAE; and &amp;#xae; are valid XHTML.

Valid XHTML
Valid XHTML is well-formed (syntactically correct) and in agreement with a DTD (Document Type Definition). The W3 (World Wide Web) Consortium made available three different DTDs for XHTML: strict, transitional, and frameset. The transitional DTD defines the same elements and attributes as the strict DTD, but in addition also attributes that are marked "deprecated" in the HTML 4.01 standard. All deprecated attributes dictate appearance and presentation (layout, fonts, etc), a task that is to be taken over by CSS (Cascading Style Sheets). The frameset DTD is applicable to documents that contain frames.

A valid XHTML document is associated with a DTD with a URL (that by definition is unique) by a &lt;!DOCTYPE statement. For instance, for a strict document:

where the string  refers to the root element of the document. The &lt;!DOCTYPE statement comes before the root element.

XML 1.0 knows the concept of namespaces. All parts of a valid document are associated with (one or more different) namespaces. A valid XHTML document must be explicitly associated with at least one namespace, even when it is the default. An XML namespace is defined by the attribute, which must be  an attribute of the root element if it has to cover the whole document. Hence a valid XHTML document must have a root element of the type:

where the URL is the address of a (default) namespace. Optionally, the natural language of the document may be specified as another attribute of.

The validity of a document that is accessible from the internet can be checked by W3C validator.