Locale Page...  Global  |  Germany  |  UK  |  USA
Your privat CyberGadget - The finest Resources for Web-Designer, Web-Master and Web-Developer!
Quick Search
Advertisement
Partner & Friends
Developersdex
Tutorial Guide
Send News    Add URL / Entry    Tag it:digg it!Stumble It!YahooMyWeb!del.icio.us!Simpify!reddit!Netvouz!Ma.gnolia!FurlIt!Blogmarks!BlinkList!
W3C Working Draft - XHTML™ 2.0
W3C

XHTML™ 2.0

W3C Working Draft 22 July 2004

This version:
http://www.w3.org/TR/2004/WD-xhtml2-20040722
Latest version:
http://www.w3.org/TR/xhtml2
Previous version:
http://www.w3.org/TR/2003/WD-xhtml2-20030506
Diff-marked version:
xhtml2-diff.html
Editors:
Jonny Axelsson, Opera Software
Beth Epperson, Netscape/AOL
Masayasu Ishikawa, W3C
Shane McCarron, Applied Testing and Technology
Ann Navarro, WebGeek, Inc.
Steven Pemberton, CWI (HTML Working Group Chair)

This document is also available in these non-normative formats: Single XHTML file, PostScript version, PDF version, ZIP archive, and Gzip'd TAR archive.


Abstract

XHTML 2 is a general-purpose markup language designed for representing documents for a wide range of purposes across the World Wide Web. To this end it does not attempt to be all things to all people, supplying every possible markup idiom, but to supply a generally useful set of elements.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is the sixth public Working Draft of this specification. It should in no way be considered stable, and should not be normatively referenced for any purposes whatsoever. This version includes an early implementation of XHTML 2.0 in RELAX NG [RELAXNG], but does not include the implementations in DTD or XML Schema form. Those will be included in subsequent versions, once the content of this language stabilizes. This version also does not address the issues revolving around the use of [XLINK] by XHTML 2.0. Those issues are being worked independent of the evolution of this specification. Those issues should, of course, be resolved as quickly as possible, and the resolution will be reflected in a future draft. Finally, the Working Group is actively working to resolve many of the issues that have been submitted by the public, but not all of them have been addressed yet. If your particular issue has not yet been addressed, please be patient - there are many issues, and some are more complex than others.

Formal issues and error reports on this specification shall be submitted to www-html-editor@w3.org (archive). It is inappropriate to send discussion email to this address. Public discussion may take place on www-html@w3.org (archive). To subscribe send an email to www-html-request@w3.org with the word subscribe in the subject line.

This document has been produced by the W3C HTML Working Group (members only) as part of the W3C HTML Activity. The goals of the HTML Working Group are discussed in the HTML Working Group charter.

This document was produced under the 24 January 2002 CPP as amended by the W3C Patent Policy Transition Procedure. The Working Group maintains a public list of patent disclosures relevant to this document; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification should disclose the information in accordance with section 6 of the W3C Patent Policy.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than "work in progress."

Quick Table of Contents

List of Issues

Full Table of Contents

1. Introduction

This section is informative.

1.1. What is XHTML 2?

XHTML 2 is a general purpose markup language designed for representing documents for a wide range of purposes across the World Wide Web. To this end it does not attempt to be all things to all people, supplying every possible markup idiom, but to supply a generally useful set of elements, with the possibility of extension using the class attribute on the span and div elements in combination with stylesheets, and attributes from the metadata attributes collection.

1.1.1. Design Aims

In designing XHTML 2, a number of design aims were kept in mind to help direct the design. These included:

  • As generic XML as possible: if a facility exists in XML, try to use that rather than duplicating it.
  • Less presentation, more structure: use stylesheets for defining presentation.
  • More usability: within the constraints of XML, try to make the language easy to write, and make the resulting documents easy to use.
  • More accessibility: some call it 'designing for our future selves' – the design should be as inclusive as possible.
  • Better internationalization: since it is a World Wide Web.
  • More device independence: new devices coming online, such as telephones, PDAs, tablets, televisions and so on mean that it is imperative to have a design that allows you to author once and render in different ways on different devices, rather than authoring new versions of the document for each type of device.
  • Less scripting: achieving functionality through scripting is difficult for the author and restricts the type of user agent you can use to view the document. We have tried to identify current typical usage, and include those usages in markup.

1.1.2. Backwards compatibility

Because earlier versions of HTML were special-purpose languages, it was necessary to ensure a level of backwards compatibility with new versions so that new documents would still be usable in older browsers. However, thanks to XML and stylesheets, such strict element-wise backwards compatibility is no longer necessary, since an XML-based browser, of which at the time of writing means more than 95% of browsers in use, can process new markup languages without having to be updated. Much of XHTML 2 works already in existing browsers; much, but not all: just as when forms and tables were added to HTML, and people had to wait for new version of browsers before being able to use the new facilities, some parts of XHTML 2, such as XForms and XML Events, still require user agents that understand that functionality.

1.1.3. XHTML 2 and Presentation

The very first version of HTML was designed to represent the structure of a document, not its presentation. Even though presentation-oriented elements were later added to the language by browser manufacturers, HTML is at heart a document structuring language. XHTML 2 takes HTML back to these roots, by removing all presentation elements, and subordinating all presentation to stylesheets. This gives greater flexibility, greater accessibility, more device independence, and more powerful presentation possibilities, since stylesheets can do more than the presentational elements of HTML ever did.

1.1.4. XHTML 2 and Linking

The original versions of HTML relied upon built-in knowledge on the part of User Agents and other document processors. While much of this knowledge had to do with presentation (see above), the bulk of the remainder had to do with the relationships between documents — so called "linking".

A variety of W3C and other efforts, most notably [XLINK], attempted to create a grammar for defining the characteristings of linking. Unfortunately, these grammars all fall short of the requirements of XHTML. While the community continues in its efforts to create a comprehensive grammar, the HTML Working Group has defined a grammar that is sufficient for its needs. The linking constructs in this document are described in terms of this grammar, [HLINK].

Note that by describing the linking characteristics using [HLINK], this document is NOT requiring that implementations support the HLINK constructs or grammar, just as it does not require implementations support [CSS2] even though some presentation aspects of this document are defined using CSS. Instead, the document relies upon this grammar as a formalism for ensuring that the characteristics of inter-document "links" can be described in a consistent manner, and thus implemented consistently.

1.2. Major Differences with XHTML 1

XHTML 2 is designed to be recognizable to the HTML and XHTML 1 author, while correcting errors and insufficiencies identified in earlier versions of the HTML family, and taking the opportunity to make improvements.

The most visible changes are the following:

  • More structuring possibilities:
    • Sections and headings: in previous versions of HTML a document's structure had to be inferred from the various levels of headings in the document; this was particularly a problem when authors misused the heading elements for visual effects. XHTML 2 lets you explicit markup the document structure with the section element, and its related header element h.
    • Separators: in previous versions of HTML, the hr element was used to separate sections of a text from each other. In retrospect, the name hr (for horizontal rule) was badly chosen, because an hr was neither necessarily horizontal (in vertical text it was vertical), nor necessarily a rule (books often use other typographical methods to represent separators, such as a line of three asterisks, and stylesheets can be used to give you this freedom). In order to emphasize its structuring nature, and to make it clearer that it has no essential directionality, hr has been renamed separator.
    • Line breaks: in previous versions of HTML, the br element was used to add micro-structure to text, essentially breaking a piece of text into several 'lines'. This micro-structure is now made explicit in XHTML 2 with the l element, which encloses the text to be broken. Amongst other advantages, this gives more presentational opportunities, such as the ability to automatically number lines, or to color alternate lines differently.
    • Paragraph structure: in earlier versions of HTML, a p element could only contain simple text. It has been improved to bring it closer to what people perceive as a paragraph, now being allowed to include such things as lists and tables.
  • Navigation lists: Part of the design of XHTML 2 has been to observe existing use of HTML and identify what is perceived as missing, for instance by use of scripting to achieve ends not supported directly in HTML. One obvious component of very many HTML pages is the 'navigation list', consisting of a collection of links to other parts of the site, presented vertically, horizontally, or as a drop-down menu. To support this type of usage, XHTML 2 introduces the navigation list element nl, which codifies such parts of documents, and allows different presentational idioms to be applied. An additional advantage is for assistive technologies, that can allow the user to skip such elements.
  • Images: the HTML img element has many shortcomings: it only allows you to specify a single resource for an image, rather than offering the fallback opportunities of the object element; the only fallback option it gives is the alt text, which can only be plain text, and not marked up in any way; the longdesc attribute which allows you to provide a long description of the image is difficult to author and seldom supported.

    XHTML 2 takes a completely different approach, by taking the premise that all images have a long description and treating the image and the text as equivalents. In XHTML 2 any element may have a src attribute, which specifies a resource (such as an image) to load instead of the element. If the resource is unavailable (because of network failure, because it is of a type that the browser can't handle, or because images have been turned off) then the element is used instead. Essentially the longdesc has been moved into the document, though this behavior also mimicks the fallback behavior of the object element. (To achieve the tooltip effect that some browsers gave with the alt attribute, as in HTML4 you use the title attribute).

  • Type: in HTML4, the type attribute when referring to an external resource was purely a hint to the user agent. In XHTML 2 it is no longer a hint, but specifies the type(s) of resource the user agent must accept.
  • Tables: the content model of tables has been cleaned up and simplified, while still allowing the same functionality.
  • Bi-directional text: rather than use an explicit element to describe bi-directional override, new values have been added to the dir attribute that allow bi-directional override on any element.
  • Edit: rather than use explicit ins and del elements to mark changes in a document, an attribute edit may be used on any element for the same purpose.
  • Linking: In HTML 3, only a elements could be the source and target of hyperlinks. In HTML 4 and XHTML 1, any element could be the target of a hyperlink, but still only a elements could be the source. In XHTML 2 any element can now also be the source of a hyperlink, since href and its associated attributes may now appear on any element. So for instance, instead of <li><a href="home.html">Home</a></li>, you can now write <li href="home.html">Home</li>. Even though this means that the a element is now strictly-speaking unnecessary, it has been retained.
  • Metadata: the meta and link elements have been generalized, and their relationship to RDF [RDF] described. Furthermore, the attributes on these two elements can be more generally applied across the language.
  • Events: event handling in HTML was restricted in several ways: since the event names were hard-wired in the language (such as onclick), the only way to add new events was to change the language; many of the events (such as click) were device-specific, rather than referring to the intent of the event (such as activating a link); you could only handle an event in one scripting language — it was not possible to supply event handlers in the same document for several different scripting languages.

    XHTML 2 uses XML Events [XMLEVENTS] to specify event handling, giving greater freedom in the ability to handle events.

  • Forms: HTML Forms were introduced in 1993, before the advent of the e-commerce revolution. Now with more than a decade of experience with their use, they have been thoroughly overhauled and updated to meet the needs of modern forms, in the shape of XForms [XFORMS], which are an integral part of XHTML 2.
  • Ownership where due: since HTML 4 was a standalone application, it defined many things which no longer need to be defined now that it is an XML application. For instance the definitions of whitespace are given by XML for input, and CSS for output; similarly, the definition of values of the media attribute are relegated to the relevant stylesheet language.

1.3. What are the XHTML 2 Modules?

XHTML 2 is a member of the XHTML Family of markup languages. It is an XHTML Host Language as defined in XHTML Modularization. As such, it is made up of a set of XHTML Modules that together describe the elements and attributes of the language, and their content model. XHTML 2 updates many of the modules defined in XHTML Modularization 1.0 [XHTMLMOD], and includes the updated versions of all those modules and their semantics. XHTML 2 also uses modules from Ruby [RUBY], XML Events [XMLEVENTS], and XForms [XFORMS].

The modules defined in this specification are largely extensions of the modules defined in XHTML Modularization 1.0. This specification also defines the semantics of the modules it includes. So, that means that unlike earlier versions of XHTML that relied upon the semantics defined in HTML 4 [HTML4], all of the semantics for XHTML 2 are defined either in this specification or in the specifications that it normatively references.

Even though the XHTML 2 modules are defined in this specification, they are available for use in other XHTML family markup languages. Over time, it is possible that the modules defined in this specification will migrate into the XHTML Modularization specification.

2. Terms and Definitions

This section is normative.

While some terms are defined in place, the following definitions are used throughout this document. Familiarity with the W3C XML 1.0 Recommendation [XML] is highly recommended.

abstract module
a unit of document type specification corresponding to a distinct type of content, corresponding to a markup construct reflecting this distinct type.
content model
the declared markup structure allowed within instances of an element type. XML 1.0 differentiates two types: elements containing only element content (no character data) and mixed content (elements that may contain character data optionally interspersed with child elements). The latter are characterized by a content specification beginning with the "#PCDATA" string (denoting character data).
deprecated
a feature marked as deprecated is in the process of being removed from this recommendation. Portable applications should not use features marked as deprecated.
document model
the effective structure and constraints of a given document type. The document model constitutes the abstract representation of the physical or semantic structures of a class of documents.
document type
a class of documents sharing a common abstract structure. The ISO 8879 [SGML] definition is as follows: "a class of documents having similar characteristics; for example, journal, article, technical manual, or memo. (4.102)"
document type definition (DTD)
a formal, machine-readable expression of the XML structure and syntax rules to which a document instance of a specific document type must conform; the schema type used in XML 1.0 to validate conformance of a document instance to its declared document type. The same markup model may be expressed by a variety of DTDs.
driver
a generally short file used to declare and instantiate the modules of a DTD. A good rule of thumb is that a DTD driver contains no markup declarations that comprise any part of the document model itself.
element
an instance of an element type.
element type
the definition of an element, that is, a container for a distinct semantic class of document content.
entity
an entity is a logical or physical storage unit containing document content. Entities may be composed of parseable XML markup or character data, or unparsed (i.e., non-XML, possibly non-textual) content. Entity content may be either defined entirely within the document entity ("internal entities") or external to the document entity ("external entities"). In parsed entities, the replacement text may include references to other entities.
entity reference
a mnemonic string used as a reference to the content of a declared entity (e.g., "&amp;" for "&", "&lt;" for "<", "&copy;" for "©".)
facilities
Facilities are elements, attributes, and the semantics associated with those elements and attributes.
generic identifier
the name identifying the element type of an element. Also, element type name.
hybrid document
A hybrid document is a document that uses more than one XML namespace. Hybrid documents may be defined as documents that contain elements or attributes from hybrid document types.
instantiate
to replace an entity reference with an instance of its declared content.
markup declaration
a syntactical construct within a DTD declaring an entity or defining a markup structure. Within XML DTDs, there are four specific types: entity declaration defines the binding between a mnemonic symbol and its replacement content; element declaration constrains which element types may occur as descendants within an element (see also content model); attribute definition list declaration defines the set of attributes for a given element type, and may also establish type constraints and default values; notation declaration defines the binding between a notation name and an external identifier referencing the format of an unparsed entity.
markup model
the markup vocabulary (i.e., the gamut of element and attribute names, notations, etc.) and grammar (i.e., the prescribed use of that vocabulary) as defined by a document type definition (i.e., a schema) The markup model is the concrete representation in markup syntax of the document model, and may be defined with varying levels of strict conformity. The same document model may be expressed by a variety of markup models.
module
an abstract unit within a document model expressed as a DTD fragment, used to consolidate markup declarations to increase the flexibility, modifiability, reuse and understanding of specific logical or semantic structures.
modularization
an implementation of a modularization model; the process of composing or de-composing a DTD by dividing its markup declarations into units or groups to support specific goals. Modules may or may not exist as separate file entities (i.e., the physical and logical structures of a DTD may mirror each other, but there is no such requirement).
modularization model
the abstract design of the document type definition (DTD) in support of the modularization goals, such as reuse, extensibility, expressiveness, ease of documentation, code size, consistency and intuitiveness of use. It is important to note that a modularization model is only orthogonally related to the document model it describes, so that two very different modularization models may describe the same document type.
parameter entity
an entity whose scope of use is within the document prolog (i.e., the external subset/DTD or internal subset). Parameter entities are disallowed within the document instance.
parent document type
A parent document type of a hybrid document is the document type of the root element.
tag
descriptive markup delimiting the start and end (including its generic identifier and any attributes) of an element.

3. Conformance Definition

This section is normative.

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

3.1. Document Conformance

In this document, the use of the word 'schema' refers to any definition of the syntax of XHTML 2, regardless of the definition language used.

3.1.1. Strictly Conforming Documents

A strictly conforming XHTML 2.0 document is a document that requires only the facilities described as mandatory in this specification. Such a document must meet all the following criteria:

  1. The document must conform to the constraints expressed in the schemas in Appendix B - XHTML 2.0 RELAX NG Definition, Appendix D - XHTML 2.0 Schema and Appendix F - XHTML 2.0 Document Type Definition.

  2. The local part of the root element of the document must be html.

  3. The start tag of the root element of the document must explicitly contain an xmlns declaration for the XHTML 2.0 namespace [XMLNS]. The namespace URI for XHTML 2.0 is defined to be http://www.w3.org/2002/06/xhtml2.

    The start tag must also contain an xsi:schemaLocation attribute. The schema location for XHTML 2.0 is defined to be TBD.

    An example root element might look like:

    <html xmlns="http://www.w3.org/2002/06/xhtml2" 
          xml:lang="en"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://www.w3.org/2002/06/xhtml2 TBD"
    >
    
  4. There must be a DOCTYPE declaration in the document prior to the root element. If present, the public identifier included in the DOCTYPE declaration must reference the DTD found in Appendix F using its Public Identifier. The system identifier may be modified appropriately.

    <!DOCTYPE
     html PUBLIC "-//W3C//DTD XHTML 2.0//EN"
     "TBD">
    

Here is an example of an XHTML 2.0 document.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 2.0//EN"
    "TBD">
<html xmlns="http://www.w3.org/2002/06/xhtml2" 
          xml:lang="en"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.w3.org/2002/06/xhtml2 TBD"
>
  <head>
    <title>Virtual Library</title>
  </head>
  <body>
    <p>Moved to <a href="http://vlib.org/">vlib.org</a>.</p>
  </body>
</html>

Note that in this example, the XML declaration is included. An XML declaration like the one above is not required in all XML documents. XHTML document authors are strongly encouraged to use XML declarations in all their documents. Such a declaration is required when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding was determined by a higher-level protocol.

3.2. XHTML Family User Agent Conformance

A conforming user agent must meet all of the following criteria:

  1. The user agent must parse and evaluate an XHTML 2 document for well-formedness. If the user agent claims to be a validating user agent, it must also validate documents against a referenced schema according to [XML].
  2. When the user agent claims to support facilities defined within this specification or required by this specification through normative reference, it must do so in ways consistent with the facilities' definition.
  3. When a user agent processes an XHTML 2 document as generic XML, it shall only recognize attributes of type ID (e.g., the id attribute on most XHTML 2 elements) as fragment identifiers.
  4. If a user agent encounters an element it does not recognize, it must continue to process the content of that element.
  5. If a user agent encounters an attribute it does not recognize, it must ignore the entire attribute specification (i.e., the attribute and its value).
  6. If a user agent encounters an attribute value it doesn't recognize, it must use the default attribute value.
  7. If it encounters an entity reference (other than one of the predefined entities) for which the user agent has processed no declaration (which could happen if the declaration is in the external subset which the user agent hasn't read), the entity reference should be rendered as the characters (starting with the ampersand and ending with the semi-colon) that make up the entity reference.
  8. When rendering content, user agents that encounter characters or character entity references that are recognized but not renderable should display the document in such a way that it is obvious to the user that normal rendering has not taken place.
  9. White space is handled according to the rules of [XML]. All elements preserve whitespace.

    The user agent must use the definition from CSS for processing white space characters [CSS3-TEXT].

  10. In the absence of a style-sheet, including user agents that do not process stylesheets, visual presentation should be as if the user agent used the CSS stylesheet specified in Appendix H.

4. The XHTML 2.0 Document Type

This section is normative.

The XHTML 2.0 document type is a fully functional document type with rich semantics. It is a collection of XHTML-conforming modules (most of which are defined in this specification). The Modules and their elements are listed here for information purposes, but the definitions in their base documents should be considered authoritative. In the on-line version of this document, the module names in the list below link into the definitions of the modules within the relevant version of the authoritative specification.

Document Module
body, head, html, title
Structural Module
address, blockcode, blockquote, div, h, h1, h2, h3, h4, h5, h6, p, pre, section, separator
Text Module
abbr, cite, code, dfn, em, kbd, l, quote, samp, span, strong, sub, sup, var
Hypertext Module
a
List Module
dl, dt, dd, label, nl, ol, ul, li
Core Attributes Module
class, id, and title attributes
Hypertext Attributes Module
href, hreftype, cite, target, rel, rev, access, nextfocus, prevfocus, and xml:base attributes
Internationalization Attribute Module
xml:lang attribute
Bi-directional Text Module
dir attribute
Edit Attributes Module
edit, and datetime attributes
Embedding Attributes Module
src, and type attributes
Image Map Attributes Module
usemap, ismap, shape, and coords attributes
Metainformation Attributes Module
about, content, datatype, property, rel, resource, restype, and rev attributes
Metainformation Module
meta, link
Object Module
object, param, standby
Scripting Module
noscript, script
Style Attribute Module
style attribute
Stylesheet Module
style element
Tables Module
caption, col, colgroup, summary, table, tbody, td, tfoot, th, thead, tr

XHTML 2.0 also uses the following externally defined modules:

Ruby Annotation Module [RUBY]
ruby, rbc, rtc, rb, rt, rp
XForms Module [XFORMS]
action, alert, bind, case, choices, copy, delete, dispatch, extension, filename, group, help, hint, input, insert, instance, item, itemset, label, load, mediatype, message, model, output, range, rebuild, recalculate, refresh, repeat, reset, revalidate, secret, select, select1, send, setfocus, setindex, setvalue, submission, submit, switch, textarea, toggle, trigger, upload, and value elements, and repeat-model, repeat-bind, repeat-nodeset, repeat-startindex, and repeat-number attributes
XML Events Module [XMLEVENTS]
listener element, and defaultAction, event, handler, objserver, phase, propagate, and target attributes in the [XMLEVENTS] namespace

An implementation of this document type as a RELAX NG grammar is defined in Appendix B, as an XML Schema in Appendix D, and as a DTD in Appendix F.

4.1. Issues

Identifying XHTML version in ansence of DTDs PR #7336
State: Suspended
Resolution: Defer
User: None

Notes:
BAE F2F: for the present DTD's are required for entity resolution. This is a tricky issue, and the working group needs to resolve it quickly. We are asking for input from the Hypertext Coordination Group and others in our quest to sort it out.

5. Module Definition Conventions

This section is normative.

This document defines a variety of XHTML modules and the semantics of those modules. This section describes the conventions used in those module definitions.

5.1. Module Structure

Each module in this document is structured in the following way:

  • An abstract definition of the module's elements, attributes, and content models, as appropriate.
  • A sub-section for each element in the module; These sub-sections contain the following components:
    • A brief description of the element,
    • A definition of each attribute or attribute collection usable with the element, and
    • A detailed description of the behavior of the element, if appropriate.

    Note that attributes are fully defined only the first time they are used in each module. After that, only a brief description of the attribute is provided, along with a link back to the primary definition.

5.2. Abstract Module Definitions

An abstract module is a definition of an XHTML module using prose text and some informal markup conventions. While such a definition is not generally useful in the machine processing of document types, it is critical in helping people understand what is contained in a module. This section defines the way in which XHTML abstract modules are defined. An XHTML-conforming module is not required to provide an abstract module definition. However, anyone developing an XHTML module is encouraged to provide an abstraction to ease in the use of that module.

5.3. Syntactic Conventions

The abstract modules are not defined in a formal grammar. However, the definitions do adhere to the following syntactic conventions. These conventions are similar to those of XML DTDs, and should be familiar to XML DTD authors. Each discrete syntactic element can be combined with others to make more complex expressions that conform to the algebra defined here.

element name
When an element is included in a content model, its explicit name will be listed.
content set
Some modules define lists of explicit element names called content sets. When a content set is included in a content model, its name will be listed.
expr ?
Zero or one instances of expr are permitted.
expr +
One or more instances of expr are required.
expr *
Zero or more instances of expr are permitted.
a , b
Expression a is required, followed by expression b.
a | b
Either expression a or expression b is required.
a - b
Expression a is permitted, omitting elements in expression b.
parentheses
When an expression is contained within parentheses, evaluation of any subexpressions within the parentheses take place before evaluation of expressions outside of the parentheses (starting at the deepest level of nesting first).
extending pre-defined elements
In some instances, a module adds attributes to an element. In these instances, the element name is followed by an ampersand (&).
defining required attributes
When an element requires the definition of an attribute, that attribute name is followed by an asterisk (*).
defining the type of attribute values
When a module defines the type of an attribute value, it does so by listing the type in parentheses after the attribute name.
defining the legal values of attributes
When a module defines the legal values for an attribute, it does so by listing the explicit legal values (enclosed in quotation marks), separated by vertical bars (|), inside of parentheses following the attribute name. If the attribute has a default value, that value is followed by an asterisk (*). If the attribute has a fixed value, the attribute name is followed by an equals sign (=) and the fixed value enclosed in quotation marks.

5.4. Content Types

Abstract module definitions define minimal, atomic content models for each module. These minimal content models reference the elements in the module itself. They may also reference elements in other modules upon which the abstract module depends. Finally, the content model in many cases requires that text be permitted as content to one or more elements. In these cases, the symbol used for text is PCDATA. This is a term, defined in the XML 1.0 Recommendation, that refers to processed character data. A content type can also be defined as EMPTY, meaning the element has no content in its minimal content model.

5.5. Attribute Types

In some instances, it is necessary to define the types of attribute values or the explicit set of permitted values for attributes. The following attribute types (defined in the XML 1.0 Recommendation) are used in the definitions of the abstract modules:

Attribute Type Definition
CDATA Character data
ID A document-unique identifier
IDREF A reference to a document-unique identifier
IDREFS A space-separated list of references to document-unique identifiers
NMTOKEN A name composed of only name tokens as defined in XML 1.0 [XML].
NMTOKENS One or more white space separated NMTOKEN values
NUMBER Sequence consisting only digits ([0-9])
PCDATA Processed character data

In addition to these pre-defined data types, XHTML Modularization defines the following data types and their semantics (as appropriate):

Data type Description
Charset A character encoding, as per [RFC2045].
ContentTypes A list of media ranges with optional accept parameters, as defined in section 14.1 of [RFC2616] as the field value of the accept request header.
Coordinates Comma separated list of Lengths used in defining areas.
Datetime Date and time information, as defined by the type dateTime in [XMLSCHEMA].
HrefTarget Name used as destination for results of certain actions, with legal values as defined by NMTOKEN.
LanguageCode A language code, as per [RFC3066].
LanguageCodes A comma-separated list of language ranges with optional accept parameters, as defined in section 14.5 of [RFC2616] as the field value of the Accept-Language request header.
Length The value may be either in pixels or a percentage of the available horizontal or vertical space. Thus, the value "50%" means half of the available space.
LocationPath A location path as defined in [XPATH].
MediaDesc

A comma-separated list of media descriptors as described by [CSS2]. The default is all.

Number One or more digits
QName An [XMLNS]-qualified name. See QName for a formal definition.
Text Arbitrary textual data, likely meant to be human-readable.
URI A Uniform Resource Identifier Reference, as defined by the type anyURI in [XMLSCHEMA].
URIs A space-separated list of URIs as defined above.

6. XHTML Attribute Collections

This section is normative.

Many of the modules in this document define the required attributes for their elements. The elements in those modules may also reference zero or more attribute collections. Attribute collections are defined in their own modules, but the meta collection "Common" is defined in this section. The table below summarizes the attribute collections available.

Collection Module Description
Core Core Attributes Module Basic attributes used to identify and classify elements and their content.
I18N Internationalization Attribute Module Attribute to identify the language of an elements and its contents.
Bi-directional Bi-directional Text Collection Attributes used to manage bi-directional text.
Edit Edit Attributes Module Attributes used to annotate when and how an element's content was edited.
Embedding Embedding Attributes Module Attributes used to embed content from other resources within the current element.
Events XML Events Module Attributes that allow associating of events and event processing with an element and its contents.
Hypertext Hyptertext Attributes Module Attributes that designate characteristics of links within and among documents.
Map Image Map Attributes Module Attributes for defining and referencing client-side image maps.
Metainformation Metainformation Attributes Attributes that allow associating of elements with metainformation about those elements
Style Style Attribute Module An attribute for associating style information with an element and its contents.
Common Attribute Collections Module A meta-collection of all the other collections, including the Core, I18N, Events, Edit, Embedding, Map, Metainformation, Style, Bi-directional, and Hypertext attribute collections.

Implementation: RELAX NG

Each of the attributes defined in an XHTML attribute collection is available when its corresponding module is included in an XHTML Host Language or an XHTML Integration Set (see [XHTMLMOD]). In such a situation, the attributes are available for use on elements that are NOT in the XHTML namespace when they are referenced using their namespace-qualified identifier (e.g., xhtml:id). The semantics of the attributes remain the same regardless of whether they are referenced using their qualified identifier or not. If both the qualified and non-qualified identifier for an attribute are used on the same XHTML namespace element, the behavior is unspecified.

6.1. Issues

xml:id PR #7442
State: Suspended
Resolution: Defer
User: None

Notes:
If xml:id becomes stable document in time for use in this document, we will migrate to its use.

7. XHTML Document Module

This section is normative.

The Document Module defines the major structural elements for XHTML. These elements effectively act as the basis for the content model of many XHTML family document types. The elements and attributes included in this module are:

Elements Attributes Minimal Content Model
html Common, version (CDATA), xmlns (URI = "http://www.w3.org/2002/06/xhtml2"), xsi:schemaLocation (URIs = "http://www.w3.org/2002/06/xhtml2 TBD") head, body
head Common title
title Common (PCDATA | Text)*
body Common ( Heading | Structural | List)*

This module is the basic structural definition for XHTML content. The html element acts as the root element for all XHTML Family Document Types.

Note that the value of the xmlns declaration is defined to be "http://www.w3.org/2002/06/xhtml2". Also note that because the xmlns declaration is treated specially by XML namespace-aware parsers [XMLNS], it is legal to have it present as an attribute of each element. However, any time the xmlns declaration is used in the context of an XHTML module, whether with a prefix or not, the value of the declaration shall be the XHTML namespace defined here.

Implementation: RELAX NG

7.1. The html element

After the document type declaration, the remainder of an XHTML document is contained by the html element.

Attributes

The Common collection
A collection of other attribute collections, including: Core, Events, I18N, Bi-directional, Edit, Embedding, Map, Metainformation, and Hypertext
version = CDATA
The value of this attribute specifies which XHTML Family document type governs the current document. The format of this attribute value is unspecified.

Need a normative definition for the version attribute

The version attribute needs a machine processable format so that document processors can reliably determine that the document is an XHTML Family conforming document.
xsi:schemaLocation = URIs
This attribute allows the specification of a location where an XML Schema [XMLSCHEMA] for the document can be found. The syntax of this attribute is defined in xsi_schemaLocation. The behavior of this attribute in XHTML documents is defined in Strictly Conforming Documents.

7.2. The head element

The head element contains information about the current document, such as its title, that is not considered document content. The default presentation of the head is not to display it; however that can be overridden with a stylesheet for special purpose use. User agents may however make information in the head available to users through other mechanisms.

Attributes

The Common collection
A collection of other attribute collections, including: Core, Events, I18N, Bi-directional, Edit, Embedding, Map, Metainformation, and Hypertext
Example:
<head>
    <title>My Life</title>
</head>

7.3. The title element

Every XHTML document must have a title element in the head section.

Attributes

The Common collection
A collection of other attribute collections, including: Core, Events, I18N, Bi-directional, Edit, Embedding, Map, Metainformation, and Hypertext

The title element is used to identify the document. Since documents are often consulted out of context, authors should provide context-rich titles. Thus, instead of a title such as "Introduction", which doesn't provide much contextual background, authors should supply a title such as "Introduction to Medieval Bee-Keeping" instead.

For reasons of accessibility, user agents must always make the content of the title element available to users. The mechanism for doing so depends on the user agent (e.g., as a caption, spoken).

Example:
<title>A study of population dynamics</title>

7.4. The body element

The body of a document contains the document's content. The content may be processed by a user agent in a variety of ways. For example by visual browsers it can be presented as text, images, colors, graphics, etc., an audio user agent may speak the same content, and a search engine may create an index prioritized according to where text appears.

Attributes

The Common collection
A collection of other attribute collections, including: Core, Events, I18N, Bi-directional, Edit, Embedding, Map, Metainformation, and Hypertext
Example:
<body id="theBody">
    <p>A paragraph</p>
</body>

8. XHTML Structural Module

This section is normative.

This module defines all of the basic text container elements, attributes, and their content models that are structural in nature.

Element Attributes Minimal Content Model
address Common (PCDATA | Text)*
blockcode Common (PCDATA | Text | Heading | Structural | List)*
blockquote Common (PCDATA | Text | Heading | Structural | List)*
div Common (PCDATA | Flow)*
h Common (PCDATA | Text)*
h1 Common (PCDATA | Text)*
h2 Common (PCDATA | Text)*
h3 Common (PCDATA | Text)*
h4 Common (PCDATA | Text)*
h5 Common (PCDATA | Text)*
h6 Common (PCDATA | Text)*
p Common (PCDATA | Text | List | blockcode | blockquote | pre )*
pre Common (PCDATA | Text)*
section Common (PCDATA | Flow)*
separator Common EMPTY

The content model for this module defines some content sets:

Heading
h | h1 | h2 | h3 | h4 | h5 | h6
Structural
address | blockcode | blockquote | div | p | pre | section | separator |
Flow
Heading | Structural | Text

Implementation: RELAX NG

8.1. The address element

The address element may be used by authors to supply contact information for a document or a major part of a document such as a form. This element often appears at the beginning or end of a document.

Attributes

The Common collection
A collection of other attribute collections, including: Core, Events, I18N, Bi-directional, Edit, Embedding, Map, Metainformation, and Hypertext
Example:
<address href="mailto:webmaster@example.net">Webmaster</address>

8.2. The blockcode element

This element indicates that its contents are a block of "code" (see the code element). This element is similar to the pre element, in that whitespace in the enclosed text has semantic relevance. The whitespace should normally be included in visual renderings of the content.

Non-visual user agents are not required to respect extra white space in the content of a blockcode element.

Attributes

The Common collection
A collection of other attribute collections, including: Core, Events, I18N, Bi-directional, Edit, Embedding, Map, Metainformation, and Hypertext

The following example shows a code fragment:

<blockcode class="Perl">
sub squareFn {
    my $var = shift;

    return $var * $var ;
}
</blockcode>

Here is how this might be rendered:

sub squareFn {
    my $var = shift;

    return $var * $var ;
}

8.3. The blockquote element

This element designates a block of quoted text.

Attributes

The Common collection
A collection of other attribute collections, including: Core, Events, I18N, Bi-directional, Edit, Embedding, Map, Metainformation, and Hypertext

This example formats an excerpt from "The Two Towers", by J.R.R. Tolkien, as a blockquote.

<blockquote cite="http://www.example.com/tolkien/twotowers.html">
<p>They went in single file, running like hounds on a strong scent,
and an eager light was in their eyes. Nearly due west the broad
swath of the marching Orcs tramped its ugly slot; the sweet grass
of Rohan had been bruised and blackened as they passed.</p>
</blockquote>

8.4. The div element

The div element, in conjunction with the id and class attributes, offers a generic mechanism for adding extra structure to documents. This element defines no presentational idioms on the content. Thus, authors may use this element in conjunction with style sheets, the xml:lang attribute, etc., to tailor XHTML to their own needs and tastes.

Attributes

The Common collection
A collection of other attribute collections, including: Core, Events, I18N, Bi-directional, Edit, Embedding, Map, Metainformation, and Hypertext

For example, suppose you wish to make a presentation in XHTML, where each slide is enclosed in a separate element. You could use a div element, with a class of slide:

<body>
    <h>The meaning of life</h>
    <p>By Huntington B. Snark</p>
    <div class="slide">
        <h>What do I mean by "life"</h>
        <p>....</p>
    </div>
    <div class="slide">
        <h>What do I mean by "mean"?</h>
        ...
    </div>
    ...
</body>

8.5. The heading elements

A heading element briefly describes the topic of the section it introduces. Heading information may be used by user agents, for example, to construct a table of contents for a document automatically.

Attributes

The Common collection
A collection of other attribute collections, including: Core, Events, I18N, Bi-directional, Edit, Embedding, Map, Metainformation, and Hypertext

There are two styles of headings in XHTML: the numbered versions h1, h2 etc., and the structured version h, which is used in combination with the section element.

There are six levels of numbered headings in XHTML with h1 as the most important and h6 as the least. The visual presentation of headers can render more important headings in larger fonts than less important ones.

Structured headings use the single h element, in combination with the section element to indicate the structure of the document, and the nesting of the sections indicates the importance of the heading. The heading for the section is the one that is a child of the section element.

For example:

<body>
<h>This is a top level heading</h>
<p>....</p>
<section>
    <p>....</p>
    <h>This is a second-level heading</h>
    <p>....</p>
    <h>This is another second-level heading</h>
    <p>....</p>
</section>
<section>
    <p>....</p>
    <h>This is another second-level heading</h>
    <p>....</p>
    <section>
        <h>This is a third-level heading</h>
        <p>....</p>
    </section>
</section>

These visual representation of these levels can be distinguished in a style sheet:

h {font-family: sans-serif; font-weight: bold; font-size: 200%}
section h {font-size: 150%} /* A second-level heading */
section section h {font-size: 120%} /* A third-level heading */

etc.

Numbered sections and references
XHTML does not itself cause section numbers to be generated from headings. Style sheet languages such as CSS however allow authors to control the generation of section numbers.

The practice of skipping heading levels is considered to be bad practice. The series h1 h2 h1 is acceptable, while h1 h3 h1 is not, since the heading level h2 has been skipped.

8.6. The p element

The p element represents a paragraph.

In comparison with earlier versions of HTML, where a paragraph could only contain inline text, XHTML2's paragraphs represent the conceptual idea of a paragraph, and so may contain lists, blockquotes, pre's and tables as well as inline text. They may not, however, contain directly nested p elements.

Attributes

The Common collection
A collection of other attribute collections, including: Core, Events, I18N, Bi-directional, Edit, Embedding, Map, Metainformation, and Hypertext
Example:
<p>Payment options include:
<ul>
<li>cash</li>
<li>credit card</li>
<li>luncheon vouchers.</li>
</ul>
</p>

8.7. The pre element

The pre element indicates that whitespace in the enclosed text has semantic relevance, and will normally be included in visual renderings of the content.

Note that all elements in the XHTML family preserve their whitespace in the document, which is only removed on rendering, via a stylesheet, according to the rules of CSS [CSS]. This means that in principle any elements may preserve or collapse whitespace on rendering, under control of a stylesheet. Also note that there is no normative requirement that the <pre> element be rendered in a monospace font (although this is the default rendering), nor that text wrapping be disabled.

Non-visual user agents are not required to respect extra white space in the content of a pre element.

Attributes

The Common collection
A collection of other attribute collections, including: Core, Events, I18N, Bi-directional, Edit, Embedding,