The structure of the XML edition of the British National Corpus is described by means of a single XML schema, which is however expressed in three different schema languages: the traditional DTD language which XML inherits from SGML; the more recently defined ISO schema language known as RELAXNG; and the W3C defined schema language. The three schema files are all generated from the same TEI-conformant XML source file, which is also used to generate the present documentation.
This section of the document contains the TEI-conformant reference specification for all components of the BNC schema. These include definitions for attribute classses, model classes, and macro patterns as well as definitions for elements and their associated attributes and possible value lists. A full description of these concepts and how they are used to define and document XML encoding schemes is given by the TEI Guidelines (in particular, in chapter TD); the following summary provides only basic information about them.
When several elements in a schema share attributes of the same name, with
values drawn from a common set, they are considered to form an
attribute class. The members of such a class can then
all reference the same class definition rather then each repeat the
same information. In the BNC, for example, the elements
and half a dozen others, all have the same attribute
rend which takes a coded value taken from the same short list of
possibilities. Rather than repeat this definition half a dozen times
therefore, the relevant elements are all said to be members of a class
att.rendered, which is defined independently of those
elements (but includes a list of its members). In the same way,
<mw> elements, as members of the
att.c5coded class, share the same definition for the
possible CLAWS5 codes specified by their c5 attribute. Note
however that the element
<c>, although it has an attribute
c5, is not a member of this class because the possible
values for this attribute on this element are entirely different.
In any reasonably large schema, and particularly one derived from
the TEI model, several elements are likely to have very similar
content models, since it will often be the case that at a given point
in the document hierarchy any one of several possible elements will be
permissible. The specific subset of elements (
<c> and a few others) which may appear within an
<s> element in the BNC, is different from the subsets of elements
which may appear within a
However, there are several elements which can appear in the same
places as a
<p>. Following TEI practice, we call
the set of elements which can appear together (in sequence or
alternation) at a specific place in the document hierarchy a
model class. For example, since
<sp> are all permitted as immediate components of a
<div> elements, we define a class model.divPart, of which these
six elements are all members. Wherever convenient, content models are
defined in terms of these model classes.
As noted above, this usage of model classes is a distinctive and pervasive feature of the TEI encoding scheme. Because the BNC derives from the TEI scheme, it uses the same names and (as far as is practicable) the same model classes throughout. Although this introduces an occasionally redundant degree of indirection in the resulting schema, it also makes clearer the relationship between the components defined for the BNC and their origins in the TEI scheme.
Finally, we define here a few macros for commonly encountered content models. These are also taken from the TEI encoding scheme, though in a few cases with different meanings. In the TEI for example, the macro macro.phraseSeq is defined as a mixture of various ‘phrase level’ elements and plain text; in the BNC scheme, it has been redefined as plain text only. The places where this macro is referenced however are unchanged; in this respect therefore, the BNC schema is a proper subset of the full BNC schema.
The remainder of this section lists in alphabetical order all of the attribute classes, model classes, elements, and macros defined for the BNC encoding scheme, using a similar method of display as the full TEI Guidelines. For each component, we give a brief description and also a usage example. Note that many of the elements listed here appear only in the corpus header rather than in the texts, and may thus be safely disregarded by applications which operate on the texts alone or in isolation.