Compatibility issues
The first version of the BNC was released slightly in advance of the publication of the TEI's definitive Recommendations, and over a year before publication of the Corpus Encoding Standard. Although all three standards have much in common (in particular, CDIF - Corpus Document Interchange Format - the BNC's own initial DTD, was influential in the design of the other two), they are not identical. Several elements are named differently, and some, more significantly, have different content models or attributes.
In the present release of the Corpus, considerable effort has been made to improve compatibility of the BNC DTD with TEI and with CES, while retaining as far as possible a degree of compatibility with CDIF. The objective was to ensure that a document which conformed to the BNC's DTD would also conform to either of the other two standards, rather than to ensure that any CES or TEI conformant document would also be BNC conformant. This necessarily involved some modification of the original tagging of the corpus, which is detailed in this section.
Differences between the BNC DTD and TEI
The present version of the BNC document type declaration (DTD) can be expressed as a set of extensions against the standard TEI dtd, using the extension mechanism recommended by that standard. Full details of the procedure are given in chapter 3 of the TEI Guidelines. Essentially, the procedure requires the definition of two ‘extension’ files, called here bncMods.ent and bncMods.dtd, the former containing definitions of parameter entities needed for this set of extensions, and the latter containing the actual SGML element and attribute definitions which make up the required modifications. Copies of these files are included in the present release, along with the DTD derived from them. The present section describes their content informally.
This file can be compiled to form a one file version of the DTD, in which all parameter references have been resolved, and any redundant declarations removed, using software such as the TEI PizzaChef.
- to exclude from the DTD a large number of standard TEI elements which are not actually used in the BNC DTD;
- to provide alternative names for some standard TEI elements;
- to exclude from the TEI DTD some elements which are redefined, either with a stricter content model, or with differing attribute lists, in the BNC DTD;
- to specify the location within the TEI class system of some elements not defined in the TEI DTD.
The following is a complete list of standard TEI elements excluded in this way: <ab>, <abbr>, <add>, <affiliation>, <alt>, <altG1rp>, <anchor>, <argument>, <authority>, <back>, <biblFull>, <birth>, <broadcast>, <byline>, <cb>, <channel>, <cit>, <cl>, <constitution>, <correction>, <dateline>, <dateRange>, <del>, <derivation>, <distinct>, <div0>, <div5>, <div6>, <div7>, <divGen>, <docAuthor>, <docDate>, <docEdition>, <docImprint>, <docTitle>, <domain>, <education>, <emph>, <epigraph>, <equipment>, <expan>, <factuality>, <firstLang>, <foreign>, <front>, <fsdDecl>, <funder>, <gloss>, <group>, <headLabel>, <headItem>, <hyphenation>, <index>, <interp>, <interpGrp>, <interpretation>, <join>, <joinGrp>, <kinesic>, <link>, <linkGrp>, <m>, <measure>, <meeting>, <metDecl>, <mentioned>, <milestone>, <normalization>, <notesStmt>, <num>, <opener>, <orig>, <personGrp>, <phr>, <postBox>, <postCode>, <preparedness>, <principal>, <purpose>, <q>, <quotation>, <rendition>, <residence>, <rs>, <reg>, <scriptStmt>, <seg>, <segmentation>, <series>, <seriesStmt>, <signed>, <soCalled>, <socecStatus>, <span>, <spanGrp>, <sponsor>, <state>, <stdVals>, <step>, <street>, <symbol>, <textDesc>, <time>, <timeRange>, <titlePage>, <titlePart>, <trailer>, <variantEncoding>, <when>, <writing>, <xptr>, <xref>.
IGNORE
declarations like those above, which have the
effect of removing the existing definitions for 22 TEI elements which
are to be redefined. The redefinitions are provided in the second of
the two BNC extension files, bncExtn.dtd, along with
definitions for some new elements not otherwise available. Their effects are
summarized in the following table.
Detailed discussion of the extension mechanism and general conformance issues relating to the use of the TEI is given in chapters 28 and 29 of the TEI Guidelines and is not further discussed here. For an explanation of the mechanisms used above, the detailed presentation of the general organization of the TEI DTD provided in chapter 3 of the Guidelines may also be helpful.
Differences between the BNC DTD and CDIF
This section lists significant differences between the current BNC DTD and CDIF 1.0. It lists elements whose names have been changed, elements whose attributes have changed, and elements whose content has been changed in such a way that CDIF-conformant files will not parse against the new DTD.
- <avail> is now <availability>
- <biblscop> is now <biblScope>
- <biblstr> is now <biblStruct>
- <bibnote> is now <note>
- <clasdecl> is now <classDecl>
- <corr> is now an <item> within <encodingDesc>
- <editdecl> is now <editorialDecl>
- <ednstmt> is now <editionStmt>
- <encdesc> is now <encodingDesc>
- <header> is now <teiHeader>
- <hyph> is now an <item> within <encodingDesc>
- <partics> is now <particDesc>
- <profdesc> is now <profileDesc>
- <projdesc> is now <projectDesc>
- <pubstmt> is now <publicationStmt>
- <quot> is now an <item> within <encodingDesc>
- <rec> is now <recording>
- <recstmt> is now <recordingStmt>
- <reg> is now <corr>
- <relation> is now <particLinks>
- <revdesc> is now <revisionDesc>
- <segm> is now an <item> within <encodingDesc>
- <settdesc> is now <settingDesc>
- <srcdesc> is now <sourceDesc>
- <titstmt> is now <titleStmt>
- <txtclass> is now <textClass>
The following elements have significantly different attributes
- <activity> has acquired the attribute spont, formerly present on its parent <setting>
- <sic>, <gap>, and <corr> have all acquired attributes resp (rather than ed)
- <corr> and <sic> no longer have a cause attribute
- <gap> has acquired the attribute reason (in place of cause)
- the complete attribute has been removed from <text> and <stext>
- <w> has a different set of values for its type attribute
Up: Contents