Formal Specification of the BNC XML schema
The structure of the XML edition of the British National Corpus is described by means of a single XML schema, which is however expressed in three different schema languages: the traditional DTD language which XML inherits from SGML; the more recently defined ISO schema language known as RELAXNG; and the W3C defined schema language. The three schema files are all generated from the same TEI-conformant XML source file, which is also used to generate the present documentation.
This section of the document contains the TEI-conformant reference specification for all components of the BNC schema. These include definitions for attribute classses, model classes, and macro patterns as well as definitions for elements and their associated attributes and possible value lists. A full description of these concepts and how they are used to define and document XML encoding schemes is given by the TEI Guidelines (in particular, in chapter TD); the following summary provides only basic information about them.
When several elements in a schema share attributes of the same name, with values drawn from a common set, they are considered to form an attribute class. The members of such a class can then all reference the same class definition rather then each repeat the same information. In the BNC, for example, the elements <bibl>,<corr>, <div>, <head>, <hi>, and half a dozen others, all have the same attribute rend which takes a coded value taken from the same short list of possibilities. Rather than repeat this definition half a dozen times therefore, the relevant elements are all said to be members of a class att.rendered, which is defined independently of those elements (but includes a list of its members). In the same way, the <w> and <mw> elements, as members of the att.c5coded class, share the same definition for the possible CLAWS5 codes specified by their c5 attribute. Note however that the element <c>, although it has an attribute c5, is not a member of this class because the possible values for this attribute on this element are entirely different.
In any reasonably large schema, and particularly one derived from the TEI model, several elements are likely to have very similar content models, since it will often be the case that at a given point in the document hierarchy any one of several possible elements will be permissible. The specific subset of elements (<w>, <mw>, <c> and a few others) which may appear within an <s> element in the BNC, is different from the subsets of elements which may appear within a <p> or <div> element. However, there are several elements which can appear in the same places as a <p>. Following TEI practice, we call the set of elements which can appear together (in sequence or alternation) at a specific place in the document hierarchy a model class. For example, since <l>, <lg>, <list>, <p>, <quote>, and <sp> are all permitted as immediate components of a <div> elements, we define a class model.divPart, of which these six elements are all members. Wherever convenient, content models are defined in terms of these model classes.
As noted above, this usage of model classes is a distinctive and pervasive feature of the TEI encoding scheme. Because the BNC derives from the TEI scheme, it uses the same names and (as far as is practicable) the same model classes throughout. Although this introduces an occasionally redundant degree of indirection in the resulting schema, it also makes clearer the relationship between the components defined for the BNC and their origins in the TEI scheme.
Finally, we define here a few macros for commonly encountered content models. These are also taken from the TEI encoding scheme, though in a few cases with different meanings. In the TEI for example, the macro macro.phraseSeq is defined as a mixture of various ‘phrase level’ elements and plain text; in the BNC scheme, it has been redefined as plain text only. The places where this macro is referenced however are unchanged; in this respect therefore, the BNC schema is a proper subset of the full BNC schema.
The remainder of this section lists in alphabetical order all of the attribute classes, model classes, elements, and macros defined for the BNC encoding scheme, using a similar method of display as the full TEI Guidelines. For each component, we give a brief description and also a usage example. Note that many of the elements listed here appear only in the corpus header rather than in the texts, and may thus be safely disregarded by applications which operate on the texts alone or in isolation.
Classes defined
Class att.ascribed
provides attributes for elements representing speech or action that can be ascribed to a specific individual.
Class att.authorialIntervention
provides attributes describing the nature of an authorial intervention.
hand
- signifies the hand of the agent which made the addition or performed the deletion.
status
- may be used to indicate faulty deletions, e.g. strikeouts
which include too much or too little text, or erroneous
additions, e.g., an insertion which duplicates some of the text
already present. Sample values include:
- duplicate
- (all of the text indicated as an addition duplicates some text that is in the original, whether the duplication is word-for-word or less exact.)
- duplicate-partial
- (part of the text indicated as an addition duplicates some text that is in the original)
- excessStart
- (some text at the beginning of the deletion is marked as deleted even though it clearly should not be deleted.)
- excessEnd
- (some text at the end of the deletion is marked as deleted even though it clearly should not be deleted.)
- shortStart
- (some text at the beginning of the deletion is not marked as deleted even though it clearly should be.)
- shortEnd
- (some text at the end of the deletion is not marked as deleted even though it clearly should be.)
- unremarkable
- (the deletion is not faulty.)
type
- classifies the type of addition or deletion using any convenient typology.
Class att.c5coded
elements which carry a CLAWS 5 Part of speech code
c5
- supplies the CLAWS 5 code associated with this word. Legal values are:
- AJ0
- Adjective (general or positive) (e.g. good, old, beautiful)
- AJC
- Comparative adjective (e.g. better, older)
- AJS
- Superlative adjective (e.g. best, oldest)
- AT0
- Article (e.g. the, a, an, no)
- AV0
- General adverb: an adverb not subclassified as AVP or AVQ (see below) (e.g. often, well, longer (adv.), furthest.
- AVP
- Adverb particle (e.g. up, off, out)
- AVQ
- Wh-adverb (e.g. when, where, how, why, wherever)
- CJC
- Coordinating conjunction (e.g. and, or, but)
- CJS
- Subordinating conjunction (e.g. although, when)
- CJT
- The subordinating conjunction that
- CRD
- Cardinal number (e.g. one, 3, fifty-five, 3609)
- DPS
- Possessive determiner-pronoun (e.g. your, their, his)
- DT0
- General determiner-pronoun: i.e. a determiner-pronoun which is not a DTQ or an AT0.
- DTQ
- Wh-determiner-pronoun (e.g. which, what, whose, whichever)
- EX0
- Existential there, i.e. there occurring in the there is ... or there are ... construction
- ITJ
- Interjection or other isolate (e.g. oh, yes, mhm, wow)
- NN0
- Common noun, neutral for number (e.g. aircraft, data, committee)
- NN1
- Singular common noun (e.g. pencil, goose, time, revelation)
- NN2
- Plural common noun (e.g. pencils, geese, times, revelations)
- NP0
- Proper noun (e.g. London, Michael, Mars, IBM)
- ORD
- Ordinal numeral (e.g. first, sixth, 77th, last) .
- PNI
- Indefinite pronoun (e.g. none, everything, one [as pronoun], nobody)
- PNP
- Personal pronoun (e.g. I, you, them, ours)
- PNQ
- Wh-pronoun (e.g. who, whoever, whom)
- PNX
- Reflexive pronoun (e.g. myself, yourself, itself, ourselves)
- POS
- The possessive or genitive marker 's or '
- PRF
- The preposition of
- PRP
- Preposition (except for of) (e.g. about, at, in, on, on behalf of, with)
- TO0
- Infinitive marker to
- UNC
- Unclassified items which are not appropriately considered as items of the English lexicon.
- VBB
- The present tense forms of the verb BE, except for is, 's: i.e. am, are, 'm, 're and be [subjunctive or imperative]
- VBD
- The past tense forms of the verb BE: was and were
- VBG
- The -ing form of the verb BE: being
- VBI
- The infinitive form of the verb BE: be
- VBN
- The past participle form of the verb BE: been
- VBZ
- The -s form of the verb BE: is, 's
- VDB
- The finite base form of the verb BE: do
- VDD
- The past tense form of the verb DO: did
- VDG
- The -ing form of the verb DO: doing
- VDI
- The infinitive form of the verb DO: do
- VDN
- The past participle form of the verb DO: done
- VDZ
- The -s form of the verb DO: does, 's
- VHB
- The finite base form of the verb HAVE: have, 've
- VHD
- The past tense form of the verb HAVE: had, 'd
- VHG
- The -ing form of the verb HAVE: having
- VHI
- The infinitive form of the verb HAVE: have
- VHN
- The past participle form of the verb HAVE: had
- VHZ
- The -s form of the verb HAVE: has, 's
- VM0
- Modal auxiliary verb (e.g. will, would, can, could, 'll, 'd)
- VVB
- The finite base form of lexical verbs (e.g. forget, send, live, return) [Including the imperative and present subjunctive]
- VVD
- The past tense form of lexical verbs (e.g. forgot, sent, lived, returned)
- VVG
- The -ing form of lexical verbs (e.g. forgetting, sending, living, returning)
- VVI
- The infinitive form of lexical verbs (e.g. forget, send, live, return)
- VVN
- The past participle form of lexical verbs (e.g. forgotten, sent, lived, returned)
- VVZ
- The -s form of lexical verbs (e.g. forgets, sends, lives, returns)
- XX0
- The negative particle not or n't
- ZZ0
- Alphabetical symbols (e.g. A, a, B, b, c, d)
- AJ0-AV0
- Probably AJ0 (adjective), but maybe AV0 (adverb)
- AJ0-NN1
- Probably AJ0 (adjective), but maybe NN1 (singular noun)
- AJ0-VVD
- Probably AJ0 (adjective), but maybe VVD (verb past tense)
- AJ0-VVG
- Probably AJ0 (adjective), but maybe VVG (-ing verb)
- AJ0-VVN
- Probably AJ0 (adjective), but maybe VVN (verb past participle)
- AV0-AJ0
- Probably AV0 (adverb), but maybe AJ0 (adjective)
- AVP-PRP
- Probably AVP (adverb particle), but maybe PRP (preposition)
- AVQ-CJS
- Probably AVQ (wh- adverb), but maybe CJS (subordinating conjunction)
- CJS-AVQ
- Probably CJS (subordinating conjunction), but maybe AVQ (wh- adverb)
- CJS-PRP
- Probably CJS (subordinating conjunction), but maybe PRP (preposition)
- CJT-DT0
- Probably CJT ("that" as conjunction), but maybe DT0 (determiner)
- CRD-PNI
- Probably CRD (number), but maybe PNI (indefinite pronoun)
- DT0-CJT
- Probably DT0 (determiner), but maybe CJT ("that" as conjunction)
- NN1-AJ0
- Probably NN1 (singular noun), but maybe AJ0 (adjective)
- NN1-NP0
- Probably NN1 (singular noun), but maybe NP0 (proper noun)
- NN1-VVB
- Probably NN1 (singular noun), but maybe VVB (verb)
- NN1-VVG
- Probably NN1 (singular noun), but maybe VVG (-ing verb)
- NN2-VVZ
- Probably NN2 (plural noun), but maybe VVZ (-s verb)
- NP0-NN1
- Probably NP0 (proper noun), but maybe NN1 (singular noun)
- PNI-CRD
- Probably PNI (indefinite pronoun), but maybe CRD (number)
- PRP-AVP
- Probably PRP (preposition), but maybe AVP (adverb particle)
- PRP-CJS
- Probably PRP (preposition), but maybe CJS (subordinating conjunction)
- VVB-NN1
- Probably VVB (verb), but maybe NN1 (singular noun)
- VVD-AJ0
- Probably VVD (verb past tense), but maybe AJ0 (adjective)
- VVD-VVN
- Probably VVD (verb past tense), but maybe VVN (verb past participle)
- VVG-AJ0
- Probably VVG (-ing verb), but maybe AJ0 (adjective)
- VVG-NN1
- Probably VVG (-ing verb), but maybe NN1 (singular noun)
- VVN-AJ0
- Probably VVN (verb past participle), but maybe AJ0
- VVN-VVD
- Probably VVN (verb past participle), but maybe VVD (verb past tense)
- VVZ-NN2
- Probably VVZ (-s verb), but maybe NN2 (plural noun)
Class att.datePart
(attributes for temporal expression) attributes for component elements of temporal expressions involving dates and time
value
- supplies the value of a date or time in a standard form.
- Example
Examples of W3C date, time, and date & time formats.
<date value="1945-10-24">24 Oct 45</date> <date value="1996-09-24T07:25Z">September 24th, 1996 at 3:25 in the morning</date> <time value="1999-01-04T20:42-05:00">Jan 4 1999 at 8 pm</time> <time value="14:12:38">fourteen twelve and 38 seconds</time> <date value="1962-10">October of 1962</date> <date value="--06-12">June 12th</date> <date value="---01">the first of each month</date> <date value="--08">August</date> <date value="2006">MMVI</date>
- Example
Examples of time formats with reduced precision.
<date value="2006-05-18T10:03+09:00">a few minutes after ten in the morning on Thu 18 May</date> <time value="03:00">3 A.M.</time> <time value="12">around noon</time>Software intended for use with W3C XML Schema datatypes may be unable to properly process times expressed with reduced precision.
dur
- (duration) indicates the length of this element in time.
Note: In providing a ‘regularized’ form, no claim is made that the form in the source text is incorrect; the regularized form is simply that chosen as the main form for purposes of unifying variant forms under a single heading.
Class att.identifiable
the class of elements which describe other elements by means of their generic identifiers
Note: The values * and name() are used for ident as well.
Members: attDef attributePolicy elementPolicy gi ident valItem valList valSource xairaItem
Class att.interpLike
provides attributes for elements which represent a formal analysis or interpretation.
Class att.personal
(attributes for components of personal names) common attributes for those elements which form part of a personal name.
type
- provides more culture- linguistic- or application- specific information used to categorize this name component.
full
- indicates whether the name component is given in full, as an abbreviation or simply as an initial. Legal values are:
sort
- specifies the sort order of the name component in relation to others within the personal name.
Class att.rendered
the class of elements whose rendition has been recorded intermittently in the BNC
Members: bibl corr div head hi item l label list p quote stage
Class att.spanning
provides attributes for elements which delimit a span of text by pointing mechanisms rather than by enclosing it.
Note: The span is defined as running in document order from the start of the content of the pointing element (if any) to the end of the content of the element pointed to by the spanTo attribute (if any). If no value is supplied for the attribute, the assumption is that the span is coextensive with the pointing element.
Class att.tableDecoration
provides attributes used to decorate rows or cells of a table.
Class att.uniqueId
the class of elements which carry an identifier which is unique across the whole corpus.
Class model.assertLike
the class of elements concerning which assertions are made, for example as parts of a biographical element.
Class: model.personPart
Class: model.personPart
Members: model.persStateLike [age dialect occupation persName persNote ]
Class model.biblLike
groups elements containing a bibliographic description.
Class: model.inter: model.common
Class: model.inter: model.common
Members: bibl
Class model.castItemPart
elements used within an entry in a cast list, such as dramatic role or actor's name.
Class model.complexVal
(complex values) groups elements which express complex feature values in feature structures.
Class model.dateLike
(dates and date ranges) groups elements containing a date specifications.
Class: model.pPart.data: model.recordingPart
Note: This class allows certain content models to allow either a single date or a date-range element.
Class: model.pPart.data: model.recordingPart
Members: date
Class model.datePart
(temporal expression) groups component elements of temporal expressions involving dates and time.
Class model.divPart
groups elements which can occur between, but not within, paragraphs and other chunks.
Note: Note that this element class does not include members of the inter class, which can appear either within or between chunks. Unlike elements of that class, chunks cannot occur within chunks.
Class model.divPart.spoken
groups those elements which appear at the component level in spoken texts only.
Class model.divWrapper
(top-of-div elements) groups elements which can occur at the start of any division class element.
Members: head
Class model.divWrapper.bottom
(Bottom-of-division elements) groups elements which can occur at the end of a text division; for example, trailer, byline, etc.
Class model.editorialDeclPart
groups elements which may be used inside editorialDecl and appear multiple times
Class model.encodingPart
groups elements which may be used inside encodingDesc and appear multiple times
Members: classDecl editorialDecl projectDesc refsDecl samplingDecl tagsDecl xairaSpecification
Class model.frontPart.drama
groups elements which appear at the level of divisions within front or back matter of performance texts only.
Class model.gLike
groups elements which are interspersed with normal text, representing non-Unicode items.
Class model.global
(global inclusions ) groups empty elements which may appear at any point within a TEI text.
Members: model.global.edit [gap ] model.milestoneLike [pb ]
Class model.global.edit
groups empty elements which perform a specifically editorial function, for example by indicating the start of a span of text added, deleted, or missing in a source.
Class: model.global
Note: Members of this class can appear anywhere within a document, between or within components or phrases.
Class: model.global
Members: gap
Class model.glossLike
groups elements which provide an alternative name, explanation, or description for any markup construct.
Members: desc
Class model.headerPart
groups elements which may be used inside teiHeader and appear multiple times
Members: encodingDesc profileDesc
Class model.hiLike
groups phrase-level elements related to highlighting.
Class: model.phrase
Class: model.phrase
Members: hi
Class model.inter
Members: model.biblLike [bibl ] model.listLike [list ] model.noteLike model.oddRef model.qLike [lg quote ] model.stageLike [stage ]
Class model.listLike
groups all list-like elements.
Class: model.inter: model.common
Class: model.inter: model.common
Members: list
Class model.milestoneLike
(reference system elements) groups milestone-style elements used to represent reference systems
Class: model.global
Class: model.global
Members: pb
Class model.nameLike
(names of people, places, or organizations, or refering strings) groups those elements which name or refer to a person, place (man-made or geographic), or organization
Class: model.addrPart: model.pPart.data
Note: A superset of the naming elements that may appear in datelines, addresses, statements of responsibility, etc.
Class: model.addrPart: model.pPart.data
Members: model.nameLike.agent [name ]
Class model.nameLike.agent
groups elements which contain names of individuals or corporate bodies.
Class: model.nameLike
Note: This class is used in the content model of elements which reference names of people or organizations.
Class: model.nameLike
Members: name
Class model.noteLike
groups all note-like elements.
Class: model.inter: model.common
Class: model.inter: model.common
Class model.oddRef
(ODD reference class) groups elements which reference declarations in some markup language in ODD documents.
Class: model.common: model.inter
Class: model.common: model.inter
Class model.pLike
The class of elements which are paragraphs for the purpose of interchange.
Members: p
Class model.pLike.front
(Front matter chunk elements) groups elements which can occur as direct constituents of front matter, when a full title page is not given.
Class model.pPart.data
groups phrase-level elements containing names, dates, numbers, measures, and similar data.
Class: model.phrase
Class: model.phrase
Members: address model.dateLike [date ] model.nameLike [model.nameLike.agent ]
Class model.pPart.edit
groups phrase-level elements for simple editorial correction and transcription.
Class: model.phrase
Class: model.phrase
Class model.persNamePart
(components of personal names) groups those elements which form part of a personal name.
Class model.persStateLike
the class of elements describing changeable characteristics of a person which have a definite duration, for example occupation, residence, name... These characteristics of an individual are typically a consequence of their own action or that of others.
Class: model.assertLike
Class: model.assertLike
Members: age dialect occupation persName persNote
Class model.personLike
the class of elements used to provide information about people and thir relationships.
Note: This class is referenced in the header module, but is not populated unless the namesdates module is loaded.
Class model.personPart
groups elements which describe characteristics of the people referenced by a text, or participating in a language interaction.
Note: This class is used to define the content model for the <person> and <personGrp> elements.
Members: model.assertLike [model.persStateLike ]
Class model.phrase
Members: model.hiLike [hi ] model.pPart.data [address model.dateLike model.nameLike ] model.pPart.edit [corr unclear ] model.ptrLike [align ] model.segLike [c mw s w ]
Class model.physDescPart
specialised descriptive elements constituting the physical description of a manuscript or similar written source.
Class model.placeNamePart
(place name components) groups those elements which form part of a place name.
Class model.profileDescPart
groups elements which may be used inside profileDesc and appear multiple times
Members: langUsage particDesc settingDesc textClass
Class model.ptrLike
groups elements used for purposes of location and reference
Class: model.phrase
Class: model.phrase
Members: align
Class model.publicationStmtPart
(publication statement elements) groups the children of publicationStmt
Members: address availability date distributor idno pubPlace publisher
Class model.qLike
groups elements related to highlighting which can appear either within or between chunk-level elements.
Class: model.inter: model.common
Class: model.inter: model.common
Class model.quoteLike
(quote and similar elements) groups elements used to directly contain quotations.
Class model.recordingPart
(dates and date ranges) groups elements used to describe details of an audio or video recording
Members: model.dateLike [date ]
Class model.respLike
groups elements which are used to indicate intellectual responsibility, for example within a bibliographic element.
Class: model.biblPart: model.msItemPart
Class model.settingPart
elements used to describe the setting of a linguistic interaction.
Class model.singleVal
(atomic values) group elements used to represent atomic feature values in feature structures.
Class model.sourceDescPart
groups elements which may be used inside sourceDesc and appear multiple times
Members: recordingStmt
Class model.stageLike
Class: model.divPart.stage: model.inter
Class: model.divPart.stage: model.inter
Members: stage
Elements defined
<activity>
(activity) contains a brief informal description of what a participant in a language interaction is doing other than speaking, if anything.
Class: model.settingPart
<address>
contains a postal or other address, for example of a publisher, an organization, or an individual.
Class: model.pPart.data: model.publicationStmtPart
<age>
specifies the age in years of a recorded participant at the time of the recording in which they participate.
Class: model.persStateLike
<align>
marks an temporal alignment point within transcribed speech
Class: model.ptrLike
<attDef>
(attribute definition) provides the definition for a single attribute.
Class: att.identifiable
<attList>
contains documentation for all the attributes associated with this element, as a series of attDef elements.
<attributePolicy>
specifies the indexing policy to be used for one or more attributes.
Class: att.identifiable
ident
- identifies the attribute to which the indexing policy applies
- att.identifiable.attribute.ns
<author>
in a bibliographic reference, contains the name of the author(s), personal or corporate, of a work; the primary statement of responsibility for any bibliographic item.
Class: model.respLike
<availability>
supplies information about the availability of a text, for example any restrictions on its use or distribution, its copyright status, etc.
Class: model.publicationStmtPart
<bibl>
(bibliographic citation) contains any bibliographic reference, occurring either within the header of a written corpus text in which case it has a fixed substructure, or within the body of a corpus text, in which case it contains only s elements.
Class: att.rendered: model.biblLike
<bncDoc>
contains a distinct document within the corpus, either spoken or written.
Class: att.uniqueId
<c>
(character) contains a significant punctuation mark as identified by the CLAWS tagger.
Class: model.segLike: att.segLike
Note: Character data. Should only contain a single character or an entity that represents a single character.
<catDesc>
(category description) provides a description for one category within the text taxonomies provided in the corpus header.
<catRef>
(category reference) provides a list of codes identifying the categories to which this text has been assigned, each code referencing a category element declared in the corpus header.
<category>
(category) defines a single category within a taxonomy of texts.
Class: att.uniqueId
<change>
summarizes a particular change or correction made to a particular version of an electronic text which is shared between several researchers.
Class: att.ascribed
date
- supplies the date of the change in standard form, i.e. yyyy-mm-dd.
- att.ascribed.attribute.who
Note: Changes should be recorded in a consistent order, for example with the most recent first.
<classCode>
(classCode) contains the classification code used for this text in some standard classification system.
<classDecl>
(classification declarations) contains one or more taxonomies defining any classificatory codes used elsewhere in the text.
Class: model.encodingPart
<collate>
supplies any additional ICU-conformant collating rules to be used when sorting words in the corpus.
Note: The format for collating rules is defined at http://icu.sourceforge.net/userguide/Collate_Customization.html
<corr>
(correction) contains the correct form of a passage apparently erroneous in the copy text.
Class: att.rendered: att.editLike: model.pPart.edit
sic
- contains verbatim text which has been corrected, or an empty string if the correction consists of an addition.
- att.rendered.attribute.rend
resp
- a code identifying the agency responsible for making the correction.
<creation>
contains information about the creation of a text.
<date>
contains a date in any format.
Class: model.dateLike: model.publicationStmtPart
<defaultVal>
specifies the default declared value for an attribute.
<desc>
(description) supplies explanatory text associated with a category or other component defined in the corpus header
Class: model.glossLike: att.translatable
Note: TEI convention requires that this be expressed as a finite clause, begining with an active verb.
<dialect>
contains an informal description of the regional variety of English used by a participant in a spoken text.
Class: model.persStateLike
<distributor>
supplies the name of a person or other agency responsible for the distribution of a text.
Class: model.biblPart: model.publicationStmtPart
<div>
(text division) contains a subdivision of the front, body, or back of a text.
Class: att.rendered
n
- for a spoken text, identities the tape corresponding to this division.
decls
- for a spoken text, identities the declarations (for setting, recording etc.) in the header which apply to this division.
level
- specifies the hierarchic level of this division as a number between 1 (outermost or largest division) and 4 (innermost or smallest).
type
- identifies the type or function of the division (for a written text). Values are:
- advertisement
- advertisement section or insert
- appendix
- appendix
- article
- single article in a journal
- blurb
- any kind of promotional front matter
- cartoon
- cartoon
- chapter
- chapter of a novel etc.
- column
- newspaper column, regular feature etc.
- compo
- composite material
- contents
- table of contents
- front
- any kind of front matter
- leaflet
- free-standing leaflet or pamphlet
- paper
- an academic paper in a collection
- part
- subdivision of a chapter
- recipe
- separate recipe in a cookbook
- section
- any subdivision
- sidebar
- sidebar or displayed paragraph e.g. in a news story
- story
- distinct story in a periodical or collection
- subsection
- smaller subdivision of any kind
- att.rendered.attribute.rend
Note: any sequence of low-level structural elements, possibly grouped into lower subdivisions.
<edition>
(Edition) describes the particularities of one edition of a text.
<editionStmt>
(edition statement) groups information relating to one edition of a text.
<editor>
(editor) secondary statement of responsibility for a bibliographic item, for example the name of an individual, institution or organization, (or of several such) acting as editor, compiler, translator, etc.
Class: model.respLike
<editorialDecl>
(editorial practice declaration) provides details of editorial principles and practices applied during the encoding of a text.
Class: model.encodingPart: att.declarable
Note: This element is supplied in the BNC corpus header only
<elementPolicy>
specifies the xaira indexing policy to be used for one or more elements.
Class: att.identifiable
<encodingDesc>
(Encoding description) documents the relationship between an electronic text and the source or sources from which it was derived.
Class: model.headerPart
<event>
(Event) any phenomenon or occurrence, not necessarily vocalized or communicative, for example incidental noises or other events affecting communication.
Class: model.divPart.spoken: att.timed: att.ascribed
desc
- provides a brief description of the event
- att.timed.attribute.dur
<extent>
specifies the approximate size of the text, in orthographic words, w elements, and s elements
<fileDesc>
(File Description) contains a full bibliographic description of an electronic file.
Note: The major source of information for those seeking to create a catalogue entry or bibliographic citation for an electronic file. As such, it provides a title and statements of responsibility together with details of the publication or distribution of the file, of any series to which it belongs, and detailed bibliographic notes for matters not addressed elswhere in the header. It also contains a full bibliographic description for the source or sources from which the electronic text was derived.
<gap>
(omitted material) indicates a point where material has been omitted from the transcription.
Class: model.global.edit: att.editLike
desc
- briefly describes the material which has been omitted.
reason
- gives further details of the reason for omission.
- att.editLike.attribute.resp
<gi>
(generic identifier) contains the name (generic identifier) of an element.
Class: att.identifiable
<head>
(heading) contains any type of heading, for example the title of a section or a poem.
Class: att.rendered: model.divWrapper
type
- Legal values are:
- att.rendered.attribute.rend
Note: The <head> element is used for headings at all levels; software which treats (e.g.) chapter headings, section headings, and list titles differently must determine the proper processing of a <head> element based on its structural position. A <head> occurring as the first element of a list is the title of that list; one occurring as the first element of a <div1> is the title of that chapter or section.
<hi>
(highlighted) marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made.
Class: att.rendered: model.hiLike
<ident>
contains an identifier or name for an object of some kind in a formal language
Class: att.identifiable
Note: In running prose, this element may be used for any kind of identifier in any formal language.
<idno>
(identifying number) supplies an identifying code for a text.
Class: model.biblPart: model.publicationStmtPart
<imprint>
groups information relating to the publication or distribution of a bibliographic item.
<item>
contains one component of a list.
Class: att.rendered
<joinTo>
supplies a list of element names carrying an attribute which has been specified with the xaira "joinTo" indexing policy.
<keywords>
(Keywords) contains a list of keywords or phrases identifying the topic or nature of a text.
<l>
(verse line) contains a single, possibly incomplete, line of verse.
Class: att.rendered: model.divPart: model.lLike
<label>
contains the label associated with an item in a list; in glossaries, marks the term being defined.
Class: att.rendered
<labelGen>
specifies the label to be generated for the parent reference.
<langUsage>
(language usage) describes the languages, sublanguages, registers, dialects etc. represented within a text.
Class: model.profileDescPart: att.declarable
<language>
characterizes a single language or sublanguage used within a text.
Note: Particularly for sublanguages, an informal prose characterization should be supplied as content for the element.
<lg>
(line group) contains a group of verse lines functioning as a formal unit, e.g. a stanza, refrain, verse paragraph, etc.
Class: model.qLike: model.divPart
Note: contains verse lines or nested line groups only, possibly prefixed by a heading.
<list>
contains any sequence of items organized as a list.
Class: att.rendered: model.listLike: model.divPart
<locale>
(locale) contains a brief informal description of the nature of a place for example a room, a restaurant, a park bench etc.
Class: model.settingPart
<mw>
contains a multi-word unit as identified by CLAWS, that is, a sequence of individual tokens which function as a single unit and can be given a single part of speech code.
Class: model.segLike: att.c5coded
Note: In CLAWS output the components of a <mw> are given ‘ditto’ tags inherited from the parent <mw>. In BNC they have been given the same code as elsewhere in the corpus.
<name>
(name, proper noun) contains a proper noun or noun phrase.
Class: model.nameLike.agent: att.naming
<namespace>
supplies the formal name of the namespace to which the elements documented by its children belong.
Note: This element is not used in the current release of the BNC: all elements belong to the empty namespace.
<note>
contains a note or annotation.
Class: model.divPart: att.placement
<occupation>
contains an informal description of a person's trade, profession or occupation.
Class: model.persStateLike
<p>
(paragraph) marks paragraphs in prose.
Class: att.rendered: model.pLike: model.divPart
type
- indicates how the paragraph is displayed Values are:
- att.rendered.attribute.rend
<para>
contains descriptive text appearing within components of a TEI header
<particDesc>
(participation description) describes the identifiable speakers, voices, or other participants in a linguistic interaction.
Class: model.profileDescPart: att.declarable
<pause>
a pause either between or within utterances.
Class: model.divPart.spoken: att.timed
<pb>
(page break) marks the boundary between one page of a text and the next in a standard reference system.
Class: model.milestoneLike
<persName>
(personal name) contains a proper noun or proper-noun phrase referring to a person, possibly including any or all of the person's forenames, surnames, honorifics, added names, etc.
Class: model.persStateLike: model.nameLikeAgent
<persNote>
contains any additional information supplied about a participant in a spoken text
Class: model.persStateLike
<person>
provides information about an identifiable individual, for example a participant in a language interaction, or a person referred to in a historical source.
Class: att.uniqueId
ageGroup
- specifies the age group to which the participant belongs. Values are:
dialect
- specifies the dialect or accent of a participant's speech, as identified
by the respondent. Values are:
- CAN
- Canadian
- NONE
- No accent recorded
- XDE
- German
- XEA
- East Anglian
- XFR
- French
- XHC
- Home Counties
- XHM
- Humberside
- XIR
- Irish
- XIS
- Indian subcontinent
- XLC
- Lancashire
- XLO
- London
- XMC
- Central Midlands
- XMD
- Merseyside
- XME
- North-east Midlands
- XMI
- Midlands
- XMS
- South Midlands
- XMW
- North-west Midlands
- XNC
- Central Northern England
- XNE
- North-east England
- XNO
- Northern England
- XOT
- Other or unidentifiable
- XSD
- Scottish
- XSL
- Lower south-west England
- XSS
- Central south-west England
- XSU
- Upper south-west England
- XUR
- European
- XUS
- American (US)
- XWA
- Welsh
- XWE
- West Indian
firstLang
- specifies the country of origin of the participant, as identified by the respondent. Legal values are:
n
- internal identifier
educ
- specifies the age at which the participant ceased full-time education. Legal values are:
soc
- specifies the social class of the participant. Legal values are:
sex
- specifies the sex of the participant. Legal values are:
role
- describes the relationship or role of this participant with respect to the respondent.
- att.uniqueId.attribute.xmlid
Note: May contain either a prose description organized as paragraphs, or a sequence of more specific demographic elements drawn from the model.personPart class.
<placeName>
(place name) contains an absolute or relative place name.
Class: model.settingPart
<pp>
supplies page numbers for a bibliographic citation.
<profileDesc>
(text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting.
Class: model.headerPart
<projectDesc>
(project description) describes in detail the aim or purpose for which an electronic file was encoded, together with any other relevant information concerning the process by which it was assembled or collected.
Class: model.encodingPart: att.declarable
<pubPlace>
contains the name of the place where a bibliographic item was published.
Class: att.naming: model.imprintPart: model.publicationStmtPart
<publicationStmt>
(publication statement) groups information concerning the publication or distribution of an electronic or other text.
<publisher>
provides the name of the organization responsible for the publication or distribution of a bibliographic item.
Class: model.imprintPart: model.publicationStmtPart
<quote>
(quotation) contains a phrase or passage attributed by the narrator or author to some agency external to the text.
Class: model.qLike: model.divPart: att.rendered
Note: Any bibliographic source or reference provided for the quotation may be included within the quote element.
<recording>
(recording event) details of an audio or video recording event used as the source of a spoken text, either directly or from a public broadcast.
Class: att.uniqueId
date
- date of the recording in standardized form.
n
- tape number.
time
- time of day the recording was made.
type
- kind of recording. Values are:
dur
- duration of the recording in minutes.
- att.uniqueId.attribute.xmlid
<recordingStmt>
(recording statement) describes a set of recordings used in transcription of a spoken text.
Class: model.sourceDescPart
<refsDecl>
(references declaration) provides documentation for the reference system applicable to the corpus.
Class: model.encodingPart: att.declarable
<resp>
contains a phrase describing the nature of a person's intellectual responsibility.
<respStmt>
(statement of responsibility) supplies a statement of responsibility for someone responsible for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply.
<revisionDesc>
(revision description) summarizes the revision history for a file.
Note: Record changes with most recent changes at the top of the list.
<s>
(s-unit) contains a sentence-like division of a text.
Class: model.segLike
<samplingDecl>
(sampling declaration) contains a prose description of the rationale and methods used in sampling texts in the creation of a corpus or collection.
Class: model.encodingPart: att.declarable
<setting>
(setting) describes one particular setting in which a language interaction takes place.
Class: att.uniqueId: att.ascribed
n
- an internal identifier for a setting
- att.uniqueId.attribute.xmlid
- att.ascribed.attribute.who
Note: If the who attribute is not supplied, the setting is assumed to be that of all participants in the language interaction.
<settingDesc>
(setting description) describes the setting or settings within which a language interaction takes place, either as a prose description or as a series of setting elements.
Class: model.profileDescPart: att.declarable
<shift>
(Shift) marks the point at which some paralinguistic feature of a series of utterances by any one speaker changes.
Class: model.divPart.spoken
<sourceDesc>
supplies a description of the source text(s) from which an electronic text was derived or generated.
<sp>
(speech) An individual speech in a performance text, or a passage presented as such in a prose or verse text.
Class: model.divPart: att.ascribed
<speaker>
A specialized form of heading or label, giving the name of one or more speakers in a dramatic text or fragment.
Note: In the BNC, used only for speaker labels in dramatic texts, or Hansard
<stage>
(stage direction) contains any kind of stage direction within a dramatic text or fragment.
Class: att.rendered: model.stageLike
<stext>
contains a single spoken text, i.e. a transcription or collection of transcriptions from a single source.
<tagUsage>
(tagUsage) supplies information about the usage of a specific element within a text.
<tagsDecl>
(tagging declaration) provides information about the XML elements actually used within a BNC text
Class: model.encodingPart
<taxonomy>
(taxonomy) defines a typology used to classify texts either implicitly, by means of a bibliographic citation, or explicitly by a structured taxonomy.
Class: att.uniqueId
<teiHeader>
(TEI Header) supplies the descriptive and declarative information making up an electronic title page prefixed to every TEI-conformant text.
<term>
contains a word or phrase used to describe the topic or nature of a text.
<textClass>
(text classification) groups information which describes the nature or topic of a text in terms of a standard classification scheme, thesaurus, etc.
Class: model.profileDescPart: att.declarable
<title>
contains the full title of a work of any kind.
<titleStmt>
(title statement) groups information about the title of a work and those responsible for its intellectual content.
<tokenize>
supplies any additional ICU-conformant rules to be used when tokenization is performed by xaira rather than by explicit XML markup.
<trunc>
contains one or more truncated words in transcribed speech.
Class: model.divPart.spoken
<u>
(utterance) a stretch of speech usually preceded and followed by silence or by a change of speaker.
Class: att.ascribed: model.divPart.spoken
Note: In the BNC, each change of speaker is marked by a new <u> element.
<unclear>
contains a word, phrase, or passage which cannot be transcribed with certainty because it is illegible or inaudible in the source.
Class: att.timed: model.pPart.edit
<valItem>
(value definition) contains a single value and gloss pair for an attribute.
Class: att.identifiable
<valList>
(value list) contains one or more valItem elements defining possible values for an attribute.
Class: att.identifiable
copyOf
- supplies the identifier of a previously-defined value list to be used at this point
type
- specifies the extensibility of the list of attribute values specified. Legal values are:
- att.identifiable.attribute.ident
- att.identifiable.attribute.ns
<valSource>
specifies where the xaira indexer is to find a value.
Class: att.identifiable
<vocal>
(Vocalized semi-lexical) any vocalized but not necessarily lexical phenomenon, for example voiced pauses, non-lexical backchannels, etc.
Class: model.divPart.spoken: att.timed: att.ascribed
desc
- provides a brief description of the vocal event
- att.timed.attribute.dur
- att.ascribed.attribute.who
<w>
(word) represents a grammatical (not necessarily orthographic) word.
Class: att.c5coded: model.segLike
pos
- supplies a simplified part-of-speech code. Legal values are:
hw
- specifies the headword under which this lexical unit is conventionally grouped, where known.
- att.c5coded.attribute.c5
<wtext>
contains a single written text.
<xairaItem>
provides data needed to define one part of a xaira specification.
Class: att.identifiable
<xairaList>
contains a list of xaira parameters of a particular type
<xairaSpecification>
specifies additional information needed by xaira.
Class: model.encodingPart
<bnc>
(TEI corpus) contains the whole of a TEI encoded corpus, comprising a single corpus header and one or more TEI elements, each containing a single text header and a text.
bncNote: Must contain one TEI header for the corpus, and a series of <TEI> elements, one for each text.This element is mandatory when applicable.
Macros defined
Macro data.count
defines the range of attribute values used for a non-negative integer value used as a count
Macro data.enumerated
defines the range of attribute values expressed as a single word or token taken from a list of documented possibilities
Note: Typically, the list of documented possibilities will be provided (or exemplified) by a value list in the associated element specification. If the value contains whitespace, it must be normalised: neither leading or trailing sequences of whitespace characters nor internal sequences of more than one whitespace character are allowed.
Macro data.language
defines the range of attribute values used to identify a particular combination of human language and writing system
Macro data.name
defines the range of attribute values expressed as an XML name or identifier
Note: Attributes using this datatype must contain a single word which follows the rules defining a legal XML name: for example they cannot include whitespace or begin with digits.
Macro data.namespace
(an XML namespace) defines the range of attribute values used to indicate XML namespaces as defined by the W3C Namespaces in XML technical recommendation
Note: The range of syntactically valid values is defined by RFC 2396 Uniform Resource Identifier (URI) Reference
Macro data.pointer
defines the range of attribute values used to provide a single pointer to any other resource, either within the current document or elsewhere
Note: The range of syntactically valid values is defined by RFC 2396 Uniform Resource Identifier (URI) Reference
Macro data.pointers
defines the range of attribute values used to provide a list of pointers to other resources, either within the current document or elsewhere
Note: A white-space delimited list of values, defined by the datatype data.pointer
Macro data.temporal
defines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them
Note: A normalized form of temporal expression conforming to the W3C XML Schema Part 2: Datatypes Second Edition, except that times may be expressed with reduced precision (i.e., to the minute or the hour). Software intended for use with W3C XML Schema datatypes may be unable to properly process times expressed with reduced precision.If it is likely that the value used is to be compared with another, then a time zone indicator should always be included, and only the dateTime representation should be used.
Macro data.word
defines the range of attribute values expressed as a single word or token
Note: Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace.
Macro macro.fileDescPart
(file description elements) groups elements which occur inside fileDesc and biblFull
Macro macro.phraseSeq
(phrase sequence) defines a sequence of character data and phrase-level elements.
Macro mix.spoken
(mixed-base spoken-text components) contains a string used in constructing the definition of macro.component used in the mixed base tag set.
Up: Contents Previous: List of Sources