[bnc] Formal Specification of the BNC XML schema - Users Reference Guide for the British National Corpus (XML Edition)

Formal Specification of the BNC XML schema

The structure of the XML edition of the British National Corpus is described by means of a single XML schema, which is however expressed in three different schema languages: the traditional DTD language which XML inherits from SGML; the more recently defined ISO schema language known as RELAXNG; and the W3C defined schema language. The three schema files are all generated from the same TEI-conformant XML source file, which is also used to generate the present documentation.

This section of the document contains the TEI-conformant reference specification for all components of the BNC schema. These include definitions for attribute classses, model classes, and macro patterns as well as definitions for elements and their associated attributes and possible value lists. A full description of these concepts and how they are used to define and document XML encoding schemes is given by the TEI Guidelines (in particular, in chapter TD); the following summary provides only basic information about them.

When several elements in a schema share attributes of the same name, with values drawn from a common set, they are considered to form an attribute class. The members of such a class can then all reference the same class definition rather then each repeat the same information. In the BNC, for example, the elements <bibl>,<corr>, <div>, <head>, <hi>, and half a dozen others, all have the same attribute rend which takes a coded value taken from the same short list of possibilities. Rather than repeat this definition half a dozen times therefore, the relevant elements are all said to be members of a class att.rendered, which is defined independently of those elements (but includes a list of its members). In the same way, the <w> and <mw> elements, as members of the att.c5coded class, share the same definition for the possible CLAWS5 codes specified by their c5 attribute. Note however that the element <c>, although it has an attribute c5, is not a member of this class because the possible values for this attribute on this element are entirely different.

In any reasonably large schema, and particularly one derived from the TEI model, several elements are likely to have very similar content models, since it will often be the case that at a given point in the document hierarchy any one of several possible elements will be permissible. The specific subset of elements (<w>, <mw>, <c> and a few others) which may appear within an <s> element in the BNC, is different from the subsets of elements which may appear within a or <div> element. However, there are several elements which can appear in the same places as a . Following TEI practice, we call the set of elements which can appear together (in sequence or alternation) at a specific place in the document hierarchy a model class. For example, since <l>, <lg>, <list>, , <quote>, and <sp> are all permitted as immediate components of a <div> elements, we define a class model.divPart, of which these six elements are all members. Wherever convenient, content models are defined in terms of these model classes.

As noted above, this usage of model classes is a distinctive and pervasive feature of the TEI encoding scheme. Because the BNC derives from the TEI scheme, it uses the same names and (as far as is practicable) the same model classes throughout. Although this introduces an occasionally redundant degree of indirection in the resulting schema, it also makes clearer the relationship between the components defined for the BNC and their origins in the TEI scheme.

Finally, we define here a few macros for commonly encountered content models. These are also taken from the TEI encoding scheme, though in a few cases with different meanings. In the TEI for example, the macro macro.phraseSeq is defined as a mixture of various ‘phrase level’ elements and plain text; in the BNC scheme, it has been redefined as plain text only. The places where this macro is referenced however are unchanged; in this respect therefore, the BNC schema is a proper subset of the full BNC schema.

The remainder of this section lists in alphabetical order all of the attribute classes, model classes, elements, and macros defined for the BNC encoding scheme, using a similar method of display as the full TEI Guidelines. For each component, we give a brief description and also a usage example. Note that many of the elements listed here appear only in the corpus header rather than in the texts, and may thus be safely disregarded by applications which operate on the texts alone or in isolation.

Classes defined

Class att.ascribed

provides attributes for elements representing speech or action that can be ascribed to a specific individual.

Attributes: In addition to global attributes

who: indicates the person, or group of people, to whom the element content is ascribed.

Class: (none)

Members: change event setting sp u vocal

Module: tei

Class att.authorialIntervention

provides attributes describing the nature of an authorial intervention.

Attributes: In addition to global attributes

hand

signifies the hand of the agent which made the addition or performed the deletion.

status

may be used to indicate faulty deletions, e.g. strikeouts which include too much or too little text, or erroneous additions, e.g., an insertion which duplicates some of the text already present. Sample values include:

duplicate: (all of the text indicated as an addition duplicates some text that is in the original, whether the duplication is word-for-word or less exact.)
duplicate-partial: (part of the text indicated as an addition duplicates some text that is in the original)
excessStart: (some text at the beginning of the deletion is marked as deleted even though it clearly should not be deleted.)
excessEnd: (some text at the end of the deletion is marked as deleted even though it clearly should not be deleted.)
shortStart: (some text at the beginning of the deletion is not marked as deleted even though it clearly should be.)
shortEnd: (some text at the end of the deletion is not marked as deleted even though it clearly should be.)
unremarkable: (the deletion is not faulty.)

type

classifies the type of addition or deletion using any convenient typology.

Class: (none)

Members:

Module: tei

Class att.c5coded

elements which carry a CLAWS 5 Part of speech code

Attributes: In addition to global attributes

c5

supplies the CLAWS 5 code associated with this word. Legal values are:

AJ0: Adjective (general or positive) (e.g. good, old, beautiful)
AJC: Comparative adjective (e.g. better, older)
AJS: Superlative adjective (e.g. best, oldest)
AT0: Article (e.g. the, a, an, no)
AV0: General adverb: an adverb not subclassified as AVP or AVQ (see below) (e.g. often, well, longer (adv.), furthest.
AVP: Adverb particle (e.g. up, off, out)
AVQ: Wh-adverb (e.g. when, where, how, why, wherever)
CJC: Coordinating conjunction (e.g. and, or, but)
CJS: Subordinating conjunction (e.g. although, when)
CJT: The subordinating conjunction that
CRD: Cardinal number (e.g. one, 3, fifty-five, 3609)
DPS: Possessive determiner-pronoun (e.g. your, their, his)
DT0: General determiner-pronoun: i.e. a determiner-pronoun which is not a DTQ or an AT0.
DTQ: Wh-determiner-pronoun (e.g. which, what, whose, whichever)
EX0: Existential there, i.e. there occurring in the there is ... or there are ... construction
ITJ: Interjection or other isolate (e.g. oh, yes, mhm, wow)
NN0: Common noun, neutral for number (e.g. aircraft, data, committee)
NN1: Singular common noun (e.g. pencil, goose, time, revelation)
NN2: Plural common noun (e.g. pencils, geese, times, revelations)
NP0: Proper noun (e.g. London, Michael, Mars, IBM)
ORD: Ordinal numeral (e.g. first, sixth, 77th, last) .
PNI: Indefinite pronoun (e.g. none, everything, one [as pronoun], nobody)
PNP: Personal pronoun (e.g. I, you, them, ours)
PNQ: Wh-pronoun (e.g. who, whoever, whom)
PNX: Reflexive pronoun (e.g. myself, yourself, itself, ourselves)
POS: The possessive or genitive marker 's or '
PRF: The preposition of
PRP: Preposition (except for of) (e.g. about, at, in, on, on behalf of, with)
TO0: Infinitive marker to
UNC: Unclassified items which are not appropriately considered as items of the English lexicon.
VBB: The present tense forms of the verb BE, except for is, 's: i.e. am, are, 'm, 're and be [subjunctive or imperative]
VBD: The past tense forms of the verb BE: was and were
VBG: The -ing form of the verb BE: being
VBI: The infinitive form of the verb BE: be
VBN: The past participle form of the verb BE: been
VBZ: The -s form of the verb BE: is, 's
VDB: The finite base form of the verb BE: do
VDD: The past tense form of the verb DO: did
VDG: The -ing form of the verb DO: doing
VDI: The infinitive form of the verb DO: do
VDN: The past participle form of the verb DO: done
VDZ: The -s form of the verb DO: does, 's
VHB: The finite base form of the verb HAVE: have, 've
VHD: The past tense form of the verb HAVE: had, 'd
VHG: The -ing form of the verb HAVE: having
VHI: The infinitive form of the verb HAVE: have
VHN: The past participle form of the verb HAVE: had
VHZ: The -s form of the verb HAVE: has, 's
VM0: Modal auxiliary verb (e.g. will, would, can, could, 'll, 'd)
VVB: The finite base form of lexical verbs (e.g. forget, send, live, return) [Including the imperative and present subjunctive]
VVD: The past tense form of lexical verbs (e.g. forgot, sent, lived, returned)
VVG: The -ing form of lexical verbs (e.g. forgetting, sending, living, returning)
VVI: The infinitive form of lexical verbs (e.g. forget, send, live, return)
VVN: The past participle form of lexical verbs (e.g. forgotten, sent, lived, returned)
VVZ: The -s form of lexical verbs (e.g. forgets, sends, lives, returns)
XX0: The negative particle not or n't
ZZ0: Alphabetical symbols (e.g. A, a, B, b, c, d)
AJ0-AV0: Probably AJ0 (adjective), but maybe AV0 (adverb)
AJ0-NN1: Probably AJ0 (adjective), but maybe NN1 (singular noun)
AJ0-VVD: Probably AJ0 (adjective), but maybe VVD (verb past tense)
AJ0-VVG: Probably AJ0 (adjective), but maybe VVG (-ing verb)
AJ0-VVN: Probably AJ0 (adjective), but maybe VVN (verb past participle)
AV0-AJ0: Probably AV0 (adverb), but maybe AJ0 (adjective)
AVP-PRP: Probably AVP (adverb particle), but maybe PRP (preposition)
AVQ-CJS: Probably AVQ (wh- adverb), but maybe CJS (subordinating conjunction)
CJS-AVQ: Probably CJS (subordinating conjunction), but maybe AVQ (wh- adverb)
CJS-PRP: Probably CJS (subordinating conjunction), but maybe PRP (preposition)
CJT-DT0: Probably CJT ("that" as conjunction), but maybe DT0 (determiner)
CRD-PNI: Probably CRD (number), but maybe PNI (indefinite pronoun)
DT0-CJT: Probably DT0 (determiner), but maybe CJT ("that" as conjunction)
NN1-AJ0: Probably NN1 (singular noun), but maybe AJ0 (adjective)
NN1-NP0: Probably NN1 (singular noun), but maybe NP0 (proper noun)
NN1-VVB: Probably NN1 (singular noun), but maybe VVB (verb)
NN1-VVG: Probably NN1 (singular noun), but maybe VVG (-ing verb)
NN2-VVZ: Probably NN2 (plural noun), but maybe VVZ (-s verb)
NP0-NN1: Probably NP0 (proper noun), but maybe NN1 (singular noun)
PNI-CRD: Probably PNI (indefinite pronoun), but maybe CRD (number)
PRP-AVP: Probably PRP (preposition), but maybe AVP (adverb particle)
PRP-CJS: Probably PRP (preposition), but maybe CJS (subordinating conjunction)
VVB-NN1: Probably VVB (verb), but maybe NN1 (singular noun)
VVD-AJ0: Probably VVD (verb past tense), but maybe AJ0 (adjective)
VVD-VVN: Probably VVD (verb past tense), but maybe VVN (verb past participle)
VVG-AJ0: Probably VVG (-ing verb), but maybe AJ0 (adjective)
VVG-NN1: Probably VVG (-ing verb), but maybe NN1 (singular noun)
VVN-AJ0: Probably VVN (verb past participle), but maybe AJ0
VVN-VVD: Probably VVN (verb past participle), but maybe VVD (verb past tense)
VVZ-NN2: Probably VVZ (-s verb), but maybe NN2 (plural noun)

Class: (none)

Members: mw w

Module: module-from-bncxml

Class att.datePart

(attributes for temporal expression) attributes for component elements of temporal expressions involving dates and time

Attributes: In addition to global attributes

value

supplies the value of a date or time in a standard form.

Example: Examples of W3C date, time, and date & time formats.
<date value="1945-10-24">24 Oct 45</date> <date value="1996-09-24T07:25Z">September 24th, 1996 at 3:25 in the morning</date> <time value="1999-01-04T20:42-05:00">Jan 4 1999 at 8 pm</time> <time value="14:12:38">fourteen twelve and 38 seconds</time> <date value="1962-10">October of 1962</date> <date value="--06-12">June 12th</date> <date value="---01">the first of each month</date> <date value="--08">August</date> <date value="2006">MMVI</date>

Example

Examples of time formats with reduced precision.

<date value="2006-05-18T10:03+09:00">a few minutes after ten in the morning on Thu 18 May</date> <time value="03:00">3 A.M.</time> <time value="12">around noon</time>

Software intended for use with W3C XML Schema datatypes may be unable to properly process times expressed with reduced precision.

Example: A usage example of <date>.
This list begins in the year 1632, more precisely on Trinity Sunday, i.e. the Sunday after Pentecost, in that year the <date calendar="Julian" value="1632-06-06">27th of May (old style)</date>.

Example: A usage example of <time>.
He likes to be punctual. I said <q> <time value="12">around noon</time> </q>, and he showed up at <time value="12:00:00">12 O'clock</time> on the dot.

dur

(duration) indicates the length of this element in time.

Example: Examples of W3C durations.
<distance dur="PT45M">forty-five minutes</distance> <distance dur="P1DT12H">a day and a half</distance> <distance dur="P7D">a week</distance> <distance dur="PT0.02S">20 ms</distance>

Note: In providing a ‘regularized’ form, no claim is made that the form in the source text is incorrect; the regularized form is simply that chosen as the main form for purposes of unifying variant forms under a single heading.

Class: (none)

Members:

Module: tei

Class att.editLike

Attributes: In addition to global attributes

resp: indicates the agency responsible for the intervention or interpretation, for example an editor or transcriber.

Class: (none)

Members: corr gap

Module: tei

Class att.identifiable

the class of elements which describe other elements by means of their generic identifiers

Attributes: In addition to global attributes

ident: supplies an element's generic identifier, or one of the codes * (meaning all elements), or name() meaning that the name of the referenced element is to be used rather than its value.
ns: supplies the namespace within which the generic identifier is to be found.

Note: The values * and name() are used for ident as well.

Class: (none)

Members: attDef attributePolicy elementPolicy gi ident valItem valList valSource xairaItem

Module: module-from-bncxml

Class att.interpLike

provides attributes for elements which represent a formal analysis or interpretation.

Attributes: In addition to global attributes

resp

indicates who is responsible for the interpretation.

type

indicates what kind of phenomenon is being noted in the passage. Sample values include:

image: (identifies an image in the passage. )
character: (identifies a character associated with the passage. )
theme: (identifies a theme in the passage. )
allusion: (identifies an allusion to another text. )

inst

points to instances of the analysis or interpretation represented by the current element.

Class: (none)

Members:

Module: tei

Class att.personal

(attributes for components of personal names) common attributes for those elements which form part of a personal name.

Attributes: In addition to global attributes

type

provides more culture- linguistic- or application- specific information used to categorize this name component.

full

indicates whether the name component is given in full, as an abbreviation or simply as an initial. Legal values are:

yes: (the name component is spelled out in full.)
abb: (the name component is given in an abbreviated form.)
init: (the name component is indicated only by one initial.)

sort

specifies the sort order of the name component in relation to others within the personal name.

Class: (none)

Members:

Module: namesdates

Class att.rendered

the class of elements whose rendition has been recorded intermittently in the BNC

Attributes: In addition to global attributes

rend

a code briefly characterising the way the element content was originally presented. Legal values are:

bo: bold weight font
bx: boxed
hi: superscript
ib: italic and bold
ih: italic superscript
il: italic subscript
it: italic font
iu: italic and underlined
lo: subscript
qc: centre-aligned
ro: roman within italic
st: strike-out
ub: bold underlined
ul: underlined
xx: crossed-out

Class: (none)

Members: bibl corr div head hi item l label list p quote stage

Module: module-from-bncxml

Class att.spanning

provides attributes for elements which delimit a span of text by pointing mechanisms rather than by enclosing it.

Attributes: In addition to global attributes

spanTo: indicates the end of a span initiated by the element bearing this attribute.

Note: The span is defined as running in document order from the start of the content of the pointing element (if any) to the end of the content of the element pointed to by the spanTo attribute (if any). If no value is supplied for the attribute, the assumption is that the span is coextensive with the pointing element.

Class: (none)

Members:

Module: tei

Class att.tableDecoration

provides attributes used to decorate rows or cells of a table.

Attributes: In addition to global attributes

role

indicates the kind of information held in this cell or in each cell of this row. Suggested values include:

label: (labelling or descriptive information only.)
data: (data values.)

rows

indicates the number of rows occupied by this cell or row.

cols

indicates the number of columns occupied by this cell or row.

Class: (none)

Members:

Module: tei

Class att.timed

Attributes: In addition to global attributes

dur: (duration) indicates the duration of the element in minutes.

Class: (none)

Members: event pause unclear vocal

Module: tei

Class att.typed

Attributes: In addition to global attributes

type: characterizes the element in some sense, using any convenient classification scheme or typology.

Class: (none)

Members:

Module: tei

Class att.uniqueId

the class of elements which carry an identifier which is unique across the whole corpus.

Attributes: In addition to global attributes

xml:id: provides the unique identifier for this element.

Class: (none)

Members: bncDoc category person recording setting taxonomy

Module: module-from-bncxml

Class model.assertLike

the class of elements concerning which assertions are made, for example as parts of a biographical element.

Class: model.personPart

Members: model.persStateLike [age dialect occupation persName persNote ]

Module: tei

Class model.biblLike

groups elements containing a bibliographic description.

Class: model.inter: model.common

Members: bibl

Module: tei

Class model.blockLike

groups segmenting elements.

Class: (none)

Members:

Module: tei

Class model.castItemPart

elements used within an entry in a cast list, such as dramatic role or actor's name.

Class:

Members:

Module: tei

Class model.catDescPart

groups elements which may be used inside catDesc and appear multiple times

Class: (none)

Members:

Module: tei

Class model.complexVal

(complex values) groups elements which express complex feature values in feature structures.

Class: model.featureVal

Members:

Module: tei

Class model.dateLike

(dates and date ranges) groups elements containing a date specifications.

Class: model.pPart.data: model.recordingPart

Note: This class allows certain content models to allow either a single date or a date-range element.

Class: model.pPart.data: model.recordingPart

Members: date

Module: tei

Class model.datePart

(temporal expression) groups component elements of temporal expressions involving dates and time.

Class: (none)

Members:

Module: tei

Class model.divPart

groups elements which can occur between, but not within, paragraphs and other chunks.

Class: model.common

Note: Note that this element class does not include members of the inter class, which can appear either within or between chunks. Unlike elements of that class, chunks cannot occur within chunks.

Class: model.common

Members: l lg list note p quote sp

Module: tei

Class model.divPart.spoken

groups those elements which appear at the component level in spoken texts only.

Class: (none)

Members: event pause shift trunc u vocal

Module: spoken

Class model.divWrapper

(top-of-div elements) groups elements which can occur at the start of any division class element.

Class: (none)

Members: head

Module: tei

Class model.divWrapper.bottom

(Bottom-of-division elements) groups elements which can occur at the end of a text division; for example, trailer, byline, etc.

Class: (none)

Members:

Module: tei

Class model.editorialDeclPart

groups elements which may be used inside editorialDecl and appear multiple times

Class: (none)

Members:

Module: header

Class model.encodingPart

groups elements which may be used inside encodingDesc and appear multiple times

Class: (none)

Members: classDecl editorialDecl projectDesc refsDecl samplingDecl tagsDecl xairaSpecification

Module: header

Class model.frontPart.drama

groups elements which appear at the level of divisions within front or back matter of performance texts only.

Class: model.frontPart

Members:

Module: tei

Class model.gLike

groups elements which are interspersed with normal text, representing non-Unicode items.

Class: (none)

Members:

Module: tei

Class model.global

(global inclusions ) groups empty elements which may appear at any point within a TEI text.

Class: (none)

Members: model.global.edit [gap ] model.milestoneLike [pb ]

Module: tei

Class model.global.edit

groups empty elements which perform a specifically editorial function, for example by indicating the start of a span of text added, deleted, or missing in a source.

Class: model.global

Note: Members of this class can appear anywhere within a document, between or within components or phrases.

Class: model.global

Members: gap

Module: tei

Class model.glossLike

groups elements which provide an alternative name, explanation, or description for any markup construct.

Class: (none)

Members: desc

Module: tei

Class model.headerPart

groups elements which may be used inside teiHeader and appear multiple times

Class: (none)

Members: encodingDesc profileDesc

Module: header

Class model.hiLike

groups phrase-level elements related to highlighting.

Class: model.phrase

Members: hi

Module: tei

Class model.imprintPart

groups the bibliographic elements which occur inside imprints.

Class: model.biblPart

Members: pubPlace publisher

Module: tei

Class model.inter

Attributes: Global attributes only

Class: (none)

Members: model.biblLike [bibl ] model.listLike [list ] model.noteLike model.oddRef model.qLike [lg quote ] model.stageLike [stage ]

Module: tei

Class model.lLike

groups elements representing metrical components such as verse lines.

Class:

Members: l

Module: tei

Class model.listLike

groups all list-like elements.

Class: model.inter: model.common

Members: list

Module: tei

Class model.milestoneLike

(reference system elements) groups milestone-style elements used to represent reference systems

Class: model.global

Members: pb

Module: tei

Class model.nameLike

(names of people, places, or organizations, or refering strings) groups those elements which name or refer to a person, place (man-made or geographic), or organization

Class: model.addrPart: model.pPart.data

Note: A superset of the naming elements that may appear in datelines, addresses, statements of responsibility, etc.

Class: model.addrPart: model.pPart.data

Members: model.nameLike.agent [name ]

Module: tei

Class model.nameLike.agent

groups elements which contain names of individuals or corporate bodies.

Class: model.nameLike

Note: This class is used in the content model of elements which reference names of people or organizations.

Class: model.nameLike

Members: name

Module: tei

Class model.noteLike

groups all note-like elements.

Class: model.inter: model.common

Members:

Module: tei

Class model.oddRef

(ODD reference class) groups elements which reference declarations in some markup language in ODD documents.

Class: model.common: model.inter

Members:

Module: tei

Class model.pLike

The class of elements which are paragraphs for the purpose of interchange.

Class: (none)

Members: p

Module: tei

Class model.pLike.front

(Front matter chunk elements) groups elements which can occur as direct constituents of front matter, when a full title page is not given.

Class: (none)

Members:

Module: tei

Class model.pPart.data

groups phrase-level elements containing names, dates, numbers, measures, and similar data.

Class: model.phrase

Members: address model.dateLike [date ] model.nameLike [model.nameLike.agent ]

Module: tei

Class model.pPart.edit

groups phrase-level elements for simple editorial correction and transcription.

Class: model.phrase

Members: corr unclear

Module: tei

Class model.persNamePart

(components of personal names) groups those elements which form part of a personal name.

Class: (none)

Members:

Module: namesdates

Class model.persStateLike

the class of elements describing changeable characteristics of a person which have a definite duration, for example occupation, residence, name... These characteristics of an individual are typically a consequence of their own action or that of others.

Class: model.assertLike

Members: age dialect occupation persName persNote

Module: tei

Class model.personLike

the class of elements used to provide information about people and thir relationships.

Note: This class is referenced in the header module, but is not populated unless the namesdates module is loaded.

Class: (none)

Members:

Module: tei

Class model.personPart

groups elements which describe characteristics of the people referenced by a text, or participating in a language interaction.

Note: This class is used to define the content model for the <person> and <personGrp> elements.

Class: (none)

Members: model.assertLike [model.persStateLike ]

Module: tei

Class model.phrase

Attributes: Global attributes only

Class: (none)

Members: model.hiLike [hi ] model.pPart.data [address model.dateLike model.nameLike ] model.pPart.edit [corr unclear ] model.ptrLike [align ] model.segLike [c mw s w ]

Module: tei

Class model.physDescPart

specialised descriptive elements constituting the physical description of a manuscript or similar written source.

Class:

Members:

Module: tei

Class model.placeNamePart

(place name components) groups those elements which form part of a place name.

Class: (none)

Members:

Module: tei

Class model.profileDescPart

groups elements which may be used inside profileDesc and appear multiple times

Class: (none)

Members: langUsage particDesc settingDesc textClass

Module: header

Class model.ptrLike

groups elements used for purposes of location and reference

Class: model.phrase

Members: align

Module: tei

Class model.publicationStmtPart

(publication statement elements) groups the children of publicationStmt

Class: (none)

Members: address availability date distributor idno pubPlace publisher

Module: tei

Class model.qLike

groups elements related to highlighting which can appear either within or between chunk-level elements.

Class: model.inter: model.common

Members: lg quote

Module: tei

Class model.quoteLike

(quote and similar elements) groups elements used to directly contain quotations.

Class:

Members:

Module: tei

Class model.recordingPart

(dates and date ranges) groups elements used to describe details of an audio or video recording

Class: (none)

Members: model.dateLike [date ]

Module: tei

Class model.respLike

groups elements which are used to indicate intellectual responsibility, for example within a bibliographic element.

Class: model.biblPart: model.msItemPart

Members: author editor

Module: tei

Class model.segLike

Class: model.phrase

Attributes: Global attributes only

Class: model.phrase

Members: c mw s w

Module: tei

Class model.settingPart

elements used to describe the setting of a linguistic interaction.

Class:

Members: activity locale placeName

Module: tei

Class model.singleVal

(atomic values) group elements used to represent atomic feature values in feature structures.

Class: model.featureVal

Members:

Module: tei

Class model.sourceDescPart

groups elements which may be used inside sourceDesc and appear multiple times

Class: (none)

Members: recordingStmt

Module: header

Class model.stageLike

Class: model.divPart.stage: model.inter

Attributes: Global attributes and those inherited from [model.divPart.stage ]

Class: model.divPart.stage: model.inter

Members: stage

Module: tei

Class model.textDescPart

elements used to categorise a text for example in terms of its situational parameters.

Class:

Members:

Module: tei

Class model.titlepagePart

(Title page elements) groups those elements which can occur as direct constituents of a title page (docTitle, docAuth, docImprint, epigraph, etc.)

Class: (none)

Members:

Module: tei

Elements defined

<activity>

(activity) contains a brief informal description of what a participant in a language interaction is doing other than speaking, if anything.

Class: model.settingPart

Declaration

element activity { attribute spont { text }?, macro.phraseSeq }

Attributes: In addition to global attributes

spont

level of spontaneity Values are:

H: high
M: medium
L: low
X: not applicable or unknown

Example

<activity>driving</activity>

Module: corpus

<address>

contains a postal or other address, for example of a publisher, an organization, or an individual.

Class: model.pPart.data: model.publicationStmtPart

Declaration

element address { macro.phraseSeq }

Attributes: Global attributes only

Example

<address>natcorp@oucs.ox.ac.uk</address>

Example

<address>13 Banbury Road, Oxford OX2 6NN,UK</address>

Module: core

<age>

specifies the age in years of a recorded participant at the time of the recording in which they participate.

Class: model.persStateLike

Declaration

element age { macro.phraseSeq }

Attributes: Global attributes only

Example

Module: namesdates

<align>

marks an temporal alignment point within transcribed speech

Class: model.ptrLike

Declaration

element align { attribute with { data.pointer }, empty }

Attributes: In addition to global attributes

Example

<s n="12"> <w c5="VVB" hw="tell" pos="VERB">tell </w> <w c5="NP0" hw="billy" pos="SUBST">Billy </w> <align with="KSWLC001"/> <w c5="DT0" hw="that" pos="ADJ">that </w> </s> <s n="13"> <align with="KSWLC001"/> <w c5="ITJ" hw="hello" pos="INTERJ">Hello </w> <w c5="AV0" hw="now" pos="ADV">now </w> <w c5="VM0" hw="can" pos="VERB">can </w> <w c5="PNP" hw="you" pos="PRON">you </w> <w c5="VVI" hw="hear" pos="VERB">hear </w> <w c5="PNP" hw="i" pos="PRON">me</w> </s>

Module: module-from-bncxml

<attDef>

(attribute definition) provides the definition for a single attribute.

Class: att.identifiable

Declaration

element attDef { att.identifiable.attributes, att.identifiable.attribute.ident, ( desc*, valList? ) }

Attributes: Global attributes and those inherited from [att.identifiable ]

att.identifiable.attribute.ident

Module: tagdocs

<attList>

contains documentation for all the attributes associated with this element, as a series of attDef elements.

Declaration

element attList { attDef+ }

Attributes: Global attributes only

Module: tagdocs

<attributePolicy>

specifies the indexing policy to be used for one or more attributes.

Class: att.identifiable

Declaration

element attributePolicy { att.identifiable.attributes, attribute ident { data.name }?, att.identifiable.attribute.ns, attribute type { "none" | "jointo" | "joinfrom" | "taxonomy" }?, ( nameList?, joinTo? ) }

Attributes: In addition to global attributes and those inherited from [att.identifiable ]

ident: identifies the attribute to which the indexing policy applies
att.identifiable.attribute.ns

Module: module-from-bncxml

<author>

in a bibliographic reference, contains the name of the author(s), personal or corporate, of a work; the primary statement of responsibility for any bibliographic item.

Class: model.respLike

Declaration

element author { attribute domicile { text }?, attribute n { text }?, attribute born { text }?, macro.phraseSeq }

Attributes: In addition to global attributes

domicile: main country of residence where known
n: internal identifier
born: year of birth where known

Example

<author n="AubreC1" domicile="Britain">Aubrey, Crispin</author>

Module: core

<availability>

supplies information about the availability of a text, for example any restrictions on its use or distribution, its copyright status, etc.

Class: model.publicationStmtPart

Declaration

element availability { ( text | para )* }

Attributes: Global attributes only

Example

<availability> This material is protected by international copyright laws and may not be copied or redistributed in any way. Consult the BNC Web Site at http://www.natcorp.ox.ac.uk for full licencing and distribution conditions.</availability>

Module: header

<bibl>

(bibliographic citation) contains any bibliographic reference, occurring either within the header of a written corpus text in which case it has a fixed substructure, or within the body of a corpus text, in which case it contains only s elements.

Class: att.rendered: model.biblLike

Declaration

element bibl { att.rendered.attributes, att.rendered.attribute.rend, ( s+ | ( title+, ( editor | author )*, imprint, pp? ) ) }

Attributes: Global attributes and those inherited from [att.rendered ]

att.rendered.attribute.rend

Example

<bibl> <title>British intelligence services in action. </title> <author n="LindsK1" born="1924">Lindsay, Kennedy</author> <imprint n="DUNROD1"> <publisher>Dunrod Press</publisher> <pubPlace>Dundalk, Ireland</pubPlace> <date value="1980">1980</date> </imprint> <pp>74-176</pp> </bibl>

Module: core

<bncDoc>

contains a distinct document within the corpus, either spoken or written.

Class: att.uniqueId

Declaration

element bncDoc { att.uniqueId.attributes, att.uniqueId.attribute.xmlid, ( teiHeader, ( wtext | stext ) ) }

Attributes: Global attributes and those inherited from [att.uniqueId ]

att.uniqueId.attribute.xmlid

Module: module-from-bncxml

<c>

(character) contains a significant punctuation mark as identified by the CLAWS tagger.

Class: model.segLike: att.segLike

Declaration

element c { model.segLike.attributes, attribute c5 { "PUN" | "PUL" | "PUR" | "PUQ" }, text }

Attributes: In addition to global attributes and those inherited from [att.segLike ]

c5

the CLAWS 5 code associated with this punctuation mark. Legal values are:

PUN: any separating punctuation mark
PUL: opening round or square parenthesis
PUR: closing round or square parenthesis
PUQ: any quotation mark

Example

Note: Character data. Should only contain a single character or an entity that represents a single character.

Module: analysis

<catDesc>

(category description) provides a description for one category within the text taxonomies provided in the corpus header.

Declaration

element catDesc { macro.phraseSeq }

Attributes: Global attributes only

Example

<category xml:id="ACPROSE"> <catDesc>Academic prose</catDesc> </category>

Module: header

<catRef>

(category reference) provides a list of codes identifying the categories to which this text has been assigned, each code referencing a category element declared in the corpus header.

Declaration

element catRef { attribute targets { data.pointers }, empty }

Attributes: In addition to global attributes

targets: identifies the categories concerned

Example

Module: header

<category>

(category) defines a single category within a taxonomy of texts.

Class: att.uniqueId

Declaration

element category { att.uniqueId.attributes, att.uniqueId.attribute.xmlid, catDesc }

Attributes: Global attributes and those inherited from [att.uniqueId ]

att.uniqueId.attribute.xmlid

Example

<category xml:id="FICTION"> <catDesc>Fiction and verse</catDesc> </category>

Module: header

<change>

summarizes a particular change or correction made to a particular version of an electronic text which is shared between several researchers.

Class: att.ascribed

Declaration

element change { att.ascribed.attributes, attribute date { data.temporal }?, att.ascribed.attribute.who, macro.phraseSeq }

Attributes: In addition to global attributes and those inherited from [att.ascribed ]

date: supplies the date of the change in standard form, i.e. yyyy-mm-dd.
att.ascribed.attribute.who

Example

<change date="2006-10-21" who="#OUCS">Tag usage updated for BNC-XML</change>

Note: Changes should be recorded in a consistent order, for example with the most recent first.

Module: header

<classCode>

(classCode) contains the classification code used for this text in some standard classification system.

Declaration

element classCode { attribute scheme { data.pointer }, macro.phraseSeq }

Attributes: In addition to global attributes

Example

Module: header

<classDecl>

(classification declarations) contains one or more taxonomies defining any classificatory codes used elsewhere in the text.

Class: model.encodingPart

Declaration

element classDecl { taxonomy+ }

Attributes: Global attributes only

Module: header

<collate>

supplies any additional ICU-conformant collating rules to be used when sorting words in the corpus.

Declaration

element collate { text }

Attributes: Global attributes only

Note: The format for collating rules is defined at http://icu.sourceforge.net/userguide/Collate_Customization.html

Module: module-from-bncxml

<corr>

(correction) contains the correct form of a passage apparently erroneous in the copy text.

Class: att.rendered: att.editLike: model.pPart.edit

Declaration

element corr { att.rendered.attributes, att.editLike.attributes, attribute sic { text }?, att.rendered.attribute.rend, attribute resp { data.pointer }?, ( w | c | mw | gap )* }

Attributes: In addition to global attributes and those inherited from [att.rendered att.editLike ]

sic: contains verbatim text which has been corrected, or an empty string if the correction consists of an addition.
att.rendered.attribute.rend
resp: a code identifying the agency responsible for making the correction.

Example

<corr sic="existant"> <w c5="AJ0" hw="existent" pos="ADJ">existent </w> </corr>

Module: core

<creation>

contains information about the creation of a text.

Declaration

element creation { attribute date { data.temporal }?, macro.phraseSeq }

Attributes: In addition to global attributes

date: supplies the year of original composition, if known; or 000-00-00 if the date is unknown.

Example

<creation date="0000-00-00"> Origination/creation date not known </creation>

Example

<creation date="1986"> Original publisher: A & C Black (Publishers) Ltd, London </creation>

Module: header

<date>

contains a date in any format.

Class: model.dateLike: model.publicationStmtPart

Declaration

element date { attribute value { data.temporal }?, macro.phraseSeq }

Attributes: In addition to global attributes

value: supplies a standardized representation of the date

Example

Example

Module: core

<defaultVal>

specifies the default declared value for an attribute.

Declaration

element defaultVal { text }

Attributes: Global attributes only

Example

<defaultVal>#IMPLIED</defaultVal>

Note: any legal declared value or TEI-defined keyword

Module: tagdocs

<desc>

(description) supplies explanatory text associated with a category or other component defined in the corpus header

Class: model.glossLike: att.translatable

Declaration

element desc { macro.phraseSeq }

Attributes: Global attributes and those inherited from [att.translatable ]

Example

<desc>contains a brief description of the purpose and application for an element, attribute, attribute value, class, or entity.</desc>

Note: TEI convention requires that this be expressed as a finite clause, begining with an active verb.

Module: core

<dialect>

contains an informal description of the regional variety of English used by a participant in a spoken text.

Class: model.persStateLike

Declaration

element dialect { macro.phraseSeq }

Attributes: Global attributes only

Example

<dialect>Home Counties</dialect>

Module: module-from-bncxml

<distributor>

supplies the name of a person or other agency responsible for the distribution of a text.

Class: model.biblPart: model.publicationStmtPart

Declaration

element distributor { macro.phraseSeq }

Attributes: Global attributes and those inherited from [model.biblPart ]

Example

<distributor>Distributed under licence by Oxford University Computing Services on behalf of the BNC Consortium.</distributor>

Module: header

<div>

(text division) contains a subdivision of the front, body, or back of a text.

Class: att.rendered

Declaration

element div { att.rendered.attributes, attribute n { text }?, attribute decls { data.pointers }?, attribute level { data.count }?, attribute type { text }?, att.rendered.attribute.rend, ( ( model.divWrapper | model.global )*, ( ( ( model.divPart ), ( model.divPart | model.global )* ) | ( ( model.divPart.spoken ), ( model.divPart.spoken | model.global )* ) )?, div* ) }

Attributes: In addition to global attributes and those inherited from [att.rendered ]

n

for a spoken text, identities the tape corresponding to this division.

decls

for a spoken text, identities the declarations (for setting, recording etc.) in the header which apply to this division.

level

specifies the hierarchic level of this division as a number between 1 (outermost or largest division) and 4 (innermost or smallest).

type

identifies the type or function of the division (for a written text). Values are:

advertisement: advertisement section or insert
appendix: appendix
article: single article in a journal
blurb: any kind of promotional front matter
cartoon: cartoon
chapter: chapter of a novel etc.
column: newspaper column, regular feature etc.
compo: composite material
contents: table of contents
front: any kind of front matter
leaflet: free-standing leaflet or pamphlet
paper: an academic paper in a collection
part: subdivision of a chapter
recipe: separate recipe in a cookbook
section: any subdivision
sidebar: sidebar or displayed paragraph e.g. in a news story
story: distinct story in a periodical or collection
subsection: smaller subdivision of any kind

att.rendered.attribute.rend

Example

<div level="1" n="1" type="chapter"> <head rend="it"> <s n="1"> <w c5="AV0" hw="so" pos="ADV">So </w> <w c5="PNP" hw="you" pos="PRON">you </w> <w c5="VVB" hw="want" pos="VERB">want </w> <w c5="TO0" hw="to" pos="PREP">to </w> <w c5="VBI" hw="be" pos="VERB">be </w> <w c5="AT0" hw="an" pos="ART">an </w> <w c5="NN1" hw="actor" pos="SUBST">Actor</w> <c c5="PUN">?</c> </s> </head> <s n="2"> <w c5="PNI" hw="everyone" pos="PRON">Everyone </w> <w c5="PNQ" hw="who" pos="PRON">who </w> <w c5="VVZ" hw="want" pos="VERB">wants ... </w> </s> </div>

Note: any sequence of low-level structural elements, possibly grouped into lower subdivisions.

Module: textstructure

<edition>

(Edition) describes the particularities of one edition of a text.

Class: model.biblPart

Declaration

element edition { attribute n { data.count }?, macro.phraseSeq }

Attributes: In addition to global attributes and those inherited from [model.biblPart ]

n: supplies an identifying number for the edition

Example

<editionStmt> <edition>BNC XML Edition, December 2006</edition> </editionStmt>

Module: header

<editionStmt>

(edition statement) groups information relating to one edition of a text.

Declaration

element editionStmt { edition }

Attributes: Global attributes only

Example

<editionStmt> <edition>BNC XML Edition, December 2006</edition> </editionStmt>

Module: header

<editor>

(editor) secondary statement of responsibility for a bibliographic item, for example the name of an individual, institution or organization, (or of several such) acting as editor, compiler, translator, etc.

Class: model.respLike

Declaration

element editor { attribute n { text }?, macro.phraseSeq }

Attributes: In addition to global attributes

n: supplies a number for the editor where multiple editors are specified for a single text

Example

<editor n="2">Boileau, John</editor>

Module: core

<editorialDecl>

(editorial practice declaration) provides details of editorial principles and practices applied during the encoding of a text.

Class: model.encodingPart: att.declarable

Declaration

element editorialDecl { ( text | para )* }

Attributes: Global attributes and those inherited from [att.declarable ]

Example

<editorialDecl> <para>Material included in the BNC was produced by several different agencies ...</para> </editorialDecl>

Note: This element is supplied in the BNC corpus header only

Module: header

<elementPolicy>

specifies the xaira indexing policy to be used for one or more elements.

Class: att.identifiable

Declaration

element elementPolicy { att.identifiable.attributes, att.identifiable.attribute.ident, att.identifiable.attribute.ns, attribute type { "none" | "children" | "content" | "markup" }?, nameList? }

Attributes: In addition to global attributes and those inherited from [att.identifiable ]

att.identifiable.attribute.ident
att.identifiable.attribute.ns

Module: module-from-bncxml

<encodingDesc>

(Encoding description) documents the relationship between an electronic text and the source or sources from which it was derived.

Class: model.headerPart

Declaration

element encodingDesc { model.encodingPart* }

Attributes: Global attributes only

Example

<encodingDesc> <projectDesc> <para>The British National Corpus (BNC) Consortium was formed in 1990...</para> </projectDesc> <samplingDecl> <para>Definitive information on the sampling policies... </para> </samplingDecl> <editorialDecl> <para>Material included in the BNC was produced by several different agencies ...</para> </editorialDecl> <refsDecl> <para>Canonical references to the BNC should ...</para> </refsDecl> <classDecl> <taxonomy xml:id="DLee"> <desc>David Lee's register and domain classification</desc> </taxonomy>... </classDecl> <xairaSpecification> ...</xairaSpecification> </encodingDesc>

Note: Used in corpus header only

Module: header

<event>

(Event) any phenomenon or occurrence, not necessarily vocalized or communicative, for example incidental noises or other events affecting communication.

Class: model.divPart.spoken: att.timed: att.ascribed

Declaration

element event { att.timed.attributes, att.ascribed.attributes, attribute desc { text }?, att.timed.attribute.dur, empty }

Attributes: In addition to global attributes and those inherited from [att.timed att.ascribed ]

desc: provides a brief description of the event
att.timed.attribute.dur

Example

Module: spoken

<extent>

specifies the approximate size of the text, in orthographic words, w elements, and s elements

Class: model.biblPart

Declaration

element extent { macro.phraseSeq }

Attributes: Global attributes and those inherited from [model.biblPart ]

Example

<extent>432434 tokens; 432859 w-units; 26215 s-units</extent>

Module: header

<fileDesc>

(File Description) contains a full bibliographic description of an electronic file.

Declaration

element fileDesc { macro.fileDescPart, sourceDesc+ }

Attributes: Global attributes only

Note: The major source of information for those seeking to create a catalogue entry or bibliographic citation for an electronic file. As such, it provides a title and statements of responsibility together with details of the publication or distribution of the file, of any series to which it belongs, and detailed bibliographic notes for matters not addressed elswhere in the header. It also contains a full bibliographic description for the source or sources from which the electronic text was derived.

Module: header

<gap>

(omitted material) indicates a point where material has been omitted from the transcription.

Class: model.global.edit: att.editLike

Declaration

element gap { att.editLike.attributes, attribute desc { text }?, attribute reason { text }?, att.editLike.attribute.resp, empty }

Attributes: In addition to global attributes and those inherited from [att.editLike ]

desc: briefly describes the material which has been omitted.
reason: gives further details of the reason for omission.
att.editLike.attribute.resp

Example

Module: core

<gi>

(generic identifier) contains the name (generic identifier) of an element.

Class: att.identifiable

Declaration

element gi { att.identifiable.attributes, att.identifiable.attribute.ident, att.identifiable.attribute.ns, text }

Attributes: Global attributes and those inherited from [att.identifiable ]

att.identifiable.attribute.ident
att.identifiable.attribute.ns

Module: tagdocs

<head>

(heading) contains any type of heading, for example the title of a section or a poem.

Class: att.rendered: model.divWrapper

Declaration

element head { att.rendered.attributes, attribute type { "MAIN" | "SUB" | "BYLINE" }?, att.rendered.attribute.rend, ( s | gap | pb )+ }

Attributes: In addition to global attributes and those inherited from [att.rendered ]

type

Legal values are:

MAIN: a major heading.
SUB: any sub-heading.
BYLINE: a sub-heading providing the name of a journalist or other source of a newspaper report.

att.rendered.attribute.rend

Example

<head rend="ub" type="MAIN"> <s n="93"> <w c5="VDB" hw="do" pos="VERB">Do </w> <w c5="PNP" hw="i" pos="PRON">I </w> <w c5="VVI" hw="need" pos="VERB">need </w> <w c5="DT0" hw="any" pos="ADJ">any </w> <w c5="NN1" hw="training" pos="SUBST">training</w> <c c5="PUN">?</c> </s> </head>

Note: The <head> element is used for headings at all levels; software which treats (e.g.) chapter headings, section headings, and list titles differently must determine the proper processing of a <head> element based on its structural position. A <head> occurring as the first element of a list is the title of that list; one occurring as the first element of a <div1> is the title of that chapter or section.

Module: core

<hi>

(highlighted) marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made.

Class: att.rendered: model.hiLike

Declaration

element hi { att.rendered.attributes, att.rendered.attribute.rend, macro.paraContent }

Attributes: Global attributes and those inherited from [att.rendered ]

att.rendered.attribute.rend

Example

<exemplum> <egXML> <s n="2211"> <hi rend="it"> <w c5="NN1" hw="apple" pos="SUBST">Apple </w> </hi> <w c5="VBZ" hw="be" pos="VERB">is </w> <w c5="PRP" hw="to" pos="PREP">to </w> <hi rend="it"> <w c5="NN0" hw="fruit" pos="SUBST">fruit </w> </hi> <w c5="CJS-PRP" hw="as" pos="CONJ">as </w> <hi rend="it"> <w c5="NN1" hw="dog" pos="SUBST">dog </w> </hi> <w c5="VBZ" hw="be" pos="VERB">is </w> <w c5="PRP" hw="to" pos="PREP">to </w> <hi rend="it"> <w c5="ZZ0" hw="x" pos="SUBST">X </w> </hi> <c c5="PUN">.</c> </s> </egXML> </exemplum>

Module: core

<ident>

contains an identifier or name for an object of some kind in a formal language

Class: att.identifiable

Declaration

element ident { att.identifiable.attributes, att.identifiable.attribute.ident, att.identifiable.attribute.ns, text }

Attributes: Global attributes and those inherited from [att.identifiable ]

att.identifiable.attribute.ident
att.identifiable.attribute.ns

Note: In running prose, this element may be used for any kind of identifier in any formal language.

Module: tagdocs

<idno>

(identifying number) supplies an identifying code for a text.

Class: model.biblPart: model.publicationStmtPart

Declaration

element idno { attribute type { data.enumerated }?, text }

Attributes: In addition to global attributes and those inherited from [model.biblPart ]

type: categorizes the code number used.

Example

Module: header

<imprint>

groups information relating to the publication or distribution of a bibliographic item.

Declaration

element imprint { attribute n { text }?, ( pubPlace | publisher | date | pp )* }

Attributes: In addition to global attributes

n: internal identifier

Example

<imprint n="JOHNMU1"> <publisher>John Murray (Publishers) Ltd</publisher> <pubPlace>London</pubPlace> <date value="1989">1989</date> </imprint>

Module: core

<item>

contains one component of a list.

Class: att.rendered

Declaration

element item { att.rendered.attributes, att.rendered.attribute.rend, ( model.pLike | model.qLike | model.listLike | s | model.global )+ }

Attributes: Global attributes and those inherited from [att.rendered ]

att.rendered.attribute.rend

Example

<list> <item> <s n="516"> <w c5="VVB" hw="substitute" pos="VERB">Substitute </w> <w c5="AJ0-NN1" hw="plain" pos="ADJ">plain </w> <w c5="NN2" hw="biscuit" pos="SUBST">biscuits </w> <w c5="PRP" hw="for" pos="PREP">for </w> <w c5="AJ0-VVN" hw="filled" pos="ADJ">filled </w> <w c5="CJC" hw="or" pos="CONJ">or </w> <w c5="AJ0" hw="chocolate-covered" pos="ADJ">chocolate-covered </w> <w c5="NN2" hw="one" pos="SUBST">ones</w>...</s> </item> <item> <s n="517"> <w c5="VVB" hw="try" pos="VERB">Try </w> <w c5="VVG" hw="eat" pos="VERB">eating </w> <w c5="AT0" hw="a" pos="ART">a </w> <w c5="AJ0" hw="small" pos="ADJ">small </w> <w c5="NN1" hw="amount" pos="SUBST">amount </w>...</s> </item> </list>

Module: core

<joinTo>

supplies a list of element names carrying an attribute which has been specified with the xaira "joinTo" indexing policy.

Declaration

element joinTo { gi+ }

Attributes: Global attributes only

Module: module-from-bncxml

<keywords>

(Keywords) contains a list of keywords or phrases identifying the topic or nature of a text.

Declaration

element keywords { attribute scheme { data.pointer }?, term+ }

Attributes: In addition to global attributes

scheme: identifies the controlled vocabulary within which the set of keywords concerned is defined.

Example

<keywords scheme="COPAC"> <term>Fluid dynamics</term> <term> Fluids. Dynamics</term> </keywords> <keywords/>

Module: header

<l>

(verse line) contains a single, possibly incomplete, line of verse.

Class: att.rendered: model.divPart: model.lLike

Declaration

element l { att.rendered.attributes, att.rendered.attribute.rend, ( s | gap | pb )+ }

Attributes: Global attributes and those inherited from [att.rendered ]

att.rendered.attribute.rend

Example

<l> <s n="3287"> <w c5="ORD" hw="next" pos="ADJ">Next </w> <w c5="NN1" hw="day" pos="SUBST">Day </w> <w c5="PRP" hw="at" pos="PREP">at </w> <w c5="CRD" hw="six" pos="ADJ">Six </w> <w c5="CJS" hw="before" pos="CONJ">before </w> <w c5="AT0" hw="the" pos="ART">the </w> <w c5="NN1" hw="gate" pos="SUBST">Gate </w> <w c5="VVZ" hw="appear" pos="VERB">appears</w> <c c5="PUN">,</c> </s> </l> <l> <s n="3288"> <w c5="AT0" hw="the" pos="ART">The </w> <w c5="NN1" hw="wretch" pos="SUBST">Wretch </w> <w c5="VVN" hw="divide" pos="VERB">divided </w> <w c5="PRP" hw="by" pos="PREP">by </w> <w c5="DPS" hw="he" pos="PRON">his </w> <w c5="NN2" hw="hope" pos="SUBST">Hopes </w> <w c5="CJC" hw="and" pos="CONJ">and </w> <w c5="NN2-VVZ" hw="fear" pos="SUBST">Fears</w> <c c5="PUN">.</c> </s> </l>

Module: core

<label>

contains the label associated with an item in a list; in glossaries, marks the term being defined.

Class: att.rendered

Declaration

element label { att.rendered.attributes, att.rendered.attribute.rend, ( s | gap | pb )+ }

Attributes: Global attributes and those inherited from [att.rendered ]

att.rendered.attribute.rend

Example

<label> <s n="8176"> <w c5="NN1-VVB" hw="amount" pos="SUBST">Amount</w> <c c5="PUN">:</c> </s> </label> <item> <s n="8177"> <w c5="CRD" hw="52153" pos="ADJ">52153 </w> <w c5="NN2" hw="pound" pos="SUBST">Pounds</w> </s> </item> <label> <s n="8178"> <w c5="NN1-VVB" hw="date" pos="SUBST">Date </w> <w c5="NN1" hw="award" pos="SUBST">Award </w> <w c5="VVD" hw="begin" pos="VERB">Began</w> <c c5="PUN">:</c> </s> </label> <item> <s n="8179"> <w c5="CRD" hw="01" pos="ADJ">01 </w> <w c5="NP0" hw="january" pos="SUBST">January </w> <w c5="CRD" hw="1992" pos="ADJ">1992</w> </s> </item>

Module: core

<labelGen>

specifies the label to be generated for the parent reference.

Declaration

element labelGen { attribute change { "onStart" | "within" }?, text }

Attributes: In addition to global attributes

Module: module-from-bncxml

<langUsage>

(language usage) describes the languages, sublanguages, registers, dialects etc. represented within a text.

Class: model.profileDescPart: att.declarable

Declaration

element langUsage { language+ }

Attributes: Global attributes and those inherited from [att.declarable ]

Example

<langUsage> <language ident="en-GB">The language of the British National Corpus is modern British English. ...</language> </langUsage>

Note: Appears only in the corpus header.

Module: header

<language>

characterizes a single language or sublanguage used within a text.

Declaration

element language { attribute ident { data.language }, macro.phraseSeq }

Attributes: In addition to global attributes

ident: Supplies a language code constructed as defined in RFC 3066 (or its successor) which is used to identify the language documented by this element, and which is referenced by the global xml:lang attribute.

Example

<language ident="en-GB">The language of the British National Corpus is modern British English. ...</language>

Note: Particularly for sublanguages, an informal prose characterization should be supplied as content for the element.

Module: header

<lg>

(line group) contains a group of verse lines functioning as a formal unit, e.g. a stanza, refrain, verse paragraph, etc.

Class: model.qLike: model.divPart

Declaration

element lg { ( model.divWrapper | model.global )*, ( model.lLike | lg ), ( model.lLike | lg | model.global )*, model.divWrapper.bottom* }

Attributes: Global attributes only

Example

<lg> <l> <s n="463"> <w c5="AV0" hw="too" pos="ADV">Too </w> <w c5="AJ0-VVD" hw="jellied" pos="ADJ">jellied</w> <c c5="PUN">, </c> <w c5="AJ0" hw="viscous" pos="ADJ">viscous</w> <c c5="PUN">, </c> <w c5="VVG" hw="float" pos="VERB">floating </w> <w c5="AT0" hw="a" pos="ART">a </w> <w c5="NN1" hw="condition" pos="SUBST">condition</w> </s> </l> <l> <s n="464"> <w c5="TO0" hw="to" pos="PREP">to </w> <w c5="VVI" hw="inspire" pos="VERB">inspire </w> <w c5="DT0" hw="more" pos="ADJ">more </w> <w c5="NN1" hw="action" pos="SUBST">action </w> <w c5="CJS" hw="than" pos="CONJ">than </w> <w c5="AT0" hw="a" pos="ART">a </w> <w c5="NN1" hw="sigh" pos="SUBST">sigh </w> <c c5="PUN">—</c> </s> </l>...</lg>

Note: contains verse lines or nested line groups only, possibly prefixed by a heading.

Module: core

<list>

contains any sequence of items organized as a list.

Class: att.rendered: model.listLike: model.divPart

Declaration

element list { att.rendered.attributes, att.rendered.attribute.rend, ( ( model.divWrapper | model.global )*, ( ( item, model.global* )+ | ( label, model.global*, item, model.global* )+ ), model.divWrapper.bottom* ) }

Attributes: Global attributes and those inherited from [att.rendered ]

att.rendered.attribute.rend

Module: core

<locale>

(locale) contains a brief informal description of the nature of a place for example a room, a restaurant, a park bench etc.

Class: model.settingPart

Declaration

element locale { macro.phraseSeq }

Attributes: Global attributes only

Example

<locale>a fashionable restaurant</locale>

Module: corpus

<mw>

contains a multi-word unit as identified by CLAWS, that is, a sequence of individual tokens which function as a single unit and can be given a single part of speech code.

Class: model.segLike: att.c5coded

Declaration

element mw { model.segLike.attributes, att.c5coded.attributes, att.c5coded.attribute.c5, w+ }

Attributes: Global attributes and those inherited from [att.c5coded ]

att.c5coded.attribute.c5

Example

<mw c5="PRP"> <w c5="PRP" hw="in" pos="PREP">in </w> <w c5="NN1" hw="response" pos="SUBST">response </w> <w c5="PRP" hw="to" pos="PREP">to </w> </mw>

Note: In CLAWS output the components of a <mw> are given ‘ditto’ tags inherited from the parent <mw>. In BNC they have been given the same code as elsewhere in the corpus.

Module: module-from-bncxml

<name>

(name, proper noun) contains a proper noun or noun phrase.

Class: model.nameLike.agent: att.naming

Declaration

element name { macro.phraseSeq }

Attributes: Global attributes and those inherited from [att.naming ]

Example

<name>Longman </name>

Note: This element is used only in the header.

Module: core

<nameList>

supplies a list of element names or attribute identifiers

Declaration

element nameList { ( gi | ident )+ }

Attributes: Global attributes only

Module: module-from-bncxml

<namespace>

supplies the formal name of the namespace to which the elements documented by its children belong.

Declaration

element namespace { attribute name { data.namespace }, tagUsage+ }

Attributes: In addition to global attributes

name: the full formal name of the namespace concerned.

Note: This element is not used in the current release of the BNC: all elements belong to the empty namespace.

Module: header

<note>

contains a note or annotation.

Class: model.divPart: att.placement

Declaration

element note { attribute place { text }?, attribute n { text }?, s+ }

Attributes: In addition to global attributes and those inherited from [att.placement ]

place

Values are:

FOOT: footnote
SIDE: side note
END: endnote

n

internal identifier

Example

<note place="SIDE"> <s n="477"> <w c5="AT0" hw="the" pos="ART">The </w> <w c5="AJ0-NN1" hw="short" pos="ADJ">short </w> <w c5="VBZ" hw="be" pos="VERB">is </w> <w c5="AT0" hw="a" pos="ART">a </w> <w c5="NN1" hw="film" pos="SUBST">film </w> <w c5="PRP" hw="about" pos="PREP">about </w> <w c5="NN1-VVG" hw="sailing" pos="SUBST">sailing</w> <c c5="PUN">.</c> </s>...</note>

Module: core

<occupation>

contains an informal description of a person's trade, profession or occupation.

Class: model.persStateLike

Declaration

element occupation { macro.phraseSeq }

Attributes: Global attributes only

Example

<occupation>student</occupation>

Module: namesdates

(paragraph) marks paragraphs in prose.

Class: att.rendered: model.pLike: model.divPart

Declaration

element p { att.rendered.attributes, attribute type { text }?, att.rendered.attribute.rend, macro.paraContent }

Attributes: In addition to global attributes and those inherited from [att.rendered ]

type

indicates how the paragraph is displayed Values are:

caption: the paragraph is displayed as a caption
caption:byline: the displayed paragraph contains a byline
caption:display: the paragraph is displayed as a floating caption
caption:attached: the paragraph is displayed as an attached caption

att.rendered.attribute.rend

Example

<s n="7234"> <w c5="VVB" hw="brave" pos="VERB">BRAVE</w> <c c5="PUN">: </c> <w c5="NP0" hw="louise" pos="SUBST">Louise</w> </s>

Example

<s n="7244"> <w c5="AJ0" hw="jobless" pos="ADJ">JOBLESS </w> <w c5="NP0" hw="darren" pos="SUBST">Darren </w> <w c5="NP0" hw="st" pos="SUBST">St </w> <w c5="NP0" hw="john" pos="SUBST">John </w> <w c5="VVD" hw="gobble" pos="VERB">gobbled </w> <w c5="NN0" hw="5lb" pos="SUBST">5lb </w> <w c5="PRF" hw="of" pos="PREP">of </w> <w c5="NN2" hw="strawberry" pos="SUBST">strawberries </w> <w c5="PRP" hw="in" pos="PREP">in </w> <w c5="CRD" hw="two" pos="ADJ">two </w> <w c5="NN2" hw="pint" pos="SUBST">pints </w> <w c5="PRF" hw="of" pos="PREP">of </w> <w c5="AJ0" hw="chilli-flavoured" pos="ADJ">chilli-flavoured </w> <w c5="NN1" hw="gravy" pos="SUBST">gravy </w> <w c5="TO0" hw="to" pos="PREP">to </w> <w c5="VVI" hw="raise" pos="VERB">raise </w> <w c5="NN0" hw="£450" pos="UNC">£450 </w> <w c5="PRP" hw="for" pos="PREP">for </w> <w c5="NN1" hw="charity" pos="SUBST">charity </w> <w c5="PRP" hw="at" pos="PREP">at </w> <w c5="NP0" hw="henley" pos="SUBST">Henley</w> <c c5="PUN">, </c> <w c5="NP0" hw="oxon" pos="SUBST">Oxon</w> <c c5="PUN">.</c> </s>

Module: core

<para>

contains descriptive text appearing within components of a TEI header

Declaration

element para { ( text | hi | list )* }

Attributes: Global attributes only

Example

<para>For information, the conditions of the Standard License Agreement are as follows:</para>

Module: module-from-bncxml

<particDesc>

(participation description) describes the identifiable speakers, voices, or other participants in a linguistic interaction.

Class: model.profileDescPart: att.declarable

Declaration

element particDesc { attribute n { text }?, person+ }

Attributes: In addition to global attributes and those inherited from [att.declarable ]

n: internal identifier

Example

<particDesc n="C114"> <person ageGroup="Ag4" xml:id="PS1US" role="unspecified" sex="m" soc="UU" dialect="NONE" educ="X"> <age>45</age> <persName>Terry</persName> <occupation>british rail employee</occupation> </person>...</particDesc>

Module: corpus

<pause>

a pause either between or within utterances.

Class: model.divPart.spoken: att.timed

Declaration

element pause { att.timed.attributes, att.timed.attribute.dur, empty }

Attributes: Global attributes and those inherited from [att.timed ]

att.timed.attribute.dur

Example

<s n="199"> <w c5="UNC" hw="erm" pos="UNC">Erm </w> <pause dur="10"/> <w c5="AV0" hw="right" pos="ADV">right </w> <w c5="AV0" hw="now" pos="ADV">now</w> <c c5="PUN">, </c>...</s>

Module: spoken

<pb>

(page break) marks the boundary between one page of a text and the next in a standard reference system.

Class: model.milestoneLike

Declaration

element pb { attribute n { text }?, empty }

Attributes: In addition to global attributes

n: gives the number of the page beginning here

Example

Module: core

<persName>

(personal name) contains a proper noun or proper-noun phrase referring to a person, possibly including any or all of the person's forenames, surnames, honorifics, added names, etc.

Class: model.persStateLike: model.nameLikeAgent

Declaration

element persName { macro.phraseSeq }

Attributes: Global attributes and those inherited from [model.nameLikeAgent ]

Example

<persName>Norman</persName>

Module: namesdates

<persNote>

contains any additional information supplied about a participant in a spoken text

Class: model.persStateLike

Declaration

element persNote { macro.phraseSeq }

Attributes: Global attributes only

Example

<person ageGroup="X"> <persNote>May well be an actor portraying a Davidian</persNote> </person>

Module: module-from-bncxml

<person>

provides information about an identifiable individual, for example a participant in a language interaction, or a person referred to in a historical source.

Class: att.uniqueId

Declaration

element person { att.uniqueId.attributes, attribute ageGroup { text }?, attribute dialect { text }?, attribute firstLang { "XX-XXX" | "DE-DEU" | "FR-FRA" | "EN-GBR" | "EN-USA" | "XX-IND" }?, attribute n { text }?, attribute educ { "Ed0" | "Ed1" | "Ed4" | "X" }?, attribute soc { "AB" | "C1" | "C2" | "DE" | "UU" }?, attribute sex { "m" | "f" | "u" }?, attribute role { text }?, att.uniqueId.attribute.xmlid, ( model.pLike+ | model.personPart* ) }

Attributes: In addition to global attributes and those inherited from [att.uniqueId ]

ageGroup

specifies the age group to which the participant belongs. Values are:

Ag0: Under 15 years
Ag1: 15 to 24 years
Ag2: 25 to 34 years
Ag3: 35 to 44 years
Ag4: 45 to 59 years
Ag5: Over 59 years
X: Unknown

dialect

specifies the dialect or accent of a participant's speech, as identified by the respondent. Values are:

CAN: Canadian
NONE: No accent recorded
XDE: German
XEA: East Anglian
XFR: French
XHC: Home Counties
XHM: Humberside
XIR: Irish
XIS: Indian subcontinent
XLC: Lancashire
XLO: London
XMC: Central Midlands
XMD: Merseyside
XME: North-east Midlands
XMI: Midlands
XMS: South Midlands
XMW: North-west Midlands
XNC: Central Northern England
XNE: North-east England
XNO: Northern England
XOT: Other or unidentifiable
XSD: Scottish
XSL: Lower south-west England
XSS: Central south-west England
XSU: Upper south-west England
XUR: European
XUS: American (US)
XWA: Welsh
XWE: West Indian

firstLang

specifies the country of origin of the participant, as identified by the respondent. Legal values are:

XX-XXX: Unknown
DE-DEU: German
FR-FRA: French
EN-GBR: British English
EN-USA: North American English
XX-IND: Unknown Indian language

n

internal identifier

educ

specifies the age at which the participant ceased full-time education. Legal values are:

Ed0: Still in education
Ed1: Left school aged 14 or under
Ed4: Education continued until age 19 or over
X: Unknown

soc

specifies the social class of the participant. Legal values are:

AB: Higher management: administrative or professional
C1: Lower management: supervisory or clerical
C2: Skilled manual
DE: Semi-skilled or unskilled
UU: Social class unknown

sex

specifies the sex of the participant. Legal values are:

m: male
f: female
u: unknown

role

describes the relationship or role of this participant with respect to the respondent.

att.uniqueId.attribute.xmlid

Example

<person ageGroup="4" xml:id="PS1V0" role="unspecified" sex="f" soc="UU" dialect="NONE" educ="X"> <age>55</age> <persName>Nola</persName> <occupation>british rail employee</occupation> </person>

Note: May contain either a prose description organized as paragraphs, or a sequence of more specific demographic elements drawn from the model.personPart class.

Module: namesdates

<placeName>

(place name) contains an absolute or relative place name.

Class: model.settingPart

Declaration

element placeName { macro.phraseSeq }

Attributes: Global attributes only

Example

<placeName>North Yorkshire: York </placeName>

Module: namesdates

<pp>

supplies page numbers for a bibliographic citation.

Class: model.biblPart

Declaration

element pp { macro.phraseSeq }

Attributes: Global attributes and those inherited from [model.biblPart ]

Example

<bibl> <title>Misfortunes of Nigel. </title> ... <pp>67-173</pp> </bibl>

Module: module-from-bncxml

<profileDesc>

(text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting.

Class: model.headerPart

Declaration

element profileDesc { creation?, model.profileDescPart* }

Attributes: Global attributes only

Example

<profileDesc> <creation date="1992"/> <textClass> <catRef targets="WRI ALLTIM3 ALLAVA2 ALLTYP5 WRIAAG0 WRIAD0 WRIASE0 WRIATY2 WRIAUD3 WRIDOM5 WRILEV3 WRIMED3 WRIPP5 WRISAM0 WRISTA0 WRITAS0"/> <classCode scheme="DLEE">W hansard</classCode> <keywords> <term> Parliamentary debates </term> </keywords> </textClass> </profileDesc>

Module: header

<projectDesc>

(project description) describes in detail the aim or purpose for which an electronic file was encoded, together with any other relevant information concerning the process by which it was assembled or collected.

Class: model.encodingPart: att.declarable

Declaration

element projectDesc { para+ }

Attributes: Global attributes and those inherited from [att.declarable ]

Example

<projectDesc> <para>The British National Corpus (BNC) Consortium was formed in 1990, and started work in 1991 on the three-year task of producing a hundred-million word corpus of modern British English for use in commercial and academic research. The first edition was published in 1994.</para> ...</projectDesc>

Module: header

<pubPlace>

contains the name of the place where a bibliographic item was published.

Class: att.naming: model.imprintPart: model.publicationStmtPart

Declaration

element pubPlace { macro.phraseSeq }

Attributes: Global attributes and those inherited from [att.naming ]

Module: core

<publicationStmt>

(publication statement) groups information concerning the publication or distribution of an electronic or other text.

Declaration

element publicationStmt { model.pLike+ | model.publicationStmtPart+ }

Attributes: Global attributes only

Example

<publicationStmt> <distributor> <availability> This material is protected by international copyright laws and may not be copied or redistributed in any way. Consult the BNC Web Site at http://www.natcorp.ox.ac.uk for full licencing and distribution conditions.</availability> <idno type="bnc">HHV</idno> <idno type="old"> HansrA </idno> </distributor> </publicationStmt>

Module: header

<publisher>

provides the name of the organization responsible for the publication or distribution of a bibliographic item.

Class: model.imprintPart: model.publicationStmtPart

Declaration

element publisher { macro.phraseSeq }

Attributes: Global attributes only

Example

<imprint> <pubPlace>Oxford</pubPlace> <publisher>Clarendon Press</publisher> <date>1987</date> </imprint>

Module: core

<quote>

(quotation) contains a phrase or passage attributed by the narrator or author to some agency external to the text.

Class: model.qLike: model.divPart: att.rendered

Declaration

element quote { att.rendered.attributes, att.rendered.attribute.rend, ( bibl?, model.divPart+, bibl? ) }

Attributes: Global attributes and those inherited from [att.rendered ]

att.rendered.attribute.rend

Example

<quote> <s n="1426"> <w c5="NN1-NP0" hw="thrift" pos="SUBST">Thrift </w> <w c5="VHZ" hw="have" pos="VERB">has </w> <w c5="VVN" hw="go" pos="VERB">gone </w> <mw c5="PRP"> <w c5="AVP" hw="out" pos="ADV">out </w> <w c5="PRF" hw="of" pos="PREP">of </w> </mw> <w c5="NN1" hw="fashion" pos="SUBST">fashion</w> <c c5="PUN">.</c> </s> </quote>

Note: Any bibliographic source or reference provided for the quotation may be included within the quote element.

Module: core

<recording>

(recording event) details of an audio or video recording event used as the source of a spoken text, either directly or from a public broadcast.

Class: att.uniqueId

Declaration

element recording { att.uniqueId.attributes, attribute date { data.temporal }?, attribute n { text }?, attribute time { text }?, attribute type { text }?, attribute dur { data.count }?, att.uniqueId.attribute.xmlid, macro.phraseSeq }

Attributes: In addition to global attributes and those inherited from [att.uniqueId ]

date

date of the recording in standardized form.

n

tape number.

time

time of day the recording was made.

type

kind of recording. Values are:

dat: recording made directly to Digital Audio tape.
walkman: recording made to Walkman tape.

dur

duration of the recording in minutes.

att.uniqueId.attribute.xmlid

Example

Module: header

<recordingStmt>

(recording statement) describes a set of recordings used in transcription of a spoken text.

Class: model.sourceDescPart

Declaration

element recordingStmt { model.pLike+ | recording+ }

Attributes: Global attributes only

Module: header

<refsDecl>

(references declaration) provides documentation for the reference system applicable to the corpus.

Class: model.encodingPart: att.declarable

Declaration

element refsDecl { para+ }

Attributes: Global attributes and those inherited from [att.declarable ]

Example

<refsDecl> <para>Canonical references to the BNC should be constructed by taking the value of the n attribute of the <bncDoc> element containing the target text, and concatenating a dot separator, followed by the value of the n attribute of the target <s> element containing the material to be referenced.</para> ...</refsDecl>

Module: header

<resp>

contains a phrase describing the nature of a person's intellectual responsibility.

Declaration

element resp { macro.phraseSeq }

Attributes: Global attributes only

Example

<respStmt> <resp>compiler</resp> <name>Edward Child</name> </respStmt>

Module: core

<respStmt>

(statement of responsibility) supplies a statement of responsibility for someone responsible for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply.

Declaration

element respStmt { ( name | resp )+ }

Attributes: Global attributes only

Example

<respStmt> <resp>Text enrichment</resp> <name>Unit for Computer Research into the English Language, University of Lancaster</name> </respStmt>

Module: core

<revisionDesc>

(revision description) summarizes the revision history for a file.

Declaration

element revisionDesc { change+ }

Attributes: Global attributes only

Example

<revisionDesc> <change date="2006-10-21" who="#OUCS">Tag usage updated for BNC-XML</change>..</revisionDesc>

Note: Record changes with most recent changes at the top of the list.

Module: header

<s>

(s-unit) contains a sentence-like division of a text.

Class: model.segLike

Declaration

element s { model.segLike.attributes, attribute n { text }, ( model.global | model.phrase | model.divPart.spoken )+ }

Attributes: In addition to global attributes

n: sequence number

Example

Module: analysis

<samplingDecl>

(sampling declaration) contains a prose description of the rationale and methods used in sampling texts in the creation of a corpus or collection.

Class: model.encodingPart: att.declarable

Declaration

element samplingDecl { ( text | para )* }

Attributes: Global attributes and those inherited from [att.declarable ]

Example

<samplingDecl> <para>Definitive information on the sampling policies applied during construction of the BNC is provided in the associated documentation...</para> </samplingDecl>

Module: header

<setting>

(setting) describes one particular setting in which a language interaction takes place.

Class: att.uniqueId: att.ascribed

Declaration

element setting { att.uniqueId.attributes, att.ascribed.attributes, attribute n { text }?, att.uniqueId.attribute.xmlid, att.ascribed.attribute.who, ( date | model.settingPart )* }

Attributes: In addition to global attributes and those inherited from [att.uniqueId att.ascribed ]

n: an internal identifier for a setting
att.uniqueId.attribute.xmlid
att.ascribed.attribute.who

Example

<setting n="090910" who="PS1YR PS1YS"> <placeName>Strathclyde: Glasgow </placeName> <locale> doctor's surgery </locale> <activity> medical consultation </activity> </setting>

Note: If the who attribute is not supplied, the setting is assumed to be that of all participants in the language interaction.

Module: corpus

<settingDesc>

(setting description) describes the setting or settings within which a language interaction takes place, either as a prose description or as a series of setting elements.

Class: model.profileDescPart: att.declarable

Declaration

element settingDesc { setting+ }

Attributes: Global attributes and those inherited from [att.declarable ]

Example

<settingDesc> <setting n="104701" who="PS000 PS302 PS303 PS304 PS305 PS306 PS307 HYFPS000"> <placeName>Unknown</placeName> <activity> analysts meeting speech </activity> </setting> </settingDesc>

Module: corpus

<shift>

(Shift) marks the point at which some paralinguistic feature of a series of utterances by any one speaker changes.

Class: model.divPart.spoken

Declaration

element shift { attribute new { data.enumerated }?, empty }

Attributes: In addition to global attributes

new: specifies the new state of the paralinguistic feature specified.

Example

Module: spoken

<sourceDesc>

supplies a description of the source text(s) from which an electronic text was derived or generated.

Class: att.declarable

Declaration

element sourceDesc { bibl | recordingStmt | para+ }

Attributes: Global attributes and those inherited from [att.declarable ]

Example

Example

<sourceDesc> <bibl> <title>The worst poverty: a history of debt and debtors. </title> <author n="BartyH1" domicile="ESussex">Barty-King, Hugh</author> <imprint n="ALANSU1"> <publisher>Alan Sutton Publishing Ltd</publisher> <pubPlace>Gloucester</pubPlace> <date value="1991">1991</date> </imprint> <pp>85-203</pp> </bibl> </sourceDesc>

Module: header

<sp>

(speech) An individual speech in a performance text, or a passage presented as such in a prose or verse text.

Class: model.divPart: att.ascribed

Declaration

element sp { att.ascribed.attributes, att.ascribed.attribute.who, ( model.global*, ( speaker, model.global* )?, ( ( model.lLike | lg | model.pLike | model.blockLike | model.stageLike ), model.global* )+ ) }

Attributes: Global attributes and those inherited from [att.ascribed ]

att.ascribed.attribute.who

Example

<sp> <speaker> <s n="1627"> <w c5="NP0" hw="mr." pos="SUBST">Mr. </w> <w c5="NP0" hw="speaker" pos="SUBST">Speaker</w> </s> </speaker> <s n="1628"> <w c5="PNP" hw="i" pos="PRON">I </w> <w c5="VVB" hw="call" pos="VERB">call </w> <w c5="NP0" hw="mr." pos="SUBST">Mr. </w> <w c5="NP0" hw="dennis" pos="SUBST">Dennis </w> <w c5="NP0" hw="turner" pos="SUBST">Turner</w> <c c5="PUN">.</c> </s> </sp>

Module: core

<speaker>

A specialized form of heading or label, giving the name of one or more speakers in a dramatic text or fragment.

Declaration

element speaker { ( s | gap | pb )+ }

Attributes: Global attributes only

Example

Note: In the BNC, used only for speaker labels in dramatic texts, or Hansard

Module: core

<stage>

(stage direction) contains any kind of stage direction within a dramatic text or fragment.

Class: att.rendered: model.stageLike

Declaration

element stage { att.rendered.attributes, model.stageLike.attributes, att.rendered.attribute.rend, macro.paraContent }

Attributes: Global attributes and those inherited from [att.rendered ]

att.rendered.attribute.rend

Example

<stage> <s n="8004"> <w c5="DT0" hw="several" pos="ADJ">Several </w> <w c5="AJ0" hw="hon." pos="ADJ">Hon. </w> <w c5="NN2" hw="member" pos="SUBST">Members </w> <w c5="VVD" hw="rise" pos="VERB">rose</w> </s> </stage>

Module: core

<stext>

contains a single spoken text, i.e. a transcription or collection of transcriptions from a single source.

Declaration

element stext { attribute type { "CONVRSN" | "OTHERSP" }, ( model.divPart.spoken*, div* ) }

Attributes: In addition to global attributes

Module: module-from-bncxml

<tagUsage>

(tagUsage) supplies information about the usage of a specific element within a text.

Declaration

element tagUsage { attribute gi { data.name }, attribute occurs { data.count }?, macro.phraseSeq }

Attributes: In addition to global attributes

gi: the name (generic identifier) of the element indicated by the tag.
occurs: specifies the number of occurrences of this element within the text.

Example

Module: header

<tagsDecl>

(tagging declaration) provides information about the XML elements actually used within a BNC text

Class: model.encodingPart

Declaration

element tagsDecl { namespace* }

Attributes: Global attributes only

Example

Module: header

<taxonomy>

(taxonomy) defines a typology used to classify texts either implicitly, by means of a bibliographic citation, or explicitly by a structured taxonomy.

Class: att.uniqueId

Declaration

element taxonomy { att.uniqueId.attributes, att.uniqueId.attribute.xmlid, ( desc?, ( category+ | model.biblLike ) ) }

Attributes: Global attributes and those inherited from [att.uniqueId ]

att.uniqueId.attribute.xmlid

Example

<taxonomy xml:id="textMode"> <desc>Text mode</desc> <category xml:id="WRI"> <catDesc>Written</catDesc> </category> <category xml:id="SPO"> <catDesc>Transcribed speech</catDesc> </category> </taxonomy>

Module: header

<teiHeader>

(TEI Header) supplies the descriptive and declarative information making up an electronic title page prefixed to every TEI-conformant text.

Declaration

element teiHeader { fileDesc, model.headerPart*, revisionDesc? }

Attributes: Global attributes only

Module: header

<term>

contains a word or phrase used to describe the topic or nature of a text.

Declaration

element term { macro.phraseSeq }

Attributes: Global attributes only

Choice:

Example

<keywords> <term> Parliamentary debates </term> </keywords>

Note: Used to specify a single keyword or phrase

Module: core

<textClass>

(text classification) groups information which describes the nature or topic of a text in terms of a standard classification scheme, thesaurus, etc.

Class: model.profileDescPart: att.declarable

Declaration

element textClass { catRef, classCode*, keywords* }

Attributes: Global attributes and those inherited from [att.declarable ]

Example

<textClass> <catRef targets="ALLTIM3 ..."/> <classCode scheme="DLEE">W hansard</classCode> <keywords> <term> Parliamentary debates </term> </keywords> </textClass>

Module: header

<title>

contains the full title of a work of any kind.

Declaration

element title { attribute level { text }?, macro.phraseSeq }

Attributes: In addition to global attributes

level

indicates the bibliographic level of this title Values are:

a: the title is an analytic title, rather than a monographic one

Example

<title>Amnesty International meeting. Sample containing about 15274 words speech recorded in public context</title>

Example

<bibl> <title>An awfully big adventure. </title> <author n="BainbB1" domicile="England">Bainbridge, B</author> <imprint n="DUCKWO1"> <publisher>Duckworth & Company Ltd</publisher> <pubPlace>London</pubPlace> <date value="1990">1990</date> </imprint> <pp>49-192</pp> </bibl>

Module: core

<titleStmt>

(title statement) groups information about the title of a work and those responsible for its intellectual content.

Declaration

element titleStmt { title+, ( author | editor | respStmt )* }

Attributes: Global attributes only

Example

<titleStmt> <title> So you want to be an actor?. Sample containing about 35817 words from a book (domain: arts) </title> <respStmt> <resp> Data capture and transcription </resp> <name> Oxford University Press </name> </respStmt> </titleStmt>

Module: header

<tokenize>

supplies any additional ICU-conformant rules to be used when tokenization is performed by xaira rather than by explicit XML markup.

Declaration

element tokenize { text }

Attributes: Global attributes only

Module: module-from-bncxml

<trunc>

contains one or more truncated words in transcribed speech.

Class: model.divPart.spoken

Declaration

element trunc { ( w | mw | gap | unclear )+ }

Attributes: Global attributes only

Example

<s n="1377"> <trunc> <w c5="UNC" hw="the" pos="UNC">The </w> </trunc> <c c5="PUN">, </c> <w c5="AV0" hw="then" pos="ADV">then </w> <w c5="PNP" hw="he" pos="PRON">he </w> <trunc> <w c5="UNC" hw="bo" pos="UNC">bo </w> </trunc> <w c5="VVD" hw="bowl" pos="VERB">bowled </w> </s>

Module: module-from-bncxml

(utterance) a stretch of speech usually preceded and followed by silence or by a change of speaker.

Class: att.ascribed: model.divPart.spoken

Declaration

element u { att.ascribed.attributes, att.ascribed.attribute.who, ( text | model.gLike | model.phrase | model.divPart.spoken | model.global )* }

Attributes: Global attributes and those inherited from [att.ascribed ]

att.ascribed.attribute.who

Example

<s n="414"> <w c5="VM0" hw="shall" pos="VERB">shall </w> <w c5="PNP" hw="i" pos="PRON">I </w> <w c5="VVI" hw="get" pos="VERB">get </w> <w c5="PNP" hw="it" pos="PRON">it </w> <w c5="CJC" hw="or" pos="CONJ">or </w> <w c5="XX0" hw="not" pos="ADV">not</w> <c c5="PUN">?</c> </s> <s n="415"> <w c5="PNP" hw="i" pos="PRON">I </w> <w c5="VDB" hw="do" pos="VERB">do</w> <w c5="XX0" hw="not" pos="ADV">n't </w> <w c5="VVI" hw="know" pos="VERB">know </w> <w c5="DTQ" hw="what" pos="PRON">what </w> <w c5="TO0" hw="to" pos="PREP">to </w> <w c5="VDI" hw="do" pos="VERB">do</w> </s> <s n="416"> <w c5="ITJ" hw="yes" pos="INTERJ">Yes </w> <w c5="VVB" hw="get" pos="VERB">get </w> <w c5="PNP" hw="it" pos="PRON">it</w> </s> <s n="417"> <w c5="ITJ" hw="eh" pos="INTERJ">eh</w> <c c5="PUN">, </c> <w c5="PNP" hw="i" pos="PRON">me </w> <w c5="CJC" hw="and" pos="CONJ">and </w> <w c5="DPS" hw="you" pos="PRON">your </w> <w c5="NN1" hw="mother" pos="SUBST">mother </w> <pause/> </s>

Note: In the BNC, each change of speaker is marked by a new element.

Module: spoken

<unclear>

contains a word, phrase, or passage which cannot be transcribed with certainty because it is illegible or inaudible in the source.

Class: att.timed: model.pPart.edit

Declaration

element unclear { att.timed.attributes, att.timed.attribute.dur, empty }

Attributes: Global attributes and those inherited from [att.timed ]

att.timed.attribute.dur

Example

Module: core

<valItem>

(value definition) contains a single value and gloss pair for an attribute.

Class: att.identifiable

Declaration

element valItem { att.identifiable.attributes, att.identifiable.attribute.ident, att.identifiable.attribute.ns, desc }

Attributes: Global attributes and those inherited from [att.identifiable ]

att.identifiable.attribute.ident
att.identifiable.attribute.ns

Module: tagdocs

<valList>

(value list) contains one or more valItem elements defining possible values for an attribute.

Class: att.identifiable

Declaration

element valList { att.identifiable.attributes, attribute copyOf { data.pointer }?, attribute type { "closed" | "semi" | "open" }?, att.identifiable.attribute.ident, att.identifiable.attribute.ns, valItem+ }

Attributes: In addition to global attributes and those inherited from [att.identifiable ]

copyOf

supplies the identifier of a previously-defined value list to be used at this point

type

specifies the extensibility of the list of attribute values specified. Legal values are:

closed: (only the values specified are permitted.)
semi: (all the values specified should be supported, but other values are legal and software should have appropriate fallback processing for them. )
open: (the values specified are sample values only.)

att.identifiable.attribute.ident

att.identifiable.attribute.ns

Module: tagdocs

<valSource>

specifies where the xaira indexer is to find a value.

Class: att.identifiable

Declaration

element valSource { att.identifiable.attributes, att.identifiable.attribute.ident, att.identifiable.attribute.ns, attribute type { "element" | "attribute" | "pseudo" }, attribute caseFold { text }?, ( nameList?, ( defaultVal | labelGen )? ) }

Attributes: In addition to global attributes and those inherited from [att.identifiable ]

att.identifiable.attribute.ident
att.identifiable.attribute.ns

Module: module-from-bncxml

<vocal>

(Vocalized semi-lexical) any vocalized but not necessarily lexical phenomenon, for example voiced pauses, non-lexical backchannels, etc.

Class: model.divPart.spoken: att.timed: att.ascribed

Declaration

element vocal { att.timed.attributes, att.ascribed.attributes, attribute desc { text }?, att.timed.attribute.dur, att.ascribed.attribute.who, empty }

Attributes: In addition to global attributes and those inherited from [att.timed att.ascribed ]

desc: provides a brief description of the vocal event
att.timed.attribute.dur
att.ascribed.attribute.who

Example

Module: spoken

<w>

(word) represents a grammatical (not necessarily orthographic) word.

Class: att.c5coded: model.segLike

Declaration

element w { att.c5coded.attributes, model.segLike.attributes, attribute pos { "ADJ" | "ADV" | "ART" | "CONJ" | "INTERJ" | "PREP" | "PRON" | "STOP" | "SUBST" | "UNC" | "VERB" }, attribute hw { text }, att.c5coded.attribute.c5, text }

Attributes: In addition to global attributes and those inherited from [att.c5coded ]

pos

supplies a simplified part-of-speech code. Legal values are:

ADJ: adjective
ADV: adverb
ART: article
CONJ: conjunction
INTERJ: interjection
PREP: preposition
PRON: pronoun
STOP: punctuation
SUBST: substantive
UNC: unclassified or non-lexical word
VERB: verb

hw

specifies the headword under which this lexical unit is conventionally grouped, where known.

att.c5coded.attribute.c5

Example

Module: analysis

<wtext>

contains a single written text.

Declaration

element wtext { attribute type { "ACPROSE" | "FICTION" | "NEWS" | "NONAC" | "OTHERPUB" | "UNPUB" }, ( ( model.divPart | model.global )*, ( div, ( div | model.global )* )? ) }

Attributes: In addition to global attributes

Module: textstructure

<xairaItem>

provides data needed to define one part of a xaira specification.

Class: att.identifiable

Declaration

element xairaItem { att.identifiable.attributes, att.identifiable.attribute.ident, att.identifiable.attribute.ns, attribute type { "element" | "form" | "addKey" | "lemmaScheme" | "region" | "textRef" | "scopeRef" | "unitRef" | "indexPol" | "defaultLang" | "langRules" }, ( desc*, ( ( valSource, labelGen? ) | attList | nameList | elementPolicy | attributePolicy | tokenize | collate )? ) }

Attributes: In addition to global attributes and those inherited from [att.identifiable ]

att.identifiable.attribute.ident
att.identifiable.attribute.ns

Module: module-from-bncxml

<xairaList>

contains a list of xaira parameters of a particular type

Declaration

element xairaList { attribute type { "elementSpec" | "keySpec" | "regionSpec" | "lemmaSpec" | "refSpec" | "indexSpec" | "langSpec" }, xairaItem+ }

Attributes: In addition to global attributes

Module: module-from-bncxml

<xairaSpecification>

specifies additional information needed by xaira.

Class: model.encodingPart

Declaration

element xairaSpecification { xairaList+ }

Attributes: Global attributes only

Module: module-from-bncxml

<bnc>

(TEI corpus) contains the whole of a TEI encoded corpus, comprising a single corpus header and one or more TEI elements, each containing a single text header and a text.

bnc

Declaration

element teiCorpus { teiHeader, bncDoc+ }

Attributes: Global attributes only

Example

Note: Must contain one TEI header for the corpus, and a series of <TEI> elements, one for each text.This element is mandatory when applicable.

Module: core

Macros defined

Macro data.count

defines the range of attribute values used for a non-negative integer value used as a count

Declaration

data.count = xsd:nonNegativeInteger

Note: Only positive integer values are permitted

Module: tei

Macro data.enumerated

defines the range of attribute values expressed as a single word or token taken from a list of documented possibilities

Declaration

data.enumerated = token

Note: Typically, the list of documented possibilities will be provided (or exemplified) by a value list in the associated element specification. If the value contains whitespace, it must be normalised: neither leading or trailing sequences of whitespace characters nor internal sequences of more than one whitespace character are allowed.

Module: tei

Macro data.language

defines the range of attribute values used to identify a particular combination of human language and writing system

Declaration

data.language = xsd:language

Note: The values for this attribute are language ‘tags’ as defined in RFC 3066 or its successor. Examples include

sn: Shona
zh-TW: Taiwanese
en-SL: English as spoken in Sierra Leone
pl: Polish
es-MX: Spanish as spoken in Mexico

Module: tei

Macro data.name

defines the range of attribute values expressed as an XML name or identifier

Declaration

data.name = xsd:Name

Note: Attributes using this datatype must contain a single word which follows the rules defining a legal XML name: for example they cannot include whitespace or begin with digits.

Module: tei

Macro data.namespace

(an XML namespace) defines the range of attribute values used to indicate XML namespaces as defined by the W3C Namespaces in XML technical recommendation

Declaration

data.namespace = xsd:anyURI

Note: The range of syntactically valid values is defined by RFC 2396 Uniform Resource Identifier (URI) Reference

Module: tei

Macro data.pointer

defines the range of attribute values used to provide a single pointer to any other resource, either within the current document or elsewhere

Declaration

data.pointer = xsd:anyURI

Note: The range of syntactically valid values is defined by RFC 2396 Uniform Resource Identifier (URI) Reference

Module: tei

Macro data.pointers

defines the range of attribute values used to provide a list of pointers to other resources, either within the current document or elsewhere

Declaration

data.pointers = list { data.pointer+ }

Note: A white-space delimited list of values, defined by the datatype data.pointer

Module: tei

Macro data.temporal

defines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them

Declaration

data.temporal = xsd:date | xsd:gYear | xsd:gYearMonth

Note: A normalized form of temporal expression conforming to the W3C XML Schema Part 2: Datatypes Second Edition, except that times may be expressed with reduced precision (i.e., to the minute or the hour). Software intended for use with W3C XML Schema datatypes may be unable to properly process times expressed with reduced precision.If it is likely that the value used is to be compared with another, then a time zone indicator should always be included, and only the dateTime representation should be used.

Module: tei

Macro data.word

defines the range of attribute values expressed as a single word or token

Declaration

data.word = token { pattern = "(\p{L}|\p{N}|\p{P}|\p{S})+" }

Note: Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace.

Module: tei