6 Miscellaneous code tables

This section consists of a series of tables listing a number of codes used in encoding various aspects of the corpus.

The following code tables are provided:

6.1 lists all SGML elements used in the Sampler corpus, with a brief description and frequency count for each.
6.2 lists all SGML entities used in the corpus, with a brief description of each
6.3 lists all values actually used in the corpus for the type attribute on division elements (<div1>, <div2> etc.)
6.4 lists all values used in the corpus for the r (rendition) attribute, chiefly on <hi> elements, to indicate typographic rendering of the source
6.5 lists all values used in the corpus for the new attribute on the <shift> element, to indicate changes in voice quality for spoken texts
6.6 lists the codes used to identify regional origins of participants, as specified in the <person> element in the header
6.7 lists the codes used to identify relationships documented between participants, as specified in the <relation> element in the header
6.8 lists the various descriptive codes used for paralinguistic or nonlinguistic phenomena noted by the transcribers of spoken texts and encoded as either <vocal> or <event> elements.
6.9 lists the classification codes used to categorize each text, as specified by the <catref> element in the individual text header and defined in the <clasCode> element in the corpus header.

A general discussion of the principles and practice underlying the CLAWS word class annotation scheme used in the BNC is provided by the document A brief users' guide to the grammatical tagging of the British National Corpus . This also includes a full list of the CLAWS7 word class codes applied to the BNC Sampler.

6.1 Elements defined by the BNC DTD

The following list gives a brief description of each SGML element used in the BNC Sampler. Elements are listed in alphabetical order. Descriptions prefixed by ``(H)'' are for elements which appear only in the text headers.

<activity> (H) participants' activity during recording 297
<address> (H) postal or other address 185
<align> alignment map for synchronizing overlap points 245
<author> (H) author in bibliographic entry 51
<avail> (H) availability code for file 185
<bibNote> (H) note within a bibliographic entry 1
<bibl> loosely structured bibliographic reference 23
<biblScop> (H) page range within bibliographic entry 39
<biblSrow> (H) structured bibliographic entry 87
<bncDoc> an individual text in the BNC Sampler 184
<c> a punctuation mark 285801
<caption> a floating heading or caption 1592
<catDesc> (H) description of a category 127
<catRef> (H) category codes applicable to a text 184
<category> (H) a category-value pair 127
<change> (H) change note 963
<clasDecl> (H) description of classification scheme 1
<corr> (H) description of correction policy 3
<creation> (H) information about creation of a text 185
<date> a date 1218
<div> any subdivision of a spoken text 238
<div1> first-level subdivision of a written text 1218
<div2> second-level subdivision of a written text 1165
<div3> third-level subdivision of a written text 512
<div4> fourth-level subdivision of a written text 137
<editDecl> (H) descriptions of editorial policies 16
<ednStmt> (H) information about a particular edition 185
<encDesc> (H) encoding description 185
<event> non-verbal event within a spoken text 477
<extent> (H) size of a corpus text 185
<fileDesc> (H) documentation of an electronic text 185
<gap> a spot where part of source text has been omitted 4831
<head> any form of heading or title 3020
<header> meta-information describing a corpus text 185
<hi> typographically highlighted phrase 1738
<hyph> (H) description of hyphenation policy 2
<idno> (H) identifying number for a text 185
<imprint> (H) imprint within a bibliographic entry 71
<item> item within a list 1041
<keywords> (H) descriptive keywords for topics of a text 184
<l> line of verse 3618
<label> label of a list item 291
<langUsg> (H) description of languages used in a text 1
<list> list of items 185
<loc> synchronisation point within an alignment map 28399
<locName> (H) name of place where speech recorded 285
<locale> (H) description of a place where speech recorded 264
<monogr> (H) monographic bibliographic entry 87
<name> proper name of person, place etc. 1417
<note> note or comment of any kind 119
<p> paragraph in written text 28
<partics> (H) description of spoken text participants 99
<pause> noticeable pause in spoken text 26091
<pb> page break in written text 2633
<person> (H) information about a speaker 537
<poem> group of verse lines in a written text 161
<profDesc> (H) additional information about a text 185
<projDesc> (H) background information about BNC project 185
<prow> link to a displaced element or to synchronisation point 58524
<pubPlace> (H) place of publication within bibliographic entry 69
<pubStmt> (H) publication or distribution information 185
<quot> (H) description of quotation policy 2
<quote> quotation from some other work 59
<rec> (H) recording details 297
<recStmt> (H) information about an audio recording 98
<refsDecl> (H) description of reference system used 185
<reg> description of regularisation policy 261
<relation> (H) relationship between participants in a spoken text 396
<resp> (H) nature of responsibility 1346
<respStmt> (H) statement of responsibility in a bibliographic entry 1343
<revDesc> (H) revision description 185
<s> sentence-like linguistic segment 172408
<sampDecl> (H) description of sampling policy 5
<segm> (H) description of segmentation policy 2
<settDesc> (H) description of setting in which speech occurs 98
<setting> (H) an individual setting in which speech occurs 297
<shift> change in voice quality 4230
<sic> apparently erroneous transcription 516
<srcDesc> (H) description of the source for a written text 185
<stext> an individual spoken text 98
<tagUsage> (H) count for a particular tag in a text 2600
<tagsDecl> (H) list of tags used in a particular text 185
<term> (H) individual term in a list of keywords 262
<text> an individual written text 86
<titStmt> (H) title statement for a text 185
<title> (H) title within a bibliographic entry 272
<trans> (H) declaration of transcription policy 7
<trunc> truncated form in a spoken text 5566
<txtClass> (H) text classification 184
<u> utterance in a spoken text 79811
<unclear> inaudible or incomprehensible passage in a spoken text 19936
<vocal> non-verbal vocalization in a spoken text 4521
<w> word or other non-punctuation token carrying a POS code 1993554

6.2 Character entities defined by the BNC DTD

The following table gives a brief description of each character entity used within the text of the BNC Sampler. Declarations for these entities may be found either in standard entity sets or from the entity definitions supplied as part of the BNC document type definition, in the file sampents.dtd. In either case, system specific values should be supplied for the characters described below. The number indicates the number of times this entity reference appears in the current version of the corpus.

aacute small a, acute accent 5
acirc small a, circumflex accent 2
agrave small a, grave accent 3
amp ampersand 262
ast asterisk 3
auml small a, dieresis or umlaut mark 10
bquo normalised begin quote mark 8049
bsol reverse solidus 1
ccaron small c, caron 1
ccedil small c, cedilla 5
deg degree sign 26
dollar dollar sign 231
Eacute capital E, acute accent 3
eacute small e, acute accent 69
egrave small e, grave accent 4
equo normalised end quote mark 8323
euml small e, dieresis or umlaut mark 1
formula mathematical formula 740
frac12 fraction one-half 163
frac14 fraction one-quarter 27
frac15 fraction one-fifth 2
frac18 fraction one-eighth 4
frac34 fraction three-quarters 27
frac38 fraction three-eighths 13
frac58 fraction five-eighths 9
frac78 fraction seven-eighths 1
ft feet indicator 13
gt greater-than sign 645
hellip ellipsis (horizontal) 919
ins inches indicator 33
iuml small i, dieresis or umlaut mark 2
lcub left curly bracket 93
lsqb left square bracket 301
lt less-than sign 640
mdash em dash 3500
ndash en dash 784
ntilde small n, tilde 2
oacute small o, acute accent 4
ocirc small o, circumflex accent 5
oelig small oe ligature 2
oslash small o, slash 110
ouml small o, dieresis or umlaut mark 19
pound pound sign 1097
quot quotation mark 2280
rcub right curly bracket 95
rehy maps to soft hyphen 16
rsqb right square bracket 301
times multiply sign 39
uacute small u, acute accent 1
uuml small u, dieresis or umlaut mark 8

6.3 Division types

The type attribute on each <div1>, <div2> (etc) element of a written text may be used to supply a value which characterizes the function of the corresponding subdivision in some way. Only the following values are used in the BNC Sampler:

chapter
front
part

6.4 Rendition codes

The following codes are used in the BNC Sampler to indicate the kind of typographic rendition associated with an element where this is typographically distinct in some way. These codes are mostly used as values for the r attribute of the <hi> element, but may be used on any element bearing this attribute.

bo bold face 220
hi highlighted 7
it italic face 1531
ro roman face 2
ul underlined 31

6.5 Voice quality codes

Changes in voice quality in spoken texts are indicated by values for the <new> attribute on a <shift> element, at the point where the speaker's voice changes. The following values are used in the BNC Sampler (frequencies are given in parentheses):

crying (18) laughing (1151) mimicking (46) mimicking refined accent (1) reading (327) screaming (12) shouting (165) sighing (31) singing (157) spelling (24) whingeing (3) whining (4) whispering (139) yawning (45)

6.6 Regional codes

A single set of codes, derived from the International Standard for language and country identification, is used to identify regional origins, first language, and dialects spoken by participants, as specified in the <person> element in the text header. Speakers for whom such information was recorded will use one or more of the following codes as values for the who.flang or who.dialect attributes. All available codes are listed here; note that not all of these codes are actually used in the BNC Sampler:

CAN Canada
CHN China
DEU Germany
FRA France
GBR United Kingdom
IND India
IRL Ireland
USA United States
XXX Unknown
ZZG Europe
XDE accent: German
XEA accent: East Anglia
XFR accent: French
XHC accent: Home Counties
XHM accent: Humberside
XIR accent: Irish
XIS accent: Indian subcontinent
XLC accent: Lancashire
XLO accent: London
XMC accent: central Midlands
XMD accent: Merseyside
XME accent: north-east Midlands
XMI accent: Midlands
XMS accent: south Midlands
XMW accent: north-west Midlands
XNC accent: central northern England
XNE accent: north-east England
XNO accent: northern England
XOT accent: unidentifiable
XSD accent: Scottish
XSL accent: lower south-west England
XSS accent: central south-west England
XSU accent: upper south-west England
XUR accent: European
XUS accent: U.S.A.
XWA accent: Welsh
XWE accent: West Indian

6.7 Relationship codes

Where relationships between individual participants in spoken texts can be identified, they will be specified by means of the <relation> element within the text header (as discussed in section 5.3.3 ). The desc attribute of this element may take any of the values listed below. The number in parentheses indicates the number of times this value appears in the BNC Sampler.

acquaint acquaintance (2)
audience (1)
aunt (2)
b-i-l brother-in-law (6)
brother (16)
chairman (1)
colleagu colleague (51)
cous-i-l cousin-in-law (1)
cousin (1)
customer (1)
d-i-l daughter-in-law (4)
daughter (24)
f-i-l father-in-law (6)
father (27)
friend (36)
g-daught granddaughter (5)
g-fath grand-father (5)
g-moth grandmother (6)
g-son grandson (5)
husband (40)
intervee interviewee (1)
m-i-l mother-in-law (8)
mother (37)
neighbou neighbour (4)
nephew (2)
niece (1)
s-daught step-daughter (1)
s-father step-father (1)
server (1)
sis-i-l sister-in-law (8)
sister (15)
son (28)
speaker (1)
stranger (1)
student (2)
teacher (1)
tutor (1)
uncle (1)
wife (41)

6.8 Paralinguistic phenomena

In addition to the <pause>, <shift>, <trunc>, and <unclear> elements, a variety of paralinguistic phenomena are marked up in the transcriptions which form the spoken part of the BNC Sampler. The <vocal> element is used to mark a variety of non-linguistic or semi-linguistic sounds made by one or more speakers in the transcriptions; the <event> element is used to mark up other occurrences which seemed of importance to the transcribers when making sense of the spoken interaction. We list here the various annotations used for the latter two categories throughout the BNC Sampler.

6.8.1 Vocals

The following lists in alphabetic order all the different values specified on the TYPE attribute for the <vocal> element in the BNC Sampler, together with the frequency of occurrence for each different value.

baby talk (18) belch (18) buzzing sound (1) clapping (82) clears throat (139) clicks tongue (1) cough (451) crying (30) gasp (2) giggle (2) gurgle (7) hiccup (4) howl (1) humming (18) imitates aeroplane (1) imitates banjo (1) imitates bringing up phlegm (4) imitates cat licking (1) imitates clearing throat (1) imitates vomiting (3) imitating engine revving (1) kiss (5) kissing (1) kissing sound (2) laugh (3438) laughing (1) licking sound (1) mimicking cat spitting (1) mimicking gorilla noises (2) mimicking microphone noises (1) mimicking shaving noise (1) noise for paws (1) panting (1) purring noises (1) raspberry (7) scream (14) sigh (78) singing (7) sneeze (27) sniff (35) sound effect (8) sound effects (1) sound of biting (1) spitting sound (1) squeak (1) sucking noises (2) sucking then purring noises (1) tt (1) tut (27) whine (1) whistling (44) yawn (22) yelping sound (1)

6.8.2 Event descriptions

The following lists in alphabetic order all the different values specified on the DESC attribute for the <event> element in the BNC Sampler, together with the frequency of occurrence for each different value.

Background to following (1) Band (1) Band music (1) Break in enquiry (1) Break in recording (1) Children screaming (1) Dramatic music (1) Dramatic music. (1) End of first tape (2) End of recording (3) End of side (3) Engine noises (3) General chatter (1) Gives her a kiss (1) Gunfire, celebration. (1) Horse racing on the radio (1) Intense gunfire (1) Loud rustling next to microphone (1) Mr Bean record on telly I think (1) Music (3) Music and singing (1) Music and song (1) Pen on paper (3) Plane noises (1) Portuguese speech (31) Reading title of book (1) Rock music (2) Sound of burning and dramatic music (1) Sounds of intense automatic weapons fire. (1) Spanish (1) Tape ends (1) Telephone being dialled, overlaps next part of speech. (1) Theme music leading to triumphant climax (1) Theme music, reprise, to end of job (1) Theme music. Engine noise (1) advert (1) advertisements (7) advertisements and travel news (2) another gap in tape (1) applause (9) baby crying (5) baby screaming (2) baby squealing (1) baby talking (8) background to following (1) banging (2) banging noises (1) barking (2) bell ringing (1) bell rings (2) bibs hooter (1) birds singing/whistling (1) birthdays etcetera (1) blowing kisses (2) blowing nose (1) blows nose (2) boxing on television (1) boys fighting (1) break in recording (6) break in recording while watching video (1) break in tape (5) calling from outside (1) car hooter (1) cat miaows (1) chairs being moved (1) cheering (1) cheering and shouting (1) clapping (24) classroom chatter (11) classroom chatter - barely audible speech (1) clicking computer (4) clicking of computer (1) clock chiming (4) closing music (2) cough (4) dog barking (14) dog barks (4) dog sick noise again (1) dogs barking (2) door opening (1) doorbell (1) doorbell ringing (1) doorbell rings (1) dramatic music (1) drilling noise (1) drums (1) duck noise (1) eating (1) eating dinner (1) end of first side of tape and start of second (1) end of job (1) end of recording (5) end of side (1) end of side of tape (1) end of side one of first tape (1) end of side one of tape (1) end of side one of tape. second side starts part way into tape. (1) end of tape (1) end of tape side two (1) engine noises (1) everyone claps (1) football on television (1) football on television again - changed channels (1) general background chatter continues, but foreground conversation has paused (1) general hubbub as people move forward (1) getting into car (1) going into kitchen (1) hammering something (1) happy children (1) hums tune (1) in another room (1) in background throughout following text (1) in canteen/dining room - very noisy (1) in other room (1) intake of breath (1) interruption for radio commercials (2) introduction music (1) jet passes overhead (1) key sounds (1) knock on door (1) knocking on door (3) laugh (1) laughter (5) loud music (2) loud music playing (1) loud music playing on television (1) loud television (1) machinery noises (2) makes dog whining noise (1) makes growling noise (1) makes noise as if being sick (1) making sound of a plane (1) market place noises (1) microphone hissing and conversation very quiet (1) mimicking (1) mimicking crying (1) mimicking dog barking (1) mumble (1) mumbles (1) murmuring from the floor (1) music (5) music of Waltzing Matilda playing (5) music on in background (1) music on loud (1) music on loud again (1) music playing (1) music playing on television (1) music playing with interruption for radio commercials (1) music very loud (1) news bulletin (2) noise (2) noise like the dog being sick (1) noise of dog biscuits being tipped out (1) now moved to another class? (1) paper crinkling noises (1) phone bleeping (2) phone ringing (1) phone rings (3) piano sound (1) plate stacking (1) playing (1) playing piano (1) playing with baby (1) printer noises (1) printer sounds (2) puking noise. (1) radio on (1) reading computer screen (1) reading from board (1) reading from book/text (1) reading from classroom board (1) reading from hospital form (2) reading from newspaper cutting (1) reading from package (1) reading from receipt (1) reading title of book (1) reads horoscopes (1) recording ends (1) repetitive banging (1) scraping something (1) screaming in background (1) shooting sound on video game (1) shuts door (1) shutting door (1) singing (4) singing along to record (1) singing in background (3) singing to music (2) smacking lips together (1) snooker on the telly (1) sombre music (1) something falls (1) something smashes (1) sounds of burning. (1) sounds of sporadic fire (1) sounds of violence and gunfire (1) speaking French (1) speaking Spanish (2) spells surname (1) spitting noise (1) sports news (1) students' voices in background (6) sucks teeth (3) talking in kitchen away from microphone (1) talking in other room (1) talking to dog (1) talking to the cat (1) talking with mouth full (3) tape ends (6) tape ends side one and starts side two (1) tape playing (1) tape recording (1) tape stops and starts (2) taps table (1) telephone conversation ends (14) telephone conversation starts (13) telephone ringing (5) telephone rings (1) television comes on (1) television loud (1) television on (5) television on - horse racing (1) television turned over to football commentary (1) television very loud (2) telly on loud (1) theme music to end of recording (1) throws dice (1) tickling little girl (2) too much banging (1) traffic (1) traffic news and advertisements (1) travel news (2) travel news and news bulletin (1) travel news and weather (1) unable to hear conversation for some time (1) very loud television - football (1) video film for 135 seconds (1) video playing (1) walking along corridor - just a lot of chatter (1) watching football on television (1) watching football on television and talking quietly in background (2) weather and travel news (1) whispering to dog (1) with microphone (3) woof (1)

6.9 Text classification codes

This section lists all of the classification codes that may appear within the <catRef> element in the header of each text. Not all of the values defined here are actually used within the BNC Sampler. For information on how many texts, words, and sentences are included within a given classification, see the additional statistical tables.

The following table shows codes which can be used to classify all kinds of text, according to their availability, or their type. (For the BNC Sampler, all texts are freely available worldwide).

allAva Text availability
allAva1 free, world: Freely available worldwide
allAva2 restricted, world: Available worldwide
allAva3 restricted, Not-NA: Not available in North America
allAva4 restricted, Not-US: Not available in U.S.A.
allAva5 restricted, EU: Not available outside the European Union
allAva6 restricted, Not-USP: Not available in U.S.A. & Philippines
allAva7 restricted, Not-NAP: Not available in North America & Philippines
allTyp Text type
allTyp1 Spoken demographic
allTyp2 Spoken context-governed
allTyp3 Written books and periodicals
allTyp4 Written-to-be-spoken
allTyp5 Written miscellaneous

The following table list the classification codes which can be specified for spoken texts (either demographic or context-governed) only. Note that the classifications for demographically sampled texts apply to the respondent only, not necessarily to all speakers transcribed.

scgDom Domain for context-governed material
scgDom1 Educational
scgDom2 Business
scgDom3 Institutional
scgDom4 Leisure
sdeAge Age band for demographic respondent
sdeAge1 0-14
sdeAge2 15-24
sdeAge3 25-34
sdeAge4 35-44
sdeAge5 45-59
sdeAge6 60+
sdeCla Social class for demographic repondent
sdeCla1 AB
sdeCla2 C1
sdeCla3 C2
sdeCla4 DE
sdeSex Sex of demographic respondent
sdeSex1 Male
sdeSex2 Female
spoLog Interaction type
spoLog1 Monologue
spoLog2 Dialogue
spoReg Region where text captured
spoReg1 South
spoReg2 Midlands
spoReg3 North

The following table lists all classification codes which may be specified for any written text.

wbpSel Books & periodicals: selection method
wbpSel1 Selective
wbpSel2 Random
wmiPub Miscellaneous materials: publication status
wmiPub1 Published
wmiPub2 Unpublished
wriAAg Author age band
wriAAg1 0-14
wriAAg2 15-24
wriAAg3 25-34
wriAAg4 35-44
wriAAg5 45-59
wriAAg6 60+
wriADo Author domicile
wriAD036 Australia
wriAD124 Canada
wriAD250 France
wriAD276 Germany
wriAD372 Ireland
wriAD380 Italy
wriAD422 Lebanon
wriAD492 Monaco
wriAD554 New Zealand
wriAD620 Portugal
wriAD702 Singapore
wriAD756 Switzerland
wriAD826 United Kingdom
wriAD840 United States
wriAD920 UK North (north of Mersey-Humber line)
wriAD921 UK Midlands (north of Bristol Channel-Wash line)
wriAD922 UK South (south of Bristol Channel-Wash line)
wriASe Sex of author
wriASe1 Male
wriASe2 Female
wriASe3 Mixed
wriASe4 Unknown
wriATy Type of author
wriATy1 Corporate
wriATy2 Multiple
wriATy3 Sole
wriATy4 Unknown
wriAud Intended age of audience
wriAud1 Child
wriAud2 Teenager
wriAud3 Adult
wriAud4 Any
wriDom Domain
wriDom1 Imaginative
wriDom2 Informative: natural & pure science
wriDom3 Informative: applied science
wriDom4 Informative: social science
wriDom5 Informative: world affairs
wriDom6 Informative: commerce & finance
wriDom7 Informative: arts
wriDom8 Informative: belief & thought
wriDom9 Informative: leisure
wriLev Circulation level
wriLev1 Low
wriLev2 Medium
wriLev3 High
wriMed Medium
wriMed1 Book
wriMed2 Periodical
wriMed3 Miscellaneous published
wriMed4 Miscellaneous unpublished
wriMed5 To-be-spoken
wriPPl Place of publication
wriPP372 Ireland
wriPP826 United Kingdom
wriPP840 United States
wriPP920 UK North (north of Mersey-Humber line)
wriPP921 UK Midlands (north of Bristol Channel-Wash line)
wriPP922 UK South (south of Bristol Channel-Wash line)
wriSam Type of sample
wriSam1 Whole text
wriSam2 Beginning sample
wriSam3 Middle sample
wriSam4 End sample
wriSam5 Composite
wriSta Reception status
wriSta1 Low
wriSta2 Medium
wriSta3 High
wriTas Target audience sex
wriTas1 Male
wriTas2 Female
wriTas3 Mixed
wriTas4 Unknown
wriTim Time period
wriTim1 1960-1974
wriTim2 1975-1993

For each classification listed above, the absence of information may be indicated either by the absence of any code, or by the presence of a code ending with a zero instead of a number. For example, written texts for which type of author is unknown may be indicated either by the absence of any value beginning wriAty or by the presence of the specific value wriAty0.

Previous
Up
Next

`<activity>`	(H)	participants' activity during recording	297
`<address>`	(H)	postal or other address	185
`<align>`		alignment map for synchronizing overlap points	245
`<author>`	(H)	author in bibliographic entry	51
`<avail>`	(H)	availability code for file	185
`<bibNote>`	(H)	note within a bibliographic entry	1
`<bibl>`		loosely structured bibliographic reference	23
`<biblScop>`	(H)	page range within bibliographic entry	39
`<biblSrow>`	(H)	structured bibliographic entry	87
`<bncDoc>`		an individual text in the BNC Sampler	184
`<c>`		a punctuation mark	285801
`<caption>`		a floating heading or caption	1592
`<catDesc>`	(H)	description of a category	127
`<catRef>`	(H)	category codes applicable to a text	184
`<category>`	(H)	a category-value pair	127
`<change>`	(H)	change note	963
`<clasDecl>`	(H)	description of classification scheme	1
`<corr>`	(H)	description of correction policy	3
`<creation>`	(H)	information about creation of a text	185
`<date>`		a date	1218
`<div>`		any subdivision of a spoken text	238
`<div1>`		first-level subdivision of a written text	1218
`<div2>`		second-level subdivision of a written text	1165
`<div3>`		third-level subdivision of a written text	512
`<div4>`		fourth-level subdivision of a written text	137
`<editDecl>`	(H)	descriptions of editorial policies	16
`<ednStmt>`	(H)	information about a particular edition	185
`<encDesc>`	(H)	encoding description	185
`<event>`		non-verbal event within a spoken text	477
`<extent>`	(H)	size of a corpus text	185
`<fileDesc>`	(H)	documentation of an electronic text	185
`<gap>`		a spot where part of source text has been omitted	4831
`<head>`		any form of heading or title	3020
`<header>`		meta-information describing a corpus text	185
`<hi>`		typographically highlighted phrase	1738
`<hyph>`	(H)	description of hyphenation policy	2
`<idno>`	(H)	identifying number for a text	185
`<imprint>`	(H)	imprint within a bibliographic entry	71
`<item>`		item within a list	1041
`<keywords>`	(H)	descriptive keywords for topics of a text	184
`<l>`		line of verse	3618
`<label>`		label of a list item	291
`<langUsg>`	(H)	description of languages used in a text	1
`<list>`		list of items	185
`<loc>`		synchronisation point within an alignment map	28399
`<locName>`	(H)	name of place where speech recorded	285
`<locale>`	(H)	description of a place where speech recorded	264
`<monogr>`	(H)	monographic bibliographic entry	87
`<name>`		proper name of person, place etc.	1417
`<note>`		note or comment of any kind	119
`<p>`		paragraph in written text	28
`<partics>`	(H)	description of spoken text participants	99
`<pause>`		noticeable pause in spoken text	26091
`<pb>`		page break in written text	2633
`<person>`	(H)	information about a speaker	537
`<poem>`		group of verse lines in a written text	161
`<profDesc>`	(H)	additional information about a text	185
`<projDesc>`	(H)	background information about BNC project	185
`<prow>`		link to a displaced element or to synchronisation point	58524
`<pubPlace>`	(H)	place of publication within bibliographic entry	69
`<pubStmt>`	(H)	publication or distribution information	185
`<quot>`	(H)	description of quotation policy	2
`<quote>`		quotation from some other work	59
`<rec>`	(H)	recording details	297
`<recStmt>`	(H)	information about an audio recording	98
`<refsDecl>`	(H)	description of reference system used	185
`<reg>`		description of regularisation policy	261
`<relation>`	(H)	relationship between participants in a spoken text	396
`<resp>`	(H)	nature of responsibility	1346
`<respStmt>`	(H)	statement of responsibility in a bibliographic entry	1343
`<revDesc>`	(H)	revision description	185
`<s>`		sentence-like linguistic segment	172408
`<sampDecl>`	(H)	description of sampling policy	5
`<segm>`	(H)	description of segmentation policy	2
`<settDesc>`	(H)	description of setting in which speech occurs	98
`<setting>`	(H)	an individual setting in which speech occurs	297
`<shift>`		change in voice quality	4230
`<sic>`		apparently erroneous transcription	516
`<srcDesc>`	(H)	description of the source for a written text	185
`<stext>`		an individual spoken text	98
`<tagUsage>`	(H)	count for a particular tag in a text	2600
`<tagsDecl>`	(H)	list of tags used in a particular text	185
`<term>`	(H)	individual term in a list of keywords	262
`<text>`		an individual written text	86
`<titStmt>`	(H)	title statement for a text	185
`<title>`	(H)	title within a bibliographic entry	272
`<trans>`	(H)	declaration of transcription policy	7
`<trunc>`		truncated form in a spoken text	5566
`<txtClass>`	(H)	text classification	184
`<u>`		utterance in a spoken text	79811
`<unclear>`		inaudible or incomprehensible passage in a spoken text	19936
`<vocal>`		non-verbal vocalization in a spoken text	4521
`<w>`		word or other non-punctuation token carrying a POS code	1993554

`aacute`	small a, acute accent	5
`acirc`	small a, circumflex accent	2
`agrave`	small a, grave accent	3
`amp`	ampersand	262
`ast`	asterisk	3
`auml`	small a, dieresis or umlaut mark	10
`bquo`	normalised begin quote mark	8049
`bsol`	reverse solidus	1
`ccaron`	small c, caron	1
`ccedil`	small c, cedilla	5
`deg`	degree sign	26
`dollar`	dollar sign	231
`Eacute`	capital E, acute accent	3
`eacute`	small e, acute accent	69
`egrave`	small e, grave accent	4
`equo`	normalised end quote mark	8323
`euml`	small e, dieresis or umlaut mark	1
`formula`	mathematical formula	740
`frac12`	fraction one-half	163
`frac14`	fraction one-quarter	27
`frac15`	fraction one-fifth	2
`frac18`	fraction one-eighth	4
`frac34`	fraction three-quarters	27
`frac38`	fraction three-eighths	13
`frac58`	fraction five-eighths	9
`frac78`	fraction seven-eighths	1
`ft`	feet indicator	13
`gt`	greater-than sign	645
`hellip`	ellipsis (horizontal)	919
`ins`	inches indicator	33
`iuml`	small i, dieresis or umlaut mark	2
`lcub`	left curly bracket	93
`lsqb`	left square bracket	301
`lt`	less-than sign	640
`mdash`	em dash	3500
`ndash`	en dash	784
`ntilde`	small n, tilde	2
`oacute`	small o, acute accent	4
`ocirc`	small o, circumflex accent	5
`oelig`	small oe ligature	2
`oslash`	small o, slash	110
`ouml`	small o, dieresis or umlaut mark	19
`pound`	pound sign	1097
`quot`	quotation mark	2280
`rcub`	right curly bracket	95
`rehy`	maps to soft hyphen	16
`rsqb`	right square bracket	301
`times`	multiply sign	39
`uacute`	small u, acute accent	1
`uuml`	small u, dieresis or umlaut mark	8

bo	bold face	220
hi	highlighted	7
it	italic face	1531
ro	roman face	2
ul	underlined	31

`CAN`	Canada
`CHN`	China
`DEU`	Germany
`FRA`	France
`GBR`	United Kingdom
`IND`	India
`IRL`	Ireland
`USA`	United States
`XXX`	Unknown
`ZZG`	Europe
`XDE`	accent: German
`XEA`	accent: East Anglia
`XFR`	accent: French
`XHC`	accent: Home Counties
`XHM`	accent: Humberside
`XIR`	accent: Irish
`XIS`	accent: Indian subcontinent
`XLC`	accent: Lancashire
`XLO`	accent: London
`XMC`	accent: central Midlands
`XMD`	accent: Merseyside
`XME`	accent: north-east Midlands
`XMI`	accent: Midlands
`XMS`	accent: south Midlands
`XMW`	accent: north-west Midlands
`XNC`	accent: central northern England
`XNE`	accent: north-east England
`XNO`	accent: northern England
`XOT`	accent: unidentifiable
`XSD`	accent: Scottish
`XSL`	accent: lower south-west England
`XSS`	accent: central south-west England
`XSU`	accent: upper south-west England
`XUR`	accent: European
`XUS`	accent: U.S.A.
`XWA`	accent: Welsh
`XWE`	accent: West Indian

`acquaint`	acquaintance (2)
`audience`	(1)
`aunt`	(2)
`b-i-l`	brother-in-law (6)
`brother`	(16)
`chairman`	(1)
`colleagu`	colleague (51)
`cous-i-l`	cousin-in-law (1)
`cousin`	(1)
`customer`	(1)
`d-i-l`	daughter-in-law (4)
`daughter`	(24)
`f-i-l`	father-in-law (6)
`father`	(27)
`friend`	(36)
`g-daught`	granddaughter (5)
`g-fath`	grand-father (5)
`g-moth`	grandmother (6)
`g-son`	grandson (5)
`husband`	(40)
`intervee`	interviewee (1)
`m-i-l`	mother-in-law (8)
`mother`	(37)
`neighbou`	neighbour (4)
`nephew`	(2)
`niece`	(1)
`s-daught`	step-daughter (1)
`s-father`	step-father (1)
`server`	(1)
`sis-i-l`	sister-in-law (8)
`sister`	(15)
`son`	(28)
`speaker`	(1)
`stranger`	(1)
`student`	(2)
`teacher`	(1)
`tutor`	(1)
`uncle`	(1)
`wife`	(41)

allAva	Text availability
allAva1	free, world: Freely available worldwide
allAva2	restricted, world: Available worldwide
allAva3	restricted, Not-NA: Not available in North America
allAva4	restricted, Not-US: Not available in U.S.A.
allAva5	restricted, EU: Not available outside the European Union
allAva6	restricted, Not-USP: Not available in U.S.A. & Philippines
allAva7	restricted, Not-NAP: Not available in North America & Philippines
allTyp	Text type
allTyp1	Spoken demographic
allTyp2	Spoken context-governed
allTyp3	Written books and periodicals
allTyp4	Written-to-be-spoken
allTyp5	Written miscellaneous

scgDom	Domain for context-governed material
scgDom1	Educational
scgDom2	Business
scgDom3	Institutional
scgDom4	Leisure
sdeAge	Age band for demographic respondent
sdeAge1	0-14
sdeAge2	15-24
sdeAge3	25-34
sdeAge4	35-44
sdeAge5	45-59
sdeAge6	60+
sdeCla	Social class for demographic repondent
sdeCla1	AB
sdeCla2	C1
sdeCla3	C2
sdeCla4	DE
sdeSex	Sex of demographic respondent
sdeSex1	Male
sdeSex2	Female
spoLog	Interaction type
spoLog1	Monologue
spoLog2	Dialogue
spoReg	Region where text captured
spoReg1	South
spoReg2	Midlands
spoReg3	North

wbpSel	Books & periodicals: selection method
wbpSel1	Selective
wbpSel2	Random
wmiPub	Miscellaneous materials: publication status
wmiPub1	Published
wmiPub2	Unpublished
wriAAg	Author age band
wriAAg1	0-14
wriAAg2	15-24
wriAAg3	25-34
wriAAg4	35-44
wriAAg5	45-59
wriAAg6	60+
wriADo	Author domicile
wriAD036	Australia
wriAD124	Canada
wriAD250	France
wriAD276	Germany
wriAD372	Ireland
wriAD380	Italy
wriAD422	Lebanon
wriAD492	Monaco
wriAD554	New Zealand
wriAD620	Portugal
wriAD702	Singapore
wriAD756	Switzerland
wriAD826	United Kingdom
wriAD840	United States
wriAD920	UK North (north of Mersey-Humber line)
wriAD921	UK Midlands (north of Bristol Channel-Wash line)
wriAD922	UK South (south of Bristol Channel-Wash line)
wriASe	Sex of author
wriASe1	Male
wriASe2	Female
wriASe3	Mixed
wriASe4	Unknown
wriATy	Type of author
wriATy1	Corporate
wriATy2	Multiple
wriATy3	Sole
wriATy4	Unknown
wriAud	Intended age of audience
wriAud1	Child
wriAud2	Teenager
wriAud3	Adult
wriAud4	Any
wriDom	Domain
wriDom1	Imaginative
wriDom2	Informative: natural & pure science
wriDom3	Informative: applied science
wriDom4	Informative: social science
wriDom5	Informative: world affairs
wriDom6	Informative: commerce & finance
wriDom7	Informative: arts
wriDom8	Informative: belief & thought
wriDom9	Informative: leisure
wriLev	Circulation level
wriLev1	Low
wriLev2	Medium
wriLev3	High
wriMed	Medium
wriMed1	Book
wriMed2	Periodical
wriMed3	Miscellaneous published
wriMed4	Miscellaneous unpublished
wriMed5	To-be-spoken
wriPPl	Place of publication
wriPP372	Ireland
wriPP826	United Kingdom
wriPP840	United States
wriPP920	UK North (north of Mersey-Humber line)
wriPP921	UK Midlands (north of Bristol Channel-Wash line)
wriPP922	UK South (south of Bristol Channel-Wash line)
wriSam	Type of sample
wriSam1	Whole text
wriSam2	Beginning sample
wriSam3	Middle sample
wriSam4	End sample
wriSam5	Composite
wriSta	Reception status
wriSta1	Low
wriSta2	Medium
wriSta3	High
wriTas	Target audience sex
wriTas1	Male
wriTas2	Female
wriTas3	Mixed
wriTas4	Unknown
wriTim	Time period
wriTim1	1960-1974
wriTim2	1975-1993