Previous
Up
Next
6 Miscellaneous code tables

6 Miscellaneous code tables

This section consists of a series of tables listing a number of codes used in encoding various aspects of the corpus.

The following code tables are provided:

A general discussion of the principles and practice underlying the CLAWS word class annotation scheme used in the BNC is provided by the document A brief users' guide to the grammatical tagging of the British National Corpus . This also includes a full list of the CLAWS7 word class codes applied to the BNC Sampler.

6.1 Elements defined by the BNC DTD

The following list gives a brief description of each SGML element used in the BNC Sampler. Elements are listed in alphabetical order. Descriptions prefixed by ``(H)'' are for elements which appear only in the text headers.

<activity>(H) participants' activity during recording 297
<address>(H) postal or other address 185
<align>alignment map for synchronizing overlap points 245
<author>(H) author in bibliographic entry51
<avail>(H) availability code for file 185
<bibNote>(H) note within a bibliographic entry1
<bibl>loosely structured bibliographic reference23
<biblScop>(H) page range within bibliographic entry39
<biblSrow>(H) structured bibliographic entry 87
<bncDoc>an individual text in the BNC Sampler 184
<c>a punctuation mark285801
<caption>a floating heading or caption 1592
<catDesc>(H) description of a category127
<catRef>(H) category codes applicable to a text184
<category>(H) a category-value pair127
<change>(H) change note963
<clasDecl>(H) description of classification scheme1
<corr>(H) description of correction policy3
<creation>(H) information about creation of a text185
<date>a date1218
<div>any subdivision of a spoken text238
<div1>first-level subdivision of a written text1218
<div2>second-level subdivision of a written text1165
<div3>third-level subdivision of a written text512
<div4>fourth-level subdivision of a written text137
<editDecl>(H) descriptions of editorial policies16
<ednStmt>(H) information about a particular edition185
<encDesc>(H) encoding description185
<event>non-verbal event within a spoken text477
<extent>(H) size of a corpus text185
<fileDesc>(H) documentation of an electronic text185
<gap>a spot where part of source text has been omitted4831
<head>any form of heading or title3020
<header>meta-information describing a corpus text185
<hi>typographically highlighted phrase1738
<hyph>(H) description of hyphenation policy2
<idno>(H) identifying number for a text185
<imprint>(H) imprint within a bibliographic entry71
<item>item within a list1041
<keywords>(H) descriptive keywords for topics of a text184
<l>line of verse3618
<label>label of a list item291
<langUsg>(H) description of languages used in a text1
<list>list of items185
<loc>synchronisation point within an alignment map28399
<locName>(H) name of place where speech recorded285
<locale>(H) description of a place where speech recorded264
<monogr>(H) monographic bibliographic entry87
<name>proper name of person, place etc.1417
<note>note or comment of any kind119
<p>paragraph in written text28
<partics>(H) description of spoken text participants99
<pause>noticeable pause in spoken text26091
<pb>page break in written text2633
<person>(H) information about a speaker537
<poem>group of verse lines in a written text161
<profDesc>(H) additional information about a text185
<projDesc>(H) background information about BNC project185
<prow>link to a displaced element or to synchronisation point58524
<pubPlace>(H) place of publication within bibliographic entry69
<pubStmt>(H) publication or distribution information185
<quot>(H) description of quotation policy2
<quote>quotation from some other work59
<rec>(H) recording details297
<recStmt>(H) information about an audio recording98
<refsDecl>(H) description of reference system used185
<reg>description of regularisation policy261
<relation>(H) relationship between participants in a spoken text396
<resp>(H) nature of responsibility1346
<respStmt>(H) statement of responsibility in a bibliographic entry1343
<revDesc>(H) revision description185
<s>sentence-like linguistic segment172408
<sampDecl>(H) description of sampling policy5
<segm>(H) description of segmentation policy2
<settDesc>(H) description of setting in which speech occurs98
<setting>(H) an individual setting in which speech occurs297
<shift>change in voice quality4230
<sic>apparently erroneous transcription516
<srcDesc>(H) description of the source for a written text185
<stext>an individual spoken text98
<tagUsage>(H) count for a particular tag in a text2600
<tagsDecl>(H) list of tags used in a particular text185
<term>(H) individual term in a list of keywords262
<text>an individual written text86
<titStmt>(H) title statement for a text185
<title>(H) title within a bibliographic entry272
<trans>(H) declaration of transcription policy7
<trunc>truncated form in a spoken text5566
<txtClass>(H) text classification184
<u>utterance in a spoken text79811
<unclear>inaudible or incomprehensible passage in a spoken text19936
<vocal>non-verbal vocalization in a spoken text4521
<w>word or other non-punctuation token carrying a POS code1993554

6.2 Character entities defined by the BNC DTD

The following table gives a brief description of each character entity used within the text of the BNC Sampler. Declarations for these entities may be found either in standard entity sets or from the entity definitions supplied as part of the BNC document type definition, in the file sampents.dtd. In either case, system specific values should be supplied for the characters described below. The number indicates the number of times this entity reference appears in the current version of the corpus.

aacutesmall a, acute accent5
acircsmall a, circumflex accent2
agravesmall a, grave accent3
ampampersand262
astasterisk3
aumlsmall a, dieresis or umlaut mark10
bquonormalised begin quote mark8049
bsolreverse solidus1
ccaronsmall c, caron1
ccedilsmall c, cedilla5
degdegree sign26
dollardollar sign231
Eacutecapital E, acute accent3
eacutesmall e, acute accent69
egravesmall e, grave accent4
equonormalised end quote mark8323
eumlsmall e, dieresis or umlaut mark1
formulamathematical formula740
frac12fraction one-half163
frac14fraction one-quarter27
frac15fraction one-fifth2
frac18fraction one-eighth4
frac34fraction three-quarters27
frac38fraction three-eighths13
frac58fraction five-eighths9
frac78fraction seven-eighths1
ftfeet indicator13
gtgreater-than sign645
hellipellipsis (horizontal)919
insinches indicator33
iumlsmall i, dieresis or umlaut mark2
lcubleft curly bracket93
lsqbleft square bracket301
ltless-than sign640
mdashem dash3500
ndashen dash784
ntildesmall n, tilde2
oacutesmall o, acute accent4
ocircsmall o, circumflex accent5
oeligsmall oe ligature2
oslashsmall o, slash110
oumlsmall o, dieresis or umlaut mark19
poundpound sign1097
quotquotation mark2280
rcubright curly bracket95
rehymaps to soft hyphen16
rsqbright square bracket301
timesmultiply sign39
uacutesmall u, acute accent1
uumlsmall u, dieresis or umlaut mark8

6.3 Division types

The type attribute on each <div1>, <div2> (etc) element of a written text may be used to supply a value which characterizes the function of the corresponding subdivision in some way. Only the following values are used in the BNC Sampler:

6.4 Rendition codes

The following codes are used in the BNC Sampler to indicate the kind of typographic rendition associated with an element where this is typographically distinct in some way. These codes are mostly used as values for the r attribute of the <hi> element, but may be used on any element bearing this attribute.

bobold face220
hihighlighted7
ititalic face1531
roroman face2
ulunderlined31

6.5 Voice quality codes

Changes in voice quality in spoken texts are indicated by values for the <new> attribute on a <shift> element, at the point where the speaker's voice changes. The following values are used in the BNC Sampler (frequencies are given in parentheses):

crying (18) laughing (1151) mimicking (46) mimicking refined accent (1) reading (327) screaming (12) shouting (165) sighing (31) singing (157) spelling (24) whingeing (3) whining (4) whispering (139) yawning (45)

6.6 Regional codes

A single set of codes, derived from the International Standard for language and country identification, is used to identify regional origins, first language, and dialects spoken by participants, as specified in the <person> element in the text header. Speakers for whom such information was recorded will use one or more of the following codes as values for the who.flang or who.dialect attributes. All available codes are listed here; note that not all of these codes are actually used in the BNC Sampler:

CANCanada
CHNChina
DEUGermany
FRAFrance
GBRUnited Kingdom
INDIndia
IRLIreland
USAUnited States
XXXUnknown
ZZGEurope
XDEaccent: German
XEAaccent: East Anglia
XFRaccent: French
XHCaccent: Home Counties
XHMaccent: Humberside
XIRaccent: Irish
XISaccent: Indian subcontinent
XLCaccent: Lancashire
XLOaccent: London
XMCaccent: central Midlands
XMDaccent: Merseyside
XMEaccent: north-east Midlands
XMIaccent: Midlands
XMSaccent: south Midlands
XMWaccent: north-west Midlands
XNCaccent: central northern England
XNEaccent: north-east England
XNOaccent: northern England
XOTaccent: unidentifiable
XSDaccent: Scottish
XSLaccent: lower south-west England
XSSaccent: central south-west England
XSUaccent: upper south-west England
XURaccent: European
XUSaccent: U.S.A.
XWAaccent: Welsh
XWEaccent: West Indian

6.7 Relationship codes

Where relationships between individual participants in spoken texts can be identified, they will be specified by means of the <relation> element within the text header (as discussed in section 5.3.3 ). The desc attribute of this element may take any of the values listed below. The number in parentheses indicates the number of times this value appears in the BNC Sampler.

acquaint acquaintance (2)
audience (1)
aunt (2)
b-i-l brother-in-law (6)
brother (16)
chairman (1)
colleagu colleague (51)
cous-i-l cousin-in-law (1)
cousin (1)
customer (1)
d-i-l daughter-in-law (4)
daughter (24)
f-i-l father-in-law (6)
father (27)
friend (36)
g-daught granddaughter (5)
g-fath grand-father (5)
g-moth grandmother (6)
g-son grandson (5)
husband (40)
intervee interviewee (1)
m-i-l mother-in-law (8)
mother (37)
neighbou neighbour (4)
nephew (2)
niece (1)
s-daught step-daughter (1)
s-father step-father (1)
server (1)
sis-i-l sister-in-law (8)
sister (15)
son (28)
speaker (1)
stranger (1)
student (2)
teacher (1)
tutor (1)
uncle (1)
wife (41)

6.8 Paralinguistic phenomena

In addition to the <pause>, <shift>, <trunc>, and <unclear> elements, a variety of paralinguistic phenomena are marked up in the transcriptions which form the spoken part of the BNC Sampler. The <vocal> element is used to mark a variety of non-linguistic or semi-linguistic sounds made by one or more speakers in the transcriptions; the <event> element is used to mark up other occurrences which seemed of importance to the transcribers when making sense of the spoken interaction. We list here the various annotations used for the latter two categories throughout the BNC Sampler.

6.8.1 Vocals

The following lists in alphabetic order all the different values specified on the TYPE attribute for the <vocal> element in the BNC Sampler, together with the frequency of occurrence for each different value.

baby talk (18) belch (18) buzzing sound (1) clapping (82) clears throat (139) clicks tongue (1) cough (451) crying (30) gasp (2) giggle (2) gurgle (7) hiccup (4) howl (1) humming (18) imitates aeroplane (1) imitates banjo (1) imitates bringing up phlegm (4) imitates cat licking (1) imitates clearing throat (1) imitates vomiting (3) imitating engine revving (1) kiss (5) kissing (1) kissing sound (2) laugh (3438) laughing (1) licking sound (1) mimicking cat spitting (1) mimicking gorilla noises (2) mimicking microphone noises (1) mimicking shaving noise (1) noise for paws (1) panting (1) purring noises (1) raspberry (7) scream (14) sigh (78) singing (7) sneeze (27) sniff (35) sound effect (8) sound effects (1) sound of biting (1) spitting sound (1) squeak (1) sucking noises (2) sucking then purring noises (1) tt (1) tut (27) whine (1) whistling (44) yawn (22) yelping sound (1)

6.8.2 Event descriptions

The following lists in alphabetic order all the different values specified on the DESC attribute for the <event> element in the BNC Sampler, together with the frequency of occurrence for each different value.

Background to following (1) Band (1) Band music (1) Break in enquiry (1) Break in recording (1) Children screaming (1) Dramatic music (1) Dramatic music. (1) End of first tape (2) End of recording (3) End of side (3) Engine noises (3) General chatter (1) Gives her a kiss (1) Gunfire, celebration. (1) Horse racing on the radio (1) Intense gunfire (1) Loud rustling next to microphone (1) Mr Bean record on telly I think (1) Music (3) Music and singing (1) Music and song (1) Pen on paper (3) Plane noises (1) Portuguese speech (31) Reading title of book (1) Rock music (2) Sound of burning and dramatic music (1) Sounds of intense automatic weapons fire. (1) Spanish (1) Tape ends (1) Telephone being dialled, overlaps next part of speech. (1) Theme music leading to triumphant climax (1) Theme music, reprise, to end of job (1) Theme music. Engine noise (1) advert (1) advertisements (7) advertisements and travel news (2) another gap in tape (1) applause (9) baby crying (5) baby screaming (2) baby squealing (1) baby talking (8) background to following (1) banging (2) banging noises (1) barking (2) bell ringing (1) bell rings (2) bibs hooter (1) birds singing/whistling (1) birthdays etcetera (1) blowing kisses (2) blowing nose (1) blows nose (2) boxing on television (1) boys fighting (1) break in recording (6) break in recording while watching video (1) break in tape (5) calling from outside (1) car hooter (1) cat miaows (1) chairs being moved (1) cheering (1) cheering and shouting (1) clapping (24) classroom chatter (11) classroom chatter - barely audible speech (1) clicking computer (4) clicking of computer (1) clock chiming (4) closing music (2) cough (4) dog barking (14) dog barks (4) dog sick noise again (1) dogs barking (2) door opening (1) doorbell (1) doorbell ringing (1) doorbell rings (1) dramatic music (1) drilling noise (1) drums (1) duck noise (1) eating (1) eating dinner (1) end of first side of tape and start of second (1) end of job (1) end of recording (5) end of side (1) end of side of tape (1) end of side one of first tape (1) end of side one of tape (1) end of side one of tape. second side starts part way into tape. (1) end of tape (1) end of tape side two (1) engine noises (1) everyone claps (1) football on television (1) football on television again - changed channels (1) general background chatter continues, but foreground conversation has paused (1) general hubbub as people move forward (1) getting into car (1) going into kitchen (1) hammering something (1) happy children (1) hums tune (1) in another room (1) in background throughout following text (1) in canteen/dining room - very noisy (1) in other room (1) intake of breath (1) interruption for radio commercials (2) introduction music (1) jet passes overhead (1) key sounds (1) knock on door (1) knocking on door (3) laugh (1) laughter (5) loud music (2) loud music playing (1) loud music playing on television (1) loud television (1) machinery noises (2) makes dog whining noise (1) makes growling noise (1) makes noise as if being sick (1) making sound of a plane (1) market place noises (1) microphone hissing and conversation very quiet (1) mimicking (1) mimicking crying (1) mimicking dog barking (1) mumble (1) mumbles (1) murmuring from the floor (1) music (5) music of Waltzing Matilda playing (5) music on in background (1) music on loud (1) music on loud again (1) music playing (1) music playing on television (1) music playing with interruption for radio commercials (1) music very loud (1) news bulletin (2) noise (2) noise like the dog being sick (1) noise of dog biscuits being tipped out (1) now moved to another class? (1) paper crinkling noises (1) phone bleeping (2) phone ringing (1) phone rings (3) piano sound (1) plate stacking (1) playing (1) playing piano (1) playing with baby (1) printer noises (1) printer sounds (2) puking noise. (1) radio on (1) reading computer screen (1) reading from board (1) reading from book/text (1) reading from classroom board (1) reading from hospital form (2) reading from newspaper cutting (1) reading from package (1) reading from receipt (1) reading title of book (1) reads horoscopes (1) recording ends (1) repetitive banging (1) scraping something (1) screaming in background (1) shooting sound on video game (1) shuts door (1) shutting door (1) singing (4) singing along to record (1) singing in background (3) singing to music (2) smacking lips together (1) snooker on the telly (1) sombre music (1) something falls (1) something smashes (1) sounds of burning. (1) sounds of sporadic fire (1) sounds of violence and gunfire (1) speaking French (1) speaking Spanish (2) spells surname (1) spitting noise (1) sports news (1) students' voices in background (6) sucks teeth (3) talking in kitchen away from microphone (1) talking in other room (1) talking to dog (1) talking to the cat (1) talking with mouth full (3) tape ends (6) tape ends side one and starts side two (1) tape playing (1) tape recording (1) tape stops and starts (2) taps table (1) telephone conversation ends (14) telephone conversation starts (13) telephone ringing (5) telephone rings (1) television comes on (1) television loud (1) television on (5) television on - horse racing (1) television turned over to football commentary (1) television very loud (2) telly on loud (1) theme music to end of recording (1) throws dice (1) tickling little girl (2) too much banging (1) traffic (1) traffic news and advertisements (1) travel news (2) travel news and news bulletin (1) travel news and weather (1) unable to hear conversation for some time (1) very loud television - football (1) video film for 135 seconds (1) video playing (1) walking along corridor - just a lot of chatter (1) watching football on television (1) watching football on television and talking quietly in background (2) weather and travel news (1) whispering to dog (1) with microphone (3) woof (1)

6.9 Text classification codes

This section lists all of the classification codes that may appear within the <catRef> element in the header of each text. Not all of the values defined here are actually used within the BNC Sampler. For information on how many texts, words, and sentences are included within a given classification, see the additional statistical tables.

The following table shows codes which can be used to classify all kinds of text, according to their availability, or their type. (For the BNC Sampler, all texts are freely available worldwide).

allAvaText availability
allAva1free, world: Freely available worldwide
allAva2restricted, world: Available worldwide
allAva3restricted, Not-NA: Not available in North America
allAva4restricted, Not-US: Not available in U.S.A.
allAva5restricted, EU: Not available outside the European Union
allAva6restricted, Not-USP: Not available in U.S.A. & Philippines
allAva7restricted, Not-NAP: Not available in North America & Philippines
allTypText type
allTyp1Spoken demographic
allTyp2Spoken context-governed
allTyp3Written books and periodicals
allTyp4Written-to-be-spoken
allTyp5Written miscellaneous

The following table list the classification codes which can be specified for spoken texts (either demographic or context-governed) only. Note that the classifications for demographically sampled texts apply to the respondent only, not necessarily to all speakers transcribed.

scgDomDomain for context-governed material
scgDom1Educational
scgDom2Business
scgDom3Institutional
scgDom4Leisure
sdeAgeAge band for demographic respondent
sdeAge10-14
sdeAge215-24
sdeAge325-34
sdeAge435-44
sdeAge545-59
sdeAge660+
sdeClaSocial class for demographic repondent
sdeCla1AB
sdeCla2C1
sdeCla3C2
sdeCla4DE
sdeSexSex of demographic respondent
sdeSex1Male
sdeSex2Female
spoLogInteraction type
spoLog1Monologue
spoLog2Dialogue
spoRegRegion where text captured
spoReg1South
spoReg2Midlands
spoReg3North

The following table lists all classification codes which may be specified for any written text.

wbpSelBooks & periodicals: selection method
wbpSel1Selective
wbpSel2Random
wmiPubMiscellaneous materials: publication status
wmiPub1Published
wmiPub2Unpublished
wriAAgAuthor age band
wriAAg10-14
wriAAg215-24
wriAAg325-34
wriAAg435-44
wriAAg545-59
wriAAg660+
wriADoAuthor domicile
wriAD036Australia
wriAD124Canada
wriAD250France
wriAD276Germany
wriAD372Ireland
wriAD380Italy
wriAD422Lebanon
wriAD492Monaco
wriAD554New Zealand
wriAD620Portugal
wriAD702Singapore
wriAD756Switzerland
wriAD826United Kingdom
wriAD840United States
wriAD920UK North (north of Mersey-Humber line)
wriAD921UK Midlands (north of Bristol Channel-Wash line)
wriAD922UK South (south of Bristol Channel-Wash line)
wriASeSex of author
wriASe1Male
wriASe2Female
wriASe3Mixed
wriASe4Unknown
wriATyType of author
wriATy1Corporate
wriATy2Multiple
wriATy3Sole
wriATy4Unknown
wriAudIntended age of audience
wriAud1Child
wriAud2Teenager
wriAud3Adult
wriAud4Any
wriDomDomain
wriDom1Imaginative
wriDom2Informative: natural & pure science
wriDom3Informative: applied science
wriDom4Informative: social science
wriDom5Informative: world affairs
wriDom6Informative: commerce & finance
wriDom7Informative: arts
wriDom8Informative: belief & thought
wriDom9Informative: leisure
wriLevCirculation level
wriLev1Low
wriLev2Medium
wriLev3High
wriMedMedium
wriMed1Book
wriMed2Periodical
wriMed3Miscellaneous published
wriMed4Miscellaneous unpublished
wriMed5To-be-spoken
wriPPlPlace of publication
wriPP372Ireland
wriPP826United Kingdom
wriPP840United States
wriPP920UK North (north of Mersey-Humber line)
wriPP921UK Midlands (north of Bristol Channel-Wash line)
wriPP922UK South (south of Bristol Channel-Wash line)
wriSamType of sample
wriSam1Whole text
wriSam2Beginning sample
wriSam3Middle sample
wriSam4End sample
wriSam5Composite
wriStaReception status
wriSta1Low
wriSta2Medium
wriSta3High
wriTasTarget audience sex
wriTas1Male
wriTas2Female
wriTas3Mixed
wriTas4Unknown
wriTimTime period
wriTim11960-1974
wriTim21975-1993

For each classification listed above, the absence of information may be indicated either by the absence of any code, or by the presence of a code ending with a zero instead of a number. For example, written texts for which type of author is unknown may be indicated either by the absence of any value beginning wriAty or by the presence of the specific value wriAty0.


Previous
Up
Next