add this bookmarking tool

Miscellaneous code tables

This section consists of a series of tables identifying a number of codes used in various aspects of the corpus and its encoding.

The following code tables are provided:
  • Elements defined by the BNC DTD lists all SGML elements used in the corpus, with a brief description of each
  • Character entities defined by the BNC DTD lists all SGML entities used in the corpus, with a brief description of each
  • Division types lists all values actually used in the corpus for the type attribute on division elements (<div1>, <div2> etc.)
  • Rendition codes lists all values used in the corpus for the r (rendition) attribute, chiefly on <hi> elements, to indicate typographic rendering of the source
  • Voice quality codes lists all values used in the corpus for the new attribute on the <shift> element, to indicate changes in voice quality for spoken texts
  • Regional codes lists the codes used to identify regional origins of participants, as specified in the <person> element in the header
  • Relationship codes lists the codes used to identify relationships documented between participants, as specified in the <relation> element in the header
  • Word class codes lists all part of speech codes in the C5 tagset, used to specify the linguistic category for all <w> and <c> elements

In addition, a list of ‘non-orthographic words’ recognized by the CLAWS system (i.e. multiword items and clitics) which was included in this section in the first edition of this document is now available in the accompanying Manual to accompany The British National Corpus (Version 2) with Improved Word-class Tagging by Geoffrey Leech and Nicholas Smith.

The list of text classification codes present in the first edition of this document is now included only as part of the presentation of the corpus header in section ??.

Elements defined by the BNC DTD

The following list gives a brief description of each element defined in the BNC document type definition (DTD). Elements are listed in alphabetical order. Descriptions prefixed by ‘(H)’ are for elements which appear only in the text headers. Counts are given for elements occurring within texts.
<activity>
(H) participants' activity during recording
<address>
(H) postal or other address
<align>
alignment map for synchronizing overlap points (3461)
<analytic>
(H) analytic bibliographic entry
<author>
(H) author in bibliographic entry
<avail>
(H) availability code for file
<bibl>
loosely structured bibliographic reference (1037)
<biblScope>
(H) page range within bibliographic entry
<biblStruct>
(H) structured bibliographic entry
<bnc>
the BNC itself
<bncDoc>
an individual text in the BNC
<body>
the body of a written text in the BNC (3136)
<c>
a punctuation mark (13620069)
<caption>
a floating heading or caption (89935)
<catDesc>
(H) description of a category
<category>
(H) a category-value pair
<catRef>
(H) category codes applicable to a text
<change>
(H) change note
<classDecl>
(H) description of classification scheme
<corr>
editorial correction (8323)
<creation>
(H) information about creation of a text
<date>
(H) a date
<classCode>
(H) externally-defined classification code for a text
<div>
any subdivision of a spoken text (3779)
<div1>
first-level subdivision of a written text (84777)
<div2>
second-level subdivision of a written text (72697)
<div3>
third-level subdivision of a written text (38122)
<div4>
fourth-level subdivision of a written text (12506)
<editorialDecl>
(H) descriptions of editorial policies
<edition>
(H) edition in a bibliographic entry
<editionStmt>
(H) information about a particular edition
<encodingDesc>
(H) encoding description
<event>
non-verbal event within a spoken text (6565)
<extent>
(H) size of a corpus text
<fileDesc>
(H) documentation of an electronic text
<gap>
a spot where part of source text has been omitted (95959)
<head>
any form of heading or title (222876)
<hi>
typographically highlighted phrase (210927)
<idno>
(H) identifying number for a text
<imprint>
(H) imprint within a bibliographic entry
<item>
item within a list (117207)
<keyWords>
(H) descriptive keywords for topics of a text
<l>
line of verse (51559)
<label>
label of a list item (65664)
<langUsage>
(H) description of languages used in a text
<lb>
line break in printed source (169)
<lg>
group of verse lines (1)
<list>
list of items (19758)
<loc>
synchronisation point within an alignment map (244975)
<locale>
(H) description of a place where speech recorded
<monogr>
(H) monographic bibliographic entry
<name>
(H) name of place where speech recorded
<note>
note or comment of any kind (17206)
<p>
paragraph in written text (1515002)
<particDesc>
(H) description of spoken text participants
<pause>
noticeable pause in spoken text (217916)
<pb>
page break in written text (153642)
<person>
(H) information about a speaker
<poem>
group of verse lines in a written text (3048)
<profileDesc>
(H) additional information about a text
<projectDesc>
(H) background information about BNC project
<ptr>
link to a displaced element or to synchronisation point (578248)
<publicationStmt>
(H) publication or distribution information
<pubPlace>
(H) place of publication within bibliographic entry
<quote>
quotation from some other work (15221)
<recording>
(H) information about a single recording
<recordingStmt>
(H) information about the recordings from which a transcript was made
<refsDecl>
(H) description of reference system used
<reg>
an editorial regularization (8363)
<relation>
(H) relationship between participants in a spoken text
<resp>
(H) nature of responsibility
<respStmt>
(H) statement of responsibility in a bibliographic entry
<revisionDesc>
(H) revision description
<s>
sentence-like linguistic segment (6053093)
<salute>
salutation or greeting (444)
<samplingDecl>
(H) description of sampling policy
<settingDesc>
(H) description of the settings in which speech occurs
<setting>
(H) an individual setting in which speech occurs
<shift>
change in voice quality (36216)
<sic>
apparently erroneous transcription (7797)
<sp>
speech in a written text (29858)
<spkr>
speaker of a speech in a written text (23708)
<sourceDesc>
(H) description of the source for a text
<stage>
stage direction in a written text (508)
<stext>
an individual spoken text (918)
<tagsDecl>
(H) list of tags used in a particular text
<tagUsage>
(H) count for a particular tag in a text
<teiHeader>
meta-information describing a corpus text
<term>
(H) individual term in a list of keywords
<text>
an individual written text (3136)
<title>
(H) title within a bibliographic entry
<titleStmt>
(H) title statement for a text
<trans>
(H) declaration of transcription policy
<trunc>
truncated form in a spoken text (52724)
<textClass>
(H) text classification
<u>
utterance in a spoken text (775799)
<unclear>
inaudible or incomprehensible passage in a spoken text (204239)
<vocal>
non-verbal vocalization in a spoken text (44286)
<w>
POS-tagged lexical item (97619934)

Character entities defined by the BNC DTD

The following list gives a brief description of each character entity used within the text of the BNC. Declarations for these entities may be found either in standard entity sets or from the entity definitions supplied as part of the BNC document type definition, in the file BNCents.dtd. In either case, system specific values should be supplied for the characters described below. The number in parentheses indicates the number of times this entity reference appears in the current version of the corpus.

Table 1. Character entities used in the BNC
entitydescriptioncount
Aacutecapital A, acute accent 51
aacutesmall a, acute accent 2145
abrevesmall A, breve 4
Acirccapital A, circumflex accent 8
acircsmall a, circumflex accent 772
acuteacute accent 3
AEligcapital AE diphthong (ligature) 324
aeligsmall ae diphthong (ligature) 232
agrsmall alpha, Greek 1237
Agravecapital A, grave accent 43
agravesmall a, grave accent 878
Amacrcapital A, macron 20
amacrsmall a, macron 581
ampampersand 18931
aogonsmall a, ogonek 2
apeapproximate, equals 5
Aringcapital A, ring 141
aringsmall a, ring 65
astasterisk 919
atildesmall a, tilde 244
Aumlcapital A, dieresis or umlaut mark 6
aumlsmall a, dieresis or umlaut mark 967
Bgrcapital Beta, Greek 186
bgrsmall beta, Greek 1122
bquonormalized begin quote mark 771009
bsolreverse solidus 226
bullround bullet, filled 2150
cacutesmall c, acute accent 186
Ccaroncapital C, caron 31
ccaronsmall c, caron 143
Ccedilcapital C, cedilla 17
ccedilsmall c, cedilla 1320
ccircsmall c, circumflex accent 2
centcent sign 3
checktick, check mark 13
circircle, open 21
circcircumflex accent 100
commatcommercial at 189
copycopyright sign 65
darrdownward arrow 17
dashhyphen (true graphic) 1
dcaronsmall d, caron 1
degdegree sign 4067
Dgrcapital Delta, Greek 251
dgrsmall delta, Greek 151
diedieresis 13
dividedivide sign 58
dollardollar sign 24552
dstroksmall d, stroke 18
dtrifdn tri, filled 2
Eacutecapital E, acute accent 269
eacutesmall e, acute accent 16078
Ecaroncapital E, caron 2
ecaronsmall e, caron 67
Ecirccapital E, circumflex accent 2
ecircsmall e, circumflex accent 718
eegrsmall eta, Greek 48
Egrcapital Epsilon, Greek 170
egrsmall epsilon, Greek 207
Egravecapital E, grave accent 14
egravesmall e, grave accent 2577
emacrsmall e, macron 4
eogonsmall e, ogonek 6
equalsequals sign 3
equonormalized end quote mark 752621
ethsmall eth, Icelandic 4
Eumlcapital E, dieresis or umlaut mark 15
eumlsmall e, dieresis or umlaut mark 482
flatmusical flat 154
formulamathematical formula 6466
frac12fraction one-half 2795
frac13fraction one-third 68
frac14fraction one-quarter 575
frac15fraction one-fifth 20
frac16fraction one-sixth 6
frac17fraction one-seventh 2
frac18fraction one-eighth 60
frac19fraction one-ninth 1
frac23fraction two-thirds 50
frac25fraction two-fifths 9
frac34fraction three-quarters 325
frac35fraction three-fifths 5
frac38fraction three-eighths 52
frac45fraction four-fifths 5
frac47fraction four-sevenths 1
frac56fraction five-sixths 1
frac58fraction five-eighths 33
frac78fraction seven-eighths 7
ftfeet indicator 630
gegreater-than-or-equal 18
Ggrcapital Gamma, Greek 33
ggrsmall gamma, Greek 499
gravegrave accent 2
Gtdbl greater-than sign 8
gtgreater-than sign 1102
halffraction one-half 74
heartsheart suit symbol 1
hellipellipsis (horizontal) 77286
horbarhorizontal bar 2
hstroksmall h, stroke 2
Iacutecapital I, acute accent 3
iacutesmall i, acute accent 1278
Icirccapital I, circumflex accent 21
icircsmall i, circumflex accent 225
iexclinverted exclamation mark 21
igrsmall iota, Greek 2
igravesmall i, grave accent 39
imacrsmall i, macron 10
infininfinity 15
insinches indicator 2306
iquestinverted question mark 11
Iumlcapital I, dieresis or umlaut mark 1
iumlsmall i, dieresis or umlaut mark 507
kgrsmall kappa, Greek 29
khgrsmall chi, Greek 300
Lacutecapital L, acute accent 2
lacutesmall l, acute accent 2
larrleftward arrow 1
lcubleft curly bracket 345
leless-than-or-equal 23
lgrsmall lambda, Greek 104
lowbarlow line 134
lsqbleft square bracket 34752
Lstrokcapital L, stroke 3
lstroksmall l, stroke 25
Ltdouble less-than sign 3
ltless-than sign 2295
mdashem dash 275695
Mgrcapital Mu, Greek 1
mgrsmall mu, Greek 376
micromicro sign 1487
middotmiddle dot 253
nacutesmall n, acute accent 21
naturmusic natural 6
ncaronsmall n, caron 27
ncedilsmall n, cedilla 2
ndashen dash 43489
ngrsmall nu, Greek 88
Ntildecapital N, tilde 4
ntildesmall n, tilde 771
numnumber sign 138
Oacutecapital O, acute accent 17
oacutesmall o, acute accent 1328
Ocirccapital O, circumflex accent 7
ocircsmall o, circumflex accent 754
OEligcapital OE ligature 2
oeligsmall oe ligature 55
Ogrcapital Omicron, Greek 11
ogrsmall omicron, Greek 52
ogravesmall o, grave accent 73
OHgrcapital Omega, Greek 1
ohgrsmall omega, Greek 23
ohmohm sign 15
omacrsmall o, macron 4
Oslashcapital O, slash 16
oslashsmall o, slash 304
Otildecapital O, tilde 1
otildesmall o, tilde 4
Oumlcapital O, dieresis or umlaut mark 344
oumlsmall o, dieresis or umlaut mark 1284
percntpercent sign 144
Pgrcapital Pi, Greek 37
pgrsmall pi, Greek 95
PHgrcapital Phi, Greek 9
phgrsmall phi, Greek 107
plusplus sign 198
plusmnplus-or-minus sign 122
poundpound sign 71698
Primedouble prime or second 59
primeprime or minute 128
PSgrcapital Psi, Greek 15
psgrsmall psi, Greek 16
quotquotation mark 142126
racutesmall r, acute accent 3
radicsurd =radical (square root) 9
rarrrightward arrow 182
Rcaroncapital R, caron 1
rcaronsmall r, caron 103
rcubright curly bracket 343
regregistered sign 11
rehymaps to soft hyphen 3905
rgrsmall rho, Greek 81
rsqbright square bracket 34807
Sacutecapital S, acute accent 13
sacutesmall s, acute accent 24
Scaroncapital S, caron 85
scaronsmall s, caron 257
Scedilcapital S, cedilla 7
scedilsmall s, cedilla 465
scircsmall s, circumflex accent 14
sectsection sign 52
Sgrcapital Sigma, Greek 13
sgrsmall sigma, Greek 150
sharpmusical sharp 93
shillingBritish shilling 228
simsimilar 68
solsolidus 355
sup1superscript one 2
sup2superscript two 45
sup3superscript three 10
szligsmall sharp s, German (sz ligature) 19
tcaronsmall t, caron 1
tcedilsmall t, cedilla 26
tgrsmall tau, Greek 68
THgrcapital Theta, Greek 13
thgrsmall theta, Greek 193
THORNcapital THORN, Icelandic 15
thornsmall thorn, Icelandic 13
timesmultiply sign 2300
tradetrade mark sign 12
Uacutecapital U, acute accent 7
uacutesmall u, acute accent 326
Ucirccapital U, circumflex accent 1
ucircsmall u, circumflex accent 107
Ugrcapital Upsilon, Greek 1
ugrsmall upsilon, Greek 3
ugravesmall u, grave accent 38
umacrsmall u, macron 3
umlumlaut mark 3
uringsmall u, ring 9
Uumlcapital U, dieresis or umlaut mark 25
uumlsmall u, dieresis or umlaut mark 2246
verbarvertical bar 269
wcircsmall w, circumflex accent 2
xgrsmall xi, Greek 8
yacutesmall y, acute accent 138
Ycirccapital Y, circumflex accent 1
ycircsmall y, circumflex accent 6
yenyen sign 120
Yumlcapital Y, dieresis or umlaut mark 2
yumlsmall y, dieresis or umlaut mark 47
zacutesmall z, acute accent 3
Zcaroncapital Z, caron 9
zcaronsmall z, caron 67
zdotsmall z, dot above 1
Zgrcapital Zeta, Greek 3
zgrsmall zeta, Greek 33

Division types

The type attribute on each <div1>, <div2> (etc) element of a written text may be used to supply a value which characterizes the function of the corresponding subdivision in some way. The following table lists the values used and their frequencies in the current version of the Corpus:
Table 2. Division types
advertisement (6) appendix (1) article (2)
blurb (1) body text (3) brochure (1)
cartoon (2) chapter (1018) column (162)
commentary (1) competition (1) compo (11)
contents (2) element (1) fact sheet (1)
front (115) headlines (2) insert (2)
introduction (1) item (12) leaflet (7)
magazine (1) paper (11) paragraph (1)
part (85) poster (1) recipe (61)
section (59) short story (8) sidebar (4)
story (178) strip cartoon (1) sub (2)
subsection (109) title page (1) title page obverse (1)
toc (1) u (209985)

Rendition codes

The following codes are used to indicate the kind of typographic rendition associated with an element which is typographically distinct in some way. These codes are mostly used as values for the rend attribute of the <hi> element, but may be used on any element bearing this attribute.

bo
bold face
bx
boxed
it
italic font
ro
roman font
hi
superscript
lo
subscript
qr
right aligned
qc
centred
qt
quoted
sc
small caps
st
struck out
ul
underlined

More than one value from the above list may occasionally be specified for a single element. In this case, the values are separated by spaces.

Voice quality codes

Changes in voice quality in spoken texts are indicated by values for the <new> attribute on a <shift> element, at the point where the speaker's voice change. The following list summarises the values used in the present version of the corpus:
  • cheering
  • crying
  • eating
  • giggling
  • humming
  • humming the stripper's song
  • imitates woman's voice
  • imitating a monkey
  • imitating a sexy woman's voice
  • imitating Chinese voice
  • imitating drunken voice
  • imitating Italian accent
  • imitating man's voice
  • imitating posh voice
  • imitating woman's voice
  • in a boyish voice
  • in the distance
  • laughing
  • laughing+reading
  • laughing+shouting
  • mimicking
  • mimicking American accent
  • mimicking American accent from Wayne's World
  • mimicking an upper class person
  • mimicking baby voice
  • mimicking Birmingham accent
  • mimicking Chinese speaking
  • mimicking Cilla Black's accent
  • mimicking crying
  • mimicking deep voice
  • mimicking Donald Duck
  • mimicking finance lady
  • mimicking Geordie accent
  • mimicking German accent
  • mimicking girlie voice
  • mimicking Henry Cooper
  • mimicking Jamaican accent
  • mimicking Manchester accent
  • mimicking mentally handicapped
  • mimicking northern accent
  • mimicking Pakistani accent
  • mimicking refined accent
  • mimicking Scottish accent
  • mimicking stupid man's voice
  • mimicking Swedish accent
  • mimicking telephone voice
  • mimicking the German accent
  • mimicking whining
  • mimicking witch
  • mimicking Yorkshire accent
  • mimicking+screaming
  • moaning
  • mumbling
  • muttering
  • on telephone
  • praying
  • quoting
  • raising voice
  • rapping
  • reading
  • reading+laughing
  • reading+shouting
  • reading+whispering
  • screaming
  • shouting
  • shouting+laughing
  • shouting+spelling
  • sighing
  • singing
  • singing+laughing
  • singing+mimicking
  • singing+shouting
  • singing+whispering
  • singing+yawning
  • speaking as if mentally handicapped
  • speaking dramatically
  • speaking with mouth full
  • spelling
  • talking with mouth full
  • whingeing
  • whining
  • whispering
  • whispering+laughing
  • yawning
  • yawning+reading

Regional codes

The codes used to mark places of origin, regions, and dialects in the TEI Header are all derived from the same set of ISO 3-letter codes. The codes used are listed here:

CAN
Canada
CHN
China
DEU
Germany
FRA
France
GBR
United Kingdom
IND
India
IRL
Ireland
USA
United States
XXX
Unknown
ZZG
Europe
XDE
accent: German
XEA
accent: East Anglia
XFR
accent: French
XHC
accent: Home Counties
XHM
accent: Humberside
XIR
accent: Irish
XIS
accent: Indian subcontinent
XLC
accent: Lancashire
XLO
accent: London
XMC
accent: central Midlands
XMD
accent: Merseyside
XME
accent: north-east Midlands
XMI
accent: Midlands
XMS
accent: south Midlands
XMW
accent: north-west Midlands
XNC
accent: central northern England
XNE
accent: north-east England
XNO
accent: northern England
XOT
accent: other or unidentifiable
XSD
accent: Scottish
XSL
accent: lower south-west England
XSS
accent: central south-west England
XSU
accent: upper south-west England
XUR
accent: European
XUS
accent: U.S.A.
XWA
accent: Welsh
XWE
accent: West Indian

Relationship codes

Where relationships between individual participants in spoken texts can be identified, they will be specified by means of the <relation> element within the text header (as discussed in section ??). The type attribute of this element may take any of the values listed below. The number in parentheses indicates the number of times this value appears in the current version of the corpus.

acquaint
acquaintance (6)
audience
(4)
aunt
(8)
aunt-i-l
aunt-in-law (1)
b-friend
boyfriend (5)
b-i-l
brother-in-law (13)
b-sitter
baby sitter (2)
brother
(53)
chairman
(8)
child
(2)
church-m
church member (1)
cl-m-i-l
common law mother-in-law (1)
client
(1)
colleagu
colleague (123)
cous-i-l
cousin-in-law (1)
cousin
(7)
customer
(3)
d-i-l
daughter-in-law (11)
daughter
(84)
doctor
(77)
employee
(4)
employer
(9)
f-i-l
father-in-law (16)
father
(73)
fiance
(1)
fiancee
(2)
friend
(123)
g-aunt
great-aunt (1)
g-daught
granddaughter (15)
g-fath
grand-father (11)
g-friend
girlfriend (5)
g-moth
grandmother (21)
g-niece
great-niece (1)
g-son
grandson (17)
gg-daugh
great-granddaughter (1)
gg-moth
great-grandmother (1)
hairdres
hairdresser (1)
host
(1)
housekee
housekeeper (1)
husband
(103)
intervee
interviewee (42)
lecturer
(4)
m-i-l
mother-in-law (21)
mother
(117)
neighbou
neighbour (13)
neph-i-l
nephew-in-law (1)
nephew
(7)
niece
(9)
parent
(5)
patient
(76)
s-daught
step-daughter (1)
s-father
step-father (1)
secretar
secretary (3)
server
(2)
sib-i-l
sibling-in-law (1)
sibling
(1)
sis-i-l
sister-in-law (12)
sister
(48)
son
(71)
son-i-l
son-in-law (18)
speaker
(9)
stranger
(13)
student
(31)
teacher
(26)
trainee
(1)
trainer
(2)
tutor
(4)
uncle
(6)
visitor
(2)
wife
(104)

Text and genre classification codes

Texts are classified in several different ways in the BNC, as described in section ??. Each text carries a number of text classification codes, specified a string of values on the target attribute of its <catRefs> element. Possible values for these codes and their significance are listed in the corpus header (see ??). These values are also used in the BNC indexing files described in section ?? and distribution tables showing the number of texts, words, and sentences classified under most of them are given above in section ??.

One of the codes listed below is also supplied for each text as the content of a <classCode> element in its text header, as an alternative way of characterising each text. Full details of the analysis scheme used and its rationale are provided in an article by David Lee (Genres, registers, text types and styles: clarifying the concepts and navigating a path through the BNC Jungle, to be published in Language Learning and Technology, vol 5 no 3, September 2001) who has also generously agreed to make some of the results of this work available with the current release of the BNC.

Table 3. Genre classification for spoken texts
code texts
S_brdcast_discussn54
S_brdcast_documentary10
S_brdcast_news12
S_classroom59
S_consult128
S_conv153
S_courtroom13
S_demonstratn6
S_interview13
S_interview_oral_history119
S_lect_commerce3
S_lect_humanities_arts4
S_lect_nat_science4
S_lect_polit_law_edu7
S_lect_soc_science13
S_meeting132
S_parliament6
S_pub_debate16
S_sermon16
S_speech_scripted26
S_speech_unscripted51
S_sportslive4
S_tutorial18
S_unclassified44
Table 4. Genre classification for written texts
code texts
W_ac_humanities_arts87
W_ac_medicine24
W_ac_nat_science43
W_ac_polit_law_edu187
W_ac_soc_science142
W_ac_tech_engin23
W_admin12
W_advert60
W_biography100
W_commerce112
W_email7
W_essay_school7
W_essay_univ4
W_fict_drama2
W_fict_poetry31
W_fict_prose485
W_hansard4
W_institut_doc43
W_instructional15
W_letters_personal6
W_letters_prof11
W_misc501
W_news_script32
W_newsp_brdsht_nat_arts51
W_newsp_brdsht_nat_commerce44
W_newsp_brdsht_nat_editorial12
W_newsp_brdsht_nat_misc95
W_newsp_brdsht_nat_report49
W_newsp_brdsht_nat_science29
W_newsp_brdsht_nat_social36
W_newsp_brdsht_nat_sports24
W_newsp_other_arts15
W_newsp_other_commerce17
W_newsp_other_report27
W_newsp_other_reportage12
W_newsp_other_science23
W_newsp_other_social37
W_newsp_other_sports9
W_newsp_tabloid6
W_non_ac_humanities_arts116
W_non_ac_medicine17
W_non_ac_nat_science62
W_non_ac_polit_law_edu93
W_non_ac_soc_science128
W_non_ac_tech_engin123
W_pop_lore211
W_religion35

Word class codes

A full discussion of the principles and practice underlying the CLAWS word class annotation scheme used in the BNC is provided by the document Manual to accompany The British National Corpus (Version 2) with Improved Word-class Tagging, which is distributed with the BNC World Edition in HTML format.

For convenience, a brief list of the codes used by this scheme extracted from that manual is also provided here.
Tag Description
AJ0 Adjective (general or positive) (e.g. good, old, beautiful)
AJC Comparative adjective (e.g. better, older)
AJS Superlative adjective (e.g. best, oldest)
AT0 Article (e.g. the, a, an, no)
AV0 General adverb: an adverb not subclassified as AVP or AVQ (see below) (e.g. often, well, longer (adv.), furthest.
AVP Adverb particle (e.g. up, off, out)
AVQ Wh-adverb (e.g. when, where, how, why, wherever)
CJC Coordinating conjunction (e.g. and, or, but)
CJS Subordinating conjunction (e.g. although, when)
CJT The subordinating conjunction that
CRD Cardinal number (e.g. one, 3, fifty-five, 3609)
DPS Possessive determiner-pronoun (e.g. your, their, his)
DT0 General determiner-pronoun: i.e. a determiner-pronoun which is not a DTQ or an AT0.
DTQ Wh-determiner-pronoun (e.g. which, what, whose, whichever)
EX0 Existential there, i.e. there occurring in the there is ... or there are ... construction
ITJ Interjection or other isolate (e.g. oh, yes, mhm, wow)
NN0 Common noun, neutral for number (e.g. aircraft, data, committee)
NN1 Singular common noun (e.g. pencil, goose, time, revelation)
NN2 Plural common noun (e.g. pencils, geese, times, revelations)
NP0 Proper noun (e.g. London, Michael, Mars, IBM)
ORD Ordinal numeral (e.g. first, sixth, 77th, last) .
PNI Indefinite pronoun (e.g. none, everything, one [as pronoun], nobody)
PNP Personal pronoun (e.g. I, you, them, ours)
PNQ Wh-pronoun (e.g. who, whoever, whom)
PNX Reflexive pronoun (e.g. myself, yourself, itself, ourselves)
POS The possessive or genitive marker 's or '
PRF The preposition of
PRP Preposition (except for of) (e.g. about, at, in, on, on behalf of, with)
PUL Punctuation: left bracket - i.e. ( or [
PUN Punctuation: general separating mark - i.e. . , ! , : ; - or ?
PUQ Punctuation: quotation mark - i.e. ' or "
PUR Punctuation: right bracket - i.e. ) or ]
TO0 Infinitive marker to
UNC Unclassified items which are not appropriately considered as items of the English lexicon.
VBB The present tense forms of the verb BE, except for is, 's: i.e. am, are, 'm, 're and be [subjunctive or imperative]
VBD The past tense forms of the verb BE: was and were
VBG The -ing form of the verb BE: being
VBI The infinitive form of the verb BE: be
VBN The past participle form of the verb BE: been
VBZ The -s form of the verb BE: is, 's
VDB The finite base form of the verb BE: do
VDD The past tense form of the verb DO: did
VDG The -ing form of the verb DO: doing
VDI The infinitive form of the verb DO: do
VDN The past participle form of the verb DO: done
VDZ The -s form of the verb DO: does, 's
VHB The finite base form of the verb HAVE: have, 've
VHD The past tense form of the verb HAVE: had, 'd
VHG The -ing form of the verb HAVE: having
VHI The infinitive form of the verb HAVE: have
VHN The past participle form of the verb HAVE: had
VHZ The -s form of the verb HAVE: has, 's
VM0 Modal auxiliary verb (e.g. will, would, can, could, 'll, 'd)
VVB The finite base form of lexical verbs (e.g. forget, send, live, return) [Including the imperative and present subjunctive]
VVD The past tense form of lexical verbs (e.g. forgot, sent, lived, returned)
VVG The -ing form of lexical verbs (e.g. forgetting, sending, living, returning)
VVI The infinitive form of lexical verbs (e.g. forget, send, live, return)
VVN The past participle form of lexical verbs (e.g. forgotten, sent, lived, returned)
VVZ The -s form of lexical verbs (e.g. forgets, sends, lives, returns)
XX0 The negative particle not or n't
ZZ0 Alphabetical symbols (e.g. A, a, B, b, c, d)

In addition to the basic 57 codes tabulated above, the BNC World Edition uses thirty ‘portmanteau’ or ‘ambiguity’ tags. These are applied wherever the probabilities assigned by the CLAWS automatic tagger to its first and second choice tags were considered too low for reliable disambiguation. So, for example, the ambiguity tag AJ0-AV0 indicates that the choice between adjective (AJ0) and adverb (AV0) is left open, although the tagger has a preference for an adjective reading. The mirror tag, AV0-AJ0, again shows adjective-adverb ambiguity, but this time the more likely reading is the adverb.

The following table lists the ambiguity codes used in BNC World:
Ambiguity tag Ambiguous between More probable tag
AJ0-NN1 AJ0 or NN1 AJ0
AJ0-VVD AJ0 or VVD AJ0
AJ0-VVG AJ0 or VVG AJ0
AJ0-VVN AJ0 or VVN AJ0
AV0-AJ0 AV0 or AJ0 AV0
AVP-PRP AVP or PRP AVP
AVQ-CJS AVQ or CJS AVQ
CJS-AVQ CJS or AVQ CJS
CJS-PRP CJS or PRP CJS
CJT-DT0 CJT or DT0 CJT
CRD-PNI CRD or PNI CRD
DT0-CJT DT0 or CJT DT0
NN1-AJ0 NN1 or AJ0 NN1
NN1-NP0 NN1 or NP0 NN1
NN1-VVB NN1 or VVB NN1
NN1-VVG NN1 or VVG NN1
NN2-VVZ NN2 or VVZ NN2
NP0-NN1 NP0 or NN1 NP0
PNI-CRD PNI or CRD PNI
PRP-AVP PRP or AVP PRP
PRP-CJS PRP or CJS PRP
VVB-NN1 VVB or NN1 VVB
VVD-AJ0 VVD or AJ0 VVD
VVD-VVN VVD or VVN VVD
VVG-AJ0 VVG or AJ0 VVG
VVG-NN1 VVG or NN1 VVG
VVN-AJ0 VVN or AJ0 VVN
VVN-VVD VVN or VVD VVN
VVZ-NN2 VVZ or NN2 VVZ

Up: Contents