add this bookmarking tool

Miscellaneous tables

This section consists of a series of supplementary tables listing values used for some open or semi open value-lists, and other aspects of the corpus and its encoding not provided by the reference information in section bnctags.

The following code tables are provided:
  • XML tag usage by text type gives a breakdown of XML tag usage by text type
  • Voice quality codes lists the most frequent values used in the corpus for the new attribute on the <shift> element, to indicate changes in voice quality for spoken texts
  • Gap descriptions lists the most frequent values used in the corpus for the desc attribute on the <gap> element, to describe material not transcribed in spoken texts
  • Event descriptions lists the most frequent values used in the corpus for the desc attribute on the <event> element, which describes non-linguistic events noted by the transcriber of a spoken text
  • Speaker relationships lists the codes used to identify the roles relationships of participants, as specified in the role attribute on <person>
  • Text and genre classification codes lists the text-type codes making up David Lee's text classification system as applied to the BNC.
  • Contracted forms and multiwords lists all the multiword items identified by the CLAWS system and tagged as <mw> elements, together with the C5 wordclass tag assigned to each of their constituent parts
  • Simplified Wordclass Tags lists the mapping between simple POS code and CLAWS5 wordclass tags

XML tag usage by text type

Each of the 4055 texts in the BNC is categorized broadly by type (written fiction, written academic prose, spoken demographic, etc.). This table lists the usage of the various XML elements documented in this manual within the corpus, both in total and in each of the different text types. Note that elements which appear only in corpus or text headers are excluded.

Total ACPROSE CONVRSN FICTION NEWS NONAC OTHERPUB OTHERSP UNPUB
align 407023 66.96%
272552
33.03%
134471
bibl 1036 17.85%
185
10.90%
113
55.59%
576
15.54%
161
0.09%
1
c 13614363 14.55%
1981729
5.03%
684858
23.65%
3220541
8.68%
1182536
22.31%
3038629
16.64%
2266554
5.23%
712415
3.87%
527101
corr 17000 11.42%
1943
0.02%
5
7.86%
1337
11.07%
1882
28.34%
4819
28.94%
4921
0.18%
31
12.12%
2062
div 210145 12.04%
25308
1.73%
3640
3.10%
6518
18.31%
38484
20.35%
42778
33.35%
70090
0.07%
155
11.02%
23172
event 6943 36.85%
2559
63.14%
4384
gap 65159 21.16%
13790
7.67%
4998
0.35%
232
1.62%
1060
14.64%
9542
16.74%
10911
28.99%
18895
8.79%
5731
head 222085 10.71%
23797
2.62%
5836
22.11%
49108
21.74%
48288
33.44%
74283
9.35%
20773
hi 210508 27.84%
58613
12.50%
26315
0.14%
302
31.23%
65758
25.28%
53236
2.98%
6284
item 117237 27.82%
32621
0.74%
870
2.23%
2621
22.93%
26893
30.82%
36139
15.43%
18093
l 51310 2.59%
1333
71.39%
36631
0.17%
89
13.59%
6974
8.62%
4426
3.61%
1857
label 65697 43.83%
28799
0.65%
430
1.66%
1093
21.27%
13976
21.96%
14428
10.61%
6971
lg 3040 7.23%
220
54.53%
1658
0.23%
7
21.71%
660
11.71%
356
4.57%
139
list 19758 20.72%
4095
0.71%
142
1.63%
323
26.41%
5220
31.75%
6274
18.74%
3704
mw 792599 19.55%
155017
3.24%
25742
16.83%
133469
7.74%
61366
25.39%
201249
16.73%
132634
6.38%
50644
4.09%
32478
note 117 45.29%
53
0.85%
1
0.85%
1
5.12%
6
47.86%
56
p 1599693 8.78%
140550
27.13%
434019
17.95%
287171
18.18%
290826
20.35%
325612
7.59%
121515
pause 216354 64.98%
140589
35.01%
75765
pb 94620 26.16%
24760
25.60%
24224
0.15%
148
31.63%
29931
14.75%
13961
1.68%
1596
quote 15208 40.20%
6114
4.58%
698
0.03%
5
45.66%
6945
6.14%
934
3.36%
512
s 6026276 11.55%
696038
10.13%
610558
21.96%
1323573
8.43%
508609
18.83%
1135264
16.95%
1021633
7.09%
427523
5.02%
303078
shift 36053 70.90%
25564
29.09%
10489
sp 29112 0.21%
62
1.28%
373
4.76%
1386
35.69%
10391
58.05%
16900
speaker 23466 0.26%
62
1.58%
373
5.90%
1385
44.16%
10363
48.08%
11283
stage 507 1.38%
7
10.25%
52
5.71%
29
82.44%
418
0.19%
1
trunc 52674 38.69%
20382
61.30%
32292
u 784483 67.02%
525789
32.97%
258694
unclear 203045 62.39%
126686
37.60%
76359
vocal 43457 63.61%
27645
36.38%
15812
w 98363707 16.04%
15781859
4.30%
4233962
16.41%
16143913
9.56%
9412174
24.58%
24179010
18.26%
17970212
6.27%
6175896
4.54%
4466681

Voice quality codes

Changes in voice quality in spoken texts are indicated by values for the <new> attribute on a <shift> element, at the point where the speaker's voice change. 156 distinct values are used, but most of them appear only infrequently. The following list gives the values which appear more than 10 times in the whole corpus:
voice qualitynumber such
laughing9268
reading2463
singing2045
shouting1419
whispering1247
yawning363
sighing276
mimicking241
spelling224
crying108
screaming97
whining40
whingeing38
praying23
reading bible22
reading newspaper20
reading+laughing15
reading book14
on telephone11

Gap descriptions

Where material is omitted for some reason during the transcription of a text, either written or spoken, the <gap> element is used to provide a brief description of the material omitted and the reason for its exclusion. The desc attribute supplies the description, and the cause attribute explains why it was done. Over 1700 distinct descriptions are used, but most of them appear only infrequently. The following list gives the 65 values which appear more than 25 times in the whole corpus:
material omittednumber such
name29698
formula12476
figure4060
address3914
many nonRoman characters2835
telephone number2173
table1393
illustration903
photograph338
footnote197
references etc.188
date172
list of names171
picture144
personal name142
advert141
reference133
adverts126
list123
period quotation112
name and address101
phonetic transcription95
list of venues95
table of contents92
names91
ingredients91
publication details84
footnotes83
contents omitted72
contents65
list of ingredients64
diagram60
hebrew59
address, telephone number etc.50
tel. no47
cover page47
text46
venue, dates, times, prices etc.46
venue46
telephone no44
number41
names and addresses41
other venues41
venues40
form39
personal names38
computer code38
gaelic36
email address36
author details35
address and telephone35
cover omitted35
company name34
credits33
caption31
dates30
name and phone number29
sales details28
address, dates, times, prices etc.28
period quotation/verse27
notes27
map26
period/overseas quotation25
quotation25
The cause for a gap in transcription is usually self-evident, which may be why only a small number of values is used for the cause attribute. The following four values are the most significant:
reason for ommissionnumber such
anonymization31924
label303
sampling strategy115
repeated elsewhere6

Event descriptions

The <event> element is used in spoken texts to mark wherever some non-linguistic but significant event is noted by a transcriber. The brief texts used to describe such events are very various, and there are more than 1500 different values for the desc attribute which stores them. The following lists shows the 60 or so values which appear more than 10 times in the whole corpus:
Event descriptionnumber such
clapping1134
music397
recorded jingle330
break in recording310
speaking french212
pre-recorded blurb199
too quiet to hear180
recording ends177
tape change158
phonecall starts138
phone rings138
phonecall ends125
tape breaks here120
piano music109
paper rustling93
tv on80
people talking77
applause67
dog barks63
tape jumps62
dog barking55
baby talk55
jingle47
banging47
classroom chatter47
talk in background41
people laughing35
advert35
door knock33
noise - traffic30
playing piano30
tape ends28
door bell27
noise - background26
portuguese speech24
writing on board24
hits ball23
music in background21
speaking italian19
television19
baby crying18
door closing17
bell ring17
singing16
crockery noise16
noseblow16
telephone conversation ends16
talking from other room16
everyone talking15
radio on15
noise15
door opening15
introduction music15
drilling noise14
microphone moved14
plane overhead13
knocking13
phonic13
closing music12
noise - train12
cat noise12
clicks fingers11
speaking german11

Speaker relationships

In demographically sampled texts, the role of each speaker with respect to the respondent is supplied by the role attribute on the <person> element. The following table lists all 79 values used in the curent version of the corpus in descending frequency order.

role name persons
unspecified 6862
other 1454
friend 654
? 354
self 306
colleague 216
daughter 102
son 100
husband 68
wife 66
mother 64
stranger 52
neighbour 50
father 42
sister 42
brother 38
mother-in-law 22
sister-in-law 22
teacher 22
acquaintance 18
brother-in-law 18
employee 14
son-in-law 14
father-in-law 12
granddaughter 12
niece 12
chairman 10
grandson 10
daughter-in-law 8
nephew 8
aunt 6
boyfriend 6
customer 6
girlfriend 6
babysitter 4
cousin 4
fiancée 4
friend's son 4
grandmother 4
lecturer 4
son's teacher 4
aunt-in-law 2
boss 2
boyfriend's father 2
boyfriend's mother 2
brother's friend 2
brother-in-law's mother 2
child's teacher 2
cousin-in-law 2
cousin-in-law's son 2
cousin-in-law's wife 2
daughter's boyfriend 2
daughter's friend 2
employee's wife 2
friend's brother 2
friend's father 2
friend's granddaughter 2
friend's mother 2
friend's sister 2
grandmother-in-law 2
hairdresser 2
hairdresser's son 2
housekeeper 2
husband's great-niece 2
husband's niece 2
neighbour's son 2
partner 2
partner's mother 2
plumber 2
sister's friend 2
sister's friend's mother 2
sister-in-law's father 2
sister-in-law's mother 2
son's friend 2
step-father 2
stepfather 2
uncle 2
visitor 2

Text and genre classification codes

Texts are classified in several different ways in the BNC, as described in section Text classification . Each text carries a number of text classification codes, specified a string of values on the target attribute of its <catRefs> element. Possible values for these codes and their significance are listed in the corpus header (see The BNC corpus header). These values are also used in the BNC indexing files described in section Creating a subcorpus and distribution tables showing the number of texts, words, and sentences classified under most of them are given above in section Design of the corpus.

One of the codes listed below is also supplied for each text as the content of a <classCode> element in its text header, as an alternative way of characterising each text. A description of the analysis scheme used and its rationale are provided in an article by David Lee (Genres, registers, text types and styles: clarifying the concepts and navigating a path through the BNC Jungle (Language Learning and Technology, vol 5 no 3, September 2001; available online at http://http://llt.msu.edu/vol5num3/lee/default.html. The codes used in the present version of the corpus have been updated to take note of a small number of corrections made by Lee on his web site (http://clix.to/davidlee00) since publication of the article.

Table 37. Genre classification for spoken texts
code texts
S_brdcast_discussn 54
S_brdcast_documentary 10
S_brdcast_news 12
S_classroom 59
S_consult 128
S_conv 153
S_courtroom 13
S_demonstratn 6
S_interview 13
S_interview_oral_history 119
S_lect_commerce 3
S_lect_humanities_arts 4
S_lect_nat_science 4
S_lect_polit_law_edu 7
S_lect_soc_science 13
S_meeting 132
S_parliament 6
S_pub_debate 16
S_sermon 16
S_speech_scripted 26
S_speech_unscripted 51
S_sportslive 4
S_tutorial 18
S_unclassified 44
Table 38. Genre classification for written texts
code texts
W_ac_humanities_arts 87
W_ac_medicine 24
W_ac_nat_science 43
W_ac_polit_law_edu 187
W_ac_soc_science 142
W_ac_tech_engin 23
W_admin 12
W_advert 60
W_biography 100
W_commerce 112
W_email 7
W_essay_school 7
W_essay_univ 4
W_fict_drama 2
W_fict_poetry 31
W_fict_prose 485
W_hansard 4
W_institut_doc 43
W_instructional 15
W_letters_personal 6
W_letters_prof 11
W_misc 501
W_news_script 32
W_newsp_brdsht_nat_arts 51
W_newsp_brdsht_nat_commerce 44
W_newsp_brdsht_nat_editorial 12
W_newsp_brdsht_nat_misc 95
W_newsp_brdsht_nat_report 49
W_newsp_brdsht_nat_science 29
W_newsp_brdsht_nat_social 36
W_newsp_brdsht_nat_sports 24
W_newsp_other_arts 15
W_newsp_other_commerce 17
W_newsp_other_report 27
W_newsp_other_reportage 12
W_newsp_other_science 23
W_newsp_other_social 37
W_newsp_other_sports 9
W_newsp_tabloid 6
W_non_ac_humanities_arts 116
W_non_ac_medicine 17
W_non_ac_nat_science 62
W_non_ac_polit_law_edu 93
W_non_ac_soc_science 128
W_non_ac_tech_engin 123
W_pop_lore 211
W_religion 35

Contracted forms and multiwords

The following tables summarize and document the tokenization decisions taken by the CLAWS system, where these do not coincide with normal orthographic convention.

The first list specifies common word-endings or enclitics which are regarded by CLAWS as indicating the start of a new ‘word’, although words containing them are conventionally represented as a single orthographic word.

The second list specifies some common two, three or four word phrases treated by CLAWS as single tokens. These are represenmted in this version of the corpus by means of a <mw> element; the table gives the C5 code assigned to this element, and also the codes assigned to the distinct <w> elements constituting it.

Contracted forms

Words ending with certain character strings are treated by CLAWS as distinct words, even though they are conventionally fused together when written. For example, ‘they're’ is treated as if it were two distinct ‘words’ — they and 're. The fact that these two items are orthographically fused is evident in the XML encoding of the corpus because there is no whitespace following the string ‘they’. Some XML processors may however assume that the end of an XML element such as the <w> enclosing the string should always be treated as a word separator, and may therefore introduce unwanted extra space.

In the following table we show how contracted forms are tokenized by CLAWS. The left column shows the contracted form; the right column shows the content of the two or more <w> elements used to represent it.

orthographic form tokenization
[word]'d [word] 'd
[word]'m [word] 'm
[word]'s [word] 's
[word]'ll [word] 'll
[word]n't [word] n't
[word]'re [word] 're
[word]'v [word] 'v
'd've 'd 've
'tis 't is
'twas 't was
'twere 't were
'twould 't would
I'd've I 'd 've
ain't ai n't
aint ai nt
aintcha ai nt cha
arent are nt
c'mon c'm on
can't ca n't
cannot can not
couldnt could nt
d'ya d' ya
d'you d' you
didnt did nt
doesnt does nt
dont do nt
dunnit dun n it
dunno du n no
geroff ger off
gimme gim me
gonna gon na
gotta got ta
hadnt had nt
hasnt has nt
he'd''ve he 'd' 've
hes he s
inne in n e
innit in n it
isnt is nt
it'd've it 'd 've
lorra lor ra
m'lud m' lud
ought'a ough t 'a
oughta ought a
shan't sha n't
she'd've she 'd 've
shouldn't've should n't 've
shouldn't should n't
t'other t' other
thats that s
theres there s
they'd've they 'd 've
theyve they ve
tis t is
twas t was
twere t were
twould t would
wanna wan na
wannit wann it
wasnt was nt
we'd've we 'd 've
weve we ve
won't wo n't
wotta wott a
wouldn't've would n't 've
wouldnt would nt
you'd've you 'd 've

Multiwords

CLAWS recognizes certain sequences of orthographically distinct words as constituting a single item: examples include common prepositional phrases such as ‘in spite of’, as well as phrases from other languages such as ‘aide memoire’. In this version of the corpus, such items are explicitly tagged using an XML <mw> (for multiword) tag carrying the appropriate wordclass tag, as indicated below. Within this <mw> element however, in a departure from earlier versions of the corpus, the individual words are also tagged using <w> tags in the same way as elsewhere in the corpus.

The following table lists all multiwords recognized in the corpus alphabetically, indicating both the wordclass codes assigned to it, and also the wordclass codes assigned to its constituent <w> elements. Note that these latter wordclass codes were assigned automatically during the XML conversion process and therefore should not be included in any assessment of the CLAWS error rate.

multiword mw wordclass/es constituent wordclasses
ab initio AV0 or AJ0 UNC UNC
a bit AV0 AT0 NN1
a capella AJ0 or AV0 UNC UNC
according as CJS VVG CJS
according to PRP VVG PRP
ad astra AV0 or AJ0 UNC UNC
ad hoc AV0 or AJ0 UNC UNC
ad hominem AV0 or AJ0 UNC UNC
ad infinitum AV0 UNC UNC
adjacent to PRP AJ0 PRP
ad lib AJ0 or AV0 or NN1 UNC UNC
ad nauseam AV0 or AJ0 UNC UNC
affaire de coeur NN1 UNC UNC UNC
affaire d'honneur NN1 UNC UNC
a fortiori AV0 or AJ0 UNC UNC
agent provocateur NN1 UNC UNC
agnus dei NN1 UNC UNC
a good deal AV0 AT0 AJ0 NN1
a great deal AV0 AT0 AJ0 NN1
ahead of PRP AV0 PRF
a heck of a lot AV0 AT0 NN1 PRF AT0 NN1
aide de camp NN1 UNC UNC UNC
aide memoire NN1 UNC UNC
a la PRP UNC UNC
a la carte AJ0 or AV0 UNC UNC UNC
a la mode AJ0 or AV0 UNC UNC UNC
al dente AJ0 or AV0 UNC UNC
al fresco AV0 or AJ0 UNC UNC
a little AV0 AT0 AV0/DT0
a little bit AV0 AT0 AJ0 NN1
alla breve AV0 or AJ0 or NN0 UNC UNC
all but AV0 AV0 CJS
all of a sudden AV0 DT0 PRF AT0 NN1
all right AV0 or AJ0 AV0 AV0
all the same AV0 DT0 AT0 DT0
alma mater NN1 UNC UNC
along with PRP AVP PRP
a lot AV0 AT0 NN1
alter ego NN1 UNC UNC
an' all AV0 CJC DT0
an awful lot AV0 AT0 AJ0 NN1
ancien regime NN1 UNC UNC
and so forth AV0 CJC AV0 AV0
and so on AV0 CJC AV0 AV0
anno domini AV0 or NN1 UNC UNC
annus horribilis NN1 UNC UNC
annus mirabilis NN1 UNC UNC
ante meridiem AV0 UNC UNC
any longer AV0 AV0 AV0
anything but AV0 PNI AV0
apart from PRP AV0 PRP
a posteriori AV0 or AJ0 UNC UNC
a priori AV0 or AJ0 UNC UNC
a propos PRP or AV0 UNC UNC
aqua vitae NN1 UNC UNC
art nouveau NN1 UNC UNC
as against PRP CJS PRP
as between PRP CJS PRP
as for PRP CJS PRP
as from PRP CJS PRP
aside from PRP AV0 PRP
as if CJS CJS CJS
as it were AV0 CJS PNP VBD
as long as CJS AV0 AV0 CJS
as of PRP CJS PRF
as opposed to PRP CJS VVN PRP
as regards PRP CJS VVZ
as soon as CJS AV0 AV0 CJS
as though CJS CJS CJS
asti spumante NN1 UNC UNC
as to PRP CJS PRP
as usual AV0 CJS AJ0
as well as PRP AV0 AV0 CJS
as well AV0 CJS AV0
as yet AV0 CJS AV0
at all AV0 PRP DT0
at best AV0 PRP AJS
at first AV0 PRP ORD
at large AV0 PRP AJ0
at last AV0 PRP ORD
at least AV0 PRP AV0
at length AV0 PRP NN1
at long length AV0 PRP AJ0 NN1
at most AV0 PRP DT0
at once AV0 PRP AV0
at present AV0 PRP NN1
at random AV0 PRP AJ0
at worst AV0 PRP AV0
au contraire AV0 UNC UNC
au fait AJ0 UNC UNC
auf wiedersehen ITJ UNC UNC
au pair NN1 UNC UNC
au revoir ITJ UNC UNC
aurora australis NN1 UNC UNC
aurora borealis NN1 UNC UNC
avant garde NN1 or AJ0 UNC UNC
away from PRP AV0 PRP
bar mitzvah NN1 or AJ0 UNC UNC
basso profundo NN1 UNC UNC
beau monde NN1 UNC UNC
because of PRP CJS PRF
belles lettres NN2 UNC UNC
bete noire NN1 UNC UNC
billet doux NN1 UNC UNC
bona fides NN2 UNC UNC
bona fide AJ0 UNC UNC
bon appetit ITJ UNC UNC
bon mot NN1 UNC UNC
bon vivant NN1 UNC UNC
bon viveur NN1 UNC UNC
bon voyage ITJ UNC UNC
brand new AJ0 NN1 AJ0
but for PRP CJS PRP
by and by AV0 AVP CJC AVP
by and large AV0 AVP CJC AJ0
by far AV0 PRP AV0
by means of PRP PRP NN0 PRF
by no means AV0 PRP PRP NN0
by now AV0 PRP AV0
by reason of PRP PRP NN1 PRF
by the by AV0 PRP AT0 NN1
by way of PRP PRP NN1 PRF
cafe au lait NN1 UNC UNC UNC
camera obscura NN1 UNC UNC
carte blanche NN1 UNC UNC
casus belli NN1 UNC UNC
cause celebre NN1 UNC UNC
ceteris paribus AV0 UNC UNC
chaise longue NN1 UNC UNC
charge d'affaires NN1 UNC UNC
chez moi AV0 UNC UNC
chez nous AV0 UNC UNC
chilli con carne NN1 NN1 NN1 NN1
chop suey NN1 UNC UNC
chow mein NN1 UNC UNC
clamp down NN1 VVB/VVI AVP
close to AV0 AV0/AJ0 PRP
compos mentis AJ0 UNC UNC
con brio AJ0 or AV0 UNC UNC
con fuoco AJ0 or AV0 UNC UNC
con moto AJ0 or AV0 UNC UNC
considering that CJS VVG CJT
contrary to PRP JJ PRP
cordon bleu NN1 UNC UNC
cordon sanitaire NN1 UNC UNC
corpus delicti NN1 UNC UNC
corpus juris NN1 UNC UNC
coup de grace NN1 UNC UNC UNC
coup d'etat NN1 UNC UNC
coup de theatre NN1 UNC UNC UNC
creme de la creme NN1 UNC UNC UNC UNC
creme de menthe NN1 UNC UNC UNC
cri de coeur NN1 UNC UNC UNC
croix de guerre NN0 UNC UNC UNC
cul de sac NN1 UNC UNC UNC
danse macabre NN1 UNC UNC
de facto AV0 or AJ0 UNC UNC
dei gratia AV0 UNC UNC
deja vu NN1 UNC UNC
de jure AV0 or AJ0 UNC UNC
delirium tremens NN1 UNC UNC
de luxe AJ0 UNC UNC
demi monde NN1 UNC UNC
depending on PRP VVG PRP
de profundis AV0 UNC UNC
de rigeur AJ0 UNC UNC
de trop AJ0 UNC UNC
deus ex machina NN1 UNC UNC UNC
double entendre NN1 UNC UNC
dramatis personae NN2 UNC UNC
due to PRP AJ0 PRP
each other PNX DT0 NN1
eminence grise NN1 UNC UNC
en bloc AV0 UNC UNC
en famille AV0 UNC UNC
enfants terribles NN2 UNC UNC
enfant terrible NN1 UNC UNC
en masse AV0 UNC UNC
en passant AV0 UNC UNC
en route AV0 UNC UNC
en suite AJ0 UNC UNC
entente cordiale NN1 UNC UNC
esprit de corps NN1 UNC UNC UNC
et al AV0 UNC UNC
et cetera AV0 UNC UNC
even if CJS AV0 CJS
even so AV0 AV0 AV0
even though CJS AV0 CJS
even when CJS AV0 CJS
ever so AV0 AV0 AV0
every so often AV0 AT0 AV0 AV0
ex army AJ0 PRP NN1
ex cathedra AV0 or AJ0 UNC UNC
except for PRP CJS PRP
excepting for PRP VVG PRP
except that CJS CJS CJT
ex gratia AV0 or AJ0 UNC UNC
ex libris AV0 UNC UNC
ex officio AV0 or AJ0 UNC UNC
ex parte AV0 or AJ0 UNC UNC
ex tempore AV0 or AJ0 UNC UNC
fait accompli NN1 UNC UNC
far from AV0 AV0 PRP
far off AJ0 AV0 AVP
faux amis NN2 UNC UNC
faux ami NN1 UNC UNC
faux pas NN0 UNC UNC
fed up AJ0 VVN AVP
femme fatale NN1 UNC UNC
fin de siecle NN1 UNC UNC UNC
follow up NN1 VVB/VVI AVP
force majeure NN1 UNC UNC
for certain AV0 PRP AJ0
for ever AV0 PRP AV0
for example AV0 PRP NN1
for fear of PRP PRP NN1 PRF
for good AV0 PRP AJ0
for instance AV0 PRP NN1
for keeps AV0 PRP NN2
for long AV0 PRP AV0
for once AV0 PRP AV0
for sure AV0 PRP AJ0
for the most part AV0 PRP AT0 AV0 NN1
for the time being AV0 PRP AT0 NN1 VBG
fromage frais NN1 UNC UNC
from now on AV0 PRP AV0 AVP
from time to time AV0 PRP NN1 PRP NN1
getting on for AV0 VVG AVP PRP
grande dame NN1 UNC UNC
grand prix NN1 UNC UNC
grown ups NN2 VVN NN2
grown up NN1 VVN AVP
gung ho AJ0 or AV0 UNC UNC
habeas corpus NN1 UNC UNC
half way AV0 DT0 NN1
hara kiri NN1 UNC UNC
hard up AJ0 AJ0 AVP
hasta la vista ITJ UNC UNC UNC
hasta luego ITJ UNC UNC
haute couture NN1 UNC UNC
haute cuisine NN1 UNC UNC
have nots NN2 VHB NN2
hey presto ITJ ITJ ITJ
hoi polloi NN0 UNC UNC
homo sapiens NN1 UNC UNC
hors d'oeuvres NN2 UNC UNC
hors d'oeuvre NN1 UNC UNC
hysteron proteron NN1 UNC UNC
idee fixe NN1 UNC UNC
in absentia AV0 UNC UNC
in accordance with PRP PRP NN1 PRP
in accord with PRP PRP NN1 PRP
in addition AV0 PRP NN1
in addition to PRP PRP NN1 PRP
in aid of PRP PRP NN1 PRF
in answer to PRP PRP NN1 PRP
in as much as CJS PRP AV0 DT0 CJS
inasmuch as CJS UNC CJS
in association with PRP PRP NN1 PRP
in back of PRP PRP NN1 PRF
in between PRP or AV0 AVP PRP/AV0
in brief AV0 PRP AJ0
in camera AV0 UNC UNC
in case of PRP PRP NN1 PRF
in case CJS or AV0 PRP NN1
in charge of PRP PRP NN1 PRF
in common AV0 PRP AJ0
in common with PRP PRP NN1 PRP
in comparison with PRP PRP NN1 PRP
in conjunction with PRP PRP NN1 PRP
in connection with PRP PRP NN1 PRP
in consultation with PRP PRP NN1 PRP
in contact with PRP PRP NN1 PRP
in cooperation with PRP PRP NN1 PRP
in course with PRP PRP NN1 PRP
in defence of PRP PRP NN1 PRF
in defiance of PRP PRP NN1 PRF
in excess of PRP PRP NN1 PRF
in extremis AV0 UNC UNC
in face of PRP PRP NN1 PRF
in favor of PRP PRP NN1 PRF
in favour of PRP PRP NN1 PRF
in flagrante delicto AV0 or AJ0 UNC UNC UNC
in front of PRP PRP NN1 PRF
in full AV0 PRP AJ0
in general AV0 PRP AJ0
in keeping with PRP PRP NN1 PRP
in lieu of PRP PRP UNC PRF
in light of PRP PRP NN1 PRF
in line with PRP PRP NN1 PRP
in loco parentis AV0 or AJ0 UNC UNC UNC
in medias res AV0 UNC UNC UNC
in memoriam AV0 UNC UNC
in need of PRP PRP NN1 PRF
in particular AV0 PRP AJ0
in perpetuum AV0 UNC UNC
in place of PRP PRP NN1 PRF
in possession of PRP PRP NN1 PRF
in private AV0 PRP AJ0
in proportion to PRP PRP NN1 PRP
in propria persona AV0 UNC UNC UNC
in public AV0 PRP AJ0
in pursuit of PRP PRP NN1 PRF
in quest of PRP PRP NN1 PRF
in receipt of PRP PRP NN1 PRF
in regard to PRP PRP NN1 PRP
in relation to PRP PRP NN1 PRP
in reply to PRP PRP NN1 PRP
in respect of PRP PRP NN1 PRF
in response to PRP PRP NN1 PRP
in return for PRP PRP NN1 PRP
in search of PRP PRP NN1 PRF
in short AV0 PRP AJ0
inside out AV0 or AJ0 AV0 AVP
in situ AV0 UNC UNC
in so far as CJS PRP AV0 AV0 CJS
insofar as CJS UNC CJS
in spite of PRP PRP NN1 PRF
instead of PRP AV0 PRF
in support of PRP PRP NN1 PRF
inter alia AV0 UNC UNC
in terms of PRP PRP NN2 PRF
in that CJS PRP CJT
in the light of PRP PRP AT0 NN1 PRF
in the main AV0 PRP AT0 AJ0
in the order of AV0 PRP AT0 NN1 PRF
into line with PRP PRP NN1 PRP
in toto AV0 or AJ0 UNC UNC
in touch with PRP PRP NN1 PRP
in vain AV0 PRP AJ0
in view of PRP PRP NN1 PRF
in vitro AJ0 or AV0 UNC UNC
in vivo AJ0 or AV0 UNC UNC
ipso facto AV0 UNC UNC
irrespective of PRP AJ0 PRF
je ne sais quoi NN1 UNC UNC UNC UNC
joie de vivre NN1 UNC UNC UNC
just about AV0 AV0 AV0
just about AV0 AV0 AV0
kind of AV0 NN1 PRF
know how NN1 VVB AVQ
kung fu NN1 UNC UNC
la dolce vita NN1 UNC UNC UNC
laissez faire NN1 UNC UNC
le mot juste NN1 UNC UNC UNC
less than AV0 AV0/DT0 CJS
let alone PRP VVB AJ0
let 's VM0 VVB PNP
lingua franca NN0 UNC UNC
lo and behold ITJ ITJ CJC VVB
loc cit AV0 UNC UNC
locum tenens NN1 UNC UNC
long-term wise AV0 AJ0 AV0
magna carta NN1 UNC UNC
magna cum laude AJ0 or AV0 UNC UNC UNC
magnum opus NN1 UNC UNC
maitre d'hotel NN1 UNC UNC
mal de mer NN1 UNC UNC UNC
matter of fact NN1 or AJ0 NN1 PRF NN1
mea culpa NN1 UNC UNC
medecins sans frontieres NN0 UNC UNC UNC
medicins sans frontieres NN0 UNC UNC UNC
menage a trois NN1 UNC UNC UNC
mezzo soprano NN1 UNC UNC
modus operandi NN1 UNC UNC
modus vivendi NN1 UNC UNC
more than AV0 AV0/DT0 CJS
more than AV0 AV0 CJS
mot juste NN1 UNC UNC
nearer to PRP AJC/AV0 PRP
nearest to PRP AJS/AV0 PRP
near to PRP AJ0/AV0 PRP
nem con AV0 UNC UNC
next to PRP ORD PRP
nigh on AV0 AV0 AVP
noblesse oblige NN1 UNC UNC
no doubt AV0 AT0 NN1
no longer AV0 AV0 AV0
no matter who PNQ AT0 NN1 PNQ
no matter whom PNQ AT0 NN1 PNQ
no matter whose DTQ AT0 NN1 DTQ
nom de guerre NN1 UNC UNC UNC
nom de plume NN1 UNC UNC UNC
non compos mentis AJ0 UNC UNC UNC
none other PNI PNI AJ0
none the less AV0 PNI AT0 AV0
none the AV0 PNI AT0
non sequitur NN1 UNC UNC
no one PNI AT0 PNI
not withstanding AV0 XX0 UNC
nouveau riche NN1 UNC UNC
nouveaux riches NN2 UNC UNC
nouvelle cuisine NN1 UNC UNC
now that CJS AV0 CJT
objet d'art NN1 UNC UNC
objets d'art NN2 UNC UNC
of course AV0 PRF NN1
off guard AV0 PRP NN1
off of PRP AVP PRF
oft times AV0 AV0 NN2
old fashioned AJ0 AJ0 VVN
on account of PRP PRP NN1 PRF
on behalf of PRP PRP NN1 PRF
on board PRP or AV0 PRP NN1
once again AV0 AV0 AV0
once and for all AV0 AV0 CJC PRP DT0
once more AV0 AV0 AV0
one another PNX CRD DT0
one 's CRD CRD POS
on the part of PRP PRP AT0 NN1 PRF
on top of PRP PRP NN1 PRF
on to PRP AVP PRP/TO0
op cit AV0 UNC UNC
other than PRP AJ0 CJS
out of PRP AVP PRF
out of date AJ0 AVP PRF NN1
out of line with PRP AVP PRF NN1 PRP
out of touch with PRP AVP PRF NN1 PRP
outside of PRP PRP PRF
over here AV0 PRP AV0
over there AV0 PRP AV0
owing to PRP VVG PRP
papier mache NN1 UNC UNC
par excellence AJ0 UNC UNC
pas de deux NN0 UNC UNC UNC
pate de foie gras NN1 UNC UNC UNC UNC
pax britannica NN1 UNC UNC
pax romana NN1 UNC UNC
per annum AV0 UNC UNC
per capita AV0 or AJ0 UNC UNC
per cent NN0 UNC UNC
per diem AV0 or AJ0 or NN1 UNC UNC
per se AV0 UNC UNC
personae non gratae NN2 UNC UNC UNC
persona non grata NN1 UNC UNC UNC
pertaining to PRP VVG PRP
petit bourgeois NN1 UNC UNC
petite bougeoisie NN1 UNC UNC
petits bourgeois NN2 UNC UNC
piece de resistance NN1 UNC UNC UNC
pied a terre NN1 UNC UNC UNC
pina colada NN1 UNC UNC
pince nez NN0 UNC UNC
poco a poco AV0 UNC UNC UNC
point blank AV0 or AJ0 NN1 AJ0
poste restante NN1 or AV0 UNC UNC
post hoc AV0 or AJ0 UNC UNC
post meridiem AV0 UNC UNC
post mortem NN1 or AJ0 UNC UNC
pot pourri NN1 UNC UNC
prima donna NN1 UNC UNC
prima facie AJ0 or AV0 UNC UNC
primus inter pares NN1 UNC AJ0 UNC
prior to PRP AJ0 PRP
pro forma NN1 UNC UNC
pro rata AV0 or AJ0 UNC UNC
pro tem AV0 UNC UNC
provided that CJS VVN CJT
providing that CJS VVG CJT
pursuant to PRP AJ0 PRP
quid pro quo NN1 UNC UNC UNC
raison d'etre NN1 UNC UNC
rather than PRP AV0 CJS
relative to PRP AJ0 PRP
rigor mortis NN1 UNC UNC
roman a clef NN1 UNC UNC UNC
save for PRP VVI PRP
save that CJS VVI CJT
savoir faire NN1 UNC UNC
savoir vivre NN1 UNC UNC
seeing as CJS VVG CJS
seeing that CJS VVG CJT
semper fidelis AJ0 UNC UNC
shish kebab NN1 UNC UNC
sine die AV0 UNC UNC
sine qua non NN1 UNC UNC UNC
sinn fein NN1 UNC UNC
so called AJ0 AV0 VVN
so long as CJS AV0 AV0 CJS
some one PNI DT0 PNI
something like AV0 PNI PRP
so much as AV0 AV0 DT0 CJS
son et lumiere NN1 UNC UNC UNC
sort of AV0 NN1 PRF
so that CJS AV0 CJT
sotto voce AV0 or AJ0 UNC UNC
spaghetti bolognese NN1 UNC UNC
spot on AV0 or AJ0 NN1 AVP
status quo NN1 UNC UNC
straight forward AJ0 AV0 AJ0
subject to PRP AJ0 PRP
sub judice AV0 or AJ0 UNC UNC
sub poena NN1 UNC UNC
subsequent to PRP AJ0 PRP
such as PRP DT0 PRP
such that CJS DT0 CJT
sui generis AJ0 UNC UNC
sui juris AJ0 UNC UNC
summa cum laude AJ0 or AV0 UNC UNC UNC
super duper AJ0 AJ0 XXX
supposing that CJS VVG PRP
table d'hote NN1 UNC UNC
tabula rasa NN1 UNC UNC
tai chi NN1 UNC UNC
tai kwan do NN1 UNC UNC UNC
terra firma NN1 UNC UNC
terra incognita NN1 UNC UNC
thanks to PRP NN2 PRP
that is AV0 DT0 VBZ
that is to say AV0 DT0 VBZ TO0 VVI
through thick and thin AV0 PRP AJ0 CJC AJ0
time and again AV0 NN1 CJC AV0
to and fro AV0 PRP CJC AV0
tour de force NN1 UNC UNC UNC
tout court AJ0 UNC UNC
tout de suite AV0 UNC UNC UNC
ultra vires AJ0 or AV0 UNC UNC
under way AV0 PRP NN1
up front AJ0 or AV0 AVP AJ0
upside down AV0 or AJ0 NN1 AVP
up to PRP or AV0 AVP PRP/TO0
up to date AJ0 AVP TO0 NN1
up to the minute AJ0 AVP PRP AT0 NN1
up until PRP AVP CJS/PRP
upward of AV0 AV0 PRF
upwards of AV0 AV0 PRF
vice versa AV0 UNC UNC
vin de table NN1 UNC UNC UNC
vis a vis PRP UNC UNC UNC
viva voce NN1 or AJ0 or AV0 UNC UNC
vol au vent NN1 UNC UNC UNC
volte face NN1 UNC UNC
vox populi NN1 UNC UNC
well off AJ0 AV0 AVP
whether or not CJS CJS CJC XX0
wiener schnitzel NN1 UNC UNC
with a view to PRP PRP AT0 NN1 PRP
with reference to PRP PRP NN1 PRP
with regard to PRP PRP NN1 PRP
with relation to PRP PRP NN1 PRP
with respect to PRP PRP NN1 PRP

Simplified Wordclass Tags

This table lists, for each of the twelve simplified wordclass tags used by the pos attribute, the corresponding CLAWS C5 tags of which the class consists.
POS valuesignificancecombines
ADJadjectiveAJ0, AJC, AJS, CRD, DT0, ORD
ADVadverbAV0, AVP, AVQ, XX0
ARTarticleAT0
CONJconjunctionCJC, CJS, CJT
INTERJinterjectionITJ
PREPprepositionPRF, PRP, TO0
PRONpronounDPS, DTQ, EX0, PNI, PNP, PNQ, PNX
STOPpunctuationPOS, PUL, PUN, PUQ, PUR
SUBSTsubstantiveNN0, NN1, NN2, NP0, ONE, ZZ0, NN1-NP0, NP0-NN1
UNCunclassified, uncertain, or non-lexical wordUNC, AJ0-AV0, AV0-AJ0, AJ0-NN1, NN1-AJ0, AJ0-VVD, VVD-AJ0, AJ0-VVG, VVG-AJ0, AJ0-VVN, VVN-AJ0, AVP-PRP, PRP-AVP, AVQ-CJS, CJS-AVQ, CJS-PRP, PRP-CJS, CJT-DT0, DT0-CJT, CRD-PNI, PNI-CRD, NN1-VVB, VVB-NN1, NN1-VVG, VVG-NN1, NN2-VVZ, VVZ-NN2
VERBverbVBB, VBD, VBG, VBI, VBN, VBZ, VDB, VDD, VDG, VDI, VDN, VDZ, VHB, VHD, VHG, VHI, VHN, VHZ, VM0, VVB, VVD, VVG, VVI, VVN, VVZ, VVD-VVN, VVN-VVD

Up: Contents Previous: Wordclass Tagging in BNC XML Next: Software for the BNC