  • XML tag usage by text type Each of the 4049 texts in the BNC is categorized broadly by type (written fiction, written academic prose, spoken demographic, etc.). This table lists the usage of the various XML elements documented in this manual within the corpus, both in total and in each of the different text types. Note that elements which appear only in corpus or text headers are excluded. Tag usage by Text Type TotalAcademic writingPublished fictionNews and journalismPublished non-fictionOther published writingUnpublished writingConversationOther spoken align407023 -- -- -- -- -- -- 66.96%27255233.03%134471 bibl103617.85%18510.90%113 -- 55.59%57615.54%1610.09%1 -- -- c1361436314.55%198172923.65%32205418.68%118253622.31%303862916.64%22665543.87%5271015.03%6848585.23%712415 corr1700011.42%19437.86%133711.07%188228.34%481928.94%492112.12%20620.02%50.18%31 div21014512.04%253083.10%651818.31%3848420.35%4277833.35%7009011.02%231721.73%36400.07%155 event6943 -- -- -- -- -- -- 36.85%255963.14%4384 gap6515921.16%137900.35%2321.62%106014.64%954216.74%109118.79%57317.67%499828.99%18895 head22208510.71%237972.62%583622.11%4910821.74%4828833.44%742839.35%20773 -- -- hi21050827.84%5861312.50%263150.14%30231.23%6575825.28%532362.98%6284 -- -- item11723727.82%326210.74%8702.23%262122.93%2689330.82%3613915.43%18093 -- -- l513102.59%133371.39%366310.17%8913.59%69748.62%44263.61%1857 -- -- label6569743.83%287990.65%4301.66%109321.27%1397621.96%1442810.61%6971 -- -- lg30407.23%22054.53%16580.23%721.71%66011.71%3564.57%139 -- -- list1975820.72%40950.71%1421.63%32326.41%522031.75%627418.74%3704 -- -- mw79259919.55%15501716.83%1334697.74%6136625.39%20124916.73%1326344.09%324783.24%257426.38%50644 note11745.29%530.85%10.85%15.12%647.86%56 -- -- -- p15996938.78%14055027.13%43401917.95%28717118.18%29082620.35%3256127.59%121515 -- -- pause216354 -- -- -- -- -- -- 64.98%14058935.01%75765 pb9462026.16%2476025.60%242240.15%14831.63%2993114.75%139611.68%1596 -- -- quote1520840.20%61144.58%6980.03%545.66%69456.14%9343.36%512 -- -- s602627611.55%69603821.96%13235738.43%50860918.83%113526416.95%10216335.02%30307810.13%6105587.09%427523 shift36053 -- -- -- -- -- -- 70.90%2556429.09%10489 sp291120.21%621.28%373 -- 4.76%138635.69%1039158.05%16900 -- -- speaker234660.26%621.58%373 -- 5.90%138544.16%1036348.08%11283 -- -- stage5071.38%710.25%52 -- 5.71%2982.44%4180.19%1 -- -- trunc52674 -- -- -- -- -- -- 38.69%2038261.30%32292 u784483 -- -- -- -- -- -- 67.02%52578932.97%258694 unclear203045 -- -- -- -- -- -- 62.39%12668637.60%76359 vocal43457 -- -- -- -- -- -- 63.61%2764536.38%15812 w9836370716.04%1578185916.41%161439139.56%941217424.58%2417901018.26%179702124.54%44666814.30%42339626.27%6175896 Voice quality codes Changes in voice quality in spoken texts are indicated by values for the new attribute on a shift element, at the point where the speaker's voice change. 156 distinct values are used, but most of them appear only infrequently. The following list gives the values which appear more than 10 times in the whole corpus: voice qualitynumber such laughing9268 reading2463 singing2045 shouting1419 whispering1247 yawning363 sighing276 mimicking241 spelling224 crying108 screaming97 whining40 whingeing38 praying23 reading bible22 reading newspaper20 reading+laughing15 reading book14 on telephone11 Gap descriptions Where material is omitted for some reason during the transcription of a text, either written or spoken, the gap element is used to provide a brief description of the material omitted and the reason for its exclusion. The desc attribute supplies the description, and the cause attribute explains why it was done. Over 1700 distinct descriptions are used, but most of them appear only infrequently. The following list gives the 65 values which appear more than 25 times in the whole corpus: material omittednumber such name29698 formula12476 figure4060 address3914 many nonRoman characters2835 telephone number2173 table1393 illustration903 photograph338 footnote197 references etc.188 date172 list of names171 picture144 personal name142 advert141 reference133 adverts126 list123 period quotation112 name and address101 phonetic transcription95 list of venues95 table of contents92 names91 ingredients91 publication details84 footnotes83 contents omitted72 contents65 list of ingredients64 diagram60 hebrew59 address, telephone number etc.50 tel. no47 cover page47 text46 venue, dates, times, prices etc.46 venue46 telephone no44 number41 names and addresses41 other venues41 venues40 form39 personal names38 computer code38 gaelic36 email address36 author details35 address and telephone35 cover omitted35 company name34 credits33 caption31 dates30 name and phone number29 sales details28 address, dates, times, prices etc.28 period quotation/verse27 notes27 map26 period/overseas quotation25 quotation25 The cause for a gap in transcription is usually self-evident, which may be why only a small number of values is used for the cause attribute. The following four values are the most significant: reason for ommissionnumber such anonymization31924 label303 sampling strategy115 repeated elsewhere6 Event descriptions The event element is used in spoken texts to mark wherever some non-linguistic but significant event is noted by a transcriber. The brief texts used to describe such events are very various, and there are more than 1500 different values for the desc attribute which stores them. The following lists shows the 60 or so values which appear more than 10 times in the whole corpus: Event descriptionnumber such clapping1134 music397 recorded jingle330 break in recording310 speaking french212 pre-recorded blurb199 too quiet to hear180 recording ends177 tape change158 phonecall starts138 phone rings138 phonecall ends125 tape breaks here120 piano music109 paper rustling93 tv on80 people talking77 applause67 dog barks63 tape jumps62 dog barking55 baby talk55 jingle47 banging47 classroom chatter47 talk in background41 people laughing35 advert35 door knock33 noise - traffic30 playing piano30 tape ends28 door bell27 noise - background26 portuguese speech24 writing on board24 hits ball23 music in background21 speaking italian19 television19 baby crying18 door closing17 bell ring17 singing16 crockery noise16 noseblow16 telephone conversation ends16 talking from other room16 everyone talking15 radio on15 noise15 door opening15 introduction music15 drilling noise14 microphone moved14 plane overhead13 knocking13 phonic13 closing music12 noise - train12 cat noise12 clicks fingers11 speaking german11 Speaker relationships In demographically sampled texts, the role of each speaker with respect to the respondent is supplied by the role attribute on the person element. The following table lists all 79 values used in the curent version of the corpus in descending frequency order. role namepersons unspecified6862 other1454 friend654 ?354 self306 colleague216 daughter102 son100 husband68 wife66 mother64 stranger52 neighbour50 father42 sister42 brother38 mother-in-law22 sister-in-law22 teacher22 acquaintance18 brother-in-law18 employee14 son-in-law14 father-in-law12 granddaughter12 niece12 chairman10 grandson10 daughter-in-law8 nephew8 aunt6 boyfriend6 customer6 girlfriend6 babysitter4 cousin4 fiancé4 friend's son4 grandmother4 lecturer4 son's teacher4 aunt-in-law2 boss2 boyfriend's father2 boyfriend's mother2 brother's friend2 brother-in-law's mother2 child's teacher2 cousin-in-law2 cousin-in-law's son2 cousin-in-law's wife2 daughter's boyfriend2 daughter's friend2 employee's wife2 friend's brother2 friend's father2 friend's granddaughter2 friend's mother2 friend's sister2 grandmother-in-law2 hairdresser2 hairdresser's son2 housekeeper2 husband's great-niece2 husband's niece2 neighbour's son2 partner2 partner's mother2 plumber2 sister's friend2 sister's friend's mother2 sister-in-law's father2 sister-in-law's mother2 son's friend2 step-father2 stepfather2 uncle2 visitor2 Text and genre classification codes Texts are classified in several different ways in the BNC, as described in section . Each text carries a number of text classification codes, specified as a string of values on the target attribute of its catRefs element. Each code identifies one of the values in one of the 23 taxonomy element provided in the BNC Header, corresponding with the design criteria outlined in . Possible values for these codes and brief explanations of their meanings are listed in the corpus header. Distribution tables showing the number of texts, words, and sentences classified under most of them are given above in section and elsewhere in the current section. One of the codes listed below is also supplied for each text as the content of a classCode element in its text header, as an alternative way of characterising each text. A description of the analysis scheme used and its rationale are provided in Lee 2001. The codes used in the present version of the corpus have been updated to take note of a small number of corrections made by Lee on his web site () since publication of that article. Genre classification for spoken texts Classification Number of texts W-units % S-units % S brdcast discussn537615950.77411440.68 S brdcast documentary10418930.0423690.03 S brdcast news122632550.26124540.20 S classroom584336460.44513550.85 S consult1281393200.14206980.34 S conv15342339554.3061055710.13 S courtroom131290670.1363660.10 S demonstratn6320620.0321750.03 S interview131250960.12118090.19 S interview oral history1198224890.83578310.95 S lect commerce3152330.014060.00 S lect humanities arts4515100.0526390.04 S lect nat science4229380.0210190.01 S lect polit law edu7514070.0516700.02 S lect soc science131620300.1681360.13 S meeting13213912071.411032661.71 S parliament6972890.0926090.04 S pub debate162870620.29133470.22 S sermon16827750.0833450.05 S speech scripted251930200.1995710.15 S speech unscripted514694920.47331210.54 S sportslive4336300.0318670.03 S tutorial181447830.1488140.14 S unclassified444250970.43315120.52 W ac:humanities arts8733581673.411301672.15 W ac:medicine2414356081.45668111.10 W ac:nat science4311229391.14512030.84 W ac:polit law edu18647033044.781902203.15 W ac:soc science14247854234.83079985.12 W ac:tech engin236921410.70349820.58 W admin122228030.22140450.23 W advert595536250.56421470.69 W biography10035566883.611726152.86 W commerce11238074943.871871273.10 W email72140220.21174110.28 W essay school71477360.1578710.13 W essay univ3562730.0529050.04 W fict drama2460940.0449320.08 W fict poetry302236820.22381370.63 W fict prose4311603364716.30129321121.45 W hansard411683621.18632341.04 W institut doc435521240.56301590.50 W instructional154405480.44278750.46 W letters personal6529150.0525830.04 W letters prof11665910.0648000.07 W misc50292375049.395212868.65 W news script3212486091.261029371.70 W newsp brdsht nat: arts513521370.35169910.28 W newsp brdsht nat: commerce444300750.43211030.35 W newsp brdsht nat: editorial121027180.1051500.08 W newsp brdsht nat: misc9510409431.05514550.85 W newsp brdsht nat: report496686130.67320790.53 W newsp brdsht nat: science29658800.0633900.05 W newsp brdsht nat: social36826050.0843880.07 W newsp brdsht nat: sports243000330.30146790.24 W newsp other: arts152408770.24130050.21 W newsp other: commerce174199960.42205060.34 W newsp other: report3927350742.781506762.50 W newsp other: science23553190.0527980.04 W newsp other: social3711514901.17638461.05 W newsp other: sports910333521.05568020.94 W newsp tabloid67330660.74517440.85 W nonAc: humanities arts11037443213.801558392.58 W nonAc: medicine175046100.51271560.45 W nonAc: nat science6225336352.571206102.00 W nonAc: polit law edu9345210404.592087853.46 W nonAc: soc science12337080333.761835883.04 W nonAc: tech engin12312200261.24527500.87 W pop lore21174508147.574254487.05 W religion3511329761.15603341.00 Contracted forms and multiwords The following tables summarize and document the tokenization decisions taken by the CLAWS system, where these do not coincide with normal orthographic convention. The first list specifies common word-endings or enclitics which are regarded by CLAWS as indicating the start of a new word, although words containing them are conventionally represented as a single orthographic word. The second list specifies some common two, three or four word phrases treated by CLAWS as single tokens. These are represented in this version of the corpus by means of a mw element; the table gives the C5 code assigned to this element, and also the codes assigned to the distinct w elements constituting it. Contracted forms Words ending with certain character strings are treated by CLAWS as distinct words, even though they are conventionally fused together when written. For example, they're is treated as if it were two distinct words — they and 're. The fact that these two items are orthographically fused is evident in the XML encoding of the corpus because there is no whitespace following the string they. Some XML processors may however assume that the end of an XML element such as the w enclosing the string should always be treated as a word separator, and may therefore introduce unwanted extra space. In the following table we show how contracted forms are tokenized by CLAWS. The left column shows the contracted form; the right column shows the content of the two or more w elements used to represent it. orthographic formtokenization [word]'d[word] 'd [word]'m[word] 'm [word]'s[word] 's [word]'ll[word] 'll [word]n't[word] n't [word]'re[word] 're [word]'v[word] 'v [word]'d've[word] 'd 've 'tis't is 'twas't was 'twere't were 'twould't would ain'tai n't aintai nt aintchaai nt cha arentare nt c'monc'm on can'tca n't cannotcan not couldntcould nt d'yad' ya d'youd' you didntdid nt doesntdoes nt dontdo nt dunnitdun n it dunnodu n no geroffger off gimmegim me gonnagon na gottagot ta hadnthad nt hasnthas nt heshe s innein n e innitin n it isntis nt lorralor ra m'ludm' lud ought'aough t 'a oughtaought a shan'tsha n't shouldn't'veshould n't 've shouldn'tshould n't t'othert' other thatsthat s theresthere s theyvethey ve tist is twast was tweret were twouldt would wannawan na wannitwann it wasntwas nt wevewe ve won'two n't wottawott a wouldn't'vewould n't 've wouldntwould nt Multiwords CLAWS recognizes certain sequences of orthographically distinct words as constituting a single item: examples include common prepositional phrases such as in spite of, as well as phrases from other languages such as aide memoire. In this version of the corpus, such items are explicitly tagged using an XML mw (for multiword) tag carrying the appropriate wordclass tag, as indicated below. Within this mw element however, in a departure from earlier versions of the corpus, the individual words are also tagged using w tags in the same way as elsewhere in the corpus. The following table lists all multiwords recognized in the corpus alphabetically, indicating both the wordclass codes assigned to it, and also the wordclass codes assigned to its constituent w elements. Note that these latter wordclass codes were assigned automatically during the XML conversion process and therefore should not be included in any assessment of the CLAWS error rate. multiwordmw wordclass/esconstituent wordclasses ab initioAV0 or AJ0UNC UNC a bitAV0AT0 NN1 a capellaAJ0 or AV0UNC UNC according asCJSVVG CJS according toPRPVVG PRP ad astraAV0 or AJ0UNC UNC ad hocAV0 or AJ0UNC UNC ad hominemAV0 or AJ0UNC UNC ad infinitumAV0UNC UNC adjacent toPRPAJ0 PRP ad libAJ0 or AV0 or NN1UNC UNC ad nauseamAV0 or AJ0UNC UNC affaire de coeurNN1UNC UNC UNC affaire d'honneurNN1UNC UNC a fortioriAV0 or AJ0UNC UNC agent provocateurNN1UNC UNC agnus deiNN1UNC UNC a good dealAV0AT0 AJ0 NN1 a great dealAV0AT0 AJ0 NN1 ahead ofPRPAV0 PRF a heck of a lotAV0AT0 NN1 PRF AT0 NN1 aide de campNN1UNC UNC UNC aide memoireNN1UNC UNC a laPRPUNC UNC a la carteAJ0 or AV0UNC UNC UNC a la modeAJ0 or AV0UNC UNC UNC al denteAJ0 or AV0UNC UNC al frescoAV0 or AJ0UNC UNC a littleAV0AT0 AV0/DT0 a little bitAV0AT0 AJ0 NN1 alla breveAV0 or AJ0 or NN0UNC UNC all butAV0AV0 CJS all of a suddenAV0DT0 PRF AT0 NN1 all rightAV0 or AJ0AV0 AV0 all the sameAV0DT0 AT0 DT0 alma materNN1UNC UNC along withPRPAVP PRP a lotAV0AT0 NN1 alter egoNN1UNC UNC an' allAV0CJC DT0 an awful lotAV0AT0 AJ0 NN1 ancien regimeNN1UNC UNC and so forthAV0CJC AV0 AV0 and so onAV0CJC AV0 AV0 anno dominiAV0 or NN1UNC UNC annus horribilisNN1UNC UNC annus mirabilisNN1UNC UNC ante meridiemAV0UNC UNC any longerAV0AV0 AV0 anything butAV0PNI AV0 apart fromPRPAV0 PRP a posterioriAV0 or AJ0UNC UNC a prioriAV0 or AJ0UNC UNC a proposPRP or AV0UNC UNC aqua vitaeNN1UNC UNC art nouveauNN1UNC UNC as againstPRPCJS PRP as betweenPRPCJS PRP as forPRPCJS PRP as fromPRPCJS PRP aside fromPRPAV0 PRP as ifCJSCJS CJS as it wereAV0CJS PNP VBD as long asCJSAV0 AV0 CJS as ofPRPCJS PRF as opposed toPRPCJS VVN PRP as regardsPRPCJS VVZ as soon asCJSAV0 AV0 CJS as thoughCJSCJS CJS asti spumanteNN1UNC UNC as toPRPCJS PRP as usualAV0CJS AJ0 as well asPRPAV0 AV0 CJS as wellAV0CJS AV0 as yetAV0CJS AV0 at allAV0PRP DT0 at bestAV0PRP AJS at firstAV0PRP ORD at largeAV0PRP AJ0 at lastAV0PRP ORD at leastAV0PRP AV0 at lengthAV0PRP NN1 at long lengthAV0PRP AJ0 NN1 at mostAV0PRP DT0 at onceAV0PRP AV0 at presentAV0PRP NN1 at randomAV0PRP AJ0 at worstAV0PRP AV0 au contraireAV0UNC UNC au faitAJ0UNC UNC auf wiedersehenITJUNC UNC au pairNN1UNC UNC au revoirITJUNC UNC aurora australisNN1UNC UNC aurora borealisNN1UNC UNC avant gardeNN1 or AJ0UNC UNC away fromPRPAV0 PRP bar mitzvahNN1 or AJ0UNC UNC basso profundoNN1UNC UNC beau mondeNN1UNC UNC because ofPRPCJS PRF belles lettresNN2UNC UNC bete noireNN1UNC UNC billet douxNN1UNC UNC bona fidesNN2UNC UNC bona fideAJ0UNC UNC bon appetitITJUNC UNC bon motNN1UNC UNC bon vivantNN1UNC UNC bon viveurNN1UNC UNC bon voyageITJUNC UNC brand newAJ0NN1 AJ0 but forPRPCJS PRP by and byAV0AVP CJC AVP by and largeAV0AVP CJC AJ0 by farAV0PRP AV0 by means ofPRPPRP NN0 PRF by no meansAV0PRP PRP NN0 by nowAV0PRP AV0 by reason ofPRPPRP NN1 PRF by the byAV0PRP AT0 NN1 by way ofPRPPRP NN1 PRF cafe au laitNN1UNC UNC UNC camera obscuraNN1UNC UNC carte blancheNN1UNC UNC casus belliNN1UNC UNC cause celebreNN1UNC UNC ceteris paribusAV0UNC UNC chaise longueNN1UNC UNC charge d'affairesNN1UNC UNC chez moiAV0UNC UNC chez nousAV0UNC UNC chilli con carneNN1NN1 NN1 NN1 chop sueyNN1UNC UNC chow meinNN1UNC UNC clamp downNN1VVB/VVI AVP close toAV0AV0/AJ0 PRP compos mentisAJ0UNC UNC con brioAJ0 or AV0UNC UNC con fuocoAJ0 or AV0UNC UNC con motoAJ0 or AV0UNC UNC considering thatCJSVVG CJT contrary toPRPJJ PRP cordon bleuNN1UNC UNC cordon sanitaireNN1UNC UNC corpus delictiNN1UNC UNC corpus jurisNN1UNC UNC coup de graceNN1UNC UNC UNC coup d'etatNN1UNC UNC coup de theatreNN1UNC UNC UNC creme de la cremeNN1UNC UNC UNC UNC creme de mentheNN1UNC UNC UNC cri de coeurNN1UNC UNC UNC croix de guerreNN0UNC UNC UNC cul de sacNN1UNC UNC UNC danse macabreNN1UNC UNC de factoAV0 or AJ0UNC UNC dei gratiaAV0UNC UNC deja vuNN1UNC UNC de jureAV0 or AJ0UNC UNC delirium tremensNN1UNC UNC de luxeAJ0UNC UNC demi mondeNN1UNC UNC depending onPRPVVG PRP de profundisAV0UNC UNC de rigeurAJ0UNC UNC de tropAJ0UNC UNC deus ex machinaNN1UNC UNC UNC double entendreNN1UNC UNC dramatis personaeNN2UNC UNC due toPRPAJ0 PRP each otherPNXDT0 NN1 eminence griseNN1UNC UNC en blocAV0UNC UNC en familleAV0UNC UNC enfants terriblesNN2UNC UNC enfant terribleNN1UNC UNC en masseAV0UNC UNC en passantAV0UNC UNC en routeAV0UNC UNC en suiteAJ0UNC UNC entente cordialeNN1UNC UNC esprit de corpsNN1UNC UNC UNC et alAV0UNC UNC et ceteraAV0UNC UNC even ifCJSAV0 CJS even soAV0AV0 AV0 even thoughCJSAV0 CJS even whenCJSAV0 CJS ever soAV0AV0 AV0 every so oftenAV0AT0 AV0 AV0 ex armyAJ0PRP NN1 ex cathedraAV0 or AJ0UNC UNC except forPRPCJS PRP excepting forPRPVVG PRP except thatCJSCJS CJT ex gratiaAV0 or AJ0UNC UNC ex librisAV0UNC UNC ex officioAV0 or AJ0UNC UNC ex parteAV0 or AJ0UNC UNC ex temporeAV0 or AJ0UNC UNC fait accompliNN1UNC UNC far fromAV0AV0 PRP far offAJ0AV0 AVP faux amisNN2UNC UNC faux amiNN1UNC UNC faux pasNN0UNC UNC fed upAJ0VVN AVP femme fataleNN1UNC UNC fin de siecleNN1UNC UNC UNC follow upNN1VVB/VVI AVP force majeureNN1UNC UNC for certainAV0PRP AJ0 for everAV0PRP AV0 for exampleAV0PRP NN1 for fear ofPRPPRP NN1 PRF for goodAV0PRP AJ0 for instanceAV0PRP NN1 for keepsAV0PRP NN2 for longAV0PRP AV0 for onceAV0PRP AV0 for sureAV0PRP AJ0 for the most partAV0PRP AT0 AV0 NN1 for the time beingAV0PRP AT0 NN1 VBG fromage fraisNN1UNC UNC from now onAV0PRP AV0 AVP from time to timeAV0PRP NN1 PRP NN1 getting on forAV0VVG AVP PRP grande dameNN1UNC UNC grand prixNN1UNC UNC grown upsNN2VVN NN2 grown upNN1VVN AVP gung hoAJ0 or AV0UNC UNC habeas corpusNN1UNC UNC half wayAV0DT0 NN1 hara kiriNN1UNC UNC hard upAJ0AJ0 AVP hasta la vistaITJUNC UNC UNC hasta luegoITJUNC UNC haute coutureNN1UNC UNC haute cuisineNN1UNC UNC have notsNN2VHB NN2 hey prestoITJITJ ITJ hoi polloiNN0UNC UNC homo sapiensNN1UNC UNC hors d'oeuvresNN2UNC UNC hors d'oeuvreNN1UNC UNC hysteron proteronNN1UNC UNC idee fixeNN1UNC UNC in absentiaAV0UNC UNC in accordance withPRPPRP NN1 PRP in accord withPRPPRP NN1 PRP in additionAV0PRP NN1 in addition toPRPPRP NN1 PRP in aid ofPRPPRP NN1 PRF in answer toPRPPRP NN1 PRP in as much asCJSPRP AV0 DT0 CJS inasmuch asCJSUNC CJS in association withPRPPRP NN1 PRP in back ofPRPPRP NN1 PRF in betweenPRP or AV0AVP PRP/AV0 in briefAV0PRP AJ0 in cameraAV0UNC UNC in case ofPRPPRP NN1 PRF in caseCJS or AV0PRP NN1 in charge ofPRPPRP NN1 PRF in commonAV0PRP AJ0 in common withPRPPRP NN1 PRP in comparison withPRPPRP NN1 PRP in conjunction withPRPPRP NN1 PRP in connection withPRPPRP NN1 PRP in consultation withPRPPRP NN1 PRP in contact withPRPPRP NN1 PRP in cooperation withPRPPRP NN1 PRP in course withPRPPRP NN1 PRP in defence ofPRPPRP NN1 PRF in defiance ofPRPPRP NN1 PRF in excess ofPRPPRP NN1 PRF in extremisAV0UNC UNC in face ofPRPPRP NN1 PRF in favor ofPRPPRP NN1 PRF in favour ofPRPPRP NN1 PRF in flagrante delictoAV0 or AJ0UNC UNC UNC in front ofPRPPRP NN1 PRF in fullAV0PRP AJ0 in generalAV0PRP AJ0 in keeping withPRPPRP NN1 PRP in lieu ofPRPPRP UNC PRF in light ofPRPPRP NN1 PRF in line withPRPPRP NN1 PRP in loco parentisAV0 or AJ0UNC UNC UNC in medias resAV0UNC UNC UNC in memoriamAV0UNC UNC in need ofPRPPRP NN1 PRF in particularAV0PRP AJ0 in perpetuumAV0UNC UNC in place ofPRPPRP NN1 PRF in possession ofPRPPRP NN1 PRF in privateAV0PRP AJ0 in proportion toPRPPRP NN1 PRP in propria personaAV0UNC UNC UNC in publicAV0PRP AJ0 in pursuit ofPRPPRP NN1 PRF in quest ofPRPPRP NN1 PRF in receipt ofPRPPRP NN1 PRF in regard toPRPPRP NN1 PRP in relation toPRPPRP NN1 PRP in reply toPRPPRP NN1 PRP in respect ofPRPPRP NN1 PRF in response toPRPPRP NN1 PRP in return forPRPPRP NN1 PRP in search ofPRPPRP NN1 PRF in shortAV0PRP AJ0 inside outAV0 or AJ0AV0 AVP in situAV0UNC UNC in so far asCJSPRP AV0 AV0 CJS insofar asCJSUNC CJS in spite ofPRPPRP NN1 PRF instead ofPRPAV0 PRF in support ofPRPPRP NN1 PRF inter aliaAV0UNC UNC in terms ofPRPPRP NN2 PRF in thatCJSPRP CJT in the light ofPRPPRP AT0 NN1 PRF in the mainAV0PRP AT0 AJ0 in the order ofAV0PRP AT0 NN1 PRF into line withPRPPRP NN1 PRP in totoAV0 or AJ0UNC UNC in touch withPRPPRP NN1 PRP in vainAV0PRP AJ0 in view ofPRPPRP NN1 PRF in vitroAJ0 or AV0UNC UNC in vivoAJ0 or AV0UNC UNC ipso factoAV0UNC UNC irrespective ofPRPAJ0 PRF je ne sais quoiNN1UNC UNC UNC UNC joie de vivreNN1UNC UNC UNC just aboutAV0AV0 AV0 just aboutAV0AV0 AV0 kind ofAV0NN1 PRF know howNN1VVB AVQ kung fuNN1UNC UNC la dolce vitaNN1UNC UNC UNC laissez faireNN1UNC UNC le mot justeNN1UNC UNC UNC less thanAV0AV0/DT0 CJS let alonePRPVVB AJ0 let 'sVM0VVB PNP lingua francaNN0UNC UNC lo and beholdITJITJ CJC VVB loc citAV0UNC UNC locum tenensNN1UNC UNC long-term wiseAV0AJ0 AV0 magna cartaNN1UNC UNC magna cum laudeAJ0 or AV0UNC UNC UNC magnum opusNN1UNC UNC maitre d'hotelNN1UNC UNC mal de merNN1UNC UNC UNC matter of factNN1 or AJ0NN1 PRF NN1 mea culpaNN1UNC UNC medecins sans frontieresNN0UNC UNC UNC medicins sans frontieresNN0UNC UNC UNC menage a troisNN1UNC UNC UNC mezzo sopranoNN1UNC UNC modus operandiNN1UNC UNC modus vivendiNN1UNC UNC more thanAV0AV0/DT0 CJS more thanAV0AV0 CJS mot justeNN1UNC UNC nearer toPRPAJC/AV0 PRP nearest toPRPAJS/AV0 PRP near toPRPAJ0/AV0 PRP nem conAV0UNC UNC next toPRPORD PRP nigh onAV0AV0 AVP noblesse obligeNN1UNC UNC no doubtAV0AT0 NN1 no longerAV0AV0 AV0 no matter whoPNQAT0 NN1 PNQ no matter whomPNQAT0 NN1 PNQ no matter whoseDTQAT0 NN1 DTQ nom de guerreNN1UNC UNC UNC nom de plumeNN1UNC UNC UNC non compos mentisAJ0UNC UNC UNC none otherPNIPNI AJ0 none the lessAV0PNI AT0 AV0 none theAV0PNI AT0 non sequiturNN1UNC UNC no onePNIAT0 PNI not withstandingAV0XX0 UNC nouveau richeNN1UNC UNC nouveaux richesNN2UNC UNC nouvelle cuisineNN1UNC UNC now thatCJSAV0 CJT objet d'artNN1UNC UNC objets d'artNN2UNC UNC of courseAV0PRF NN1 off guardAV0PRP NN1 off ofPRPAVP PRF oft timesAV0AV0 NN2 old fashionedAJ0AJ0 VVN on account ofPRPPRP NN1 PRF on behalf ofPRPPRP NN1 PRF on boardPRP or AV0PRP NN1 once againAV0AV0 AV0 once and for allAV0AV0 CJC PRP DT0 once moreAV0AV0 AV0 one anotherPNXCRD DT0 one 'sCRDCRD POS on the part ofPRPPRP AT0 NN1 PRF on top ofPRPPRP NN1 PRF on toPRPAVP PRP/TO0 op citAV0UNC UNC other thanPRPAJ0 CJS out ofPRPAVP PRF out of dateAJ0AVP PRF NN1 out of line withPRPAVP PRF NN1 PRP out of touch withPRPAVP PRF NN1 PRP outside ofPRPPRP PRF over hereAV0PRP AV0 over thereAV0PRP AV0 owing toPRPVVG PRP papier macheNN1UNC UNC par excellenceAJ0UNC UNC pas de deuxNN0UNC UNC UNC pate de foie grasNN1UNC UNC UNC UNC pax britannicaNN1UNC UNC pax romanaNN1UNC UNC per annumAV0UNC UNC per capitaAV0 or AJ0UNC UNC per centNN0UNC UNC per diemAV0 or AJ0 or NN1UNC UNC per seAV0UNC UNC personae non grataeNN2UNC UNC UNC persona non grataNN1UNC UNC UNC pertaining toPRPVVG PRP petit bourgeoisNN1UNC UNC petite bougeoisieNN1UNC UNC petits bourgeoisNN2UNC UNC piece de resistanceNN1UNC UNC UNC pied a terreNN1UNC UNC UNC pina coladaNN1UNC UNC pince nezNN0UNC UNC poco a pocoAV0UNC UNC UNC point blankAV0 or AJ0NN1 AJ0 poste restanteNN1 or AV0UNC UNC post hocAV0 or AJ0UNC UNC post meridiemAV0UNC UNC post mortemNN1 or AJ0UNC UNC pot pourriNN1UNC UNC prima donnaNN1UNC UNC prima facieAJ0 or AV0UNC UNC primus inter paresNN1UNC AJ0 UNC prior toPRPAJ0 PRP pro formaNN1UNC UNC pro rataAV0 or AJ0UNC UNC pro temAV0UNC UNC provided thatCJSVVN CJT providing thatCJSVVG CJT pursuant toPRPAJ0 PRP quid pro quoNN1UNC UNC UNC raison d'etreNN1UNC UNC rather thanPRPAV0 CJS relative toPRPAJ0 PRP rigor mortisNN1UNC UNC roman a clefNN1UNC UNC UNC save forPRPVVI PRP save thatCJSVVI CJT savoir faireNN1UNC UNC savoir vivreNN1UNC UNC seeing asCJSVVG CJS seeing thatCJSVVG CJT semper fidelisAJ0UNC UNC shish kebabNN1UNC UNC sine dieAV0UNC UNC sine qua nonNN1UNC UNC UNC sinn feinNN1UNC UNC so calledAJ0AV0 VVN so long asCJSAV0 AV0 CJS some onePNIDT0 PNI something likeAV0PNI PRP so much asAV0AV0 DT0 CJS son et lumiereNN1UNC UNC UNC sort ofAV0NN1 PRF so thatCJSAV0 CJT sotto voceAV0 or AJ0UNC UNC spaghetti bologneseNN1UNC UNC spot onAV0 or AJ0NN1 AVP status quoNN1UNC UNC straight forwardAJ0AV0 AJ0 subject toPRPAJ0 PRP sub judiceAV0 or AJ0UNC UNC sub poenaNN1UNC UNC subsequent toPRPAJ0 PRP such asPRPDT0 PRP such thatCJSDT0 CJT sui generisAJ0UNC UNC sui jurisAJ0UNC UNC summa cum laudeAJ0 or AV0UNC UNC UNC super duperAJ0AJ0 XXX supposing thatCJSVVG PRP table d'hoteNN1UNC UNC tabula rasaNN1UNC UNC tai chiNN1UNC UNC tai kwan doNN1UNC UNC UNC terra firmaNN1UNC UNC terra incognitaNN1UNC UNC thanks toPRPNN2 PRP that isAV0DT0 VBZ that is to sayAV0DT0 VBZ TO0 VVI through thick and thinAV0PRP AJ0 CJC AJ0 time and againAV0NN1 CJC AV0 to and froAV0PRP CJC AV0 tour de forceNN1UNC UNC UNC tout courtAJ0UNC UNC tout de suiteAV0UNC UNC UNC ultra viresAJ0 or AV0UNC UNC under wayAV0PRP NN1 up frontAJ0 or AV0AVP AJ0 upside downAV0 or AJ0NN1 AVP up toPRP or AV0AVP PRP/TO0 up to dateAJ0AVP TO0 NN1 up to the minuteAJ0AVP PRP AT0 NN1 up untilPRPAVP CJS/PRP upward ofAV0AV0 PRF upwards ofAV0AV0 PRF vice versaAV0UNC UNC vin de tableNN1UNC UNC UNC vis a visPRPUNC UNC UNC viva voceNN1 or AJ0 or AV0UNC UNC vol au ventNN1UNC UNC UNC volte faceNN1UNC UNC vox populiNN1UNC UNC well offAJ0AV0 AVP whether or notCJSCJS CJC XX0 wiener schnitzelNN1UNC UNC with a view toPRPPRP AT0 NN1 PRP with reference toPRPPRP NN1 PRP with regard toPRPPRP NN1 PRP with relation toPRPPRP NN1 PRP with respect toPRPPRP NN1 PRP Simplified Wordclass Tags This table lists, for each of the twelve simplified wordclass tags used by the pos attribute, the corresponding CLAWS C5 tags of which the class consists. POS valuesignificancecombines ADJadjectiveAJ0, AJC, AJS, CRD, DT0, ORD ADVadverbAV0, AVP, AVQ, XX0 ARTarticleAT0 CONJconjunctionCJC, CJS, CJT INTERJinterjectionITJ PREPprepositionPRF, PRP, TO0 PRONpronounDPS, DTQ, EX0, PNI, PNP, PNQ, PNX STOPpunctuationPOS, PUL, PUN, PUQ, PUR SUBSTsubstantiveNN0, NN1, NN2, NP0, ONE, ZZ0, NN1-NP0, NP0-NN1 UNCunclassified, uncertain, or non-lexical wordUNC, AJ0-AV0, AV0-AJ0, AJ0-NN1, NN1-AJ0, AJ0-VVD, VVD-AJ0, AJ0-VVG, VVG-AJ0, AJ0-VVN, VVN-AJ0, AVP-PRP, PRP-AVP, AVQ-CJS, CJS-AVQ, CJS-PRP, PRP-CJS, CJT-DT0, DT0-CJT, CRD-PNI, PNI-CRD, NN1-VVB, VVB-NN1, NN1-VVG, VVG-NN1, NN2-VVZ, VVZ-NN2 VERBverbVBB, VBD, VBG, VBI, VBN, VBZ, VDB, VDD, VDG, VDI, VDN, VDZ, VHB, VHD, VHG, VHI, VHN, VHZ, VM0, VVB, VVD, VVG, VVI, VVN, VVZ, VVD-VVN, VVN-VVD