BNC2 POS-tagging Manual

GUIDELINES TO WORDCLASS TAGGING

[ Related documents: Introduction to the Manual | Automatic tagging of the BNC | Error rates | Acknowledgments]


CONTENTS

  1. Preliminaries: The Tagset, Ambiguity Tags and Appearance of tags and citations in the guide
  2. INTRODUCTION TO WORD CLASSES

  • DISAMBIGUATION GUIDE, by TAG PAIR
  • DISAMBIGUATION GUIDE, by WORD
  • Features of spoken corpus tagging

  • Section 1. Preliminaries

    THE BNC BASIC TAGSET

    (also known as the "C5" tagset; followed by Ambiguity tag list)

    Tag

    Description

    AJ0

    Adjective (general or positive) (e.g. good, old, beautiful)

    AJC

    Comparative adjective (e.g. better, older)

    AJS

    Superlative adjective (e.g. best, oldest)

    AT0

    Article (e.g. the, a, an, no)

    AV0

    General adverb: an adverb not subclassified as AVP or AVQ (see below) (e.g. often, well, longer (adv.), furthest.

    AVP

    Adverb particle (e.g. up, off, out)

    AVQ

    Wh-adverb (e.g. when, where, how, why, wherever)

    CJC

    Coordinating conjunction (e.g. and, or, but)

    CJS

    Subordinating conjunction (e.g. although, when)

    CJT

    The subordinating conjunction that

    CRD

    Cardinal number (e.g. one, 3, fifty-five, 3609)

    DPS

    Possessive determiner-pronoun (e.g. your, their, his)

    DT0

    General determiner-pronoun: i.e. a determiner-pronoun which is not a DTQ or an AT0.

    DTQ

    Wh-determiner-pronoun (e.g. which, what, whose, whichever)

    EX0

    Existential there, i.e. there occurring in the there is ... or there are ... construction

    ITJ

    Interjection or other isolate (e.g. oh, yes, mhm, wow)

    NN0

    Common noun, neutral for number (e.g. aircraft, data, committee)

    NN1

    Singular common noun (e.g. pencil, goose, time, revelation)

    NN2

    Plural common noun (e.g. pencils, geese, times, revelations)

    NP0

    Proper noun (e.g. London, Michael, Mars, IBM)

    ORD

    Ordinal numeral (e.g. first, sixth, 77th, last) .

    PNI

    Indefinite pronoun (e.g. none, everything, one [as pronoun], nobody)

    PNP

    Personal pronoun (e.g. I, you, them, ours)

    PNQ

    Wh-pronoun (e.g. who, whoever, whom)

    PNX

    Reflexive pronoun (e.g. myself, yourself, itself, ourselves)

    POS

    The possessive or genitive marker 's or '

    PRF

    The preposition of

    PRP

    Preposition (except for of) (e.g. about, at, in, on, on behalf of, with)

    PUL

    Punctuation: left bracket - i.e. ( or [

    PUN

    Punctuation: general separating mark - i.e. . , ! , : ; - or ?

    PUQ

    Punctuation: quotation mark - i.e. ' or "

    PUR

    Punctuation: right bracket - i.e. ) or ]

    TO0

    Infinitive marker to

    UNC

    Unclassified items which are not appropriately considered as items of the English lexicon.

    VBB

    The present tense forms of the verb BE, except for is, 's: i.e. am, are, 'm, 're and be [subjunctive or imperative]

    VBD

    The past tense forms of the verb BE: was and were

    VBG

    The -ing form of the verb BE: being

    VBI

    The infinitive form of the verb BE: be

    VBN

    The past participle form of the verb BE: been

    VBZ

    The -s form of the verb BE: is, 's

    VDB

    The finite base form of the verb BE: do

    VDD

    The past tense form of the verb DO: did

    VDG

    The -ing form of the verb DO: doing

    VDI

    The infinitive form of the verb DO: do

    VDN

    The past participle form of the verb DO: done

    VDZ

    The -s form of the verb DO: does, 's

    VHB

    The finite base form of the verb HAVE: have, 've

    VHD

    The past tense form of the verb HAVE: had, 'd

    VHG

    The -ing form of the verb HAVE: having

    VHI

    The infinitive form of the verb HAVE: have

    VHN

    The past participle form of the verb HAVE: had

    VHZ

    The -s form of the verb HAVE: has, 's

    VM0

    Modal auxiliary verb (e.g. will, would, can, could, 'll, 'd)

    VVB

    The finite base form of lexical verbs (e.g. forget, send, live, return) [Including the imperative and present subjunctive]

    VVD

    The past tense form of lexical verbs (e.g. forgot, sent, lived, returned)

    VVG

    The -ing form of lexical verbs (e.g. forgetting, sending, living, returning)

    VVI

    The infinitive form of lexical verbs (e.g. forget, send, live, return)

    VVN

    The past participle form of lexical verbs (e.g. forgotten, sent, lived, returned)

    VVZ

    The -s form of lexical verbs (e.g. forgets, sends, lives, returns)

    XX0

    The negative particle not or n't

    ZZ0

    Alphabetical symbols (e.g. A, a, B, b, c, d)

    Total number of wordclass tags in the BNC basic tagset = 57, plus 4 punctuation tags

    2. Ambiguity Tag list

    In addition, there are 30 "Ambiguity Tags". These are applied wherever the probabilities assigned by the CLAWS automatic tagger to its first and second choice tags were considered too low for reliable disambiguation. So, for example, the ambiguity tag AJ0-AV0 indicates that the choice between adjective (AJ0) and adverb (AV0) is left open, although the tagger has a preference for an adjective reading. The mirror tag, AV0-AJ0, again shows adjective-adverb ambiguity, but this time the more likely reading is the adverb.

    Ambiguity tag

    Ambiguous between

    More probable tag

    AJ0-NN1

    AJ0 or NN1

    AJ0

    AJ0-VVD

    AJ0 or VVD

    AJ0

    AJ0-VVG

    AJ0 or VVG

    AJ0

    AJ0-VVN

    AJ0 or VVN

    AJ0

    AV0-AJ0

    AV0 or AJ0

    AV0

    AVP-PRP

    AVP or PRP

    AVP

    AVQ-CJS

    AVQ or CJS

    AVQ

    CJS-AVQ

    CJS or AVQ

    CJS

    CJS-PRP

    CJS or PRP

    CJS

    CJT-DT0

    CJT or DT0

    CJT

    CRD-PNI

    CRD or PNI

    CRD

    DT0-CJT

    DT0 or CJT

    DT0

    NN1-AJ0

    NN1 or AJ0

    NN1

    NN1-NP0

    NN1 or NP0

    NN1

    NN1-VVB

    NN1 or VVB

    NN1

    NN1-VVG

    NN1 or VVG

    NN1

    NN2-VVZ

    NN2 or VVZ

    NN2

    NP0-NN1

    NP0 or NN1

    NP0

    PNI-CRD

    PNI or CRD

    PNI

    PRP-AVP

    PRP or AVP

    PRP

    PRP-CJS

    PRP or CJS

    PRP

    VVB-NN1

    VVB or NN1

    VVB

    VVD-AJ0

    VVD or AJ0

    VVD

    VVD-VVN

    VVD or VVN

    VVD

    VVG-AJ0

    VVG or AJ0

    VVG

    VVG-NN1

    VVG or NN1

    VVG

    VVN-AJ0

    VVN or AJ0

    VVN

    VVN-VVD

    VVN or VVD

    VVN

    VVZ-NN2

    VVZ or NN2

    VVZ

    Total number of wordclass tags including punctuation and ambiguity tags = 91.

    Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


    Appearance of wordclass tags and citations

    To illustrate BNC wordclass tagging, we will show text examples in a format similar to the SGML contained in the corpus. The underlying grammatical tags and other structural markup (for example, paragraph and pause markers) are generally invisible when using concordancing software such as XAIRA, BNCWeb, and WordSmith Tools.

    Each orthographic word in the corpus generally has its own wordclass tag, which appears in the form

    <w TAG>word

    followed by a single space. There are important exceptions in the case of contracted forms and multiword sequences. Spacing may also vary around punctuation tags (eg <c PUN>. ). The following excerpt from a file in the spoken part of the corpus illustrates the general format:

    <s n=1000> <w PNP>I <w VVB>mean <w DTQ>what <w PRP>about <w AV0>apparently <w PNP>we <w VVB>eat <w DT0>more <w NN1>chocolate <w CJS>than <w DT0>any <w AJ0>other <w NN1>country<c PUN>.

    For examples in this guide, we will retain just the POS-tag of the word (or words) in question. Under subordinating conjunctions, for instance, the citation above has been reduced to:

    ...apparently we eat more chocolate <w CJS>than any other country
    [G3U.1000]

    Citations: Examples taken from the BNC have the filename and line number appended in square brackets; [G3U.1000] means line 1000 of file G3U. In the Disambiguation Guide, section 3 and section 4, we also cite cases where the POS-tagging in the corpus does not match the tag given in the citation, in that it is either an error or an ambiguity tag. This is to give an idea of the contexts in which the resolution of ambiguities has been less reliable. We list the tag found in the corpus next to the file reference with an asterisk, eg. in Section 4 well we give the ideal tag as VVB, but the actual tag as AV0.

    Tears <w VVB>well up in my eyes.
    [BN3.5 *AV0]

    Note also that we occasionally use invented examples, rather than corpus citations, especially where a contrast between categories is being made.

    Appearance and tagging of contracted forms

    Contracted forms -- including enclitics, eg he's, she'll, negatives eg don't and can't, and 'fused words', eg wanna and gimme -- are broked down by the tagger into their component parts, with each part being assigned its own tag. No spaces are introduced in POS-tagged contracted words:

    could've = <w VM0>could<w VHI>'ve
    doesn't = <w VDZ>does<w XX0>n't
    dunno = <w VDB>du<w XX0>n<w VVI>no
    wanna = <w VVB>wan<w TO0>na --or-- <w VVB>wan<wAT0>na
    gimme = <w VVB>gim<w PNP>me

    This procedure sometimes results in strange-looking word divisions, particularly with the fused words. However, they do provide a ready means of comparison with the full forms, such as <w VVB>want <w TO0>to and <w VVB>give <w PNP>me.

    [ View list of contracted forms and associated tags ]

    Note that in the case of ain't it has been tricky to resolve the tag of the first part ( ai ) satisfactorily. Therefore in all contexts we have tagged this as an unclassified word, followed by the negative particle. Eg

    <w UNC>Ai<w XX0>n't got yours yet
    [KCT.1282]


    Appearance and tagging of multiwords

    The term `multiwords' denotes multiple-word combinations which function as one wordclass - for example, a complex preposition, an adverbial, or a foreign expression naturalised into English as a compound noun. To clarify which words form part of the multiword sequence, we highlight them in the guide in bold typeface, although in the BNC they will appear in normal typeface.

    <w AV0>of course (adverb)
    <w PRP>according to (preposition)
    <w NN1>persona non grata ('naturalised' compound noun)
    <w CJS>except that (conjunction)

    Because they function as one unit, only one tag is assigned to the multiword (at the beginning), and not to the individual component parts (contrast contracted forms, above). The spaces between the parts of the multiword are retained (again, compare contracted forms).

    [ View list of multiword forms and their associated tags ]

    Note that some multiwords can represent different categories according to context, e.g. in between in:

    The stage <w PRP>in between the original negative and the dupe is called an interpositive
    [FB8.295]

    The truth lies somewhere <w AV0>in between
    [ABK.2834]

    Moreover, sometimes it is more appropriate to tag a word combination as consisting of ordinary words than as a multiword sequence, as in the case of but for below:

    <w CJC>But <w PRP>for years now darkness has been growing
    [F99.2027]

    cf. which they would not have done <w PRP>but for the presence of the police.
    [H81.766]


    Words joined by the slash character

    Words which are joined together by a slash ( / ) but no whitespace, such as and/or, are not split up in tagged versions of the text.

    Examples

    A title <w CJC>and/or an author's name
    [H0S.358]

    You should be a graduate in <w AJ0>Electrical/Electronic Engineering, Physics , Mathematics , Computing or a related discipline .
    [CJU.1049]

    A time-space matrix for each <w UNC>rural/social/age group.
    [FR2.346]


    SECTION 2. INTRODUCTION TO WORD CLASSES

    Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


    NOUNS

    Basic tags:
    NN1, NN2, NN0 = common nouns
    NP0 = proper nouns

    Ambiguity tags:
    NN1-NP0, NN1-VVB, NN1-VVG, NN2-VVZ
    NP0-NN1, VVB-NN1, VVG-NN1, VVZ-NN2

    Common nouns

    Singular common nouns are tagged NN1, while plurals take NN2:

    A <w NN1>child.
    Several <w NN2>children
    An <w NN1>air of <w NN1>distinction
    Fifteen <w NN2>miles away

    Nouns such as fish, which is morphologically invariant for number, and government, which can take either a singular or plural verb, (so-called 'neutral for number') end in zero: NN0.

    Now the <w NN0>government is considering new warnings on steroids ...
    [K24.3057]

    ... the <w NN0>Government are putting people's lives in jeopardy.
    [A7W.518]

    I caught a <w NN0>fish.
    [KBW.316]

    I had caught four <w NN0>fish with hardly any effort
    [B0P.1387]

    We make no special distinction between common nouns that can be mass (or 'non-count') nouns (eg water, cheese), and other common nouns. All are tagged NN1 when singular and NN2 when plural:

    <w NN1>Cheese is a protein of high biological value.
    [ABB.1950]

    three <w NN2>cheeses.
    [CH6.7834]

    A <w NN1>car glistens in the <w NN1>distance.
    [HH0.1035]

    Three <w NN2>cars, two <w NN2>lorries and a <w NN1>motorbike!
    [CHR.290]

    Abbreviations

    In general we try to tag abbreviations for common nouns (and other word classes) as if they were written as full forms. Abbreviations for measurement nouns are generally tagged NN0 as they are invariant for number.

    Crewe are top of <w NN1>div 3 by 8 points
    [J1C.961]
    (where div = division)

    1 <w NN0>km
    400 <w NN0>km
    (km = 'kilometre' or 'kilometres')

    1 <w NN0>oz.
    6 <w NN0>oz
    (oz = 'ounce' or 'ounces')

    Numeral nouns

    Nouns such as hundred, hundreds, dozens, gross, are all tagged as numbers, CRD, rather than nouns.

     

    Proper nouns

    The tag NP0 ideally should denote any kind of proper noun, but in practice the open-endedness of naming expressions makes it difficult to capture all possible types consistently. We have confined its coverage mainly to personal and geographical names, and even within these, somewhat arbitrary borderlines have had to be drawn. Users of version 1 of the corpus should be aware of a few small but important changes in BNC2.

    (a) Personal names

    <w NP0>Sally
    <w NP0>Joe <w NP0>Bloggs
    <w NP0>Madame <w NP0>Pompadour
    <w NP0>Leonardo <w NP0>da <w NP0>Vinci

    (b) Geographical names

    <w NP0>London
    <w NP0>Lake <w NP0>Tanganyika
    <w NP0>New <w NP0>York

    (c) Also: days of the week; months of the year

    <w NP0>April
    <w NP0>Sunday

    Notes

    1. The distinction between singular and plural proper nouns is not indicated in the tagset, plural proper nouns being a comparative rarity.

      <w NP0>John <w NP0>Smith.
      All of the <w NP0>Smiths.

    2. Multiwords. As the examples in (a) and (b) above show, proper nouns are not processed as multiwords (even though there may be good linguistic reasons for doing so). Each word in such a sequence gets its own tag.

    3. Initials in names

      A person's initials preceding a surname are tagged NP0, just as the surname itself. The choice whether to use a space and/or full-stop between initials (eg J.F. or J. F. or J F or JF) is determined in the original source text; the tagged version follows the same format.

      John F. Kennedy = <w NP0>John <w NP0>F. <w NP0>Kennedy

      J. F. Kennedy = <w NP0>J. <w NP0>F. <w NP0>Kennedy

      J.F. Kennedy = <w NP0>J.F. <w NP0>Kennedy

      IMPORTANT NOTE: In the spoken part of the BNC, however, the components of names -- and, in fact, most words -- that are spelt aloud as individual letters, such as I B M, and J R in J R Hartley, are not tagged NP0 but ZZ0 (letter of the alphabet). See below

    4. Nouns of style

      Preceding a proper noun, or sequence of proper nouns, style (or title) nouns with uppercase initial capitals are tagged NP0.


      <w NP0>Pastor <w NP0>Tokes
      <w NP0>Chairman <w NP0>Mao
      <w NP0>Sub-Lieutenant <w NP0>R <w NP0>C <w NP0>V <w NP0>Wynn
      <w NP0>Sister <w NP0>Wendy

      Contrast: You remember your <w NN1>sister <w NP0>Wendy... [HGJ.800]
      where Wendy is in apposition to a common noun sister, in lowercase letters.

    5. Geographical names

      For names of towns, streets, countries and states, seas, oceans, lakes, rivers, mountains and other geographical placenames, the general rule is to tag as NPO. (If the word the precedes, it is tagged AT0, as normally.)

      <w NP0>East <w NP0>Timor
      <w NP0>South <w NP0>Carolina
      <w NP0>Baker <w NP0>Street
      <w NP0>West <w NP0>Harbour <w NP0>Lane
      <w AT0>the <w NP0>United <w NP0>States
      <w AT0>the <w NP0>United <w NP0>Kingdom
      <w AT0>the <w NP0>Baltic
      <w AT0>the <w NP0>Indian <w NP0>Ocean
      <w NP0>Mount <w NP0>St <w NP0>Helens
      <w AT0>the <w NP0>Alps

      Ordinary (non-NP0) tags are applied to more verbose (especially political) descriptions of placenames, or those that are not typically marked on maps. (As above, the preceding word the is optional.)

      <w AJ0>Latin <w NP0>America
      <w AJ0>Western <w NP0>Europe
      <w AT0>the <w AJ0>Western <w NN1>Region
      <w AT0>the <w AJ0>Soviet <w NN1>Union
      <w AT0>the <w NN0>People<w POS>'s <w NN1>Republic <w PRF>of <w NP0>China
      <w AT0>the <w AJ0>Dominican <w NN1>Republic
      <w AT0>the <w NN1>Sultanate <w PRF>of <w NP0>Oman

      The examples show a little arbitrariness in application, for example with United States counting, and Soviet Union not counting as proper nouns. (Also: <w AT0>the <w AJ0>ex-Soviet <w NN1>Union [KJS.28])

      NB. Multiple-word names containing a compass point, ie. those beginning North, South, East, West, North East, South-west etc. nearly always become NP0, whereas those with Northern, Southern, Eastern, Western follow the non-NP0 pattern. Rare exceptions are:

      <w NP0>Northern <w NP0>Ireland
      <w NP0>Western <w NP0>Samoa

    6. Non-personal and non-geographical names

      -- including eg names of organisations, sports teams, commercial products (incl newspapers), shops, restaurants, horses, ships etc.

      When such names consist of ordinary words (common nouns, adjectives etc.), they receive ordinary tags (NN1, AJ0 etc.)

      Where a word as part of a name is an existing NP0 (typically a personal or geographical name), or a specially-coined name, it is tagged NP0. Examples:

      1. Organisations, sports teams etc.
        Ordinary tagsTagged NP0
        <w NN1>Cable <w CJC>and <w NN1>Wireless<w NP0>Procter <w CJC>and <w NP0>Gamble
        <w NN1>Acorn <w NN1>Marketing <w AJ0>Limited<w NP0>Minolta; <w NP0>IBM; <w NP0>NATO
        <w NP0>Wolverhampton <w NN2>Wanderers ( <w NN1>football <w NN1>club )<w NP0>Tottenham <w NP0>Hotspur ( <w NN1>football <w NN1>club )
        <w AT0>The <w NP0>Chicago <w NN2>Bears<w NP0>Spartak <w NP0>Moscow
        <w NN1>World <w NN1>Health <w NN1>Organisation<w NP0>Oxfam

        There is a slight inconsistency here, in that acronyms of organisation names (WHO, NATO, IBM etc.) take NP0, whereas the expanded forms of these names take regular tags.

      2. Products (including newspapers and magazines).
        Ordinary tagsTagged NP0
        <w NN2>Windows <w NN1>software<w NP0>Weetabix
        <w NP0>Lancashire <w NN1>Evening <w NN1>Post<w NP0>Mars <w NN2>bars
        <w NN1>Time <w NN1>Magazine <w NP0>Scotchgard
        <w AT0>The <w NN1>Reader<w POS>'s <w NN1>Digest<w NP0>Perrier <w NN1>water

        Company names may sometimes be used to represent product names; in such cases the same tags apply. For example:

        John drives a <w NP0>Volkswagen <w NN1>Golf.

        John drives a <w NP0>Volkswagen.

      3. Shops, pubs, restaurants, hotels, horses, ships etc.
        Ordinary tagsTagged NP0
        <w NN1>Body <w NN1>Shop<w NP0>Mothercare
        <w AT0>The <w AJ0>Grand <w NN1>Theatre<w NP0>Sainsburys <w NN1>supermarket
        <w AT0>The <w NN1>King<w POS>'s <w NN2>Arms<w AT0>The <w NP0>Ritz
        <w AJ0>Red <w NN1>Rum<w NP0>Aldaniti
        <w AT0>The <w NN1>Bounty<w AT0>The <w NP0>Titanic

        Here again NP0 is reserved for parts of names that are specially coined, or derived from existing personal/geographical proper nouns.

    7. Changes in NP0 assignment since BNC1

      In the first release of the BNC, the use of NP0 tags applied a little more widely. The geographical category tagged NP0 used to include names of buildings and other institutions. Names of newspapers and magazines used to be treated separately from other products and tagged NP0.

      NOTE THAT IN BNC2 BOTH THESE TYPES NOW TAKE ORDINARY (non-NP0) TAGS:

      • Buildings and institutions

        BNC1:
        <w NP0>Blackpool <w NP0>Tower
        <w NP0>Prospect <w NP0>Theatre <w NN0>Company
        <w NP0>Austro-Hungarian <w NP0>Empire

        BNC2:
        <w NP0>Blackpool <w NN1>Tower
        [B22.1633]
        <w NN1>Prospect <w NN1>Theatre <w NN1>Company
        [A06.1962]
        <w AJ0>Austro-Hungarian <w NN1>Empire
        [G3B.617]

      • Newspapers and magazines

        BNC1:
        <w AT0>the <w NP0>Daily <w NP0>Mail
        <w NP0>Railway <w NP0>Gazette

        BNC2:
        <w AT0>the <w AJ0>Daily <w NN1>Mail
        [D95.334]
        <w NN1>Railway <w NN1>Gazette
        [HWM.1860]

    Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


    VERBS

    Basic tags:
    VBB VBD VBG VBI VBN VBZ = forms of be
    VDB VDD VDG VDI VDN VDZ = forms of do
    VHB VHD VHG VHI VHN VHZ = forms of have
    VM0 = modal verbs
    VVB VVD VVG VVI VVN VVZ = lexical verbs

    Ambiguity tags:
    VVB-NN1 VVD-VVN VVD-AJ0 VVG-AJ0 VVG-NN1 VVZ-NN2 = verb more probable
    NN1-VVB VVN-VVD AJ0-VVD AJ0-VVG NN1-VVG NN2-VVZ = verb less probable

    1. Inflection is marked by the third character in the tag.
    2. --B base form finite

      --D past tense

      --Z 3rd person sing present

      --N past participle

      --I infinitive

      --G present participle

    3. All forms of BE, HAVE and DO receive tags beginning VB-, VH- and VD- respectively.
      Auxiliary and main uses of these verbs are not distinguished.
    4. she <w VBZ>is playing her best tennis for six years.
      [CH3.1383]

      she <w VBZ>is just a star.
      [CH3.6940]

      John <w VHZ>has built a set of bookshelves.
      [C9X.121]

      John <w VHZ>has great courage.
      [CA9.1941]

      We <w VDD>did<w XX0>n't see anybody.
      [KB2.702]

      They <w VDB>do nice work.
      [ANY.514]

      Note the variant form of have in non-standard English:

      they shouldn't <w VHI>of left it the last minute
      [KD8.7289]

      That could <w VHI>of been 'bout us
      [B38.322]

    5. Lexical verbs
    6. Tags beginning VV- apply to all other (lexical) verbs.

      She <w VVZ>travels in every Saturday morning.
      [KRH.4013]

      The young kids <w VVB>want to <w VVI>dance and have fun
      [CHA.1600]

      I <w VVD>thought he <w VVD>looked a sad sort of a boy.
      [CDY.2831]

      ...after <w VVG>running out of coal, the crew were <w VVN>forced to <w VVI>burn timber and resin
      [HPS.270]

    7. Modals
    8. All modals are tagged VM0. We do not differentiate between so-called past and present forms:

      We <w VM0>can go there.

      We <w VM0>could go there.

      We <w VM0>used <w TO0>to go there every year.

      The form let's is treated as one verb:

      <w VM0>Let's <w VVI>go!
      [A61.1443]

    9. Contracted forms (can't, won't, gimme, dunno etc) are split into their component parts, which are tagged individually.
    10. <w VBB>Are<w XX0>n't you coming?
      [A0R.2215]

      I <w VDB>du<w XX0>n<w VVI>no
      [KR0.23]

      It is not always clear if and where they should be divided. Please refer to the list of contracted forms. (See also above on appearance of contracted forms)

    11. Note, in addition, that no special tags apply for the following:

    See further - Disambiguation Guide
    Section 3 Adjective vs. Participle (AJ0 vs. VVG and AJ0 vs. VVN)
    Section 4 's

    Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


    ADJECTIVES

    Basic tags:
    AJ0 AJC AJS
    Ambiguity tags:
    AJ0-NN1, AJ0-VVG, AJ0-VVN = adjective more probable
    NN1-AJ0, VVG-AJ0, VVN-AJ0 = adjective less probable

    1. General adjectives (AJ0)
    2. AJ0 is the general tag for adjectives. It subsumes:

    3. Comparative adjectives receive the tag AJC; superlatives take AJS.
    4. A <w AJC>faster car.

      The <w AJS>best in its class.

    Ambiguities frequently arise between adjectives and other worclasses, in particular adverbs, nouns and participles.

    See further - Disambiguation Guide
    Section 3 ADJECTIVE vs. PARTICIPLE (AJ0 vs. VVG, AJ0 vs. VVN)
    Section 3 ADJECTIVE vs. NOUN (AJ0 vs. NN1)
    Section 3 ADJECTIVE vs. ADVERB (AJ0 vs. AV0, AJC vs. AV0)
    Section 4 well, right

    Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


    ADVERBS

    Basic tags:
    AV0, AVQ, AVP
    Ambiguity tags:
    AV0-AJ0 = adverb more probable; AJ0-AV0 = adverb less probable

    1. AV0 is the default tag for adverbs. It incorporates a very mixed bag, including
    2. Note that adverbs, unlike adjectives, are not tagged as positive, comparative, or superlative. This is because of the relative rarity of comparative and superlative adverbs.
    3. Ordinal-type adverbs are treated separately with the ORD tag
    4. Prepositional Adverb (also known as "Adverbial Particle") AVP - see Prepositions
    5. Interrogative and relative wh-adverbs (when, where, how, why, wherever)
    6. The same tag, AVQ, is applied to these adverbs, whether the word occurs in interrogative or relative use.

      "<w AVQ>When do your courses start?"
      [A0F.3117]

      "...if you let me know <w AVQ>when the police are called in."
      [BMU.2291]

      Yet <w AVQ>why is that so?
      [CR7.3089]

    See further - Disambiguation Guide
    Section 3 ADVERB vs. ADJECTIVE (AV0 vs. AJ0, AV0 vs. AJC )
    Section 3 DETERMINER-PRONOUN vs. ADVERB
    Section 3 ADVERB vs. PREPOSITION
    Section 4 about, as, but, like, little, much, no, right, so, well, when

    Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


    ARTICLES, DETERMINERS & PRONOUNS

    Basic tags:
    AT0 = article
    DPS DT0 DTQ = determiner-pronoun
    PNP PNI PNQ PNX = pronoun only
    Ambiguity tags: none

    1. Articles, tagged AT0
    2. Articles are defined here as determiner words which typically begin a noun phrase, but which cannot occur as the head of a noun phrase. Examples: a/an, the, no and every

      Have <w AT0>a break

      <w AT0>Every year

      There's <w AT0>no time

    3. Determiner-Pronoun: DT0
    4. Recognising that there is a high degree of formal and functional overlap between determiners and pronouns, we have conflated under the D-- heading words that are capable of either function, such as that, few, both, another. Examples:

      at <w DT0>all times of the day
      [A7P.1196]

      free secondary education for <w DT0>all
      [ECB.1610]

      <w DT0>Few diseases are incurable
      [GV1.1130]

      for the benefit of the <w DT0>few
      [HHX.10188]

      DTQ is the wh- (interrogative) determiner-pronoun (and also relative pronoun - see below). Which and what are always tagged DTQ

      <w DTQ>Which country do you live in?
      [A7N.979]

      And she didn't say <w DTQ>which?
      [KCF.352 ]

      <w DTQ>What time is it?
      [A0N.406]

      DPS is the prenominal possessive pronoun (my, your, etc). Eg

      <w DPS>my hat

      Compare the nominal use:

      That is your way. This is <w PNP>mine
      [A0N.726-7]

      [ View list of Determiner-Pronoun tagged words and compounds. ]

    5. `Pronoun-only' words
    6. Tags beginning P-- indicate pronouns which do not share the determiner function, for example I, it , anyone. Pronouns are differentiated according to whether they are:

      [ View list of Pronoun-only words and compounds ]

    7. Relative pronouns
    8. Which as a relative (or interrogative) pronoun is grouped with the other determiner-pronouns, and tagged DTQ.

      Give 4 details <w DTQ>which should appear on an order form
      [HBP.417]

      Meanwhile, that as a relative clause complementizer is treated with that as a complement clause complementizer, and tagged CJT

      I got some currants <w CJT>that are left over
      [KST.3734]

      this girl <w CJT>that Claire knows
      [KC7.1101]

      He dismissed reports <w CJT>that his party was divided over tactics
      [A28.11]

      We both knew <w CJT>that enough was enough.
      [FEX.268]

      Note, however, that that takes the tag DT0 when it functions as a demonstrative pronoun or determiner:

      Look at <w DT0>that bear!
      [KP8.1547]

      I guess I was sad about <w DT0>that.
      [BMM.239]

    For D-- tagged words, the main source of ambiguity is between determiners and adverbs. See Disambiguation Guide
    Section 3: DT0 vs. AV0 (illustrated by more and less) and
    Section 4: much; no; that

    Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


    PREPOSITIONS AND PREPOSITIONAL ADVERBS

    Basic tags:
    PRP PRF AVP
    Ambiguity tags:
    PRP-AVP = Prep more probable; AVP-PRP = Prep less probable

    1. Prepositions
    2. Most prepositions are tagged PRP, including a large number of complex prepositions (shown in bold here). Examples

      <w PRP>at the Pompidou Centre <w PRP>in Paris
      [A04.325]

      I use humour <w PRP>as a protection
      [FBL.363]

      Heard <w PRP>about this have you?
      [KE6.9557]

      <w PRP>According to ancient tradition, ...
      [A04.784]

      Many disputes are dealt with by bodies <w PRP>other than courts.
      [F9B.4]

      Nice walls and a big sky to look <w PRP>at.
      [A25.122]

    3. The preposition of is assigned a special tag PRF because of its frequency and its almost exclusively postnominal function. Examples
    4. a couple <w PRF>of cans <w PRF>of Coke
      [AJN.283]

      DNA consists <w PRF>of a string <w PRF>of four kinds <w PRF>of bases
      [AE7.107]

      NB. Numerous multiwords contain of, eg in front of, in light of, by means of, etc.

      [ View list of Preposition-tagged words and compounds ]

    5. Prepositional adverbs/particles
    6. Preposition-type words which have no complement are tagged AVP. Typical uses of AVP are in phrasal verb constructions, or when it functions as a place adjunct. e.g.

      We gave <w AVP>up after two hours.
      [KSV.1029]

      there were a lot of horses <w AVP>around.
      [HR7.3105]

      The following is a list of possible AVP words:

      'boutaboutalongaround backbydownin off
      onoutoverroundthroughthrutounderup

      Of the above list, all except back allow also a prepositional reading. Thus there are many instances of ambiguity between PRP and AVP. See further - Disambiguation Guide:

    Section 3 Preposition vs. Prepositional Adverb vs. Locative Adverb (PRP vs. AVP vs. AV0)
    Section 4 but, about

    Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


    CONJUNCTIONS

    Basic tags: CJC CJS CJT
    Ambiguity tags: CJS-PRP PRP-CJS

    1. The tag CJC (and, or, but, nor) denotes coordinators.
    2. Fish <w CJC>and chips

      James laughed <w CJC>and spilled wine.
      [A0N.136]

      She was paralysed <w CJC>but she could still feel the pain.
      [FLY.536]

    3. CJS denotes subordinators in:
    4. CJT applies to that-clauses, introducing reported speech and thought, and also relative clauses
    5. Historians knew <w CJT>that this was nonsense.
      [G3C.363]

      China announced <w CJT>that it was ending martial law in the Tibetan capital Lhasa .
      [KRU.95]

      The problem <w CJT>that he was having was <w CJT>that she was his legal wife 's sister
      [HE3.210]

      [ View list of Conjunction words and compounds ]

    See further - Disambiguation Guide
    Section 4 as, so, that

    Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


    NUMERALS

    Basic tags:
    CRD ORD
    Ambiguity tags:
    CRD-PNI, PNI-CRD

    1. Cardinal numbers, numeral nouns, fractions and so on
      take the tag CRD, whether they are written as words or numerals, and whether functioning nominally or prenominally. Examples:
    2. <w CRD>5 out of <w CRD>10
      [CGM.525]

      <w CRD>one striking feature of the years <w CRD>1929-31
      [A6G.134]

      his <w ORD>first innings, when he scored <w CRD>forty-two, with <w CRD>seven <w CRD>fours

      <w CRD>Hundreds of people audition each year
      [K1S.2241]

      About a <w CRD>dozen.
      [H2U.5182]

    3. Ordinal numbers are assigned ORD in all syntactic positions, including adverbial positions, as in
    4. We only came <w ORD>fourth in the county championship <w ORD>last year
      [EDT.1629]

      NOTE: ORD is also assigned to the less overtly numeric words like next and last, even in clear adverbial, adjectival or nominal contexts. This is because next and last function like ordinals both syntactically and semantically.

    5. Currency and measurement expressions
    6. Measurement expressions, consisting of numbers and a unit of measurement of some kind (together as one word), are assigned a noun tag, usually NN0 (neutral for number) or NN2 (plural):

      <w NN0>6kg

      <w NN0>&pound;600

      <w NN0>12.5%

      <w NN2>12&ins; ( = 12 inches)

    7. Other mixtures of numeric and alphabetic characters are assigned UNC (formulaic) tags
    8. Figure <w UNC>2b
      [FTC.248]

      Serial no. <w UNC>S835508
      [C9H.2284]

      <w UNC>A4 sheet of paper
      [CN4.296]

      Mark drove home along the <w UNC>M1
      [AC2.2210]

    The main ambiguity in this category is between one functioning as a cardinal number (CRD) and as a pronoun (PNI).

    Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


    MISCELLANEOUS OTHER TAGS

    The following tags are included here: EX0 | ITJ | POS | TO0 | XX0 | ZZ0 | UNC

    1. EX0 = existential there

      In its existential use there does not carry any real meaning: it merely states that something exists or existed. It occurs at the beginning of a clause and is usually followed by the verb be and an indefinite noun phrase; for example

      <w EX0>There was a long long pause in which nothing at all happened
      [H80.3991]

      Waiter! Waiter! <w EX0>There's an awful film on my soup!
      [CHR.657-9]

      <w EX0>There appears to be little alternative
      [ECE.2139]

      Compare this with there when it has a clear locative meaning ('in/to that place'):

      Don't stand <w AV0>there grinning like a stuck pig
      [C85.1553]

    2. ITJ = interjection.
    3. <w ITJ>Hello, Nell.

      <w ITJ>Oi - come here!

      <w ITJ>Yes , <w AV0>please do

      <w ITJ>No <w XX0>not <w AV0>yet

      For the distinction between ITJ and the unclassified tag, UNC, see section 3: INTERJECTION vs. UNCLASSIFIED.

    4. POS = genitive morpheme 's (singular) or ' (plural after an s), eg

      <w NN1>teacher<w POS>'s pet
      <w NN2>teachers<w POS>' pet

      Note the lack of space between the noun and the following POS, as 's is tokenized in the same way whether it represents a genitive or a contracted verb. See further on tagging of 's in Section 4.

    5. TO0 = the infinitive marker. This includes elliptical uses.
    6. "Do you want <w TO0>to talk about it?"
      [EFG.1935]

      In the summer holidays I can , I can get up early if I want <w TO0>to .
      [KPG.4204]

      Note the morphological variation of to in the following colloquial forms:

      We <w VVN>got<w TO0>ta go

      We <w VVB>wan<w TO0>na stay.

    7. UNC is the tag for unclassified words. It is applied in contexts where no other wordclass tag seems appropriate, including
    8. For the distinction between UNC and ITJ see section 3, INTERJECTION vs. UNCLASSIFIED.
      See also features of spoken corpus tagging.

    9. XX0 is the tag for the negative particle not, and also for its contracted or fused form, eg

      Brown <w VDD>did<w XX0>n't see it that way.
      [A6W.338]

      no, that is <w XX0>not correct.
      [JK0.257]

    10. ZZ0 = letter of the alphabet: A, X, x, p, r
    11. ZZ0 vs. NP0 vs. CRD.
      ZZ0 is the default tag for a single letter of the alphabet.

      If however, the letter clearly represents a separate word, or an abbreviation of a separate word, we have tried to assign the appropriate POS-tag for the full form of that word, rather than ZZ0.

      Examples:

    Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


    DISAMBIGUATION GUIDE

    The following is a guide to resolution of the most common tagging ambiguities. It states the principles by which we have drawn the line between the "correct" and the "incorrect" assignment of a tag in particular contexts (as applied in the report on tagging error rates.) Note that in the next section and section 4, Disambiguation by word, we also cite examples where the POS-tagging in the corpus is less reliable and does not match that given for the citation. In such cases we append the actual tag in the corpus to the file reference with an asterisk. Eg. under Adjective vs Adverb (next section), the preferred tag for long is AV0, but the actual tag is ambiguous AV0-AJ0.

    You're not supposed to keep medicine that <w AV0>long.
    [H8Y.1976 *AV0-AJ0]

    Note also that in this section we use a number of invented examples (in addition to corpus citations) to clarify the distinction between categories.

    SECTION 3. DISAMBIGUATION BY TAG PAIR

    Contents | Tagset | Intro to Wordclasses | Adjective vs. Adverb | Adjective vs. Noun | Determiner-Pronoun vs. Adverb | Adjective vs. Participle | Prep vs. Prep Adverb | Interjection vs. Unclassified | Disambiguate words

    ADJECTIVE vs. ADVERB

    After a verb or an object, there is sometimes a difficult choice between AJ0 and AV0, or between AJC and AV0. e.g.:

    We arrived <w AJ0>tired, but <w AJ0>safe
    [CCP.530]

    Here, both tired and safe are AJ0. The main test is to see whether one can express the relation between these words and their logical subjects using the verb be: They arrived tired but safe implies 'They were tired but safe'. The word tagged AJ0 refers to a property of a noun, rather than to a property of an event or situation. Contrast:

    Peter sang out <w AV0>loud and <w AV0>clear.

    This sentence does not imply that Peter was loud and clear, but is more or less equivalent to Peter sang out loudly and clearly. It means that his singing was loud and clear.

    It follows that when, in colloquial English, a word which we normally expect to be an adjective is used as an adverb, we should tag it AV0; e.g:

    You did <w AV0>great though.
    [HH0.3248 *AV0-AJ0]

    Here is another pair of examples, where the AJ0/AV0 word follows an object:

    everyone below 25 grew their hair too <w AJ0>long.
    [ARP.592 *AV0-AJ0]
    (i.e. 'their hair was too long'.)

    Try not to keep her too <w AV0>long.
    [FAB.3620 *AV0-AJ0]
    (i.e. NOT 'she will be too long.')

    Also note the similar distinction between AJC and AV0:

    We can make this piece <w AJC>higher if you want to.
    [BNG.2270]

    You should aim <w AV0>higher
    [ACN.985 *AJC]

    and between AJS and AV0:

    Delgard thought it <w AJS>best to leave the subject alone.
    [FS4.1559]

    BUT: I liked the cartoons <w AV0>best.
    [CAM.194]

    Contents | Tagset | Intro to Wordclasses | Adjective vs. Adverb | Adjective vs. Noun | Determiner-Pronoun vs. Adverb | Adjective vs. Participle | Prep vs. Prep Adverb | Interjection vs. Unclassified | Disambiguate words


    ADJECTIVE vs. NOUN

    There are many words in English which can be tagged either adjective (AJ0) or noun (NN1). Colour words like black, white and red are fairly consistent in allowing the two tags, and may be used to illustrate the difference. In attributive (premodifying) or predicative (complementing) positions without further modification these words are normally adjectives: a <w AJ0>white screen, The screen is <w AJ0>white. When the word is the head of a noun phrase, on the other hand, it is a noun: <w NN1>Red is my favourite colour. They painted the wall a brilliant <w NN1>white.

    Sometimes a word cannot be used predicatively as an adjective, but can occur attributively in a way which suggests adjectival use. For example, past and present are adjectives in

    (1) All <w AJ0>past and <w AJ0>present employees of the branch are invited.
    [K99.217]

    We do not find present, etc. being used as predicative adjectives, however:

    (2) *These needs are past, present, and future.

    (Note that present can be used as a predicative adjective meaning the opposite of 'absent'; but this meaning is not comparable to the temporal meanings of past, present and future above.)

    Contrast (1) above with cases where past, present etc. are heads of noun phrases, e.g. following the definite article, and are clearly nouns:

    "You're living in the <w NN1>past."
    [HGS.1045]

    "I don't even want to think about the <w NN1>future."
    [JY4.2864]

    The only reason for treating past and present in example (1) above as adjectives is that they have an institutionalized meaning as modifiers, which is rather different from the meaning they have as nouns. Further examples of this type are words such as model in model behaviour, giant in a giant caterpillar and vintage in vintage cars.

    Words ending in -ing are a particular problem: when they premodify a noun, they can be tagged either NN1 (noun) or AJ0 (adjective). Contrast:

    new <w NN1>spending plans
    [CEN.5922]

    a <w AJ0>working mother
    [ED4.153]

    his <w NN1>reading ability
    [CFV.1903]

    in the <w AJ0>coming weeks
    [HKU.1341]

    The guideline is as follows. If X-ing + Noun is equivalent in meaning to 'Noun who/which X-es / X-ed / BE + X-ing', then X-ing is an adjective (AJ0). That is, a word ending -ing is an adjective when it is the notional subject of the noun it premodifies. E.g.:

    two <w AJ0>smiling children ('two children who are/were smiling')
    [HTT.743]

    In other cases, X-ing is generally a noun (NN1). In such cases, it is often possible to paraphrase X-ing + Noun by a more explicit phrase in which X-ing is clearly a noun:

    new <w NN1>spending plans ('new plans for spending')

    his <w NN1>reading ability ('his ability in reading')

    Further examples:

    a <w AJ0>mating animal
    [G08.2142]

    the <w NN1>mating game
    [ECG.336 *AJ0-NN1]

    a <w AJ0>falling rate of unemployment
    [KR2.2129]

    <w NN1>slimming tablets.
    [KCA.941 *NN1-VVG]

    Contents | Tagset | Intro to Wordclasses | Adjective vs. Adverb | Adjective vs. Noun | Determiner-Pronoun vs. Adverb | Adjective vs. Participle | Prep vs. Prep Adverb | Interjection vs. Unclassified | Disambiguate words


    DETERMINER-PRONOUN vs. ADVERB (DT0 vs. AV0)

    More and less can be assigned to either of the tags DAR or AV0. The difference between them is that DT0 is for noun-phrase-like (and determiner-like) uses of the word in question, whereas AV0 is for adverbial uses. The two can be hard to distinguish, particularly after a verb:

    (a) You should relax <w AV0>more.

    (b) You should spend <w DT0>more.

    Since relax is an intransitive verb in (a), more cannot be a noun phrase following it. Instead, more can be paraphrased roughly as 'to a greater extent' or 'to a greater degree'. On the other hand, spend in (b) is a transitive verb, and so more is a determiner-pronoun form following it. As confirmation of this, note that sentence (b) could be turned into a passive with more as subject: More should be spent.... There are unfortunately some verbs for which the distinction is less clear than in the above examples, e.g.:

    You should eat more. You should read more. You should smoke less.

    In these cases, the verb may be used transitively or intransitively with almost identical meanings, so that the syntactic structures of the immediate and/or surrounding context are the only clues as to which is the case:

    Do you smoke? (Intransitive)

    How many do you smoke in a week? (Transitive)

    Contrast (c) and (d) below:

    (c) At the moment we have 23 fixtures per season. Personally, I would rather play <w DT0>more.

    (d) You should work less and play <w AV0>more.

    (In (d) the adverb more has roughly the meaning of 'more often'.)

    Note. The automatic disambiguation of determiners and adverbs is not reliable, because transitivity has not been encoded in the tagger. Sentences like (c) and (d), where more follows the verb at end of a sentence, are invariably tagged AV0.

    Contents | Tagset | Intro to Wordclasses | Adjective vs. Adverb | Adjective vs. Noun | Determiner-Pronoun vs. Adverb | Adjective vs. Participle | Prep vs. Prep Adverb | Interjection vs. Unclassified | Disambiguate words


    ADJECTIVE vs. PARTICIPLE (AJ0 vs. VVG and AJ0 vs. VVN)

    Another area of borderline cases is the tagging of words as adjectives (AJ0) or as participles (VVG or VVN).

    1. In both cases, the word can be an AJ0. One test is to see whether a degree adverb like very can be inserted in front of the word: e.g. in We were very surprised, surprised is an AJ0.

    2. Another test, having the opposite effect, is to see whether there is an agent by-phrase following the word in -ed or -en. If so it is a VVN: e.g. We were <w VVN>surprised by pirates. Even where it is not present, the possibility of adding the by-phrase, without changing the meaning of the word, is evidence in favour of VVN. (However, this criterion can clash with the preceding one - since it occasionally happens that an -ed word is both preceded by an adverb like very AND followed by a by-phrase: E.g. I was so irritated by his behaviour that I put the phone down. When these do occur, we give preference to AJ0.)

    3. A third test is negative: to see whether the word in question can be placed before a noun. e.g.:

      The effect is <w AJ0>lasting (compare a <w AJ0>lasting effect).

      The door is <w AJ0>locked (compare the <w AJ0>locked door.)

      This shows that lasting or locked can easily be (but need not be) an AJ0. If the word could not be placed (with the same meaning) before the noun, this would be evidence that the word is a participle.

    4. Even though an -ing word is normally a VVG after the verb be, it is generally treated as an AJ0 before a noun:

      The man was <w VVG>dying.
      [HTM.1494 *VVG-AJ0]

      BUT: the <w AJ0>dying man.
      [FSH.606]

    5. However, when the -ing or -ed forms part of a premodifying phrase, the VVG or VVN tag is preferred:

      an <w NN1>interest <w VVG>earning account

      a <w NN1>hypothesis <w VVN>driven approach

      In these examples the NN1+VVG/VVN sequence has the character of a premodifying adjective compound. We can therefore imagine the two words bracketed together forming an adjective: an [AJ0 interest-earning AJ0] account. But within the adjective, the VVG and VVN tags retain their verbal character, with the initial noun acting as object of the verb (cf. the account earns interest).

      The same applies when the premodifying compound phase is noun-like:

      a [ <w NN1>shanty <w VVG>singing ] competition
      [K4W.2952]

    6. If the verb be can be replaced by another verb such as seem or become, without changing the meaning of the following AJ0 / VVN word, this is a strong indication that the construction is not properly a passive, and that the word is an AJ0:

      The building was <w AJ0>infested with cockroaches

      (cf.: The building seemed/became infested with cockroaches)

    7. A further distinction which can be used to test with 'event' verbs is that the AJ0 refers to a 'resultant state', whereas the VVN refers to an event:

      Bill was <w AJ0>married. (i.e. he was not single)

      Bill was <w VVN>married to Sarah on the 15th May. (i.e. the actual event)

      This is a manifestation of the general semantic character of adjectives (which typically refer to states or qualities) and verbs (which typically refer to events or actions).

      However, this criterion is not definitive, as VVG and VVN can also sometimes refer to states, when the meaning of the verb is stative:

      She is not <w VVN>disturbed by that sort of threat.

      The tourists were <w VVG>standing around a map of the city.

    8. Finally, here is a test which clearly identifies an -ing form as a verb.

      A verb takes following complements such as a noun phrase, an adjective or an adverbial. These cannot follow the same word as adjective. E.g.:

      Are you <w VVG>expecting someone?

      The arithmetic is <w VVG>looking good.

      <w VVG>Turning suddenly, she ran for the safety of the car

      Contrast:

      His manner was <w AJ0>insulting.

      where insulting could not normally be followed by an object:

      * insulting us.

    Contents | Tagset | Intro to Wordclasses | Adjective vs. Adverb | Adjective vs. Noun | Determiner-pronoun vs. Adverb | Adjective vs. Participle | Prep vs. Prep Adverb | Interjection vs. Unclassified | Disambiguate words


    PREPOSITION vs. PREPOSITIONAL ADVERB vs. GENERAL ADVERB
    (PRP vs. AV0, and PRP vs. AVP)

    This kind of ambiguity occurs frequently, particularly in spoken texts. Compare:

    (a) She ran <w PRP>down the hill.

    (b) She ran <w AVP>down her best friends.

    In (a), down is a preposition, because:

    In (b), down is an adverbial particle because:

    Notice that the syntactic distinction between (for example) down as an adverbial particle and down as a preposition is independent of the semantic distinction between locative and non-locative interpretations of down.

    When the verb is simply followed by down or out, etc., without a following noun phrase, it is normally an AVP:

    Income tax is coming <w AVP>down.

    The decorations are put <w AVP>up on Christmas Eve.

    However, it is important to recognize 'stranded' prepositions, which have been deprived of the company of their noun phrase, the prepositional complement, because it has been fronted or omitted through ellipsis (e.g. in relative clauses, with passives, in questions, etc.):

    This is the hill (which) she ran <w PRP>down.
    (Cf. This is the hill down which she ran.)

    The poor were looked down <w PRP>on by the rich.
    (Here on is the stranded preposition)

    Which car did she arrive <w PRP>in?

    The same tests apply to words which are tagged either as prepositions or as general adverbs (AV0), such as across, past and behind.

    Note, additionally, the use of about as a degree adverb.

    Contents | Tagset | Intro to Wordclasses | Adjective vs. Adverb | Adjective vs. Noun | Determiner-pronoun vs. Adverb | Adjective vs. Participle | Prep vs. Prep Adverb | Interjection vs. Unclassified | Disambiguate words


    INTERJECTION vs. UNCLASSIFIED (ITJ vs. UNC)

    The borderline between interjections or exclamatory particles (tagged ITJ) and unclassified 'noise' words (tagged UNC) is drawn as follows:

    ITJ is used for 'institutionalized' interjections or discourse particles such as

    good-byeoh nooopshallelujahwhoawow

    Well, right and like functioning as discourse markers are tagged AV0.

    UNC is used in contexts where no other wordclass tag seems appropriate:

    Contents | Tagset | Intro to Wordclasses | Adjective vs. Adverb | Adjective vs. Noun | Determiner-pronoun vs. Adverb | Adjective vs. Participle | Prep vs. Prep Adverb | Interjection vs. Unclassified | Disambiguate words


    SECTION 4 : DISAMBIGUATION BY WORD

    In this section we discuss some common words which belong to more than one word class, and are among the most problematic for disambiguation. As in section 3, if the tag stated in the example differs from the actual tag in the corpus, we append the latter to the file reference number in the next line. Eg *AV0 in

    Tears <w VVB>well up in my eyes.
    [BN3.5 *AV0]

    The words covered are:

    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    apostrophe 'S

    Choice of tags: VBZ VHZ VDZ POS
    [in fused words: VM0 ZZ0 CRD ]

    In the BNC the apostrophe 's is generally tagged as a separate wordform (that is <w TAG>'s ), attached without a space to the immediately preceding word.

    1. Contracted forms, When it represents a shortened form of is, has or (rarely) does, it has the appropriate verb tag. Occasionally, for example with auxiliaries followed by past participles, there are difficulties determining what the full form of the verb should be. Examples:

      <w DT0>That<w VBZ>'s perfect is that one... (= That is...)
      [KCX.1254]

      <w NP0>She<w VHZ>'s got tickets. (= She has...)
      [KPV.6481]

      well, <w DTQ>what<w VDZ>'s he do?, is he a plumber? (= What does...)
      [KD6.310]

    2. Genitives

      <w NP0>Britain<w POS>'s small businesses
      [HMH.67]

      After <w AV0>today<w POS>'s announcement
      [K6F.39]

    3. However, when 's acts as a marker of the -s plural, or as part of the verb form let's, it is part of a single word, and is not assigned its own tag. E.g.:

      success in the three <w ZZ0>R's
      [EVY.59]

      in the <w CRD>1980's
      [HJ1.22024]

      <w VM0>Let's <w VVI>go.
      [A61.1443]

      Note that let's is not considered a contraction of let us, but is treated as a single 'verbal particle', tagged VM0, on the grounds that it is closely analogous to modal auxiliaries.

    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    ABOUT

    Choice of tags: PRP, AV0 and AVP

    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    AS

    Choice of tags: PRP, AV0 and CJS (also multiword tags)

    1. Comparative constructions:

      As is a degree adverb (AV0) when it occurs before an adjective, adverb or determiner (and sometimes other words) in phrases of the type as X as Y, or simply as X (where the comparative clause or phrase as Y) is omitted but understood:

      I go to see them <w AV0>as often as I can .
      [AC7.1192]

      and they employ ninety people, twice <w AV0>as many as last year.
      [K1C.3542]

      And every bit <w AV0>as good.
      [EEW.1132 *CJS]

      In the first and second examples above, the second as introduces a comparative construction which expresses 'equal comparison', as contrasted with the unequal comparison of more X than Y. When as is a word introducing such a comparative construction, it is tagged CJS:

      Capitalism is not <w AV0>as good <w CJS>as it claims.
      [CFT.2051]

      Linked together, they can crunch numbers <w AV0>as fast <w CJS>as any mainframe.
      [CRB.271]

      She will deposit <w AV0>as many <w CJS>as a dozen eggs there.
      [F9F.424]

      Notice that as in this comparative use is tagged CJS whether or not it introduces a clause. Often it introduces a noun phrase. In the following example, it introduces an adjective:

      always reply <w AV0>as quickly <w CJS>as possible.
      [C9R.989]

    2. Introducing other clauses:

      The tag CJS is also used when introducing other subordinate clauses, such as adverbial clauses of time or reason:

      Mr Phelps arrived just <w CJS>as I was leaving.
      [G1K.1685]

      <w CJS>As you've gone to so much trouble , it would seem discourteous to refuse
      [G1K.1685]

    3. Preposition:

      The tag PRP is used for as functioning clearly as a preposition:

      Consider it <w PRP>as a kind of insurance
      [AD0.1641]

      <w PRP>As head of information, Christina will lead a team of four TEC staff...
      [BM4.2830]

      Usually the meaning is related to the equative meaning of the verb be. However, the guideline restricts PRP to cases where as is followed by the normal noun phrase or nominal, as is normal for prepositions. Where the as is followed by an adjective or a past participle clause, it is tagged CJS, even though it may retain the equative type of meaning:

      We regard these results <w CJS>as encouraging.
      [B1G.184]

      I very much hope that you will in fact support the motion <w CJS>as originally intended.
      [KGX.93]

    4. Multiwords:

      As is part of many multiwords which get tagged with a single tag: e.g. as soon as, such as, in so far as, as long as, as well as. The sequence as well as, for example, is tagged as a preposition (PRP) in such examples as

      Sometimes <w PRP>as well as going this way we actually need to go in this was too.       [G5N.31]

      Note that this is different from the multiword adverb as well (meaning also); it is also different from the sequence of as well as as three separate words, e.g. in:

      She's <w AV0>as <w AJ0>well <w CJS>as can be expected.
      [F9X.2095]

    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    BUT

    Choice of tags: CJC, CJS, PRP, AV0
    The coordinating conjunction CJC is overwhelmingly the most common use of but.

    1. Adverb:

      But is an adverb when its meaning is similar to 'only':

      She can spare you <w AV0>but a few minutes
      [CCD.82 *CJC]

      There is <w AV0>but one penalty.
      [ALS.185 *CJC]

    2. Subordinating conjunction or preposition:

      But is either a conjunction (CJS) or a preposition (PRP) if it has the meaning of 'except (for)', 'other than' or 'apart from'. CJS is used when it introduces a clause, and PRP is used when it introduces a phrase:

      ...mediocre albums that do nothing <w CJS>but take up shelf space
      [C9M.1014]

      I couldn't help <w CJS>but notice.
      [JY0.5323 *CJC]

      I always feel they are open meetings in everything <w PRP>but name.
      [HJ4.5520]

      No one had guessed she was anything <w PRP>but a boy.
      [C85.517]

    3. Coordinating conjunction:

      Otherwise but is a coordinating conjunction, tagged CJC, linking units of the same kind (e.g. clauses or adjective/adverb phrases). Its function is to express contrastive or 'adversative' meaning:

      God and minds do exist , <w CJC>but materially so .
      [ABM.1260]

      And that's it for another week <w CJC>but don't forget the late news at eleven thirty.
      [J1M.2520]

      Hares ( <w CJC>but not rabbits ) are particularly vulnerable...
      [B72.892]

    4. Multiwords

      Note also multiwords such as but for (PRP):

      The fare increases would have been bigger <w PRP>but for the governments last minute intervention.
      [K6D.124]


    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    HOME

    Choice of tags: AV0 and NN1

    As a locative adverb, home has no determiner or article preceding:

    We stayed <w AV0>home.
    [FAP.313]

    This place is my <w NN1>home.
    [AMB.1805]


    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    LIKE

    Choice of tags: PRP AV0 CJS VVB VVI NN1 AJ0

    1. Discoursal function:

      In speech, when like has a discoursal function as a 'hedge', we tag it AV0:

      well she says <w AV0>like, I won't be a minute
      [KCY.1518]

      I'm driving along, you know <w AV0>like <trunc> wha </trunc> when you're in the car by yourself and everything's turning over in your head
      [KBU.1096]

    2. Other functions:

      Like very frequently occurs as a preposition or as a verb. The noun and adjective uses are fairly rare:

      ...but I <w VVB>like Monday best.
      [FU4.1089]

      He didn't look <w PRP>like a goodie.
      [H0M.1353]

      ... fuel, weapons, ground crew and the <w NN1>like.
      [J1N.105 *AJ0-NN1]

      Churchill and Eden were not of <w AJ0>like minds...
      [ACH.1299]


    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    LITTLE

    Choice of tags: AJ0, DT0, AV0, (also multiwords)

    1. Adjective:

      The meaning of little (AJ0) is the opposite of big:

      Bless their dear <w AJ0>little faces.
      [HRB.722]

      <w AJ0>Little green shoots of recovery are stirring.
      [CEL.968]

    2. Determiner-pronoun:

      The meaning of little (DT0) is 'not much':

      I have <w DT0>little to say.
      [G1Y.1137]

      ...there was <w DT0>little food left.
      [FSJ.721]

    3. Adverb:

      As an adverb (AV0), too, little has the meaning 'not much':

      I care very <w AV0>little about petty-minded, selfish "rules".
      [B0P.211]

    4. A little

      Note that a little can also be a multiword adverb (AV0):

      They are all <w AV0>a little drunk.
      [G0F.2118]

      However, the quantifier a little meaning 'a small amount' is not tagged as a multiword1 but as AT0 + DT0:

      You couldn't let me have <w AT0>a <w DT0>little milk?
      [GUM.1661]

      [See DETERMINER-PRONOUN vs. ADVERB ]


    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    MUCH

    Choice of tags: DT0 AV0

    1. Determiner-pronoun:

      <w DT0>Much of this work has to be done on the spot.
      [C8R.24]

      I've spent too <w DT0>much money.
      [KPV.6261]

    2. Adverb:

      Thanks very <w AV0>much.
      [A73.5]

      I didn't sleep <w AV0>much last night
      [ALH.1495]

      See also DETERMINER-PRONOUN vs. ADVERB


    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    MORE and LESS

    Choice of tags: DT0 AV0 AV0


    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    NO

    Choice of tags: AT0 NN1 AV0 ITJ

    1. Article

      <w AT0>No <w NN1>problem.
      [H4H.227]

    2. Noun

      As a noun, no is usually an abbreviation for number:

      quoting <w NN1>Ref <w NN1>No <w UNC>BCE90/10/4(NS)
      [CJU.673]

    3. Adverb

      but the matter was taken <w AV0>no <w AV0>further.
      [ARF.183 no: *AT0]

      To put it <w AV0>no <w AV0>more <w AV0>strongly, it has not been proved beyond doubt that....     [EW7.125]

    4. Interjection:

      No is tagged as an interjection (ITJ) where it functions as the opposite of Yes.

      "...See how easy my job can be?"
      "Frankly, <w ITJ>no".
      [HR4.2329]

    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    ONE

    Choice of tags: PNI, CRD

    1. Numeral:

      The clearest cases of CRD are:

      In a quantifying noun phrase, typically allowing the substitution of another numerical expression (e.g. one chip contrasts with two chips) or of the digit 1 (1 chip):

      Can I have <w CRD>one chip, please?
      [KDB.1417]

      So are there criticisms? Just <w CRD>one.
      [CG2.1490]

      ... <w CRD>one in five sufferers never tells their partners.
      [CF5.8 *PNI]

      Orford Ness is <w CRD>one of Britain's most unusual coastal features.
      [CF8.86]

      In such noun phrases, one functions like a determiner-pronoun such as some.

    2. Indefinite Pronoun:

      The clearest cases of PNI are:

      (a) As a substitute form, standing for an understood noun or noun phrase:

      The channel was not a broad <w PNI>one
      [AEA.1461]

      In this use, one has a plural form ones.

      (b) As a generic personal pronoun, meaning 'people in general':

      And I think <w PNI>one might go on to argue that far from saving labour it creates it.
      [J17.1915]

    Note that the reliability of the ambiguity tag PNI-CRD (in which the pronoun is rated more likely) is somewhat low. See Error rates, table 2.

    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    RIGHT

    Choice of tags: AV0 VVB VVI NN1

    As both an adverb (AV0) and an adjective (AJ0) right means the opposite of 'wrong' and also the opposite of 'left'. As a noun, it generally means 'entitlements': e.g. I have a <w NN1>right to know. The uses of right as a verb are very rare.

    Less obvious points:

    1. Discoursal function:

      As a discourse marker, right is tagged AV0:

      <w AV0>Right, how you doing there?
      [KBL.4671]

      <w AV0>Right, er, members, any questions to <pause> the speakers?
      [F7V.139]

    2. Degree adverb (intensifier):

      In dialectal usage, right can be an intensifier, and is tagged AV0:

      it's a ... it's a <w AV0>right soft carpet.
      [KB2.1242-4]

    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    SO

    Choice of tags: AV0 AV0 CJS

    1. In most cases so is tagged as an adverb (AV0):

      "<w AV0>So this is where you work..."
      [H8M.2964]

      Right, <w AV0>so what's fifty three per cent as a decimal?
      [JP4.354]

      They waited but nothing happened <w AV0>so they made a fuss.
      [FU1.2484]

    2. As a pro-form meaning 'thus' or standing for a clause/predicate, so is tagged AV0:

      <w AV0>So say I and <w AV0>so say the folk.
      [G11.230]

      "Yes, I think <w AV0>so."
      [CCM.151]

    3. As a degree adverb or intensifier, so is tagged AV0:

      tough and long lasting -- that's why they're <w AV0>so popular.
      [BN4.940]

      ... there are <w AV0>so many lonely people in hospitals
      [FPS.2227]

    4. Introducing purpose clauses, so is tagged CJS (subordinating conjunction):

      Drink your tea <w CJS>so they can have your cup.
      [KB2.1767]

    5. Note that so is frequently part of a multiword: so that, so far, so as to, (in) so far as, etc. See the list of multiwords

    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    THAT

    Choice of tags: DT0 CJT AV0

    1. As a demonstrative (pronoun or determiner), that is tagged DT0:

      <w DT0>That<w VBZ>'s my coat yeah.
      [KBS.1310]

      he's getting hooked on the taste of vaseline, <w DT0>that dog.
      [KCL.197]

    2. As a clause-initiating conjunction, that is tagged CJT:

      This applies to that as a complementizer:

      Many experts claim <w CJT>that it is good for your growing baby, too.
      [G2T.1091]

      and also to that as a relativizer (introducing a relative clause):

      A ship <w CJT>that never enters harbour.
      [BPA.1326]

      This is different from the more traditional analysis which treats that introducing a relative clause as a relative pronoun.

    3. As a degree adverb (intensifier):

      It wasn't all <w AV0>that bad.
      [KPP.322]

    4. In multiwords: That occurs commonly in multiwords such as so that, in that, in order that.


    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    THEN

    Choice of tags: AV0 AJ0

    In all functions except clear adjectival usage (AJ0, usually following the), then receives the tag AV0:

    And <w AV0>then she spoke.
    [H8T.2675]

    "Come on, <w AV0>then."
    [K8V.1722]

    Mr Willi Brandt, the <w AJ0>then Mayor of West Berlin.
    [A87.84]

    ...the <w AJ0>then state governor , who wasn't <w AV0>then Bill Clinton
    [JSM.131]


    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    TO

    Choice of tags: TO0 PRP AVP

    1. Infinitive marker (TO0):

      Note elliptical uses of the pre-infinitival to, especially in informal spoken texts:

      In the summer holidays, I can, I can get up early if I want <w TO0>to.
      [KPG.4204]

      Note also the common colloquial spelling of want to, got to, and going to as fused words:

      wanna = <w VVB>wan<w TO0>na
      gotta = <w VVN>got<w TO0>ta
      gonna = <w VVG>gon<w TO0>na

    2. Preposition (PRP): Prepositions are normally followed by a noun phrase or nominal clause. Where the preposition is 'stranded' (i.e. where the noun phrase associated with the preposition has been moved or ellipted) it can be confused with an adverbial particle:

      That 's the school that Terry goes <w PRP>to.
      [KB8.2443]

      ...what you're entitled <w PRP>to by law is money back
      [FUT.360]

      "Where <w PRP>to?"
      "<w PRP>The moon."

      [FNW.240-1]

    3. The adverbial particle to is extremely rare: it occurs in come to meaning 'regain consciousness'.


    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    WELL

    Choice of tags: AV0 VVB VVI AJ0 NN1

    1. By far the most common function for well is as an adverb: e.g. She's playing <w AV0>well..

    2. Discoursal function:

      When well has the function of a discourse marker, it is treated as an adverb (AV0):

      Oh <w AV0>well. That'll be the finish.
      [FX6.196-7]

      I bet he doesn't get up till about, <w AV0>well, it's eleven now.
      [KBL.3808]

    3. Degree adverb:

      Well is tagged AV0, too, where it has an intensifying function: e.g.

      It was dark outside and <w AV0>well past your bedtime.
      [ASS.898]

    4. Adjective (AJ0):

      Well is tagged as an adjective where it means 'in good health': You don't look <w AJ0>well.
      [HPR.107]

    5. As a verb, well is very rare, and occurs in the phrasal verb well up. NB. This use has not been accurately tagged in the corpus:

      Tears <w VVB>well up in my eyes.
      [BN3.5 *AV0]


    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    WHEN

    Choice of tags: AVQ CJS

    When can introduce three types of clauses: an adverbial clause, a nominal clause, or a relative clause. Where it introduces an adverbial clause, it is tagged CJS. Otherwise it is tagged AVQ. The AVQ tag is also used for when introducing a question. Examples:

    1. Adverbial clause:

      <w CJS>When I got back to my flat, I decided to ring Toby.
      [CS4.1265]

      the crowd left quietly <w CJS>when the police arrived.
      [APP.1017]
      (when = at the time at which)

      If you smoke <w CJS>when you're pregnant...
      [A0J.1600]
      (when = whenever)


      Note that when is also a conjunction (CJS) in abbreviated adverbial clauses which lack a subject and finite verb, such as when in doubt, when ready, when completed.

    2. Nominal clause

      I can't remember <w AVQ>when we last had a frost.
      [KBF.11728]

      "Do you remember <w AVQ>when we used to go with Daddy in the boat on Saturdays?"
      [A6N.2022]

      You never know <w AVQ>when the next big story will break.
      [HJ6.101]
      (when = at what time)

      Before an infinitive, when is also tagged AVQ:

      Otto knew <w AVQ>when to change the subject.
      [FAT.1606]

      Also when the rest of the infinitive clause is understood: Tell me <w AVQ>when.

    3. Relative clause

      in the year <w AVQ>when I was born (when = in which)

      the moment <w AVQ>when he arrived (when = at which)

      Note that when can often be omitted in relative clauses: the moment he arrived.

    4. Direct questions

      <w AVQ>When did you find out?


    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    WHERE

    Choice of tags: AVQ CJS

    Where is like when in that it can be a wh- adverb (AVQ) or a subordinating conjunction (CJS). However, with where the CJS tag is much less likely than the AVQ tag. Examples:

    1. In adverbial clauses (CJS):

      ...to hit him <w CJS>where it hurts.
      [CEN.2816]

    2. In other contexts it is tagged AVQ:


    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    WORTH

    Choice of tags: PRP NN1

    1. Preposition

      Worth is tagged PRP where it could answer the question 'How much is ... worth?' or 'What is ... worth?'

      these pictures are <w PRP>worth a small fortune.
      [FNT.1060]

      That makes him <w PRP>worth about $dollar;60m.
      [CT3.479]

      'Darling, it's not <w PRP>worth getting upset.
      [HH9.2310]

      Worth also occurs as a 'stranded preposition' in questions used to elicit such responses, and in some other common constructions:

      how much d'ya think it's <w PRP>worth?
      [KCX.1344]

      share prices say nothing about what a company is <w PRP>worth.
      [A9U.305 *NN1]

      Please go ahead and push Grapevine for all you are <w PRP>worth.
      [AP1.575]

    2. Noun

      Worth is tagged NN1 when it is an obvious noun (meaning 'value'). Typically this occurs following expressions of quantity, whether or not the quantity is expressed by a possessive or genitive (e.g. its, 's).

      Baker showed his <w NN1>worth for Ipswich in the 20th minute
      [CF9.103]

      hundreds of pounds' <w NN1>worth of damage.
      [A0H.15]

      £2,500 <w NN1>WORTH OF PRIZES
      [ECJ.1147]

    apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


    SECTION 5: Features of spoken corpus tagging

    The spoken and written texts of the BNC have been tagged in the same way, except that the following phenomena occur almost entirely in the spoken part of the corpus.

    Footnotes

    1 In BNC version 1, the quantifier a little meaning 'a small amount' was sometimes (but not reliably) tagged as a multiword DT0. See multiword list for differences in multiword tag assignment from the earlier tagging of the corpus.

    2 In our experience, human analysts too sometimes have difficulty resolving ambiguities such as these, especially when using the plain orthographic transcriptions of the BNC, and with no direct access to the original sound recordings.

    Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs | Disambiguate words


    Related documents

    Introduction to the manual | Automatic tagging of the BNC | Error rates | Acknowledgments

    Date: 17 March 2000