add this bookmarking tool

Written texts

Divisions of written texts

Written texts exhibit a rich variety of different structural forms. Some have very little organization at levels higher than the paragraphs; others may have a complex hierarchy of parts, sections, chapters etc. Novels are divided into chapters, newspapers into sections, reference works into articles and so forth. In the BNC all such structural divisions are represnted uniformly by means of the <div> element.
  • <div> (text division) contains a subdivision of the front, body, or back of a text.
    for a spoken text, identities the tape corresponding to this division.
    specifies the hierarchic level of this division as a number between 1 (outermost or largest division) and 4 (innermost or smallest).
    identifies the type or function of the division (for a written text).
The n attribute is sometimes used to supply an identifying name or number used within the text for a given division, for example, a chapter number, as in the following example:
 <div type="chapter" n="three" level="1">...</div>

More often, however, chapter names or numbers will appear within the text, tagged using the <head> element discussed in section Headings and captions below.

The value of the attribute type is used to characterise the function of the textual division, according to an informal taxonomy listed above. If a value is supplied for one division at a given level, it may be assumed to apply to all subsequent divisions at the same level until the end of the enclosing element, although it is not always explicitly specified.

Where <div> levels are nested, for example where the chapters of a novel are grouped into parts each of which may have its own title or number, the level attribute is used to indicate the depth of nesting. This is not strictly necessary (since an XML-aware processor retains this information) but has been added for the convenience of users of previous versions of the corpus, in which the level was explicitly coded into the name of the surrounding element (<div>, <div> etc.)

In text ANY, for example, each chapter of the original novel corresponds with a <div level="2">, because the work contains groups of chapters, each of which begins with a page containing just a date. The opening of the text is therefore encoded as follows:
 <wtext type="FICTION">   <div level="1" n="1">    <s n="1">     <w c5="NP0" hw="monday" pos="SUBST">Monday</w>     <c c5="PUN">, </c>     <w c5="NP0" hw="january" pos="SUBST">January </w>     <w c5="ORD" hw="13th" pos="ADJ">13th</w>     <c c5="PUN">, </c>     <w c5="CRD" hw="1986" pos="ADJ">1986</w>     <c c5="PUN">.</c>    </s>    <div level="2" n="1">     <p>      <s n="2">       <w c5="NP0" hw="victor" pos="SUBST">Victor </w>       <w c5="NP0" hw="wilcox" pos="SUBST">Wilcox </w>       <w c5="VVZ" hw="lie" pos="VERB">lies </w>       <w c5="AJ0" hw="awake" pos="ADJ">awake</w>      ...       </s>     </p>    </div>   </div>...</wtext>

Note however that in some texts initial sentences (like ‘Monday, January 13th, 1986’ above) may have been misplaced, so that they appear at the start of an inner <div> rather than the start of its parent.

A sequence of paragraph-level elements of arbitrary length may precede the first structural subdivision at any level. A text may have no structural divisions within it at all. Note that any prefatory or appended matter not forming part of a text will not generally be captured: the tei elements <front> and <back> elements are not used.

Paragraph-level elements and chunks

Written texts may be organized into structural units containing more than one <s> element and smaller than any of the divisions discussed in section Divisions of written texts above. The most commonly found such element is the <p> (paragraph):
  • <p> (paragraph) marks paragraphs in prose.
    indicates how the paragraph is displayed Values are:
    the paragraph is displayed as a caption
    the displayed paragraph contains a byline
    the paragraph is displayed as a floating caption
    the paragraph is displayed as an attached caption
    (In addition to global attributes and those inherited from [att.rendered ] )
Several other elements may however appear directly within <div> or within <text> elements, not nested within some other element such as a paragraph. An list of these elements follows:
  • <head> (heading) contains any type of heading, for example the title of a section or a poem.
    a code briefly characterising the way the element content was originally presented.
  • <quote> (quotation) contains a phrase or passage attributed by the narrator or author to some agency external to the text.
  • <sp> (speech) An individual speech in a performance text, or a passage presented as such in a prose or verse text.
  • <lg> (line group) contains a group of verse lines functioning as a formal unit, e.g. a stanza, refrain, verse paragraph, etc.
  • <list> contains any sequence of items organized as a list.
  • <note> contains a note or annotation.
    internal identifier
  • <bibl> (bibliographic citation) contains any bibliographic reference, occurring either within the header of a written corpus text in which case it has a fixed substructure, or within the body of a corpus text, in which case it contains only s elements.

Each of these elements contains one or more <s> elements, as discussed above; in some cases enclosed by an intermediate element. They are used chiefly to indicate the function of sections of the text, as indicated in the list above.

The following sections provide examples for the use of each of these elements.

Headings and captions

One or more <head> elements of specified types may appear in sequence at the start of any <div> element, or at the start of a <list> or <poem>, as in the following examples:.
 <div level="1" n="1">   <head type="MAIN">    <s n="1">     <w c5="NN1" hw="ageism" pos="SUBST">AGEISM</w>    </s>   </head>   <head type="SUB">    <s n="2">     <w c5="AT0" hw="the" pos="ART">THE </w>     <w c5="NN1" hw="foundation" pos="SUBST">FOUNDATION </w>     <w c5="PRF" hw="of" pos="PREP">OF </w>     <w c5="NN1" hw="age" pos="SUBST">AGE </w>     <w c5="NN1" hw="discrimination" pos="SUBST">DISCRIMINATION</w>    </s>   </head>   <head type="BYLINE">    <s n="3">     <w c5="NP0" hw="steve" pos="SUBST">STEVE </w>     <w c5="NP0-NN1" hw="scrutton" pos="SUBST">SCRUTTON</w>    </s>   </head>...</div>

As shown above, the type attribute is used to distinguish more exactly the function of a heading.

Note that, in the BNC, captions or headings which ‘float’ within the text, that is, which appear elsewhere than at the very beginning of the section which they name, are not encoded as <head> elements. A <head> element can appear only at the start of a text division and is logically associated with it (for example, chapter titles, newspaper headlines etc.). Paragraphs which provide heading or captioning information, but which are logically independent of their position within a textual division (for example, captions attached to pictures or figures, or ‘pull-quotes’ embedded within the text) are represented in the same way as any other paragraph of text, using the <p> element, but specifying the value caption in their rend attribute.

In the following example, the <head> element is followed by a number of captions introducing particular parts of a magazine story:
 <div level="1">   <head>    <s n="40">     <w c5="NN2" hw="trousers" pos="SUBST">TROUSERS </w>     <w c5="VVB-NN1" hw="suit" pos="VERB">SUIT</w>    </s>   </head>   <p type="caption">    <s n="41">     <w c5="EX0" hw="there" pos="PRON">There </w>     <w c5="VBZ" hw="be" pos="VERB">is </w>     <w c5="PNI" hw="nothing" pos="PRON">nothing </w>     <w c5="AJ0" hw="masculine" pos="ADJ">masculine </w>     <w c5="PRP" hw="about" pos="PREP">about </w>     <w c5="DT0" hw="these" pos="ADJ">these </w>     <w c5="AJ0" hw="new" pos="ADJ">new </w>     <w c5="NN1" hw="trousers" pos="SUBST">trouser </w>     <w c5="VVZ-NN2" hw="suit" pos="VERB">suits </w>     <w c5="PRP" hw="in" pos="PREP">in </w>     <w c5="NN1" hw="summer" pos="SUBST">summer</w>     <w c5="POS" hw="'s" pos="UNC">'s </w>     <w c5="AJ0" hw="soft" pos="ADJ">soft </w>     <w c5="NN2" hw="pastel" pos="SUBST">pastels</w>     <c c5="PUN">.</c>    </s>...</p>...</div>


A quotation is an extract from some other work than the text itself which is embedded within it, for example as an epigraph or illustration. It is marked up using the <quote> element. This may contain any combination of other chunks (for example paragraphs, poems, lists) but may not directly contain phrase-level elements. A reference for the citation may also be contained within it.

For example:
 <quote>   <p>    <s n="2080">     <w c5="DT0" hw="this" pos="ADJ">This </w>     <w c5="NN1" hw="way" pos="SUBST">way </w>     <w c5="PRP" hw="for" pos="PREP">for     </w>     <w c5="AT0" hw="the" pos="ART">the </w>     <w c5="AJ0" hw="sorrowful" pos="ADJ">sorrowful </w>     <w c5="NN1" hw="city" pos="SUBST">city</w>     <c c5="PUN">.</c>    </s>   ...   <s n="2083">     <w c5="VVB" hw="abandon" pos="VERB">Abandon </w>     <w c5="DT0" hw="all" pos="ADJ">all </w>     <w c5="NN1" hw="hope" pos="SUBST">hope</w>     <c c5="PUN">, </c>     <w c5="PNP" hw="you" pos="PRON">you </w>     <w c5="PNQ" hw="who" pos="PRON">who </w>     <w c5="VVB" hw="enter" pos="VERB">enter</w>     <c c5="PUN">…</c>    </s>    <bibl>     <s n="2084">      <w c5="NP0" hw="dante" pos="SUBST">Dante</w>     </s>    </bibl>   </p>  </quote>

Spoken paragraphs

As noted above, the <sp> element is used to mark parts of a written text which were or are intended to be spoken, for example the speeches in a dramatic text or a published interview. Such parts are generally readily identifiable by the use of such conventions as speaker prefixes (the label supplying the name of the speaker) and stage directions, for which the following additional elements are used:
  • <speaker> A specialized form of heading or label, giving the name of one or more speakers in a dramatic text or fragment.
  • <stage> (stage direction) contains any kind of stage direction within a dramatic text or fragment.

The <sp> element is used only for speech which is presented as such in a written text, by contrast with the element <u> discussed in section Utterances, which is used only for speaker turns identified in a spoken text, i.e. one which has been transcribed from audio tape.

If present, a <speaker> element will appear only at the start of the <sp> element, followed by one or more <p> elements containing the actual speech.

Here is an example of a stage direction occurring within a speech:
 <sp>   <p>    <s n="1115">     <w c5="CRD" hw="seven" pos="ADJ">Seven </w>     <w c5="NN2" hw="book" pos="SUBST">books </w>     <w c5="AT0" hw="a" pos="ART">a </w>     <w c5="NN1" hw="week" pos="SUBST">week</w>     <c c5="PUN">.</c>    </s>   </p>   <stage rend="it">    <s n="1119">     <w c5="PNP" hw="he" pos="PRON">He </w>     <w c5="VVZ" hw="dance" pos="VERB">dances</w>    </s>   </stage>   <p>    <s n="1122">     <w c5="NN1" hw="library" pos="SUBST">Library </w>     <w c5="NN2" hw="book" pos="SUBST">books</w>     <c c5="PUN">.</c>    </s>   </p>  </sp>
These elements appear frequently in formal transcriptions of written proceedings, notably those parts of the BNC which are extracted from Hansard:
 <sp>   <p>    <s n="20468">     <w c5="DT0" hw="that" pos="ADJ">That </w>     <w c5="NN1" hw="millionaire" pos="SUBST">millionaire </w>     <w c5="NN1" hw="mammy" pos="SUBST">mammy</w>     <w c5="POS" hw="'s" pos="UNC">'s </w>     <w c5="NN1" hw="boy" pos="SUBST">boy </w>     <c c5="PUN">—</c>    </s>    <stage>     <s n="20469">      <w c5="NN1" hw="interruption" pos="SUBST">Interruption</w>     </s>    </stage>   </p>  </sp>  <sp>   <speaker>    <s n="20470">     <w c5="NP0" hw="mr." pos="SUBST">Mr. </w>     <w c5="NP0" hw="speaker" pos="SUBST">Speaker</w>    </s>   </speaker>   <p>    <s n="20471">     <w c5="NN1-VVB" hw="order" pos="SUBST">Order</w>     <c c5="PUN">.</c>    </s>    <s n="20472">     <w c5="DT0" hw="that" pos="ADJ">That </w>     <w c5="VBZ" hw="be" pos="VERB">is </w>     <w c5="XX0" hw="not" pos="ADV">not </w>     <w c5="AV0" hw="wholly" pos="ADV">wholly </w>     <w c5="AJ0" hw="unparliamentary" pos="ADJ">unparliamentary</w>     <c c5="PUN">.</c>    </s>   </p>  </sp>


Poetry is distinguished from prose in the BNC where it is so presented in the original, for example as fragments of verse or song may appearing within or between paragraphs of prose. The <l> (line) element is used to mark each verse line; where there are several such lines, perhaps with a heading, they are grouped together using the <lg> (linegroup) element, and any title or heading present is marked with a <head> element.

For example:
 <lg>   <l>    <s n="906">     <w c5="PNP" hw="i" pos="PRON">I </w>     <w c5="VVB" hw="send" pos="VERB">send </w>     <w c5="DPS" hw="i" pos="PRON">my </w>     <w c5="NN1" hw="soul" pos="SUBST">soul </w>     <w c5="PRP" hw="through" pos="PREP">through </w>     <w c5="NN1" hw="time" pos="SUBST">time </w>     <w c5="CJC" hw="and" pos="CONJ">and </w>     <w c5="NN1-VVB" hw="space" pos="SUBST">space </w>     <w c5="TO0" hw="to" pos="PREP">to </w>     <w c5="VVI" hw="greet" pos="VERB">greet </w>     <w c5="PNP" hw="you" pos="PRON">you</w>     <c c5="PUN">.</c>    </s>   </l>   <l>    <s n="907">     <w c5="PNP" hw="you" pos="PRON">You </w>     <w c5="VBD" hw="be" pos="VERB">were </w>     <w c5="AT0" hw="a" pos="ART">a </w>     <w c5="NN1" hw="poet" pos="SUBST">poet</w>     <c c5="PUN">.</c>    </s>    <s n="908">     <w c5="PNP" hw="you" pos="PRON">You </w>     <w c5="VM0" hw="will" pos="VERB">will </w>     <w c5="VVI" hw="understand" pos="VERB">understand</w>     <c c5="PUN">.</c>    </s>   </l>  </lg>

Note that the <l> element is not used to mark typographic lineation. Layout information is not, in general, preserved in the BNC.


A list is a collection of distinct items flagged as such by special layout in written texts, often functioning as a single syntactic unit. Lists may appear within or between paragraphs. Where marked, lists are tagged with the <list> element, which may contain the following subelements:
  • <head> (heading) contains any type of heading, for example the title of a section or a poem.
    Legal values are:
    a major heading.
    any sub-heading.
    a sub-heading providing the name of a journalist or other source of a newspaper report.
    (In addition to global attributes and those inherited from [att.rendered ] )
  • <label> contains the label associated with an item in a list; in glossaries, marks the term being defined.
  • <item> contains one component of a list.

A <list> element consists of an optional <head> element, followed by one or more <item> elements, each of which may optionally be preceded by a <label> element, used to hold the identifier or tag sometimes attached to a list item, for example ‘(a)’. It may also contain a word or phrase used for a similar purpose.

The <item> element may appear only inside lists. It contains the same mixture of elements as a paragraph, and may thus contain one or more nested lists. It may also contains a series of paragraphs, each marked with a <p> element.

Here is an example of a simple list:
 <list>   <item>    <s n="87">     <w c5="VBZ" hw="be" pos="VERB">Is </w>     <w c5="DPS" hw="you" pos="PRON">your </w>     <w c5="NN1" hw="nylon" pos="SUBST">nylon </w>     <hi rend="it">      <w c5="NN1" hw="nightie" pos="SUBST">nightie </w>     </hi>     <w c5="AJ0" hw="fireproof" pos="ADJ">fireproof</w>     <c c5="PUN">?</c>    </s>   </item>   <item>    <s n="88">     <w c5="AT0" hw="the" pos="ART">The </w>     <w c5="NN1" hw="hurricane" pos="SUBST">hurricane </w>     <w c5="VBD" hw="be" pos="VERB">was </w>     <hi rend="it">      <w c5="AV0" hw="mighty" pos="ADV">mighty </w>     </hi>     <w c5="AJ0" hw="fierce" pos="ADJ">fierce</w>     <c c5="PUN">.</c>    </s>   </item>   <item>    <s n="89">     <w c5="VM0" hw="will" pos="VERB">Will </w>     <w c5="PNP" hw="you" pos="PRON">you </w>     <hi rend="it">      <w c5="VVI" hw="mow" pos="VERB">mow </w>     </hi>     <w c5="AT0" hw="the" pos="ART">the </w>     <w c5="NN1" hw="lawn" pos="SUBST">lawn</w>     <c c5="PUN">?</c>    </s>   </item>   <item>    <s n="90">     <w c5="VDD" hw="do" pos="VERB">Did </w>     <w c5="PNP" hw="you" pos="PRON">you </w>     <hi rend="it">      <w c5="VVI" hw="know" pos="VERB">know      </w>     </hi>     <w c5="AT0" hw="the" pos="ART">the </w>     <w c5="NN1" hw="time" pos="SUBST">time</w>     <c c5="PUN">?</c>    </s>   </item>  </list>
Here is an example of a labelled list:
 <list>   <label>    <s n="424">     <w c5="CRD" hw="1" pos="ADJ">1</w>     <c c5="PUN">.</c>    </s>   </label>   <item>    <s n="425">     <w c5="NN1-NP0" hw="surya" pos="SUBST">Surya </w>     <c c5="PUN">— </c>     <w c5="NN1" hw="sun" pos="SUBST">Sun </w>     <c c5="PUN">— </c>     <w c5="AJ0" hw="creative" pos="ADJ">Creative </w>     <w c5="NN1" hw="agent" pos="SUBST">agent</w>    </s>   </item>   <label>    <s n="426">     <w c5="CRD" hw="2" pos="ADJ">2</w>     <c c5="PUN">.</c>    </s>   </label>   <item>    <p>     <s n="427">      <w c5="NN1-NP0" hw="vayu" pos="SUBST">Vayu </w>      <c c5="PUN">— </c>      <w c5="NN1" hw="air" pos="SUBST">Air </w>      <c c5="PUN">— </c>      <w c5="VVG-AJ0" hw="preserve" pos="VERB">Preserving </w>      <w c5="NN1" hw="agent" pos="SUBST">agent </w>      <pb n="43"/>     </s>    </p>   </item>   <label>    <s n="428">     <w c5="CRD" hw="3" pos="ADJ">3</w>     <c c5="PUN">.</c>    </s>   </label>   <item>    <p>     <s n="429">      <w c5="NN2" hw="agni" pos="SUBST">Agni </w>      <c c5="PUN">—      </c>      <w c5="NN1-VVB" hw="fire" pos="SUBST">Fire </w>      <c c5="PUN">—      </c>      <w c5="AJ0" hw="destructive" pos="ADJ">Destructive </w>      <w c5="NN1" hw="agent" pos="SUBST">agent</w>     </s>    </p>   </item>  </list>

Notes and citations

Annotations occurring in written texts, and bibliographic citations or references, have been marked up in some texts, using the <note> element.

Original notes may contain any mixture of other chunks, and may also contain paragraphs: they appear in written texts only. They may be relocated to the end of the section in which they appear.

For example:
 <note place="SIDE">   <s n="477">    <w c5="AT0" hw="the" pos="ART">The </w>    <w c5="AJ0-NN1" hw="short" pos="ADJ">short </w>    <w c5="VBZ" hw="be" pos="VERB">is </w>    <w c5="AT0" hw="a" pos="ART">a </w>    <w c5="NN1" hw="film" pos="SUBST">film </w>    <w c5="PRP" hw="about" pos="PREP">about </w>    <w c5="NN1-VVG" hw="sailing" pos="SUBST">sailing</w>    <c c5="PUN">.</c>   </s>...</note> <!-- A6C 476-->  <s n="56">   <w c5="PNP" hw="i" pos="PRON">I </w>   <w c5="VVB" hw="expect" pos="VERB">expect </w>   <w c5="NN1-VVB" hw="demand" pos="SUBST">demand </w>   <w c5="PRP" hw="for" pos="PREP">for </w>   <w c5="DT0" hw="this" pos="ADJ">this </w>   <w c5="NN1" hw="service" pos="SUBST">service </w>   <w c5="TO0" hw="to" pos="PREP">to </w>   <w c5="VVI" hw="continue" pos="VERB">continue </w>   <w c5="TO0" hw="to" pos="PREP">to </w>   <w c5="VVI" hw="grow" pos="VERB">grow </w>   <w c5="PRP-AVP" hw="over" pos="PREP">over </w>   <w c5="AT0" hw="the" pos="ART">the </w>   <w c5="AJ0" hw="coming" pos="ADJ">coming </w>   <w c5="NN1" hw="year" pos="SUBST">year</w>   <c c5="PUN">.</c>  </s>  <note n="2">   <s n="57">    <w c5="NN1" hw="aids" pos="SUBST">AIDS </w>    <w c5="NN2" hw="death" pos="SUBST">deaths</w>    <c c5="PUN">: </c>    <w c5="NP0" hw="april" pos="SUBST">April </w>    <w c5="CRD" hw="1990" pos="ADJ">1990 </w>    <c c5="PUN">— </c>    <w c5="NP0" hw="march" pos="SUBST">March </w>    <w c5="CRD" hw="1991" pos="ADJ">1991</w>    <c c5="PUN">, </c>    <w c5="NP0" hw="uk" pos="SUBST">UK </w>    <w c5="NN1" hw="total" pos="SUBST">total </w>    <c c5="PUL">(</c>    <w c5="NP0" hw="cdsc" pos="SUBST">CDSC </w>    <w c5="NN2-VVZ" hw="figure" pos="SUBST">figures </w>    <c c5="PUN">— </c>    <w c5="CRD" hw="584" pos="ADJ">584 </w>    <w c5="NP0" hw="april" pos="SUBST">April </w>    <w c5="CRD" hw="1991" pos="ADJ">1991</w>    <c c5="PUN">.</c>    <c c5="PUR">)</c>   </s>   <s n="58">    <w c5="DPS" hw="we" pos="PRON">Our </w>    <w c5="NN1" hw="home" pos="SUBST">Home </w>    <w c5="NN1-VVB" hw="care" pos="SUBST">Care </w>    <w c5="NN2" hw="team" pos="SUBST">teams </w>    <w c5="VVD" hw="see" pos="VERB">saw </w>    <w c5="CRD" hw="141" pos="ADJ">141 </w>    <w c5="NN2" hw="aid" pos="SUBST">AIDS </w>    <w c5="AJ0-VVD" hw="related" pos="ADJ">related </w>    <w c5="NN2" hw="death" pos="SUBST">deaths </w>    <w c5="ORD" hw="last" pos="ADJ">last </w>    <w c5="NN1" hw="year" pos="SUBST">year</w>   </s>  </note>  <p>   <s n="59">    <w c5="PRP" hw="in" pos="PREP">In </w>    <w c5="NP0" hw="scotland" pos="SUBST">Scotland </w>    <w c5="AV0" hw="rapidly" pos="ADV">rapidly </w>    <w c5="AJ0-VVG" hw="growing" pos="ADJ">growing </w>    <w c5="NN2" hw="number" pos="SUBST">numbers </w>    <w c5="PRF" hw="of" pos="PREP">of </w>    <w c5="AJ0" hw="ill" pos="ADJ">ill </w>    <w c5="NN2" hw="man" pos="SUBST">men</w>    <c c5="PUN">, </c>    <w c5="NN2" hw="woman" pos="SUBST">women </w>    <w c5="CJC" hw="and" pos="CONJ">and </w>    <w c5="NN2" hw="child" pos="SUBST">children </w>    <w c5="PRP" hw="at" pos="PREP">at </w>    <w c5="NN1" hw="home" pos="SUBST">home </w>    <w c5="VHB" hw="have" pos="VERB">have </w>    <w c5="VVN" hw="stretch" pos="VERB">stretched </w>    <w c5="DPS" hw="we" pos="PRON">our </w>    <w c5="NP0" hw="edinburgh" pos="SUBST">Edinburgh </w>    <w c5="CJC" hw="and" pos="CONJ">and </w>    <w c5="NP0" hw="dundee" pos="SUBST">Dundee </w>    <w c5="NN2" hw="team" pos="SUBST">teams </w>    <w c5="PRP" hw="to" pos="PREP">to </w>    <w c5="AT0" hw="the" pos="ART">the </w>    <w c5="NN1" hw="limit" pos="SUBST">limit</w>    <c c5="PUN">.</c>   </s>  </p>
Note the use of the n attribute to carry the original footnote number in the above example.

Bibliographic references

Bibliographic citations or references within running texts may also be marked, using the <bibl> element; this is done in some texts only in the present version of the corpus.

For example:
 <quote>   <p rend="it">    <s n="1943">     <w c5="NN1" hw="zombie" pos="SUBST">Zombie </w>     <w c5="AT0" hw="no" pos="ART">no </w>     <w c5="NN1" hw="go" pos="SUBST">go </w>     <w c5="CJS" hw="unless" pos="CONJ">unless </w>     <w c5="PNP" hw="you" pos="PRON">you </w>     <w c5="VVB" hw="tell" pos="VERB">tell </w>     <w c5="VVB-NN1" hw="im" pos="VERB">im </w>     <w c5="TO0" hw="to" pos="PREP">to </w>     <w c5="VVI" hw="go" pos="VERB">go</w>    </s>    <bibl>     <s n="1944">      <w c5="AT0" hw="the" pos="ART">The </w>      <w c5="NP0" hw="communards" pos="SUBST">Communards</w>      <c c5="PUN">.</c>     </s>    </bibl>   </p>  </quote>

Note that the <bibl> element used within corpus texts has none of the more detailed sub-elements described for it in Structured bibliographic record. Like all the other elements described in the present subsection, the <bibl> element appearing within corpus texts contains only <s> elements.

Phrase-level elements

Phrase-level elements are elements which cannot appear directly within a textual division, but must be contained by some other element. In practice, this means they will be contained within an <s> element. In addition to the <w>, <mw>, and <c> elements already discussed, only the following phrase-level elements appear within <s> elements in written texts:
  • <pb> (page break) marks the boundary between one page of a text and the next in a standard reference system.
    gives the number of the page beginning here
  • <hi> (highlighted) marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made.

Page breaks

Wherever possible, the original pagination and page numbering of the source text has been preserved. The <pb> element is used to mark the approximate position in the text at which each new page starts, and its n attribute supplies the number of the page.
 <l>   <s n="1403">    <c c5="PUN">— </c>    <w c5="CJC" hw="and" pos="CONJ">and </w>    <w c5="NN2" hw="creditor" pos="SUBST">creditors </w>    <w c5="VVB" hw="grow" pos="VERB">grow </w>    <w c5="AJ0" hw="cruel" pos="ADJ">cruel</w>    <c c5="PUN">,</c>   </s>  </l>  <l>   <pb n="75"/>  </l>  <l>   <s n="1404">    <w c5="AV0" hw="so" pos="ADV">so </w>    <w c5="PNP" hw="he" pos="PRON">he </w>    <w c5="VVZ" hw="bow" pos="VERB">bows </w>    <w c5="CJC" hw="and" pos="CONJ">and </w>    <w c5="NN2-VVZ" hw="scrape" pos="SUBST">scrapes</w>    <c c5="PUN">,</c>   </s>  </l>
Where several pages have been left out of a transcription, for example because they are blank or contain illustrations only, a <pb> element may be given for each, as in this example:
 <s n="1323">   <w c5="PNP" hw="i" pos="PRON">I </w>   <w c5="VHB" hw="have" pos="VERB">have</w>   <w c5="XX0" hw="not" pos="ADV">n't </w>   <w c5="VBN" hw="be" pos="VERB">been </w>   <w c5="PRP" hw="to" pos="PREP">to </w>   <w c5="AT0" hw="an" pos="ART">an </w>   <w c5="AJ0" hw="organized" pos="ADJ">organized </w>   <w c5="NN1" hw="campsite" pos="SUBST">campsite </w>   <w c5="PRP" hw="for" pos="PREP">for </w>   <pb n="64"/>   <pb n="65"/>   <pb n="66"/>   <w c5="AV0" hw="perhaps" pos="ADV">perhaps </w>   <w c5="CRD" hw="fifteen" pos="ADJ">fifteen </w>   <w c5="NN2" hw="year" pos="SUBST">years</w>   <c c5="PUN">, </c>   <w c5="AV0" hw="so" pos="ADV">so </w>   <w c5="DT0" hw="all" pos="ADJ">all </w>   <w c5="DT0" hw="this" pos="ADJ">this </w>   <w c5="VBZ" hw="be" pos="VERB">is </w>   <w c5="AJ0" hw="new" pos="ADJ">new </w>   <w c5="PRP" hw="to" pos="PREP">to </w>   <w c5="PNP" hw="i" pos="PRON">me</w>   <c c5="PUN">.</c>  </s>

Highlighted phrases

Typographic changes or highlighting in the original may not be marked in the transcript at all. Alternatively, highlighted phrases, and the kind of highlighting used, may be recorded in one of two ways:
  • using the rend (rendition) attribute on elements for which this is defined
  • using the <hi> (highlighted) element

The former is used where the whole of the content of one of the elements <bibl>, <corr>, <div>, <head>, <item>, <l>, <label>, <list>, <p>, <quote> or <stage>function is highlighted. The latter is used on all other occasions. The values available for the rend attribute in either case and their significance are as listed above in either case.

It should be noted that the purpose of the rend attribute is not to provide information adequate to the needs of a typesetter, but simply to record some qualitative information about the original.

Like all other phrase-level elements, each <hi> element must be entirely contained by an <s> element. This implies that where, for example, a bolded passage contains more than one sentence, or an italicised phrase begins in one verse line and ends in another, the <hi> element must be closed at the end of the enclosing element, and then re-opened within the next.

For example, in the following four lines of verse, the first three are rendered in italics, and the rend attribute is therefore specified for each <l> element. In the fourth line, only the first few words are in italics: a <hi> element must be used within the <l> to carry this information.
 <l rend="it">   <s n="394">    <w c5="PNP" hw="it" pos="PRON">It </w>    <w c5="VBD" hw="be" pos="VERB">was </w>    <w c5="CRD" hw="one" pos="ADJ">one </w>    <w c5="PRF" hw="of" pos="PREP">of </w>    <w c5="AT0" hw="a" pos="ART">a </w>    <w c5="NN0" hw="pair" pos="SUBST">pair</w>    <c c5="PUN">.</c>   </s>   <s n="395">    <w c5="DPS" hw="it" pos="PRON">Its </w>    <w c5="AJ0" hw="precious" pos="ADJ">precious </w>    <w c5="NN1" hw="twin" pos="SUBST">twin</w>   </s>  </l>  <l rend="it">   <s n="396">    <w c5="VBD" hw="be" pos="VERB">was </w>    <w c5="VVN" hw="steal" pos="VERB">stolen </w>    <w c5="PRP" hw="by" pos="PREP">by </w>    <w c5="AT0" hw="the" pos="ART">the </w>    <w c5="NN2" hw="soldier" pos="SUBST">soldiers</w>    <c c5="PUN">.</c>   </s>   <s n="397">    <w c5="DT0" hw="all" pos="ADJ">All </w>    <w c5="AT0" hw="the" pos="ART">the </w>    <w c5="NN1" hw="time" pos="SUBST">time</w>   </s>  </l>  <l rend="it">   <s n="398">    <w c5="DPS" hw="she" pos="PRON">her </w>    <w c5="NN1" hw="uncle" pos="SUBST">uncle </w>    <w c5="VVD" hw="stand" pos="VERB">stood </w>    <w c5="AV0" hw="there" pos="ADV">there </w>    <w c5="VVG" hw="clutch" pos="VERB">clutching </w>    <w c5="DT0" hw="this" pos="ADJ">this </w>    <w c5="PNI" hw="one" pos="PRON">one </w>    <w c5="AVP-PRP" hw="in" pos="ADV">in</w>   </s>  </l>  <l>   <s n="399">    <hi rend="it">     <w c5="DPS" hw="he" pos="PRON">his </w>     <w c5="AJ0" hw="big" pos="ADJ">big </w>     <w c5="NN1" hw="fist" pos="SUBST">fist </w>    </hi>    <c c5="PUN">— </c>    <w c5="AV0" hw="so" pos="ADV">so</w>    <c c5="PUN">!</c>   </s>   <s n="400">    <w c5="PNP" hw="she" pos="PRON">She </w>    <w c5="VDZ" hw="do" pos="VERB">does </w>    <w c5="AT0" hw="a" pos="ART">a </w>    <w c5="AJ0" hw="little" pos="ADJ">little </w>    <w c5="NN1" hw="mime" pos="SUBST">mime</w>    <c c5="PUN">.</c>   </s>  </l>

Up: Contents Previous: Basic structure Next: Spoken texts