
Making the BNC was a joint effort of a large number of participants; organisations and individuals. It comprised two main stages: the planning (design stage) and the execution (creation stage) as described further below.
The BNC project started with a careful planning stage where the design principles for the corpus were drawn up. These established a number of selection criteria which were then used for identifying suitable texts to be included in the corpus. In addition to the selection criteria for the written and spoken components, a large number of classification features were identified for the texts in the corpus.
Texts were selected for inclusion in the corpus according to three independent selection criteria: domain, time, and medium. Target proportions were defined for each of these criteria, as listed below.
There are two parts to the 10-million word spoken corpus: a demographic part, containing transcriptions of spontaneous natural conversations made by members of the public and a context-governed part, containing transcriptions of recordings made at specific types of meeting and event.
All the original recordings transcribed for inclusion in the BNC have been deposited at the National Sound Archives of the British Library.
A total of 124 volunteers were recruited by the British Market Research Bureau. The volunteers came from four social groupings (AB, C1, C2, and DE). There were male and female volunteers from a wide range of ages, and they lived at 38 different locations across the UK. Recruits were chosen in such a way as to make sure there were equal numbers of men and women, approximately equal numbers from each age group, and equal numbers from each social grouping.
Recruits used a personal stereo to record all their conversations unobtrusively over two or three days, and logged details of each conversation in a special notebook. Those who took part in the recordings were asked after the conversation to give permission for their speech to be included in the corpus.
Information about the participants, such as age, sex, accent, occupation, was recorded when available.
Up: Contents