An essential prerequisite for almost any kind of corpus building activity is to ensure that appropriate rights are negotiated to permit inclusion of written and spoken materials owned by other people and agencies.
The success of the BNC in obtaining such rights was no doubt partly a result of having good contacts with the British publishing industry, as well as being in some sense a nationally funded initiative. However, some part of its success may also be attributed to the care with which rights holders were approached, in particular the project's desire to make clear that their rights would be respected.
We reproduce here copies of the standard letters used to approach rights holders and the draft agreements we proposed to them, in the hope that these may be of use to other corpus builders.
Date Permissions Department Dear
The British National Corpus is a major collaborative venture, the goal of which is the creation of a corpus of 100 million words of contemporary spoken and written British English. A corpus is a large collection of texts held on computer and designed to provide the raw data on which an empirical study of language can be based. The participants in the project are OUP, which leads the consortium, Longman Group UK Ltd, W & R Chambers, the British Library and the universities of Oxford (OUCS) and Lancaster. The project is to run for just over three years and it started in January 1991. The work is supported by a substantial grant from the Department of Trade and Industry and the Science and Engineering Research Council, which approved the project under their joint Advanced Technology Programme. The enclosed leaflet provides further background information about the project and its aims.
We have selected a number of texts which we would like to include in the Corpus to build a broadly representative sample of modern English. The attached sheet gives formal details of the permission we are requesting.
We hope you will be able to grant us this permission and in so doing aid the development of an important national resource. If you agree to the inclusion of the texts in the British National Corpus, could you please sign the copy of the "Permissions Request" and return it in the enclosed envelope.
We have used our best endeavours to establish that you control the rights in the title in question; if you do not, we would be most grateful if you could let us know who does.
Simon Murison-Bowie BNC Project Director Direct line: 0865 267618 EMail: SMBOWIE@VAX.OX.AC.UK
Thank you very much for agreeing to take part in this project. The study is being carried out by the British Market Research Bureau. an independent market research company and a member of AMSO, the Association of Market Survey Organisations, on behalf of Longman, a major publisher of dictionaries. Longman are participating in a government funded research project to compile a national treasury of the English language. For the first time, dictionary writers and language researchers will be able to show how words are used in ordinary, everyday conversation. We are asking a large cross-section of people around the country to help with this task by recording their own conversations. These will then be transcribed on computer and built into a database which will contain several million words, and will be used for scientific study and publication by writers of dictionaries and educational materials and language researchers. The tapes and conversation details will all be completely anonymous. So no-one will know who has used the words, or whose voices are on the tapes, but together they will provide a permanent record of how the English language was spoken in the 1990s.
What we would like you to do is to record all your conversations with other people (except telephone conversations) over a period, using the personal stereo provided. You will also need to write down some details of all conversations you have in the yellow booklet provided. The notes attached explain in detail how you should go about this. Please read these notes carefully before you begin to record your conversations. The notes remind you what to do over the two to seven-days of recording and tell you about changing tapes and batteries, operating the personal stereo, filling in details about conversations etc.
If you have any problems with recording or filling in the booklet, which these notes can't help you with, then you can ring either Bruce Hayward or Laura Fargher at BMRB on 081-567- 3060, who will be able to help you.
As a token of our appreciation for your help in the project, and in consideration of the rights you have granted, you will receive a £25 gift voucher, which the interviewer will bring with her when she collects the materials on her second visit.
We hope you enjoy taking part in this project, which should provide a lot of important information about our language. Thank you once again for the time and effort you are putting into helping us, and good luck with your recording.
BRUCE HAYWARD SENIOR RESEARCH EXECUTIVE LONGMAN GROUP LJK LTD Burnt Mill Harlow Essex CM20 2JE Tel 0279 623463 Fax 0279431059
This is to confirm to the BNC that I agree to take part in the British National Corpus and that I give permission for all tape recordings and conversation details to be used as explained to me by the British Market Research Bureau and as confirmed in this letter, the accompanying letter, and Recording Guidelines, which I understand and accept.
I understand that all tapes and conversation details will be completely anonymous, and will be used for scientific study and publication by writers of dictionaries and educational material and language researchers. All rights of any description in recorded conversations and conversation details will belong to the British National Corpus.
Name (in block capitals)
21 May 1991
Permissions Department/Syndicates Department
We wish to include a sample of the above text in the British National Corpus, which is described in the attached leaflet.
The British National Corpus is destined to be a resource for use by researchers worldwide for many years. It is important therefore that a text is included on the basis of the granting of permission throughout the world for the duration of the legal term of copyright.
If you control world rights for the legal term of copyright and are prepared to grant us the permission we seek, then please sign the copy of this letter and return it to me.
If your control of rights is more limited we would be grateful if you would grant us such rights as you are able, ticking the appropriate box/es below and providing the further information requested:
The permission we seek is for gratis non-exclusive use throughout the world for the duration of copyright in the English language:
The terms and conditions under which we wish to include the text sample are as follows:
The text sample, along with 7-8000 other such samples, will be incorporated into the British National Corpus of 100 million words. The Corpus will be processed by computer to bring it into a common format at Oxford University Computing Services and have grammatical tags attached to each of the words by Lancaster University. Later this processed version of the Corpus may be distributed in electronic form only, at cost of duplication and distribution only, to researchers who request the data for use as a pre-competitive resource for research in speech and language technology. These researchers maybe working in academic, governmental or commercial organizations.
All such recipients of the British National Corpus will enter into an End User Licence agreement, limiting the uses that may be made of the data, a copy of which is attached.
A register will be maintained of the names and addresses of those who receive a copy of the Corpus and who take responsibility for ensuring that it is used in accordance with the user agreement. The register will be available on request for inspection by you.
In preparing this request and the accompanying documents we have consulted fully with 'I'he Publishers' Association, The Society of Authors, The Writers' Guild of Great Britain and 'l'he Association of Authors' Agents, and they have agreed to lend their support to this initiative.
We hope you will be able to grant us this permission and in so doing aid the development of an important national resource. All contributions will of course be suitably acknowledged.
Ginny Frewer Project Assistant, British National Corpus attachments: British National Corpus: leaflet British National Corpus: End User Licence British National Corpus: Some questions answered
No. It is a pre-competitive research project, funded partly by the publishers in the consortium and partly by the Government. The Government, the Universities and the British Library support the project because it is expected to make a major contribution to computerized research into the English language. The dictionary publishers invest in it because they each have a particular interest in resources and tools for studying language: meanings, usages, idioms, grammar and so on. The Corpus will help everybody involved in the language industries in understanding how the language works, and this in turn will result in better dictionaries, grammars and teaching materials and help with the development of computer systems for machine translation and other types of natural language processing. When complete, the Corpus will be made available to organizations involved in the language industries and the only charge made will be to cover the costs of distribution.
First, the End User Licence states that the Licensee "must not ... copy, publish or otherwise give to any third party access to the whole or any part of the BNC Processed Material" (Clause 2c). Secondly, we will not store any text in its entirety (see Permission Request terms I and 2). Nevertheless, the publishers in the consortium clearly recognize the dangers inherent in electro-copying and are as concerned as you that the Corpus should not allow the abuse of copyright texts. A text sample, by being included in the BNC, loses none of the protection of copyright law.
The End User licence strictly controls the use of the Corpus and the text samples it contains. The right of reproduction of the individual original text samples by any means is explicitly forbidden. None of the text samples in their original form will be incorporated into any product. Quotations from text samples will be strictly limited by the fair dealing provisions of copyright law. It is the Corpus as whole, as a window on the whole language, together with its grammatical and semantic tagging, that is of interest for language research.
The uses to which the Corpus will be put will typically include the following: the compilation of a dictionary including information about words and meanings which has been derived to some extent from observation of words in the Corpus; the development of machine translation software incorporating knowledge about English grammar derived from the Corpus; the preparation of statistical information (ea. frequency) about words in the Corpus.
No fee is offered for the following reasons: - the BNC is a not-for-profit research project each individual text forms no more than 0.04% of the whole corpus; text samples are only included as representations of language in use.