What can I do with the BNC?
The BNC is a corpus - a collection of samples of real life language, chosen to be as varied as possible in its coverage. It includes speech as well as a wide variety of different kinds of written language, all chosen from the same time.
The BNC is distributed in a format which makes possible almost any kind of computer-based research on the nature of the language. Obvious application areas include lexicography, natural language understanding (NLP) systems, and all branches of applied and theoretical linguistics.
The BNC material is made available under certain conditions, summarized in the BNC End User Licence (also available in pdf format. All rights in the texts are reserved. For further information, see the BNC copyright page
What's the plural of corpus? In what social situations is wicked a term of approval? Why does it "sound wrong" to say The good weather set in on Thursday although The bad weather set in on Thursday is perfectly acceptable? If I can say I live a stone's throw away from here , can I also say I'm going a stone's throw away from here?
Large language corpora can help provide answers for these kinds of questions -- if only because they encourage linguists, lexicographers, and all who work with language to ask them. The purpose of a language corpus is to provide language workers with evidence of how language is really used, evidence that can then be used to inform and substantiate individual theories about what words might or should mean. Traditional grammars and dictionaries tell us what a word ought to mean, but only experience can tell us what a word is used to mean. This is why dictionary publishers, grammar writers, language teachers, and developers of natural language processing software alike have been turning to corpus evidence as a means of extending and organizing that experience.
With the development of computing technology able to store and handle massive amounts of linguistic evidence, it has become possible to base linguistic judgment on something far greater and far more varied than any one individual's personal experience or intuitions. The British National Corpus (BNC) was created in order to offer that possibility to the widest variety of researchers, scholars, teachers, and language enthusiasts
- Reference Book Publishing
- Dictionaries, grammar books, teaching materials, usage guides, thesauri. Increasingly, publishers are referring to the use they make of corpus facilities: it's important to know how well their corpora are planned and constructed.
- Linguistic Research
- Raw data for studying lexis, syntax, morphology, semantics, discourse analysis, stylistics, sociolinguistics...
- Artificial Intelligence
- Extensive data test bed for program development.
- Natural language processing
- Taggers, parsers, natural language understanding programs, spell checking word lists...
- English Language Teaching
- Syllabus and materials design, classroom reference, independent learner research.