Text only | Skip links
Skip links||

BNC Products

The British National Corpus (BNC) Consortium was formed in 1990, and started work in 1991 on the three-year task of producing a hundred-million word corpus of modern British English for use in commercial and academic research. The full BNC contains about 100 million words: 90% written, 10% orthographically transcribed spoken text. The first edition was published in 1994. A slightly revised version, BNC World, was made available world-wide in 2001. In 2007, a third edition appeared, using XML. BNC XML Edition is the version currently distributed and supported. Two subsets of the BNC have been produced separately: BNC Sampler and BNC Baby.

The BNC corpora are distributed with a search tool called XAIRA, developed specially for the BNC, but also usable with other corpora in XML format.

BNC XML Edition

The full BNC contains about 100 million words: 90% written, 10% orthographically transcribed spoken text. It is annotated with word-class information (part-of-speech, simplified word class) and lemmatized. The texts also contain detailed metatextual information. It is delivered in XML format.

BNC XML Edition is distributed on two DVDs for installation on a stand-alone PC or on a Windows, Unix or OSX server. It is delivered with a copy of the XAIRA search program and all necessary XAIRA index files.

Full reference information about the BNC is provided in the Reference Guide for the British National Corpus (XML Edition). Information about the BNC project and the original creation of the corpus can be found at corpus creation page. To buy a copy of the corpus, follow the links to the How to order page.

BNC Baby

BNC Baby is a subset of the BNC. It consists of four one-million word samples, each compiled as an example of a particular genre: fiction, newspapers, academic writing and spoken conversation. The texts have the same annotation as the full corpus (part of speech, meta data, etc). The Reference Guide to BNC Baby [.pdf file] offers further information about this sample, such as a description of the design and information about the way in which it is encoded.

The BNC Baby is in XML format and can be searched with the XAIRA program (included on the CD). It is distributed on a CD together with the BNC Sampler and an XML version of the American English Brown corpus. More information about the CD is available on the BNC Baby CD page. The CD can be ordered online.

BNC Sampler

The BNC Sampler is a subset of the full BNC. It comprises two samples of written and spoken material of one million words each, compiled to mirror the composition of the full BNC as far as possible. The word-class annotation of the BNC Sampler texts has been carefully checked and manually corrected. The Sampler was first created at Lancaster University during the creation of the BNC. More information about the Sampler can be found in the users reference guide for the BNC Sampler: XML Edition [.pdf file]

The BNC Sampler is in XML format and can be searched with the XAIRA program (included on the CD). It is distributed on the BNC Baby CD together with the BNC Baby and an XML version of the American English Brown corpus. How to order

Brown Corpus

The Brown Corpus of Standard American English was created at the Brown university by by W. N. Francis and H. Kucera. It contains one million words of written American English, taken from publications from 1961. The texts are all abour 2,000 words long and grouped into 15 categories. More information about the content of the corpus can be found in the Brown Corpus Manual by Francis and Kucera, available on the ICAME webpage.

This version of the Brown corpus has word-class annotation and has been converted into XML and indexed to be used with theXAIRA program (included on the CD). It is distributed on the BNC Baby CD together with the BNC Baby and BNC Sampler corpora. How to order

Up: Contents


Style: Single file | Normal | PDF

Maintained by: BNC Webmaster (bnc-queries@rt.oucs.ox.ac.uk) January 2009. Lou Burnard.
© 2005, University of Oxford.