Using the corpus
Yes. The corpus text files are made available in an open format called XML which can be processed by many different software tools. You can also use scripts, or write your own software to analyse the BNC. Please note that some desktop tools might struggle to cope with a corpus of this size.
The XML markup in the BNC text files provides information about the texts, the
speakers, and much more.
Some tools will allow you to see the texts without the mark-up. (more about mark-up). This is an XML
<s n="12"><w type="TO0" lemma="to">To
</w>< w type="VVI" lemma="illustrate">illustrate
</w><w type="AT0" lemma="the">the
</w><w type="NN1" lemma="paradigm">
paradigm</w><c type="PUN"> , </c><w
type="NN1" lemma="reference"> reference </w><w type="VBZ"
lemma="be"> is </w><w type="VVN" lemma="make"> made
</w><w type="PRP" lemma="to"> to
BNC XML Edition and BNC World are both versions of the whole British National Corpus, containing 100 million words. BNC XML Edition is the current version, BNC World the former one. The BNC Sampler is a 2 million word subset of the BNC World with equal proportions written and spoken material. BNC Baby consists of four subsets of BNC World: one million words each of spoken conversation, written fiction, newspapers and academic prose [more about the corpora].
I would like to access the sound recordings of the spoken part of the corpus. Where are they stored, and can I get access to them?
Some of the recordings from which the spoken parts of the BNC were transcribed are now stored at the The British Library Sound Archive. The AudioBNC project started in 2010 to work on aligning this sound data with the BNC transcripts. The corpus aligned with the digital audio can now be queried via an experimental BNCweb service at Lancaster University.
Please use this formula "Data cited herein has been extracted from the British National Corpus Online service, managed by Oxford University Computing Services on behalf of the BNC Consortium. All rights in the texts cited are reserved." More information available on the Copyright page