New edition of BNC Baby available
We are pleased to announce that the new, third edition of the BNC Baby CD is now available. The CD contains three corpora and the latest version of the Xaira software. The corpora are in XML format and can be used with the Xaira software or any other tool that can handle XML. The corpora are:
- BNC Baby
- Four one-million-word genre-based subsets (academic, fiction, newspaper and conversation). New: Now with added lemma information and additional, simplified POS-tags for each word.
- BNC Sampler
- Two million words, 50% spoken, sampled to be similar to the full corpus. Annotated with detailed part-of-speech information that has been manually checked and edited
- Brown corpus
- A corpus classic. One million words of written American English from 1961