But I don't do Windows!
BNC-XML can also be installed on non-Windows platforms. The corpus itself is in a non-proprietary format, and the Xaira indexes supplied are platform-independent. These notes describe briefly how to install the corpus from the supplied media, and how to set up a Xaira server on a Linux or OS X platform. Options for installation on non-Windows platforms are also provided in the file Doc/dontDoWindows.txt on the first DVD.
The corpus files occupy 4 Gb unpacked; the index files about 5 Gb.
Read the licence
Whatever platform you intend to install the BNC on, you are still required to observe the terms of the Licence you purchased when you obtained your copy of the Corpus. The BNC data is protected by United Kingdom copyright laws, international treaty provisions, and other applicable national laws. The owners of the copyright retain all rights not expressly granted. By making use of the BNC Data on your system you indicate your acceptance of the terms and conditions relating to the British National Corpus, stated below.
The BNC Licence constitutes a legal agreement between You ("the Licensee") and the British National Corpus Consortium ("the Licensor"), as legal owner of the British National Corpus, which is a UK-registered trademark. Oxford University Computing Services ("the Distributor") has been appointed sole licensed Distributor of the British National Corpus and is authorised to collect and enforce such agreements.
The full text of the Licence Agreement is available from http://www.natcorp.ox.ac.uk/getting/licence.html. If you do not accept these terms and conditions, you must delete the Data from your system and dispose of the media.
Unpack the corpus
The corpus is supplied as a GZIPped TAR archive on the DVD called
texts.tar.gz on BNC-XML-1. We recommend unpacking it into
a directory called Texts. Assuming you have mounted this
disk at /media/cdrom, open a shell window and type
$mkdir Texts
$cd Texts
$gunzip -xvf /media/cdrom/texts.tar.gz
This should produce 10 subdirectories, A to K (there is no I, to avoid confusing it with J), each of which contains a varying number of sub-subdirectories, within which are the actual corpus files, as further described in the User Reference Guide.
Get the documentation
The directory Doc on BNC-XML-1 contains a copy of the user reference guide, which you will probably want to copy to your system as well. The directory Doc/Src contains the XML source, which will probably interest you less than the HTML produced from it which is in the directory Doc/HTML.
If you plan to process the XML source of the Corpora in any way, you'll probably want to copy the directory XML from BNC-XML-1 as well. This contains schema files, expressed in DTD, RelaxNG, and WSD, and also a folder called Scripts, which contains the sample XSLT scripts mentioned in the documentation.
Install Xaira indexes
If you want to set up a Xaira server you will need to install Xaira Indexes, and (possibly) build the Xaira server software. We don't describe the latter in much detail here: refer to the documentation on the Xaira website at http://www.xaira.org
You can create this structure in two steps. First, assuming you
created the root directory BNC-XML and have already unpacked the Texts
directory into it, unpacking the file bnc-etc.tar from the disk
BNC-XML-1 will create the Etc directory and the three XML files for
you:
$cd /BNC-XML
$tar -xf /media/cdrom/bnc-etc.tar
- with the first DVD in the drive, create the Index directory and
copy files from the first DVD:
$cd /BNC-XML $mkdir Index $cp /media/cdrom0/Index/* Index
- unmount the first DVD and mount the second DVD: it should appear as BNC-XML-2
- copy the remaining index files from the directory Index2 on the
second DVD:
$cp /media/cdrom0/Index/* Index
At the end of this process, if all goes well, your Index folder
should contain 17 files, including four big files xdblocs, xdblocs1
and
xdblocs2
copied from the second DVD.
Run XAIRA
Information about the XAIRA software system is available from the project's website at http://www.xaira.org and the software itself may be downloaded from SourceForge. We do not describe here in any detail how to install or run the Xaira software on non-Windows platforms, but the following brief notes may help you get started.
- If you are running a Debian-based system such as Ubuntu, you should be able to install xaira from binaries, using the packages maintained at http://tei.oucs.ox.ac.uk
- Alternatively, building from source is not hard: download the source code from SourceForge, configure, and make, as usual. But note that you also need to install and build ICU and xercesc (and possibly also some other libraries such as GNU readline and zlib)
- Once you have successfully built the software, you can check your
installation of BNC-XML by running the
command-line solve client within your BNC-XML directory :
cd /BNC-XML solve -p corpus_parameters.xml
expects you to type Xaira queries at the prompt and returns a stream of hits as XML: try for example fishcake to see the one occurrence of "fishcake" in the BNC. Do not try this with a high frequency word! - If you have PHP installed, you can use the Xaira library as a back end to access your copy of BNC-XML over the web. Information on how to configure PHP and demonstration scripts are included in the source release and documented on the website.
- Use the
xaira_daemon
command to start a xaira server:cd /BNC-XML xaira_daemon -p corpus_parameters.xml -l 20200
will start a server running on port 20200. Firewalls permitting, you should be able to point Xaira clients, such as those provided by the Windows release, at it.
Up: Contents Previous: Installing on PC Next: Installation Problems