Introducing XAIRA briefly

This worksheet introduces you to some of the key features of the XAIRA software for interrogating the BNC XML edition (BNC-xml). To use the BNC Xml edition you need to install XAIRA and the BNC XML Edition on your computer. You can find instructions about how to do this on the BNC XML website.

1. Getting Started

Start up the XAIRA client by clicking on the bnc-xml.xcorpus icon. Close the big Xaira splash window to access the program window.

At the top of the screen you see the usual Windows menu items (File, Edit, Texts, View, Window, and, on the far right, Help). Immediately below that you can see a row of buttons which we call collectively the Toolbar,

The bottom of the screen contains a message area and a status bar, in which XAIRA posts useful information about its current state.

2. A phrase query

Click on the Phrase query button on the toolbar (it has the letter A on it).

Type pros and cons into the box and then press Enter.

The cursor will turn briefly into an hourglass while XAIRA searches, and then the Too Many Solutions alert will appear, telling you the result of the search: there are 168 occurrences of this phrase in 142 different texts.

Using the appropriate radio buttons, specify that you would like to download the initial 100 hits, then click on OK.

The solutions to the query will now be downloaded, one by one. According to the default settings, you will see EITHER

The first kind of display is known as a line mode display, the second as a page mode display. You can change the display mode by clicking on the page/line mode button on the toolbar.

This button toggles between displaying solutions one at a time and displaying solutions in one-per-line format. In either mode, you can scroll through the solutions using the PgDn and PgUp keys; in line mode you can also use the up and down arrow keys.

Change to Page mode.

If you look at the status bar, you will see that it now contains additional information. Reading from left to right, you should see something like the following: BNC BNC 1:100(88) A16 1812. This indicates that the name of the corpus being searched is BNC, that the lemmatization scheme in effect is called bnc, that the currently highlighted solution is number 1 of 100 chosen from 88 different texts, and that it comes from text A16 at sentence number 1812.

Select line-mode display and press the Down arrow key a few times.

Note the dashed lines surrounding what is known as the current solution. Information about this solution is displayed on the status bar at the bottom of the screen.

You can get further information about the current solution by clicking on the Bibliographic data button on the toolbar. This will tell you exactly what text the solution comes from. Click on OK to close the Source Description popup.

Now click the right mouse button.

A menu appears from which you can choose, inter alia, to

3. Sorting solutions

Switch to Line mode and Custom or Plain format. Find the Sort button on the toolbar (it has letters A and Z with an arrow beside them), and click on it.

A dialogue box appears in which you can specify how the concordance lines should be sorted. You can indicate how many words are to be considered when sorting using the Span window.

As Sort key, select the Left radio button and specify a 3 word span. Then press the Sort button.

The lines will be sorted by the words to the left of the hit, so phrases like "weigh up the pros and cons" will be grouped together. Scroll through the lines to see what the most common phrases containing the words "pros and cons" are.

Of course, this procedure has shown you what words occur before "pros and cons". To see what words occur after "pros and cons", you should now re-sort them by the right. In general, you should always sort solutions both by the left and the right to identify common patterns.

What are the most common patterns with "pros and cons"? Did you know them all?

If you want, you can use the Print option to print your sorted concordance. You are advised to choose "landscape" under the Print setup options in order to get a printout showing sufficient context. If you care about the environment, never try to print more than 100 solutions.

4. Setting Preferences

Before doing further queries, let us set some Preferences for the toolbar and display of results.

Under the View menu, first select Toolbars. Check all the options except Navigation, and click on OK. Then select the View menu again, followed by Preferences. Under User Preferences, check the following: Query, Concordance, Show Flybys, and set the Initial scope as 3. The System Corpus root should be where you plan to keep your corpora! Click on OK to close the User Preferences box. Then select the View menu again, followed by Corpus Preferences. Under Corpus preferences, in View select BNC, as Default partition and region select textMode and textOnly.

Preferences also allows you to vary the Font used to display solutions.

Your Preferences will be saved when you leave XAIRA

5. Word Query

XAIRA maintains a list of all the distinct word forms in the corpus, together with their frequencies and part of speech codes. We refer to this list as the lexicon. You can use the Word Query command to search the lexicon in a number of different ways. The Word Query Button looks like a small white box with a vertical yellow stripe.

Click on the WordQuery button. The Word Query dialogue box will be displayed. Make sure the Controls and Unique forms boxes are checked.

Click in the empty box at top left of the dialog box, and type in the string weigh. Then click on the LookUp button (or press Enter).

A list of all the word-forms in the lexicon which begin with the letters "weigh" is displayed in alphabetical order. The other columns show the frequency and the number of different forms grouped under that entry.

Click on the entry for weigh.

The different word-forms grouped under this entry are displayed in the lower window. You will see that weigh, while generally classified as a finite verb form (VVB) or an infinite verb form (VVI), also appears in the lexicon classified as EITHER a verb OR a noun (VVB-NN1; NN1-VVB). These are cases where the computer software performing the part-of-speech analysis was uncertain: the more probable part-of-speech is given first.

Select the VVI form (388 occurrences) and click on Query. Download a random 100 hits and scroll through the solutions. Not surprisingly, the most common word preceding the infinitive form weigh is clearly to.

6. Lemmata

You can make multiple selections from the windows in the Word Query dialog box by the usual Windows procedure of holding down the Control key while you select. So you could select "weigh", "weighed", "weighing" and "weighs" to show the occurrences of all of these. Word query allows you to select a lemmatisation scheme which groups different forms under a headword - for verbs, the infinitive; for nouns, the singular; the root form of adjectives.

Click on the Query Edit button on the toolbar (this looks like a tiny pencil writing on a blue-edged screen) to return to the Word Query dialogue box. Then click on the Lemmata tab in the Controls section. Select the BNC lemmatisation scheme, then click on Apply.

If you select "weigh VERB" you will see that all the different forms of the verb weigh are listed in the lower window. Altogether there are 2320 of them. Click on Query and download a random 10 solutions.

7. Collocations

What words follow the verb weigh? It's difficult to analyse more than 100 solutions by looking at them, even when you have sorted them. It would be impossible to analyse 2320 in this way. Xaira's Collocation facilities allow you to automatically detect patterns over large numbers of solutions.

Click on the Collocation button on the toolbar (it's one of the last ones in the first group, immediately to the left of the red-white-blue dice button). The Collocation dialogue box will be displayed. Make sure the Show controls box is checked.

Click on the Window tab, and set the window to Left 0, Right 3. (We want to find out what words follow forms of the verb weigh)

Click on Calculate, and wait while XAIRA calculates the collocates of the verb lemma weigh in this window.

The list shows the collocates in order of significance - a frequent collocate which is a rare word is more significant than a frequent collocate which is a common word. You can resort the list alphabetically, by pos, or by frequency, by clicking at the top of the column you want.

If you select a word from the list and click on Query, you will see a concordance of the occurrences of divide with that collocate. Try "heavily", and try to work out why it co-occurs with "weigh". What sorts of things "weigh heavily" in this concordance?

Close the "weigh heavily" solutions window and click on Collocation again to return to the list of collocates of weigh. Click on Frequency to sort the collocates by frequency instead of by significance (z-score), and then on Pos to sort them by part-of-speech. What is the most frequent adverb occurring with weigh? Which adverb has the highest z-score?

Select "up" in the list of collocates, click on Query, and see what sorts of things you can weigh up.

8. Introducing subcorpora

90% of the BNC is composed of written texts, so it is likely that the feature you have just noted are those most frequently employed in writing. Are the most common features in speech any different? You can investigate this question by changing from the full corpus to the subcorpus of spoken texts.

Towards the end of the toolbar you will see a small box displaying the word Text mode. Next to it is a box which is blank, meaning that you are currently using the full corpus (all Text modes).

Click on the little arrow at the right of this box and select Transcribed speech.

You have now activated just the subcorpus of spoken texts: the Status bar should now show the current corpus as BNC:Transcribed speech.

You can return to using the full corpus by clicking on the blue cross to the right of the Class box on the toolbar.

Now redo your Word Query for the lemma weigh. If you check the Pattern box, you may have to wait a little less before the list of matching forms appears, as this will exclude those lemmas which do not correspond to the pattern weigh.

This time, the results will regard only the spoken texts, and you should see that there are only 100 occurrences of the verb lemma weigh in these. Download these hits for the verb lemma weigh and look up its collocates in the window 0,3.

You will see that up is still the most significant collocate, but you may notice other changes.

Close all the open queries once you have completed this section.

9. Introducing the Query Builder

The query builder allows you to combine different queries - for example to find occurrences of "weigh" near "up", of "hello" as a reply to "good morning", or of "in fact" as the first words of a sentence.

In this tutorial we will look for occurrences of the verb lemma "weigh" followed by "in" within a span of 5 words.

Click on the QueryBuilder button on the toolbar (it is shaped like a T).

The QueryBuilder screen appears. You use this screen to define complex queries, each component of the query being represented as a node on this screen.

On the right-hand side of the screen you define what you want to look for, as one or more content nodes. The box on the right is currently red because you have not yet specified what you want to look for.

Click in the red node and select Word from the menu that appears.

This will display the Word Query dialog box.

Enter the string "weigh", check that the BNC lemmatisation scheme has been applied, and click on LookUp.

Select the verb lemma "weigh" and click on Query.

The red content node should now be black, containing the lemma "weigh".

At the bottom of the content node, you will see there is a little branch. Click on it.

A new empty content node will be displayed beneath the first content node.

Click in the new node and select Word to show the Word Query dialog box.

In the same way as before, select the noun lemma "in" and click on Query to insert it in the second content node.

You have now defined "weigh" followed by "in" as the content of the query. You now need to indicate where this content must be found - in this case, the maximum distance between the two words. To do this, we use the left-hand node on the Query Builder screen.

The lefthand node defines the scope of the query — that is, where the search is to be carried out. You can set the scope to a set number of words (span) or to an XML element such as <s> (a sentence), <u> (a spoken utterance) or <p> (a paragraph of written text). As you can see, query builder starts off with the assumption that you will search anywhere within any of the corpus texts (a <bncDoc> element).

Click in the scope node and select Span. Select the number 5 words and click on OK.

You will now see the complete query: at the foot of the window there should be a message saying Query is OK.

At the bottom of the Query Builder screen, click on OK to send the query to the server.

Download all the solutions, and sort them, first by the words preceding the lemma "weigh", then by the words following the lemma "in" . What do you discover?

The interesting collocates are the ones to the right, where there are two main patterns, with the prepositions "at" and "with". What do these two expressions mean?

10. AddKey queries

What other prepositions go with weigh? Click on the QueryBuilder button on the toolbar, create a second Content node, and set the Scope as 5 words.

Click in the top Content node and select Edit. Then select Word, and insert a Word Query for the noun lemma "weigh".

Now we shall add a query for any AVP in the span of 5 words following "weigh".

Click in the second Content node and select Edit. Then Select AddKey. The AddKey dialogue box will be displayed.

Select c5 and check the Any box. Then click on Refresh.

A list of all the part-of-speech categories in the corpus will be displayed. Select AVP (an adverbial particle) and click on OK to insert this query in the QueryBuilder content node, then click on OK in the QueryBuilder window. Download all the solutions and sort them to see what other adverbial particles occur with "weigh". What is the meaning of "weigh down"?

11. Exercises

What can you find out about "unlikely"?

What can you find out about the verb "occur"?

What sorts of things "go with" what sorts of things?