1. Advanced Sara Tutorial

The folder C:\sara99 contains the latest beta release of SARA. It has been set up to allow you to experiment with some of the newer features of the client. You can use it access the following corpora:

The Lampeter Corpus
The La Recherche Corpus
Some files prepared for the CEXI corpus at Forlí

2. Defining and using partitions

A partition is a way of dividing up a corpus of texts according to any kind of criteria. For example, we might classify each text in a corpus in a number of different ways: one way, might be as (say) ‘interesting’, ‘dull’, or ‘unread’, another as (say) satisfying some query, or not satisfying it, and another as having one or more predefined classifcation codes, of the type defined in a taxonomy. Each such classification would result in a different partition. This version of SARA allows us to define such partitions, and then use them to see how the results of queries occur in different kinds of texts.

Note that partitions always operate on whole texts; you cannot classify parts of a text in different ways. A partition also always applies to the whole corpus.

  1. Open Sara and select either Lampeter or LR as your corpus. Choose the Toolbars option on the View Menu, and click the Subcorpora box (if necessary) to display the Partition tool bar.
  2. Click the Partition button (a red P), or select Define Partition from the Texts menu (if you cannot see the Texts menu, click on the "Lampeter" icon at bottom left of the display, or close or minimize any open Query result windows). The New Partition dialogue appears.
  3. You must supply a name (such as "readability") for your partition in the first box, and also a filename (such as "readbl.sc") in the third box. You can also type in a brief description such as "Manual assssment of readability" in the second box, if you wish. When entering the filename, use the Browse button to navigate to the current directory if necessary. Note that you must give the filename an extension of .sc.
  4. As you see, this dialogue offers three different ways of defining a partition. We will start with the first one, so make sure that the radio button next to Create an empty partition with these classes is selected.
  5. We now need to define the different classes which make up the readability partition, for example "boring", "interesting", "fascinating" etc. Click in the window immediately below the heading and type in the first term ("dull") you want to use. Press the Add button to add it to the list. Add another term ("interesting"), and so on. You can use the Delete button to remove a term from the list. For convenience, the first item in the list should be the class to which most text in your partition will be assigned, but the order is otherwise unimportant.
  6. When you've defined three or four classes, press OK. (If the OK button cannot be selected, you probably forgot to supply a filename). The dialogue disappears. If you now select the Texts window, you will see that a new column has appeared following the "Text" column, headed "Class". Every text has been given a default classification -- that specified first in your list. We can now manually reclassify the texts more appropriately.
  7. Select one or more rows in the text list, using the mouse and the shift or control keys as appropriate. Select Classify Selection from the Texts menu, and a submenu opens with all the available class names. Choose the class you want for the group of texts currently highlighted.
  8. Continue in this way till you have classified all the non-default texts, or until you get bored. If it helps to group the texts by some other column, click on any column heading to sort the rows by the values displayed in it: for example (with Lampeter) if you wish to classify all texts from the 17th century as "interesting", you might sort the display according to the year column.
  9. Remember to save your partition, by pressing the Save Partition button on the tool bar (the diskette image with a red P on it), or by choosing Save Partition from the Texts menu.
  10. Choose Partition Properties from the Texts menu, and you will see that each of your classes has been allocated a colour. You can change the colour by selecting the class name and pressing the Colour button. You can also add (but not delete) a class in this dialogue.
  11. Now choose Activate Class from the Texts menu. A window appears showing the available class codes: choose one (say "dull"). In the status bar, you will see that the corpus name is now suffixed by the class code (e.g. "Lampeter:dull"): this indicates that only the dull texts in the Lampeter corpus are now being searched. Check this by doing a search for any word (say "King"): the results will be taken only from texts with this classification.
  12. The second window on the lower button bar now also contains a list of available classifications. Choose a different classification (say "interesting") and repeat your search. A second query window now opens, with results taken only from the "interesting" texts in the corpus. (You may find it convenient to place the two query result windows side by side by choosing Tile from the Windows menu). You can now compare the usage patterns of the word "King" in texts classified differently in your corpus.
  13. Finally, we will activate the whole partition created by the "Readability" classification scheme. Choose "Activate partition" from the Text menu. A list of available partitions appears: select "readability" and press OK. Repeat your search for "King", and you will see that there are 1400 occurrences in the whole corpus, appearing in 85 texts. However, because the "Readability" partition is now active, Sara will also show you a breakdown of how these hits are distributed according to your classification.
  14. Select Analysis from the Query menu. A new window opens showing at the top various statistical properties of your partition, and at the bottom a graphic display, as either a bar or a pie chart.

The columns show for each row

The number of words matching your query found in texts allocated to the specified class
The size in words of the texts allocated to the specified class
hits as a percentage of words
Hit Texts
the number of texts allocated to the specified class which contain at least one occurrence of words matching your query
the number of texts allocated to the specified class

If you check the box labelled ‘Measure size of hits’, the "Hits" column (and consequently the "%") column changes to indicate the size in words of all the texts containing at least one hit.

You can save these statistics in a file by clicking the Listing button. You can also copy the graphic to the Windows clipboard, by pressing the Copy button.

Defining a classification in this way is laborious. Saving it as a partition file means you can re-use it: if you now close SARA down, and then re-open it, you can re-activate your "readability" partition simply by clicking on the "Open Partition" button, or choosing Open Partition from the Texts menu. However, if your corpus is already marked up with classification information, you can define the whole of a partition automatically.