2. Defining and using partitions
A partition is a way of dividing up a corpus of texts
according to any kind of criteria. For example, we might classify
each text in a corpus in a number of different ways: one way, might be as (say)
‘interesting’, ‘dull’, or ‘unread’, another as (say)
satisfying some query, or not satisfying it, and another as having
one or more predefined classifcation codes, of the type defined in a
taxonomy. Each such
classification would result in a different partition. This version
of SARA allows us to define such partitions, and then use them to see how the
results of queries occur in different kinds of texts.
Note that partitions always operate on whole texts; you cannot
classify parts of a text in different ways. A partition also always applies
to the whole corpus.
-
Open Sara and select either Lampeter or LR as your
corpus. Choose the Toolbars option on the View Menu, and click the
Subcorpora box (if necessary) to display the Partition tool bar.
-
Click the Partition button (a red P), or select
Define Partition from the Texts menu (if you cannot see the Texts
menu, click on the "Lampeter" icon at bottom left of the display, or close or minimize any open Query result windows). The New
Partition dialogue appears.
-
You must supply a name (such as "readability") for your partition in the
first box, and also a filename (such as "readbl.sc") in the third
box. You can also type in a brief description such as "Manual
assssment of readability" in the second box, if you wish. When
entering the filename, use the Browse button to navigate to the
current directory if necessary. Note that you must give the
filename an extension of .sc.
-
As you see, this dialogue offers three different ways of
defining a partition. We will start with the first one, so make
sure that the radio button next to Create an empty partition
with these classes is selected.
-
We now need to define the different classes which
make up the readability
partition, for example "boring", "interesting", "fascinating"
etc. Click in the window immediately below the heading and type
in the first term ("dull") you want to use. Press the Add button to add
it to the list. Add another term ("interesting"), and so on. You
can use the Delete button to remove a term from the list. For
convenience, the
first item in the list should be the class to which most text
in your partition will be assigned, but the order is otherwise
unimportant.
-
When you've defined three or four classes, press OK. (If the OK
button cannot be selected, you probably forgot to supply a
filename). The dialogue disappears. If you now select the Texts
window, you will see that a new column has appeared following the
"Text" column, headed "Class". Every text has been given a
default classification -- that specified first in your
list. We can now manually reclassify the texts more
appropriately.
-
Select one or more rows in the text list, using the mouse and
the shift or control keys as appropriate. Select Classify
Selection from the Texts menu, and a submenu opens with all the
available class names. Choose the class you want for the group of
texts currently highlighted.
-
Continue in this way till you have classified all the
non-default texts, or until you get bored. If it helps to group
the texts by some other column, click on any
column heading to sort the rows by the values displayed in it:
for example (with Lampeter) if you wish to classify all texts
from the 17th century as "interesting", you might sort
the display according to the year column.
-
Remember to save your partition, by pressing the Save Partition
button on the tool bar (the diskette image with a red P on it),
or by choosing Save Partition from the Texts menu.
-
Choose Partition Properties from the Texts menu, and you will
see that each of your classes has been allocated a colour. You
can change the colour by selecting the class name and pressing
the Colour button. You can also add (but not delete) a class in
this dialogue.
-
Now choose Activate Class from the Texts menu. A window appears
showing the available class codes: choose one (say "dull"). In
the status bar, you will see that the corpus name is now suffixed
by the class code (e.g. "Lampeter:dull"): this indicates that
only the dull texts in the Lampeter corpus are now being
searched. Check this by doing a search for any word (say
"King"): the results will be taken only from texts with this
classification.
-
The second window on the lower button bar now also contains a
list of available classifications. Choose a different
classification (say "interesting") and repeat your search. A
second query window now opens, with results taken only from the
"interesting" texts in the corpus. (You may find it convenient to
place the two query result windows side by side by choosing Tile
from the Windows menu). You can now compare the usage patterns of the word
"King" in texts classified differently in your corpus.
-
Finally, we will activate the whole partition created by the "Readability"
classification scheme. Choose "Activate partition" from the Text
menu. A list of available partitions appears: select
"readability" and press OK. Repeat your search for
"King", and you will see that there are 1400 occurrences in the
whole corpus, appearing in 85 texts. However, because the
"Readability" partition is now active, Sara will also show you a
breakdown of how these hits are distributed according to your
classification.
-
Select Analysis from the Query menu. A new window opens showing
at the top various statistical properties of your partition, and
at the bottom a graphic display, as either a bar or a pie
chart.
The columns show for each row
- hits
- The number of words matching your query found
in texts allocated to the specified class
- words
- The size in words of the texts allocated to
the specified class
- %
- hits as a percentage of words
- Hit Texts
- the number of texts allocated to the specified class which contain at least one occurrence of words matching your
query
- Texts
- the number of texts allocated to the specified class
If you check the box labelled ‘Measure size of hits’, the
"Hits" column (and consequently the "%") column changes to indicate the size in words of all the
texts containing at least one hit.
You can save these statistics in a file by clicking the Listing
button. You can also copy the graphic to the Windows clipboard, by
pressing the Copy button.
Defining a classification in this way is laborious. Saving it as a
partition file means you can re-use it: if you now close SARA
down, and then re-open it, you can re-activate your
"readability" partition simply by clicking on the "Open
Partition" button, or choosing Open Partition from the Texts
menu. However, if your corpus is already marked up with
classification information, you can define the whole of a
partition automatically.
-
Choose Column Control from the Texts menu. The upper part of the
Text Windows Columns dialogue shows you all the available
elements in your corpus; the lower part shows you the column
headings in the Text display (other than Text and Class). This
dialogue allows you to add new columns to this display. The
column contents will be derived from either attribute values
or element content, which you select from the list in the upper
part.
-
Choose catRef from the scrollable lift in the upper part of the
dialogue and click the Attribute radio button. From the list of
attributes displayed, select "decade" or "domain" and press the
Add button. You could also select socecStatus (socio-economic
status or author), pressing the Content radio button. As you
add new columns, you will see the effect in the Text window.
-
We can now define a new partition, based on one of the columns
we have just added. Click on the New Partition button, to open
the New partition dialogue and enter a name for the new
partition ("domain") and a filename ("domain.sc") as
before. This time however check the second radio button, which is
labelled "Create a partition based on values in a column". This
contains a list of all the available column headings: select
"domain" and press the OK button.
-
Look at the class column in the text window: each row now
contains, by default, the values supplied in the column you
nominated when you defined this partition. You can reclassify these
if you like, using the same techniqes as before. You can now
search for texts within a specific classification, or analyse
queries across the whole partition, just as you did
before.
In texts from which domain does the word "king" appear
most frequently? What is the socio-economic status of
authors who use the word "Holland" most frequently? (To answer
the second question, re-classify the
texts so that for example the various sub-classes of
"aristocracy" are all classified as "aristocracy")