XXQ - an informal introduction

XXQ - an informal introduction

XXQ is a new query language, designed for querying text corpora that have been indexed using the Xaira indexer, or any other system using a similar object model. XXQG isn’t grep: because it uses a pre-built index it can query a collection of texts much faster than grep could; but the process of building an index also takes time. XXQ is for use when the same group of texts will be queried over and over again. It isn’t xpath: it has some of the features of xpath (and, when the two overlap, it uses xpath notation) but it can only access as much information about the DOM tree as is in the index. And finally, it isn't Xquery: it is a pattern matching language, intended for use within the text nodes of an XML document as much as with its XML structure.

As of release 1.20 (July 2006), support for XXQ in Xaira has yet to be implemented. This document should therefore be read as an introductory design document indicating future directions; however most of the functionality described here is already present in the Xaira system, if expressed less elegantly.

XXQ queries are patterns that may or may not match locations in the corpus. There are many features that will recall regular expressions, such as minimum and maximum repetition counts. But where the ‘atoms’ of regexp searches are characters, the atoms of XXQ searches are words and tags.

You can expect to find the same sort of limitations when searching with XXQ as you would when searching with regular expressions. There are no variables, so you cannot write an XXQ query that will find all occurrences of ‘X and X’ where we do not know what X is, but simply stipulate that the same word should occur on both sides of and. To carry out more advanced searches, you need to use a programming language that can access the Xairo API such as C++, Java, or PHP.

An XXQ query is represented by an XML element <xxq>. In the examples that follow we shall not supply this. Nor shall we take time to explain what makes a valid XML document - for example, you can use XML-style comments in queries if you want. Do remember that if you need to use the special characters & and < in character data you must use XML entity notation and write &amp; and &lt; otherwise you will almost certainly get an error from the XXQ query parser.

Sections in this document: