Trecx:Toolkit

From Bodington Wiki

(Redirected from TReCX:Toolkit)
Jump to: navigation, search

Back to TReCX

Contents

Obtaining the Toolkit

The toolkit can be downloaded from the TReCX Source Forge site, http://sourceforge.net/projects/trecx/.

Contents

The following files are available for download. (Note the war files are distributed in a zip file which also includes the version as part of the file name. The other zip files mentioned below also have version numbers attached.)

  • Tracking Store
    • trecx-store-quickstart.war - a 'quickstart' Tracking Store. This contains a preconfigured in-memory database (Hypersonic), it can be deployed 'as-is' into a servlet container. It will only accept events that are published locally and report queries which also originate from the local machine.
    • trecx-store-jndi.war - a JNDI-enabled Tracking Store. This configuration expects to obtain its underlying datasource via a JNDI lookup. A file (/META-INF/context.xml) containing template values needs to be edited to match your local configuration. This is described below in the section entitled 'Configuring the Database'. The web.xml file must also be modified with the IP addresses of valid publishing sources and reporting applications. This is also described below.
    • trecx-store-nojndi.war - a non-JNDI enabled Tracking Store. This configuration expects to obtain its underlying datasource without going via a JNDI lookup. A file (/WEB-INF/classes/jpox.properties) containing template values needs to be edited to match your local configuration. The web.xml must be modified as above.
  • Publishing Application
    • trecx-publish.zip - this is a library to assist developers with publishing from an arbitrary (Java) source. It also contains an interactive command-line client. This needs to be configured to match your local requirements, and is described below.
  • Reporting Application
    • trecx-reporting.zip - this is a library to assist developers with writing applications that report from one or more tracking stores. It also contains an interactive command-line client. This needs to be configured to match your local requirements, and is described below.

Resources and Help

The following may be useful if you plan to use the toolkit.

Service Descriptions

TReCX follows the conventions of a Yahoo! REST-ian style API.

Publishing

URL: http://[host name]/[context name]/.

  • host name : host where the tracking store has been deployed.
  • context name : context where the tracking store has been deployed.

Events are expected to be sent as XML that validates against the TReCX schema.xsd and forming the entirety of an HTTP PUT request sent to a URL of the form above.

Reporting

URL: http://[host name]/[context name]/query/[method name].

  • host name : host where the tracking store has been deployed.
  • context name : context where the tracking store has been deployed.
  • method name : name of the query method you wish to invoke.

Queries are expected to be formed as an HTTP GET request. The entire response body is formed of XML that conforms to the schema.xsd. The method names and their corresponding parameters are described in the tables below. The parameters should be specified as standard request parameters (i.e. of the form ...?param1=value1&param2=value2...).

method: queryDates

Obtain the events within a specified date range,

Parameter Value Description
begin string (in xsd:dateTime format.) The start of the date range (inclusive). This can be omitted.
end string (in xsd:dateTime format.) The end of the date range (exclusive). This can be omitted.

method: queryUser

Obtain the events associated with a user and (optionally) specify a date range,

Parameter Value Description
userID string The ID of the user of interest.
begin string (in xsd:dateTime format.) The start of the date range (inclusive). This can be omitted.
end string (in xsd:dateTime format.) The end of the date range (exclusive). This can be omitted.

method: queryApplication

Obtain the events associated with an application and (optionally) specify a date range,

Parameter Value Description
applicationID string The ID of the application of interest.
begin string (in xsd:dateTime format.) The start of the date range (inclusive). This can be omitted.
end string (in xsd:dateTime format.) The end of the date range (exclusive). This can be omitted.

Using the Toolkit

There are two ways in which the toolkit can be downloaded and used:

  • a 'quickstart' installation: easy to set-up, uses an in-memory database which is not suitable for production deployment. This is for demonstration purposes only.
  • a 'proper' installation: needs an external database to be installed and a certain amount of other configuration before it can be used. This set-up would be suitable for a production installation.

System Requirements

In addition to the contents of the toolkit, the following are also needed.

  • Java 1.5.
  • a servlet container such as Tomcat 5.5.
  • an RDBMS such as PostgresSQL. (Note that Hypersonic is included with the 'quickstart' version.)

Servlet container

We would recommend that Tomcat 5.5 is used as this is the platform upon which the code has been developed.

It should be possible to use other containers such as Jetty, but this has not been tested.

Configuring the System

The system has been designed with the intention that publishers, stores and reporters could in principle all be residing on different hosts. At the time of writing there is no zero-configuration war that file can be deployed on a single host in order to demonstrate all of these components at once. Before deployment, it will be necessary to specify certain properties. Configuration options such as the following need to be specified:

  • the underlying JDBC database properties.
  • the host of the store a publisher is publishing to.
  • the publishing hosts a tracking store will accept.
  • the reporting hosts a tracking store will accept.
  • the tracking stores a reporter will interrogate.

For most modules configuration is achieved via entries in properties file made available on the CLASSPATH. For the store module, configuration is obtained by providing suitable initialization values within the entry for the appropriate servlet filters within the web.xml file.

Configuring the Tracking Store

The tracking store only accepts events or queries from known sources.

It is also possible to enable XML validation within the tracking store, clearly this implies an overhead so is generally used as a debugging aid.

If there are problems then it may also be a good idea to enable logging. TReCX uses log4j which can be configured to output useful information. The file log4j.properties controls logging and the verbosity of output. The Apache log4j website holds a wealth of information that may prove useful.

Message Validation

The web.xml file contains a property that can be set so that all incoming messages are validated against the XML schema.

<!-- Validate the XML [true | false] (default: false). -->
<context-param>
 <param-name>trecx.validate.xml</param-name>
 <param-value>false</param-value>
</context-param>

Specifying Valid Publishers

This needs to be done by specifying the IP address of publishing hosts as init-param elements within the declaration of the 'Trecx Publish Hosts Filter' filter:

<filter>
 <filter-name>Trecx Publish Hosts Filter</filter-name>
 <filter-class>uk.ac.ox.oucs.trecx.store.impl.HostsFilter</filter-class>

 <!-- 
 List of trecx publishing hosts. (These are hosts that are permitted
 to publish / post data to the tracking store). Each host name 
 parameter must be unique and of the form 'trecx.host.??', where '??' 
 is a zero-padded number in the range 1 - 99, e.g. 01, 02, 03 ... 99. 
 By default, we allow the localhost (IP: 127.0.0.1).
 -->

 <init-param>
  <param-name>trecx.host.01</param-name>
  <param-value>127.0.0.1</param-value>
 </init-param>
</filter>

Specifying Valid Reporting Applications

This needs to be done by specifying the IP address of reporting hosts as init-param elements within the declaration of the 'Trecx Reporting Hosts Filter' filter:

<filter>
 <filter-name>Trecx Reporting Hosts Filter</filter-name>
 <filter-class>uk.ac.ox.oucs.trecx.store.impl.HostsFilter</filter-class>

 <!-- List of trecx reporting hosts. (These are hosts that are permitted
 to query the tracking store). Each host name parameter must be 
 unique and of the form 'trecx.host.??', where '??' is a zero-padded 
 number in the range 1 - 99, e.g. 01, 02, 03 ... 99. By default, we 
 allow the localhost (IP: 127.0.0.1).
 -->

 <init-param>
  <param-name>trecx.host.01</param-name>
  <param-value>127.0.0.1</param-value>
 </init-param>
</filter>

Setting up the Database

The project has been primarily been developed against HSQLDB and Apache Derby. However, due to the use of JPOX any database with a suitable JDBC driver should be expected to work.

Compatibility

The tracking store uses the JPOX implementation of the Java Data Objects interface. This implies that any database from the following list can be used. (This list is correct at the time of writing, however, the latest list can be found at: http://www.jpox.org/docs/1_1/rdbms.html.)

  • MySQL
  • Oracle
  • Postgres
  • Apache Derby
  • Hypersonic (HSQL DB)
  • Informix
  • Sybase
  • MS SQLServer
  • H2
  • McKoi
  • DB2
  • Firebird
  • SAPDB / MaxDB
  • Pointbase

Please check the above link for driver compatibility.

Configuration

Note that the 'quickstart' version does not require the database to be configured; these instructions only pertain to use of the system with an external database. The corresponding war file needs to be unpacked, the relevant files edited and then the corresponding unpacked application needs to be re-packed as a war file.

The main configuration issue concerns configuring the underlying JDBC datasource. How you do this differs depending on whether you want make the datasource available through JNDI or not. One of the key benefits of using JNDI is that it gives you database pooling without any further effort.

Whether you choose to configure with JNDI or without, you will need to modify the following JDBC template values to suit your local configuration. These are located in different files depending on the type of configuration you choose to go for.

  • @JDO_DRIVER_USER@ : the JDBC user.
  • @JDO_DRIVER_PASSWORD@ : the password for the above user.
  • @JDO_DRIVER_NAME@ : the class name of the JDBC driver.
  • @JDO_DRIVER_URL@ : the JDBC URL of the database connection.
No-JNDI

You will need to expand the corresponding war file (trecx-store-nojndi.war) and edit the file located at /WEB-INF/classes/jpox.properties. The example here shows a configuration for using an in-memory HSQDLB database. After replacing the token values, one could end up with something like the below:

#
# JDO properties:
#
javax.jdo.PersistenceManagerFactoryClass=org.jpox.PersistenceManagerFactoryImpl
javax.jdo.option.NontransactionalRead=true
javax.jdo.option.ConnectionDriverName=org.hsqldb.jdbcDriver
javax.jdo.option.ConnectionURL=jdbc:hsqldb:mem:trecx-store
javax.jdo.option.ConnectionUserName=trecx
javax.jdo.option.ConnectionPassword=iamkuriosoranj

#
# JPOX properties:
#
org.jpox.autoCreateSchema=true
org.jpox.validateTables=false
org.jpox.validateConstraints=false

The other thing you need is the jar file containing the JDBC driver for your database. You need to add this to /WEB-INF/lib/ of the unpacked web-application. You then need to create a new war file including the modified properties file and JDBC driver jar.

JNDI

You will need to expand the corresponding war file (trecx-store-jndi.war) and edit the file located at /META-INF/context.xml. The example here shows a configuration for connecting to an HSQDLB database running in server mode. After replacing the token values, one could end up with something like the below (edited to show the salient text):

<Context>
   <Resource name="jdbc/TrecxStore" auth="Container" 
       type="javax.sql.DataSource" 
       username="sa" password="" 
       driverClassName="org.hsqldb.jdbcDriver" 
       url="jdbc:hsqldb:hsql://localhost/trecx-store" 
       maxActive="8" maxIdle="4" />
</Context>    

You then need to create a new war file including the modified properties file. You then need to make your JDBC driver available to your servlet container (and thereby application). With Apache Tomcat you do this by placing the corresponding jar in $CATALINA_HOME/common/lib/. If you are using another kind of servlet container you will need to consult your servlet container documentation on how to do this.

Configuring the Publishing Library

You should retrieve and unzip the zip file to a suitable directory and follow the instructions below. You need to make all the contained jar files available to your application CLASSPATH.

The publish library makes use of a properties file publish.properties that needs also needs to be located on the CLASSPATH. The only required property is trecx.store.url which specifies the endpoint for your events. A sample properties file can be found within the /etc directory of the zip distribution.

Here is an example,

# Endpoint to publish events to (required):
trecx.store.url=http://localhost:8080/trecx-store/
#
# Number of event to send in each batch (default: 250):
trecx.batch.size=250
#
# Max. time between the sending of events {ms} (default: 30000):
trecx.batch.interval=30000
#
# Specify whether or not to validate the XML that is sent (default: false) possible values [true|false]:
trecx.validate.xml=false

Unless marked as required the setting of any property is optional.

Configuring the Reporting Library

You should retrieve and unzip the zip file to a suitable directory and follow the instructions below.

The reporting library is configured via a file called reporting.properties which needs to be present on the CLASSPATH. The file simply contains a list corresponding to the URLs of all the trackings stores you wish to report from. You need to add at least one entry corresponding to a tracking store. A sample properties file can be found within the /etc directory of the zip distribution. You should make a copy of this file and rename it accordingly.

Here is an example,

# Endpoint to retrieve events to (required):
trecx.store.01=http://localhost:8080/trecx-store/
trecx.store.02=http://weblearn.ox.ac.uk/trecx-store/
trecx.store.03=http://aspire.ox.ac.uk/LUSID/trecx-store/
# and so on!

Installation / Deployment

Once the files have been downloaded and (if appropriate) modified as described above, they can be deployed.

Tracking Store

Assuming the database has been set up (or a decision made to use HSQLDB), the web.xml updated and a new WAR file created, it is simply a case of deploying the WAR file via a tool such as the Tomcat Manager. The database schema gets created on-the-fly by the underlying libraries that are used (JPOX).

Publishing Library

The publish library is primarily intended to be used in conjunction with an existing application. However in order to breed familiarity with the toolkit and for general testing purposes there is an interactive command-line client included. After setting up the CLASSPATH appropriately (which needs to include all the jars that were in the zip file) you can execute this client by typing the below:

java uk.ac.ox.oucs.trecx.publish.impl.TestClient

Reporting Library

The reporting library is primarily intended to be used as the basis for a reporting application. However in order to breed familiarity with the toolkit and for general testing purposes there is an interactive command-line client included. After setting up the CLASSPATH appropriately (which needs to include all the jars that were in the zip file) you can execute this client by typing the below:

java uk.ac.ox.oucs.trecx.reporting.impl.TestClient

Tracking

Here we describe how to add tracking to existing tools.

Allowing an Existing Tool to use the Tracking Store

In order to add tracking to an application one would augment the source code to intercept the events and then send them to a tracking store. If the application is Java-based the publisher package can be used to help with the event publishing process.

From the Publisher class, a singleton instance can be obtained whose publish() method can be used to publish events to the tracking store. The buffering and dispatching of the events is performed behind the scenes according to the configuration parameters given in the corresponding publish properties file.

There is no reason why a PHP or Python application could not send events to the TReCX tracking store (which of course is part of the whole rationale behind web services!). However libraries to assist in such applications publishing events are yet to be written.

Allowing an Existing Tracking Store to be Queried

If an e-learning application already stores its own tracking data then this data can be supplied to the reporting application so long as the e-learning application 'implements' the RESTian query interface described above. In other words, the e-learning application must be able to supply an XML document which conforms to the TReCX Event schema (https://svn.sourceforge.net/svnroot/trecx/trunk/core/schema/schema.xsd).

Events

A single action by a user does not necessarily translate to one 'Event', moreover, in most of the situations that we looked at this was not the case.

Examples

For example, consider a user who posts a message to a forum in response to an already existing thread. We need to relate the actual message to its containing thread and to the actual instance of the forum. (A tutor may set up a forum for his Noodle Making 101 course, over a term, this area may contain numerous threads all about NM101.) The above user event translates to 3 distinct system events:

  1. create forum post
  2. update forum thread
  3. update forum instance

It is up to the tool doing the reporting to construct these 3 events. The reporting application will still function if only the 'most significant' event is reported (ie, create forum post) but will not be able to relate back to the containing 'object'.

Consider a further example form a course booking system, user books on a course:

  1. create course registration
  2. update course instance

User un-enroles from a course. This implies two events:

  1. delete course registration
  2. update course instance

Possible Events

We looked at the following e-learning applications and noted what possible events could be generated. When adding tracking to existing tools it pays to think quite hard about each action in the system; what events should be generated for each distinct user action?

Wiki

Events:

  • user X creates wiki area Z in wiki at <date+time>
  • user X registered with wiki at <date+time>
  • user X logged in to wiki at <date+time>
  • user X created page Y in wiki area Z at <date+time>
  • user X modified page Y in wiki area Z at <date+time>
  • user X accessed (read) page Y in wiki area Z at <date+time>
  • user X logs out of wiki at <date+time>

Course Booking system

Events:

  • user X registered at <date+time>
  • user X logged in at <date+time>
  • user X accesses (reads) page about course Y at <date+time>
  • user X registers for course Y <date+time>
  • user X un-registers for course Y <date+time>
  • user X adds new course Y at <date+time>
  • user X modifies course Y at <date+time>
  • user X deletes course Y at <date+time>
  • user X makes a comment (ie, reviews) course Y at <date+time>
  • user X logs out at <date+time>

Forum

Events:

  • user X registered at <date+time>
  • user X logged in at <date+time>
  • user X creates new forum instance Y at <date+time>
  • user X creates new forum thread T
  • user X posted message Z to thread T on forum Y at <date+time>
  • user X read message Z posted to thread T in forum Y at <date+time>
  • user X modifies message Z posted in forum Y at <date+time>
  • user X locks forum at <date+time>
  • user X unlocks forum at <date+time>
  • user X deletes forum Y at <date+time>
  • user X logs out at <date+time>

Chat

Events:

  • user X registered at <date+time>
  • user X logged in at <date+time>
  • user X creates chat room Y at <date+time>
  • user X entered chat room Y at <date+time>
  • user X posted a message in chat room Y at <date+time>
  • user X leaves chat room Y at <date+time>
  • user X logs out at <date+time>

Assessment

Events:

  • user X registered with assessment system at <date+time>
  • user X logged in at <date+time>
  • user X visited assessment Y at <date+time>
  • user X submitted answers for assessment Y at <date+time>

PDP

Events:

  • user X registered with system at <date+time>
  • user X logged in at <date+time>
  • user X visited page Y at <date+time>
  • user X created a new PDR at <date+time>
  • user X shared (granted access to) PDR P with user Y
  • user logs out at <date+time>

Repository

Events:

  • user X registered with system at <date+time>
  • user X uploaded document Y at <date+time>
  • user X modified document Y at <date+time>
  • user X reviewed document Y at <date+time>

Annotation (Digital Repositories)

Events:

  • user X commented on resource Y at <date+time>

Reading list

Events:

  • user X registered with system at <date+time>
  • user X accessed reading list Y at <date+time>
  • reading list Y created at <date+time>
  • reading list Y modified at <date+time>

VLE

Events:

  • user X registered with system at <date+time>
  • user X logs in at <date+time>
  • user X creates resource Y at <date+time>
  • user X reads resource Y at <date+time>
  • user X updates resource Y at <date+time>
  • user X deletes resource Y at <date+time>
  • system reveals resource Y at <date+time>
  • system hides resource Y available at <date+time>
  • user X logs out at <date+time>

Example Events

(The latest copy of the event schema can be found here).

Here is a visual representation of the abovementioned schema (correct on 10 Sep 2006).

Commentary

The types are described here as per XML Schema.

  • timestamp (xs:dateTime) the timestamp of event.
  • systemID (xs:string) identifier of system that produced event: In the short term this is likely to be a domain name, eg, weblearn.ox.ac.uk or aspire.ox.ac.uk.
  • userID - partitioned into
    • "content" is the value of the ID itself.
    • type (xs:string) the restricted enumeration of values are:
      • TRANSIENT (for anonymous (not logged in) users)
      • SYSTEM for events generated by the system (note there will not be a username in this case)
      • INSTITUTION_ID
      • UNIQUE_LEARNER_ID
      • USER_NAME
  • service (xs:string) an extensible list. However, to kick off it's likely that Wiki, Chat, Blog, Forum, etc will be initial "core" values.
  • domain (xs:string) the restricted enumeration of values are:
    • USER - an event initiated by a user
    • RESOURCE - an event pertaining to a resource, for example, a timed-release resource becomes available, or a resource has been updated
    • AUTHENTICATION - successful authentication
  • message pertinent message which can be displayed to a human to give an idea of what the event is all about, may be the first line of a forum post or chat message, or the title of a wiki page that was created or edited, and so on. This message should be capable of displaying unicode/UTF-8 characters. Note that the message tag should contain the 'xml:lang' attribute to indicate what language is being used. this is stored in the database.
    • "content" is the message itself.
    • language (xml:lang) is an attribute which describes the language used for the message.
  • applicationID (xs:string) identifier of resource within system that produced event: a given instance of a tool within the above. Again, this will probably be a URL (or URI). URLs only work for some applications. Lots of applications don't have good URLs (JSF).
  • operation (xs:string) taking our cue from the database world, our operation types would be from the restricted enumeration of:
    • CREATE
    • READ
    • UPDATE
    • DELETE
  • servicePart (xs:string) an (optional) sub-part of the service element. Examples would include a thread or a post within a forum, post within a blog, etc.

A TRANSIENT userID is used when a temporary userId has been assigned by the e-learning system, for example, when a user has a 'session' but has not yet logged in (cf. Plone, Bodington). A non-transient userId is some sort of 'permanent' username. We do not intend to connect transient userIds to fixed userIds. Imagine the following scenario: Ming The Merciless accesses several 'open' resources in Plone as an anonymous user and then logs in (using username 'minger') and accesses several more. If Ming The Merciless's tutor does a search on the username 'minger' from within the reporting application they will only see data pertaining to resources accessed after login, all the resources visited when Ming was not logged in will be absent.

In the schema, eventList has been declared as nillable. This means that if there are no events (intended to be used as the result of a query when there are no matching events to return) the eventList can just be declared as null (or xsi:nil="true" to be precise!). A non-null eventList should thus always contain at least one event. (NOTE: publishing clients are not intended to send null eventLists; if there are no events to send, do not send anything at all!).

Example messages

<?xml version="1.0" encoding="UTF-8"?>
<eventList xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:noNamespaceSchemaLocation="../schema/schema.xsd">
   <event>
       <applicationID>/biog/year04/forum/</applicationID>
       <domain>AUTHENTICATION</domain>
       <operation>CREATE</operation>
       <service>Forum</service>
       <systemID>http://vle.ox.ac.uk/</systemID>
       <timestamp>2006-06-09T14:29:36</timestamp>
       <userID type="INSTITUTION_ID">lxsocon</userID>
   </event>
   <event>
       <applicationID>/medsci/year01/wiki/</applicationID>
       <domain>RESOURCE</domain>
       <message xml:lang="en">Corrected spelling mistakes.</message>
       <operation>UPDATE</operation>
       <service>Wiki</service>
       <systemID>http://vle.ox.ac.uk/</systemID>
       <timestamp>2006-06-09T14:29:37</timestamp>
       <userID type="INSTITUTION_ID">jneeskens</userID>
   </event>
   <event>
       <applicationID>/library/staff/internal/</applicationID>
       <domain>RESOURCE</domain>
       <message xml:lang="en">Removed inflammatory feedback.</message>
       <operation>DELETE</operation>
       <service>Blog</service>
       <servicePart>entry20060609d</servicePart>
       <systemID>http://vle.ox.ac.uk/</systemID>
       <timestamp>2006-06-09T14:29:37</timestamp>
       <userID type="INSTITUTION_ID">bookworm</userID>
   </event>
</eventList>


<?xml version="1.0" encoding="UTF-8"?>
<eventList xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:noNamespaceSchemaLocation="../schema/schema.xsd">
   <event>
       <applicationID>/biog/year04/forum_y/</applicationID>
       <domain>RESOURCE</domain>
       <message xml:lang="en">I disagree in the strongest terms.</message>
       <operation>CREATE</operation>
       <service>Forum</service>
       <servicePart>thread3.post26</servicePart>
       <systemID>http://vle.ox.ac.uk/</systemID>
       <timestamp>2006-06-09T14:29:36</timestamp>
       <userID type="INSTITUTION_ID">bmoore</userID>
   </event>
   <event>
       <applicationID>/biog/year04/forum_y/</applicationID>
       <domain>RESOURCE</domain>
       <operation>UPDATE</operation>
       <service>Forum</service>
       <servicePart>thread3</servicePart>
       <systemID>http://vle.ox.ac.uk/</systemID>
       <timestamp>2006-06-09T14:29:36</timestamp>
       <userID type="INSTITUTION_ID">bmoore</userID>
   </event>
   <event>
       <applicationID>/biog/year04/forum/</applicationID>
       <domain>RESOURCE</domain>
       <operation>UPDATE</operation>
       <service>Forum</service>
       <systemID>http://vle.ox.ac.uk/</systemID>
       <timestamp>2006-06-09T14:29:36</timestamp>
       <userID type="INSTITUTION_ID">bmoore</userID>
   </event>
   <event>
       <applicationID>/medsci/year01/wiki/</applicationID>
       <domain>RESOURCE</domain>
       <message xml:lang="en">Corrected spelling mistakes.</message>
       <operation>UPDATE</operation>
       <service>Wiki</service>
       <systemID>http://vle.ox.ac.uk/</systemID>
       <timestamp>2006-06-09T14:35:12</timestamp>
       <userID type="INSTITUTION_ID">jneeskens</userID>
   </event>
   <event>
       <applicationID>/library/staff/internal/</applicationID>
       <domain>RESOURCE</domain>
       <message xml:lang="en">Removed inflammatory feedback.</message>
       <operation>DELETE</operation>
       <service>Blog</service>
       <systemID>http://vle.ox.ac.uk/</systemID>
       <timestamp>2006-06-09T14:41:21</timestamp>
       <userID type="INSTITUTION_ID">bookworm_king</userID>
   </event>
   <event>
       <applicationID>/staff/register/</applicationID>
       <domain>USER</domain>
       <message xml:lang="en">New member of staff: Diego Maradona.</message>
       <operation>CREATE</operation>
       <service>Registration</service>
       <systemID>http://library.ox.ac.uk/</systemID>
       <timestamp>2006-06-09T14:46:09</timestamp>
       <userID type="INSTITUTION_ID">bookworm_king</userID>
   </event>
</eventList>

Building Events in a Java Application

Events can be created by using the JAXB classes that were generated from the schema. Within the toolkit modules, these classes are found within the trecx-jaxb.jar.

Consider the following example taken from our test harness (note the random() method simply returns a random object of the appropriate class and the rdm.nextBoolean() is occasionally false which will ensure optional elements are sometimes absent):

Event event = new Event();
	
event.setApplicationID( random( applicationIDs ) );
event.setDomain( random( DomainType.class ));
	
if ( rdm.nextBoolean() ) { // 'message' is optional:
 Message message = new Message();
 message.setLang( "en" ); // or whatever
 message.setValue( random( messageValues ) );
 event.setMessage( message );
}
	
event.setOperation( random( OperationType.class ) );
event.setService( random( services ) );
	
if ( rdm.nextBoolean() ) { // 'servicePart' is optional:
 event.setServicePart( random( serviceParts ) );
}
	
event.setSystemID( random( systemIDs ) );
event.setTimestamp( timestamp );
	
UserID userID = new UserID();
userID.setType( random( UserIdType.class ) );
userID.setValue( random( userIdValues ) );
event.setUserID( userID );

Publishing Events

Here is an example of using the Publisher class, (note that randomEvent() generates a random event using the code given above). This code has been adapted from our test harness; it can be seen that 10000 random events are sent to the publisher.

The publisher will dispatch the messages in accordance with its configuration parameters (see above).

Publisher publisher = Publisher.getPublisher();
for (int i = 0; i < 10000; i++) {
 publisher.publish( randomEvent());
}
publisher.shutdown();

Developing the Toolkit

Every class in the toolkit has extensive Javadoc which will be essential when developing the code further.

Getting Started

The project has been evolved from the start to use Ant as the definitive build process. All the required build procedures, such as JAXB class generation, JDO class enhancement, etc, have corresponding Ant tasks. One of the primary advantages of this is that it avoids dictating to developers which (if any) IDE they should use. However, clearly many developers like to work with an IDE as it gives them code completion, javadoc hints, etc. These days many IDEs (such as NetBeans, Eclipse, IntelliJ, etc) have excellent Ant integration. We include some instructions on how to use Eclipse in particular.

If you intend to develop the software then you will also need to ensure that the following are available:

(Note that the software has been developed using Tomcat, however, there is no reason why another container cannot be used. The only issue here is how to specify the details of the data source; different servlet containers have different ways of doing this. This is described elsewhere.)

Developing with Eclipse

We would recommend using the Eclipse IDE. You will need at least version 3.x; it is generally a good idea to use the latest release!

Installing a Subversion Client

The source code for TreCX is stored in a Subversion repository, therefore you will need some kind of Subversion client. You could always the command-line client, but using an Eclipse Subversion plug-in makes things easier in many ways. There are other Subversion clients available for Eclipse (i.e. Subclipse), but we would recommend Subversive. The follow are the steps you need to follow:

  1. Click on 'Window > Preferences > Install/Update > Automatic Updates'
  2. Select the option to 'Automatically find new updates .....'
  3. Click on 'Apply'
  4. Click on 'OK'

Then

  1. Click on 'Help > Software Updates > Find and Install ...'
  2. Select 'Search for New Features to Install'
  3. Click on 'Next'
  4. Click on 'New Remote Site'
  5. In the pop up box supply a suitable name and paste in the URL: http://www.polarion.org/projects/subversive/download/update-site/
  6. Click 'OK'
  7. Click 'Finish'

Setting up the Repository Location

For read-only access you can connect to the repository anonymously. For developer (i.e. write access ) you will need to have registered with SourceForge to obtain a username and password and have been added to the list of developers for the TReCX project.

You now need to set up a new repository location within Eclipse.

  1. Click on 'New > Repository Location'
  2. In the pop-up box, supply the following URL: https://svn.sourceforge.net/svnroot/trecx/trunk/
  3. Supply your Username and Password then click the 'Remember Password' box
  4. Click on 'Finish'

Downloading the Source Code

This assumes that you have successfully completed the above steps.

  1. Click on 'File > New Project'
  2. In the pop-up window, open up 'SVN' and select 'Projects from SVN'
  3. Click on 'Next'
  4. Select the TReCX repository location and click 'Next'
  5. Click on 'ROOT' then click 'Finish'
  6. Select 'Checkout project using New Project Wizard'. Make sure the 'Checkout Subdirectories' is selected.
  7. Click on 'Finish'
  8. In the pop-up 'New Project' window, select 'Java Project' and click 'Next'
  9. Now supply a meaningful project name. May we suggest you use 'TRECX'?
  10. Ensure that the JDK Compliance is set to '5.0' (which is JDK 1.5)
  11. Ensure that 'Use project folder as root ....' is selected.
  12. Click 'Finish'
  13. In the resultant pop-up, say that you do want to open the Java Perspective and click OK
  14. You should now see the project is being created.

Your Eclipse workspace should now contain all the TReCX source code. You will see a number of red 'X's alongside many of the packages.

Keen observers will note that the filestore is organised in the same way as other Apache projects - this should seed some familiarity.

Building Using Ant

Currently there is no one target to run to build everything. However, for each module dist is the default target and executing this will create the distributables associated with that module. By following what the build scripts do you can integrate the build process more closely into your IDE of choice, if you so wish.

Configuring Eclipse for the TReCX Project

  1. Click on 'Project > Java Build Path'
  2. Select the 'Source' tab
  3. For each of the packages, open up the tree by click on the little plus sign and select the src and test folders for each package.
  4. Accept the option making the bin folder the default for output

Then

  1. Click on 'Project > Java Build Path'
  2. Select the 'Libraries' tab
  3. Click on 'Add JARS' then click on the TReCX project in the pop up window
  4. Open up ALL branches, select all jar files on the core/lib/jaxb branch and select the junit file select all jar files in store/lib/jpox, in publish/lib/ and in reporting/lib/httpClient.
  5. Click 'OK'

Now you need to build the TReCX libraries.

  1. Open up the 'core' folder and right click on build.xml
  2. Select 'Run As ...'
  3. Select 'Ant Build ...'
  4. Select the all Ant target and click on 'Run'

This should build the TReCX jars and place then in the core/lib folder. The console should carry the immortal words BUILD SUCCESSFUL.

The TReCX jars should now be located in core/build/jars. To see them you will probably need to

  1. Click on 'File > Refresh'

Now add these libraries to the build path

  1. Click on 'Project > Properties'
  2. Select the 'Libraries' tab.
  3. Click on 'Add Jars' then navigate to core/build/jars and select all 3 jars
  4. Click 'OK'

Bingo. There should be no more compilation problems.

Database

The toolkit uses the JPOX / JDO database abstraction layer meaning that it is database agnostic. We would recommend using Hypersonic (aka HSQLDB) for development and something like Postgres or MySQL for a deployed service.

Required Software

The toolkit download and SVN repository holds all the software that you need to run the toolkit.

For the record, here is a list of the third party software used by the toolkit. As an aside, if you look in the svn repository you will see that each jar utilizes svn properties to maintain the library name, version and URL where it can (manually!) be found.

  • HTTPClient (v3.1-alpha).
  • JAXB RI (v2.0.2)
  • JDO (v2.0) - JDO API
  • JPOX (v1.1.2) - reference implementation of Java Database Objects (JDO).
  • BCEL (v5.2) - required by JPOX at development time.
  • LOG4J (1.2.13) - logging library.
  • HSQLDB (1.8.0.5) - pure Java database.

Deploying and Testing

Before deploying any of the systems, it is necessary to configure each component of the system as described above (see 'Using the Toolkit').

Each module has its own build.xml file and closer inspection will reveal that there are dependancies between them (in particular, other modules will call targets within core). These drive the build process and can be used to execute and deploy the generated software artefacts.

Tracking Store

It will be necessary to set up a build.properties file before attempting to deploy the tracking store. A template (which you should copy and rename) can be found within the top-level directory of the store module called sample.build.properties. This includes parameters specifying how to connect as a manager to your Apache Tomcat installation. If do not already have a user in the manager role, you will need to add a user with the role manager to the file $CATALINA_HOME/conf/tomcat-users.xml. The property values specifying how you connect to your underlying JDO JDBC datasource have already been described elsewhere (You don't have to change these anyway; sensible defaults are provided).

# Location of the jar containing the Ant tasks for Apache Tomcat:
# NOTES: The Tomcat targets operate via the web interface, rather than acting 
# directly on the file system. Update == true in the deploy task of Tomcat v5.5 
# doesn't seem to work properly (it fails to do an undeploy first), therefore
# one suggestion is to use the one in v5.0 instead.
tomcat.ant.jar=/usr/java/jakarta/tomcat-5.0.28/server/lib/catalina-ant.jar

# Tomcat (manager) properties:
tomcat.deploy.url=http://localhost/manager
tomcat.deploy.username=Fred
tomcat.deploy.password=iAmFred
tomcat.deploy.path=/trecx-store

# Override where the JAXB jars are taken from:
#jaxb.lib=/usr/java/jaxb/lib 

# Override location of your JUnit jar :
#junit.jar=/usr/java/lib/junit-4.1/junit-4.1.jar 

#
# JDO connection properties:
# (NOTE: If you are creating the 'quickstart' war you can ignore the following
# section as the values are overridden by ones contained internally.)
#
# Set the following to true if configuring the JDO datasource via JNDI:
jdo.using.jndi=false
jdo.driver.name=org.hsqldb.jdbcDriver
jdo.driver.url=jdbc:hsqldb:mem:trecx-store
jdo.driver.user=sa
jdo.driver.password=
# NOTE - the following is not required if configuring via JNDI:
jdo.driver.jar=lib/hsqldb.jar

Once the build.properties file has been set up correctly, the tracking store can be deployed by invoking the deploy target:

ant deploy

If you are using another kind of servlet container you can always type the following instead and take the resulting war (which is created at ${build}/wars/trecx-store.war) and deploy it manually:

ant trecx-store-war

Publish Module

Once the publish.properties has been correctly set up, various associated applications can be run. The following will invoke the command-line client whereby you can, for example, specify sample event XML files to send to a store (such as that found in ../core/samples/).

ant run-test-client

The following will invoke the automatic test harness, which can create thousands of randomly generated events. This is primarily intended for load-testing the publish (and store!) modules:

ant run-test-harness

Reporting Application

Once the reporting.properties has been correctly set up, the reporting application can be run. The following will invoke the interactive command-line client whereby you can try out various queries against one or more configured stores:

ant run-test-client

Architecture

We present various UML diagrams which help to show the architecture of the toolkit.

Package Diagrams

The following diagram is a schematic representation of the TReCX Toolkit.

Image:Trecx_diagram_lxs2.png

This is an implementation diagram of the architecture implemented using the nomenclature of a UML2 component diagram. The interfaces are shown using the ball-and-socket notation of provided (the end attached to the ball) and required (the end attached to the socket) interfaces. The applications are shown as "components" in the sense that these can (in principle) be swapped in and out. It is intended to highlight the importance of the interfaces and the fact that a reporting application can only query applications which contain tracking stores.

Image:TReCX-Comp-UML2.jpg

Class Diagrams

Tracking Store: uk.ac.ox.oucs.trecx.store

Tracking Store package: uk.ac.ox.oucs.trecx.store.impl

Event Classes: uk.ac.ox.oucs.trecx.jaxb.j2s

Publisher Package: uk.ac.ox.oucs.trecx.publish

DateConverter class: uk.ac.ox.oucs.trecx.core.jaxb


Back to TReCX