DRH2002 Logo

id

Paper Proposals

68

Carson, Christie. Royal Holloway, University of London
Owen, Catherine. Performing Arts Data Service.

Designing Shakespeare: Why Is An Audio-Visual Archive Necessary?

The aim of the Designing Shakespeare project is to broaden the range of materials available for the study of Shakespeare in performance. Following on from the work initiated by The Cambridge King Lear CD-ROM: Text and Performance Archive it is my continuing aim to try to draw attention to the very wide range of possibilities in terms of Shakespearean production, thereby showing the fluidity of interpretation of these texts over time. It is also my aim to draw attention to the increasing reliance on visual literacy as a means of creating meaning on stage. It has been argued that dress became increasingly important in Shakespeare's own performance period as a result of the increasing fluidity of society at the time but also in response to a developing interest in the processes of perception and representation in art and science generally. I would suggest equally in our current period of social, but also cultural, movement the visual has become increasingly important in creating meaningful productions

This project takes an innovative approach both to the kinds of materials it documents but also to the ways in which these materials have been created, displayed and made available to the public through the Performing Arts Data Service at the University of Glasgow. The purpose of this paper is to introduce its audience to the approach taken in terms of building the database but also the approaches that I hope this database will inspire in terms of teaching models for the study of Shakespeare. By charting the last forty years of Shakespearean performance history, through production details, production photographs, 3D Models and interviews with designers, this project hopes to demonstrate the very real changes which have taken place in terms of theatre practice but also in terms of theatre spaces and actor-audience interaction over that period. British society has changed dramatically over the past four decades and one way to illustrate these changes is through the work of the country's leading professiona

Working with the theatre photographer Donald Cooper and the theatre designer Chris Dyer this project aims to document the past in a way which will preserve its processes but also in a way which makes its processes accessible to a new generation of users working in an entirely different way. Digital technology is changing the way theatre is rehearsed, designed and staged. It is also changing the responses of the audience and the position of theatre within society. Digital technology has increasingly been incorporated into the onstage action but it has also changed the social habits of the audience. Photography is also changing in response to the new digital advances. This project stands at the apex of the old and the new ways of working both in the theatre and in terms of creating a record of that ephemeral event. History is a process, this archive charts not only an historical process but also the changing creative practices of theatre making and archival practices of theatre preservation. The motivatio

The forty year period covered by this archive helps to illustrate that fact that the position of theatre in society has changed radically, as has the position of Shakespeare in education. This project hopes, on the one hand, to look at the ways in which those changes have effected the visual representation of Shakespeare's plays over time and, on the other, it hopes to make new ways of studying Shakespeare possible. By making this database of visually focused information available to an educational audience this project pushes academic research into the public domain and encourages feedback and participation in the ongoing research project. The aim therefore is not just access, in the sense of creating a public library, but exchange, creating a topic and a forum for discussion. The key aim, therefore, of the project is to change the way the texts of Shakespeare's plays are approached through the development of a visual familiarity with the play and its interpretation over time, to encourage users of the

75

Walsh, John. Indiana University

Free the Data:  Accessibility, Format, and Transformation Issues Related to XML-Based Humanities Resources

My paper and presentation will discuss the importance of delivering XML-based digital humanities texts in their native XML format as well as in a variety of other useful formats and will further discuss and demonstrate the use of XSL and other tools to create dynamically many of the different possible delivery formats.

A number of recent technologies, in particular XML-related technologies such as XSL, XPath, and X-Query, provide, for both content providers and content consumers, a wealth of new ways to manipulate and query XML-based digital humanities texts and other XML-based data. One significant problem with many current electronic text delivery models is that SGML- or XML-encoded data is hidden behind a proprietary system or web interface that provides a limited number of formats to the user. The SGML or XML is searched on the server but delivered to the user as HTML or some other format such as PDF or page images in various graphic file formats. The XML data never reach the user, and the formats available to the user lack the often rich markup and detailed structure of the original SGML or XML data. 

In cases of proprietary or Web interfaces, the content provider also makes decisions about how the text may be accessed and searched, and these decisions may or may not coincide with the specialized needs of the scholars and researchers using the data. A Web form is often used to construct a query, and these Web forms typically limit the structure and complexity of the query. The advantage of these systems is that they provide a relatively familiar and user-friendly Web interface that provides users with quick and simple access to the data. Content providers should continue the development of these interfaces, always striving of course to provide increased flexibility and a better user experience to the data consumer. 

However, in addition to developing new and improved Web interfaces to e-text projects, I would argue that, whenever possible given copyright and other restrictions, content providers should provide easy access to the raw XML data as well. By providing users access to the raw XML documents, users will be free to manipulate the data in any way they see fit and to apply to the data the full arsenal of XML manipulation and query tools in arbitrarily complex ways. With technologies like XPath and XQuery and the availability of free implementations of these technologies, scholars, researchers, and general purpose users of our collections can manipulate and transform the data themselves in a greater variety of discipline- and interest-specific ways than could likely be accommodated by a user-friendly, Web-based interface. By providing the raw XML data to users, the scholarly community may also reap the benefit of encouraging and facilitating the acquisition of greater technical skills among humanities scholars  

After elaborating on these and other issues related to the freeing of humanities XML data, which is probably of most use to the power, I will discuss and demonstrate the importance of providing access to a variety of other data formats, which can benefit all levels of users. Using XSL, one can easily transform XML data into a number of useful formats, including HTML, PDF, open eBook, PalmOS Doc, Scalable Vector Graphic, VRML, VoiceXML, and more. While some potential applications are more obvious than others, I believe all of these formats can be usefully and creatively applied to digital humanities projects. 

I will close my talk with a demonstration of using XSL to transform an XML-based humanities text into a sampling of these different formats.

76

Stein, Adelheit . Fraunhofer IPS
Keiper, Jürgen. Deutsches Filminstitute-DIF
Thiel, Ulrich. Fraunhofer IPSI

Web-based Collaboration Support in a Digital Library on European Historic Films - the COLLATE Project

Introduction

Various Web-based collaboratories have been employed since the early 90ies mainly in the natural sciences, but in art and humanities there are only few efforts and systems with very limited functionality. Whereas preservation and organisation of historical knowledge may be comparable, work processes in interpreting sciences are often different. Especially the process of compiling arguments, counterarguments, examples, definitions and references to historical source material – which is the prevailing method in the humanities – may profit from an electronic environment that improves the capacity and reach of the individual knowledge worker.

Another aspect of collaboratories is their capability to enable virtual teams to work together almost as if they were at the same location. Although many informal contacts between cultural archives constitute specific professional communities, they so far lack effective technology support for collaborative knowledge working. The WWW can serve both as communication platform for such communities and as gateway for document-centred work in such digital libraries and archives.

The EU-funded project "COLLATE – Collaboratory for Annotation, Indexing and Retrieval of Digitized Historical Archive Material" (http://www.collate.de) started in fall 2000 (IST-1999-20882) and runs for three years. An international team of content providers, domain experts and technology providers works together to develop a new type of collaboratory in the domain of cultural heritage. The implemented system offers access to a digital repository of historic text archive material documenting film censorship practices for several thousands of European films from the 20ies and 30ies. For a subset of significant films it provides enriched context documentation including selected press articles, film advertising material, digitised photos and some film fragments. Major film archives from Germany, Austria and the Czech Republic provide the sources and work as pilot users with the COLLATE system.

COLLATE’s Goals and Approach

Designed as a content and context based knowledge working environment for distributed user groups, the COLLATE system supports both individual work and collaboration of domain experts who are analysing, evaluating, indexing and annotating the material in the data repository.

The system provides appropriate task-based interfaces for indexing/annotation and collaborative activities such as preparing a joint multimedia publication or assembling and creating material for a (virtual) exhibition, contributing unpublished parts of the film experts’ work in the form of extended annotations and commentaries. Appropriate knowledge management tools, e.g. indexing aids and domain-specific controlled vocabularies for manual indexing, have been developed jointly by the system developers and the film domain experts, thus exploiting the benefits of a participatory design. Automatic indexing of textual and pictorial parts of a document are invoked to receive suggestions for index terms from the system (cf. Hollfelder et al. 2000). In addition, the users can rely on the support from automatic layout analysis for scanned documents (cf. Semeraro et al. 2001), which allows them to annotate individual segments. Annotations are a central concept in COLLATE. As a multifunctional means of in-depth analysis annotations can be made individually but also collaboratively, for example, in the form of annotation of annotations, collaborative evaluation and comparison of documents (cf. Keiper et al. 2001).

The system exploits the user-generated metadata and annotations by advanced XML-based content management and retrieval methods (cf. Brocks et al. 2001). The final version of the online collaboratory will integrate cutting-edge document pre-processing and management facilities, e.g., XML-based document handling and semi-automatic segmentation, categorization and indexing of digitised text documents and pictorial material. Combining results from the manual and automatic indexing procedures, elaborate content and context-based information retrieval mechanisms can be applied.

COLLATE System Features

Collaboration support in the COLLATE working environment incorporates advanced system functions based on an explicit model of collaborative indexing and annotation (cf. Brocks et al. 2002). Through interrelated free-text annotations users can enter a – direct or indirect – discourse on the interpretation of document passages, e.g. adding information, interpretations, arguments, etc. Possible relations between annotations can be pre-defined or inferred by the system in order to represent the discourse structure. Explicit communication about the interpretation of contents and interrelation of annotations is supported by an in-built discussion forum, and in the final system version by an intelligent dialogue/collaboration manager.

The COLLATE collaboratory is a multi-functional software package realised by cooperating software modules. XML is used as the uniform internal representation language for the documents in the repository and the associated metadata as well as for the implementation of the communication protocol among its system modules, which include:

  • Three document pre-processing modules for digital watermarking of the documents (copyright and integrity watermarks), intelligent, automatic document structure analysis and classification and automatic, concept-based picture indexing and retrieval.
  • A distributed multimedia data repository comprising digitised text material, pictorial material like photos and posters and digital video fragments.
  • Tools for the representation and management of the metadata, such as an XML-based content manager (XLMCM) which incorporates an ontology manager and a retrieval engine.
  • A collaborative task manager for complex individual and collaborative tasks, such as indexing, annotation, comparison, interlinking and information retrieval, including tools for online communication and collaborative discourse between the domain experts and other system users.
  • The Web-based user interface of COLLATE comprises several workspaces for different tasks performed by distributed user groups allowing for different access rights and offered interface functions. The final interface version will be generated semi-automatically by exploiting knowledge from the underlying task model and the user-specific dialogue history.

Conclusion

The COLLATE system represents a new type of collaboratory supporting content and concept based work with digital document sources. Innovative task-based interfaces support professional domain experts in their individual and collaborative scholarly work, i.e. analysing, interpreting, indexing and annotating the sources. The hereby provided metadata are managed by an advanced XML-based content manager and an intelligent content and context based retrieval system.

References

Brocks, Holger; Thiel, Ulrich; Stein, Adelheit & Dirsch-Weigand, Andrea (2001).
Customizable Retrieval Functions Based on User Tasks in the Cultural Heritage Domain. In: Constantopoulos, P. & Sølvberg, I.T. (Eds.) Research and Advanced Technology for Digital Libraries. Proceedings of the 5th European Conference, ECDL 2001. Berlin: Springer, 2001, pp. 37-48.

Brocks, Holger; Stein, Adelheit; Frommholz, Ingo; Thiel, Ulrich & Dirsch-Weigand, Andrea (2002).
How to Incorporate Collaborative Discourse in Cultural Digital Libraries. In: ECAI2002 Workshop "Semantic Authoring, Annotation & Knowledge Markup" (SAAKM '02) at the 15th European Conference on Artificial Intelligence (ECAI '02), 21-26 July 2002, Lyon, France.

Hollfelder, Sylvia, Everts, André, Thiel, Ulrich. (2000).
Designing for Semantic Access: A Video Browsing System. Multimedia Tools and Applications, 11 (3), 2000: 281-293.

Keiper, Jürgen; Brocks, Holger; Dirsch-Weigand, Andrea; Stein, Adelheit & Thiel, Ulrich (2001).
COLLATE – A Web-Based Collaboratory for Content-Based Access to and Work with Digitized Cultural Material. In: Bearman, D. & Garzotti, F. (Eds.), Proceedings of the International Cultural Heritage Informatics Meeting (ICHIM '01). Milano: Politecnico di Milano, 2001, pp. 495-511.

Semeraro, Giovanni; Ferilli, Stefano; Fanizzi, Nicola & Floriana Esposito (2001).
Document Classification and Interpretation through the Inference of Logic-Based Models. In: Constantopoulos, P. & Sølvberg, I.T. (Eds.) Research and Advanced Technology for Digital Libraries. Proceedings of the 5th European Conference, ECDL 2001. Berlin: Springer, 2001, pp. 59-70.

79

Lindsay, Kathryn. Oxford University
Warwick, Claire. SLAIS University College London

Conflict resolution and The Battle of Maldon: A Practical Study Of The Combination Of Hypertext, Usability And Literary Theories.

The proposed paper investigates the implications and reality behind the convergence of information science, literary theory and hypertext. Advocates of new technologies argue that the presentation of literary texts via the medium of hypertext has led to a major shift in these activities, taking the processes of the reader much closer to those of the writer. This, they believe, creates a more dynamic, active learning medium for the scholar of the literary text. This notion has led literary theorists to hallmark hypertext as a practical embodiment of textual theory. They have drawn parallels between the hypertext edition and the theories of Roland Barthes, Jacques Derrida and Julia Kristeva which proposed "readerly", "decentred" and "associative" texts. 

The project was undertaken in collaboration between Sheffield University Department of Information Studies, and Oxford University Humanities Computing Unit and Faculty of English. The methodology for the project consisted of the study of a number of hypertext editions from two theoretical perspectives - textual theory and usability theory. From this a set of guidelines as to what extent literary theory can be incorporated within hypertext in a usable manner was developed. These were then put into practice through the design and development of a web based hypertext course pack to support students studying Old English at the faculty of English in Oxford. The inclusion of an online edition of the epic poem "The Battle of Maldon" created an exciting and challenging opportunity to disseminate the research findings into practice. The site was then tested using a series of usability tests to find if it was an effective model for studying literature.

This combination of methodologies represents important original work. Scholars of English and of hypertext have worked on literary theory and its application to the hypertext medium. Information scientists have performed research into usability and human computer interaction. However the application of the two methodologies to a practical study in the humanities represents an important innovation in both research areas.

The findings of the research indicated that whilst hypertext editions may naturally adhere to certain aspects of textual theory, literary theory and technology do not converge on the computer screen in a way that is beneficial to reader of literature Hypertext theorists Landow, Bolter and Moulthrop have produced hypertext systems that have attempted to practically embody textual theory through the use of advanced linking systems, authoring software and non-hierarchical structures. However, when presented to the end user they appear confusing and complicated ignoring the basic tenant of any system design – "usability". As a result of further study of usability theory (e.g. Faulkner 2000, Morkes and Neilsen 1997, Kostelnick 1989), and design heuristics (e.g. Shneiderman 1998, Neilsen 1993, Norman 1988), it became clear that ultimately these systems serve readers poorly and in the long term fail as a vehicle for learning.

From this research it becomes evident that a new area of theoretical investigation needs to emerge to address the gulf that exists between theory and practice. This paper lays the foundation blocks for that investigation, advocating a new partner in the convergence, that of usability. If simple usability guidelines and considerations can be used to influence the extent to which literary theory should be incorporated within hypertext editions then this can be developed to form a highly effective means of studying literature, traditional and current theory working together in ways that can bring about enhanced ways of learning.

Relevance to DRH themes:

Other social sciences where these overlap significantly with the humanities:

This study aims to address the problems that often arise in interdisciplinary work such as combining empirical methods from the domain of information science with humanities computing. It seeks to investigate how these two practical areas, concerned with the provision and evaluation of information may be combined with theoretical methodologies in English literature. 

Information analysis, design and modelling in humanities research:

The study also confronts the issue of how advocates of hypertext may propel their own theories to the extent that it effaces other issues imperative to the design of hypertext systems such as usability and Human Computer Interaction (HCI) in favour of a completely theoretical agenda (Flanders 1996; para 4). It argues that it is vital not only t propound theories, but to test them using empirical methods of information design and usability that have been developed in Information science.
 

References

Barthes, R. (1977b). "The Death of the Author" Image Music and Text. Heath, S. (trans). London:: Fontana

Bolter, J. D. (1991). Writing Space: The Computer, Hypertext and the History of Writing. London: Lawrence Erlbaum Associate, Inc.

Derrida, J. (1993). Dissemination. Johnson, B. (trans). Continuum International Publishing Group: London.

Kostelnick, C. (1988): "Visual Rhetoric: A Reader Orientated Approach to Graphics and Designs" Educational Technology and Research Development 38 61-74.

Kristeva, J. (1980). Desire in Language: a semiotic approach to literature and art. Trans Gora,T. etal. Roudiex, L. S.(ed). New York: Columbia University Press.

Landow, G. and Hopkins, J. (1992). Hypertext. London: The John Hopkins University Press.

Morkes, J and Nielsen, J. (1998). "Applying Writing Guidelines to Web Pages" [Online] .Available from http://useit.com/papers/webwriting/rewriting.html. (Accessed 31.08.01).

Morkes, J and Nielsen, J. (1997). "Concise, SCANNABLE, and Objective: How to Write for the Web". [Online]. Available from http://www.useit.com/papers/writing.html (Accessed 31.08.01).

Mouthrop, S. (1992). Victory Garden. Watertown: Eastgate Systems, Inc.

Nielsen, J. (1994). Usability Engineering. London: Academic Press, Inc.

Norman, D. (1988) The Psychology of Everyday Things. Basic Books. New York.

Shneiderman, B. (1998) Designing the User Interface (Third Edition). Addison Wesley, Reading. 

81

Ore, Espen. S. National library of Norway
Vasstveit, Oddvar. National Library of Norway
Hesselberg-Wang, Nina. National Library of Norway
Witek, Wodek. National Library of Norway

The Diriks Papers
The Diriks family scrapbooks: Preserving, digitizing and presenting the past

The Norwegian Diriks family scrapbooks, a part of the manuscript collections at the National Library of Norway (NLN), comprise 13 large bound volumes (albums) and three folders of loose material covering a time span of more than a 100 years from the beginning of the 19th century to the beginning of the 20th century. The material was collected and organized by Anna Diriks (1870-1932). The contents of the scrapbooks vary and include letters and notes, newspaper clippings, drawings (including original drawings by artists such as Max Klingeri and Amedeo Modigliani

The Diriks scrapbook albums need to be conserved. The albums are of cheap quality from around 1890 and the pages have a high acid content. The material in the albums is much sought after by researchers in history, art history etc. and for use as illustrations in publications. For these reasons the NLN has decided to:

  • do a physical conservation of the scrap books
  • register the contents (objects) of the scrapbooks in a database
  • digitize the scrapbook contents
  • produce a virtual copy of the original albums on the web

A pilot project was done in 2000 and a production project based on the experiences from that pilot project will end in March 2002 resulting in one album (1a) conserved, digitized and published on the web. The remaining 12 albums will be conserved and published over the next 6 years depending on the available funding. According to our plans album 1b and 2 will be done in 2002.

This project is inter-departmental at the NLN: the Manuscript department is responsible for registering the contents (objects) of the volumes, the Conservation department does the conservation work and in collaboration with the IT department administers the digitization of the pages and their objects. The IT department is responsible for the database development and for the web version of the scrapbooks.

The publication of the scrapbooks on the web requires manual adjustment of the digitized objects and allows us to automate only some tasks. The object-browsing model (see below) however allows us to place the edited digital objects in the data structure more or less automatically. The database on the other hand automatically serves as a basis for many purposes:

  • it documents all objects
  • it links conservation information with the objects
  • it links digital images with the individual objects
  • it is available for the web version, and so allows for searching and other kinds of retrieval than virtual browsing through album pages

What objects are there?

Page from the Diriks scrapbook 1a 6 recto

Figure 1, from Diriks album 1a, 6 recto

On a given album page there may be text and drawings directly on the page. In addition there usually is one or more items adhered to the page. Such an item may itself be of more than one page (for instance a printed or handwritten booklet, a multi page letter or a set of newspaper clippings) and these items and pages may have other items adhered to them.

In the pilot project in 2000 a data model of the albums was set up showing a hierarchy:

  • Scrapbook album
  • Album pages
  • Objects on album pages

The work done in 2001 has shown that this model was too simple or maybe too confining. The 2000 work did not include conservation information in the database model, and the example volume (album 5) did not have drawings and text directly on the album pages in such a way that it was felt necessary to keep them intact or as part of the data referenced from the database, except for on an abstract top level. The database was in 2001 extended to allow for conservation information. The data model was changed to allow us to treat albums and album pages as objects in the same way as the objects adhered to the pages. This made it necessary to introduce a notion of "is part of" and "holds the following objects" - something which in any case would have been needed also for the original and simpler data model once registration of objects further down in the hierarchy than at the page level started.

Conservation, preservation and digitization

Before the conservation work the albums and their pages undergo a first level of photographing /digitization. These images are linked with the album objects in the database and are used as a tool during conservation. The images also document the original album pages and are used as a basis for building the web-version, a virtual preservation of the albums. When the objects on the pages (and the pages themselves) have been through the conservation process they are digitized on an individual level. These images are also linked with the database and in various resolutions they are used as browsing objects in the web version of the albums. The high resolution images are also intended for researchers and for publication purposes. Ideally they can be used as replacements for the originals thus reducing the handling of them and so hopefully aid the preservation of the conserved objects.

The web version

For the web presentation of the scrapbooks a restricted view of the database is available. This of course includes the registered names, places and dates as well as the listed object types. There are also additional introductory texts. Gateways to the data in the form of a time line and possibly a map are considered.

One of the important aims for the project however is to give a digital presentation of the albums such as they were before treatment since the organization itself gives information about the objects and their relationships. The albums and the organization of the collection are also objects of cultural historical value by themselves.

Since a given page may hold one or more objects which may in themselves be multi page and hold other objects it is necessary that the reader should be able to browse in two dimensions. We have designed this browsing model:

Model of the web version of the Diriks scrapbooks

Figure 2, the model for browsing a virtual Diriks scrapbook

and this is realized in this example:

Pages 4 verso and 5 recto, Diriks scrapbook 1a

Figure 3, Pages 4 verso and 5 recto, Diriks scrapbook 1a

Page 5 recto, Diriks scrapbook 1a

Figure 4, Page 5 recto, Diriks scrapbook 1a

Detail from page 5 recto, Diriks scrapbook 1a

Figure 5, Detail from page 5 recto, Diriks scrapbook 1a

How web versions of multilevel multi page documents should at best be presented for browsing is not yet known. A quick look through for instance the documents available in the Digital Library at <a href="http://www.bl.uk/index,html"British Library</a> show that the various projects presented there all seem to use different browsing metaphors. In the Diriks project we use a browsing metaphor close to how a reader would browse the original albums. At the same time we give the users the benefit of computer based tools such as for instance a database.

Results and spin offs

The NLN plans to follow the procedures described above for all the scrapbooks and the folders of loose objects in the Diriks collection. In addition this model will be used for other kinds of material. One important feature which is missing here is the searchable full text of all textual objects. These data can however easily be linked into the system once there is funding available for the transcription and encoding work that would be necessary.

At the NLN there has until now been no database system for conservation data for the use of the conservation department. Instead paper forms have been used and stored. The conservation part of the database developed for the Diriks project is organized so that it can be lifted out of this database and be established as a separate database for general use by the conservation department. This database has links to the existing registration system at the NLN, both at manuscript/book level and at the level of the individual objects wherever appropriate.

86

Mowat, Elaine. Tertiary Education Officer, SCRAN

Involve Me And I’ll Understand: Using Digital Resources To Improve Learning And Teaching In The Humanities.

With millions of pounds of public money being spent on digitisation projects, it seems reasonable to assume that such initiatives should yield significant educational benefits. Certainly students gain from being able to access rare or delicate materials and educators have a rich resource to help them illustrate their lectures.

But the potential of digital resources to support learning and teaching goes much further than the delivery of content. Learner activity is at the heart of learning; as recognised by the ancient Chinese proverb:

Tell me and I’ll forget
Show me and I may remember
Involve me and I’ll understand

and articulated by the contemporary educational theory of social constructivism. At the core of constructivism is the idea that learning takes place not just by exposure to new information but through a process of learners building their own understandings as a result of their interactions in the world. 

With long lectures and tutor-dominated seminars now hard to justify in the age of abundant information, the expectation is on the educator to facilitate students’ learning by designing engaging and challenging experiences which will allow them to transform new content into personal meaning. Dialogue, problem solving, and collaborative research tasks are all examples of the kind of interactivity that is on the agenda.

The importance of interactivity and involvement in learning will be the starting point for this paper. Drawing on some of the most interesting and influential thinking in education, it will seek to establish some basic principles about how learning takes place and how best to support it. It will then go on to examine the potential role of digital resources in this process.

Discussion will be illustrated with examples of the ways in which resources from SCRAN can be used in the Humanities. SCRAN is one of the most exciting and extensive digital resource bases for education. Over one million high quality objects – images, sound files, video clips and text records – are available from its website at www.scran.ac.uk. Educational users can download these resources, interact with them and re-use them. The resources on SCRAN have been gathered from a wide range of cultural organisations, such as museums, galleries, media organisations and archives and cover a wide range of subject areas. Collections include artworks from Bridgeman Art Library and and sculptures from the medieval city of Ife in Nigeria, illuminated manuscripts from the Book of Deer and bookbindings by Talwin Morris, objects from the V&A and treasures from the Museum of Antiquities, as well as early photography, mini-movies, old m

By making its resources available as individual objects rather than pre-articulated packages, SCRAN allows lecturers and learners maximum flexibility in engaging with the materials. SCRAN also provides a variety of supports to help its users select, organise, manipulate, and repurpose the data for their own use. These supports range from how-to guides, search tips, and regularly posted teaching ideas on the website, to hands-on training sessions and specialised software tools to aid the creation of individualised learning materials.

In a major new initiative, SCRAN is now working with lecturers and learning resource staff across the country to develop creative and innovative exemplars of digital design for learning and teaching. The paper will discuss some of the ideas being developed under this scheme in order to share with delegates practical advice and inspirational ideas for making the most of digital content. From warm-up discussions around print-outs in seminars to sophisticated web-based multimedia tutorials, there are many possibilities for using digital resources to enhance learning and teaching in the Humanities; or in other words to involve students and help them understand.

87

Miles, Adrian. InterMedia University of Bergen and RMIT

Searching for Contexts: Cinema and Computing 

Digital resources in humanities computing largely concentrate on critical encoding of content for access and archiving. Such projects often produce applied critical outcomes however, it appears that such work is constrained in its applicability to other humanities communities by either a concentration on textual artefacts, or on its emphasis on the quantitative analysis of data. This paper documents two recent film based digital resource projects that offer low scale 'tactical' encoding with applied research and learning outcomes, and models ways in which such 'middle level' resources may be relevant to a broader conception of computing in the humanities.

The SMAFE project (SMIL Meta Analysis Film Engine) is an initiative that explores the viability of broadband networked film analysis systems for the authoring and delivery of screen studies content. In addition, novel applied research and learning methodologies are being investigated within the development of the film engine and its various iterations. Each project relies upon SMIL, custom written CGI scripts utilising PERL and mySQL, and utilises a QuickTime Streaming Server. 

The first project utilises the Odessa Steps sequence from Sergei Eisenstein's _Battleship Potemkin_ (1925), one of the most famous sequences in cinema history. The sequence was digitised and then each shot encoded utilising a series of metadata categories relating to film elements such as shot scale, camera direction, screen direction, and composition. A database provides access to this metadata so that queries can be made for any encoded criteria, and search results list the series of shots that meet the criteria. Users can then view each shot individually, view the shot in context (we automatically roll back 10 seconds before the shot and roll forward 10 seconds after), or elect to view all the matching shots in sequence.

Eisenstein has written extensively, and in considerable detail, about the visual, thematic, and intellectual patterns of montage that he utilised in sequences such as the Odessa Steps, and the metadata utlised for encoding reflect this. As a result the system provides a valuable research resource when used in conjunction with other available material, including the sequence in its entirety, the film, and Eisenstein's writing, as well as secondary commentaries on the sequence, Eisenstein, and other canonical essays on film montage. 

However, as a theorist such as Eisenstein argued, cinematic shots only gain their particular significance because of the series they are placed within - it is montage (for Eisenstein) that generates meaning. As a result this project is limited in many ways as it is little more than an quantitative analysis engine. It is valuable when utilised in larger research contexts and it does demonstrate the viability of the engine (keeping in mind that other film content could as easily be encoded utlising other metadata schemas), but its content remains fixed and in some ways intractable - you search for close ups with left to right movement, and it finds them and lets you view them. What the shots mean individually is considerably less than what they mean in their various sequences.

The second project, "Searching" is based on John Ford's 1956 western _The Searchers_ and rather than encoding the film around a particular set of cinematic metadata the project begins from a hermeneutic claim, that "doorways in _The Searchers_ represent liminal zones between spaces that are qualities." Doors, as they appear in the film, are encoded around a small data set (camera is inside, outside, or between, and is looking inside, outside, or between) and still images from the film are provided. Here a search by a user yields all the stills that meet the search criteria, and clicking on any still loads the appropriate sequence from the film for viewing in its cinematic context. 

Unlike the Odessa Steps engine this project becomes a much more open analytical system. The sets of images that are the result of any query require significant interpretation by the user by contextualising them within the broad claims about doors and liminality. To do this requires some familiarity with the film, and leaves open the problem of what sorts of qualities the spaces demarcated by the doors represent. In other words the engine sets up the possiblity for exploring the hermeneutic claim, and whether or not it is answered by the evidence provided by the engine is determined in the interpretive work required of the user, not the system - the system simply provides the possibility for exploring a thesis.

For example, a search based on the camera being inside and looking inside reveals that virtually every such shot in the film involves one of two women, both of whom are very significant characters. This pattern, which is surprisingly poetic and consistent, is clearly significant but only becomes easily recognisable as a pattern because of the manner in which the engine is designed. What it means however, is not answered by the engine, it productively generates new questions or hermeneutic claims.

This makes "Searching" a 'discovery engine' so what is developed in the engine is the process for the unveiling of patterns of meaning, and this would seem to offer an engaged middle ground for distributed humanities content that provides access and critical activity.

These projects are not large scale archival or critical encoding projects but are a tactical appropriation of available media and computing resources to achieve middle level research and learning outcomes. They are desktop based and delivered, using existing proprietary, open source, and W3C standards, and because of their small scale nature do not require significant investments in design, building, or maintenance. They are creative and critical interventions into the domain of an engaged new media computing practice, and their scalability and success is not in breadth of content but in the processes of engagement that they offer. This engagement is distributed as it is feasible, and relatively simple, to produce numerous such engines - for example students could encode _The Searchers_ around other sets of critical claims within the same engine and an emergent 'hermeneutic engine' would rapidly evolve. It is such 'middle level' solutions that enable a practice based research culture to complement the p

92

Menon, Elizabeth. Purdue University

Communicating Vessels: Digital Semiotics and Web Installation Art.

Installation art fundamentally changed the nature of artistic practice, the role of the spectator and exhibition strategies. It helped define the postmodern from the modern through its compromising of boundaries between the visual arts and other cultural forms including performance, architecture and video. Installation has now become a preeminent postmodern practice and has changed substantively since its inception. Is Web Installation Art now the site of post-postmodern artistic production? This presentation considers new media exhibitions presented on-line that address the impact of technology on the practice of art, communication and the representation of the body. Artists who use web installation challenge the boundaries of art as a communicative medium. At the same time their production questions the nature of installation art in relationship to its past achievements — the use of ephemeral materials to provide institutional critiques and a resistance to commodity status.

In order to interpret these web installations, a "digital semiotics" is proposed that considers how meaning occurs in digital media (including digital photos, digital prints and web art) through its social, discursive creation and its paradoxically finite and individual status. The methodological basis for this investigation is adapted from Ferdinand Saussure—signifier (the sound/image, the "material" with which meaning is made), signified (the "concept" which is produced by the brain) and the sign (the combination of both signifier and signified and socially/individually produced meaning). Binaries identified by Saussure’s system (arbitrary/linear, immutable/mutable, diachronic/synchronic) express the relationship between the creator and consumer of meaning and can similarly be adapted to digital media. Extension of "digital semiotics" involves application of Foucault's definitions of author and author-function, Derrida's " "transcendental signifier" and Barthes' declaration of the "death of the author

93

Schaffner, Paul. University of Michigan

Keying What You See In Early English Books

Two years ago at DRH 2000, at the very inception of the EEBO text conversion project, I outlined what at that stage seemed to be the peculiar challenges of the EEBO project, as well as the strategies that we proposed to cope with them as best we could. 
The present paper will be a progress report of sorts: a reassessment, two years into the project and a year and a half into real production, of the project's working assumptions and the strategies that were adopted in response to this project's unusual character.

The Michigan way of producing electronic texts has become increasingly well defined in recent years, and increasingly oriented toward bulk production and bulk distribution under generic systems. It seems to us the approach best suited to a production unit based in a library, since it is oriented toward producing standardized digital libraries, not individual works, however superlative and seductive they may be.
It is this approach that we have applied to EEBO, and may be summed up as involving: (1) the application of simple and strict data capture standards; (2) a determination to distinguish only those features that are essential to retrieval, navigation, or intelligible display (and that can be described by means of typographic and other predictable visual cues); and (3) division of labor: automate what can be; outsource what cannot be; and reserve interpretation and quality assurance for specialist in house staff. In the case of EEBO I would add: (4) planning for mistakes, the "principle of graceful degradation"-- that is, trying to ensure that even when features fail to be recognized, they are nevertheless captured in a way that allows for useful retrieval.

EEBO has put all this to the test: it is a large project now (ca. 1200 books processed by conference time) and potentially a huge one; its material is as varied in format, language, type, and convention as the inventiveness of 16th- and 17th-century authors and printers could supply; and its underlying physical form (1-bit digital scans of microfilm copies of often poorly printed and preserved early books) makes legibility a constant problem. Finally, the electronic texts produced by the project have been produced, at least in part, for scholars; i.e., for an audience not necessarily forgiving of imprecision, inaccuracy, or inconsistency.

Some of the results have been predictable: for example, the scope, scale, and quality standards of the project are under constant pressure and remain subject to revision.

Other results, however, rather fly in the face of conventional wisdom about data capture, as well as rendering some of our working assumptions dubious, particularly and most interestingly as regards the role of interpretation. Interpretation, it appears, whether at the character and word level or at the structural level, is not something that should or even can be eschewed, assigned, or rigidly controlled. Instead, it is something that has to be allowed for at every point where a human mind encounters the text, whether it be that of a keyer in India typing in ecclesiastical Latin or a specialist reviewer in Oxford correcting the tagging. 

Practical large-scale data capture and the bulk conversion of books to digital text may seem odd areas on which philosophies of reading should impinge, but impinge they do. The working philosophy of most such conversion projects--of all that aspire to the production of large amounts of text via the efforts of non-specialist keyers and coders--is often summed up as "key what you see." Such a philosophy, at its extreme, regards the human keyers as little more than animate OCR software, trained to recognize certain shapes as distinct glyphs and to record them in specified ways, wholly without interest in or comprehension of the text. E-text project designs, including ours, arising from this view tend to minimize or eliminate the role of interpretation, or even to dismiss its relevance altogether.

In designing the EEBO project and creating its specifications, we had to confront this issue in acute form. The variety of format and typeface makes it difficult to provide an inventory of visual cues in advance, even while the wealth of ambiguous and irregular abbreviations, forms, and layouts makes heavy demands on the interpretive skills of the reader. On the other hand, efficient production, as well as the texts' high degree of variability with regard to intrinsic intelligibility, demand a design that assumes no interpretive ability on the part of those doing the data capture.

Neither in the original design, nor in the past year of reviewing the results, have we arrived at an easy or comprehensive way of resolving this tension. We have, however, learned to build on a somewhat more nuanced view of how non-specialist keyers and coders actually engage with the text that they are converting--not necessarily very different from the engagement enjoyed by specialists. There certainly remains some truth to the prevailing view: one cannot rely on interpretation and therefore should be very careful about writing specifications that require it. In some circumstances, indeed, the readers are to be discouraged (if possible) from attempting to understand what they are reading. On the other hand, we have come more and more to realize the frequent necessity of interpretation; the inevitability of interpretation (and even engagement), no matter how disengaged the keyers and coders are trained to be; the need more commonly to guide, influence, or provide an outlet for interpretation than to di

Designing a project like EEBO is in effect to teach hundreds or thousands of people how to read in a specially limited way: to make the best use possible of their interpretive abilities, while minimizing the effects of their interpretive inabilities. This paper will discuss the mixed success of our attempts hitherto.

100

Little, David. Wellcome Trust

Sharing History Of Medicine Metadata Using The OAI Protocol 

The principal aim of this paper would be to examine and evaluate an approach to sharing metadata relating to historical Internet resources between a selection of Resource Discovery Network (RDN) hubs aimed at different communities. It will be based upon the experiences of the MedHist history of medicine gateway, developed by the Wellcome Library, which aims to make its records available either in full or as a subset to the humanities and social science gateways, using the Open Archives Initiative (OAI) protocol for metadata exposure and harvesting.

The Wellcome Library for the History and Understanding of Medicine, in cooperation with the Biome hub is currently developing a gateway to Internet resources in the history of medicine. The MedHist gateway, as it will be known, will be launched in July 2002 and will provide access to assessed high-quality Internet resources relating to the history of medicine. MedHist is aimed primarily, but not exclusively, at students and academics working within the further and higher education sectors with an interest in the history of medicine. MedHist's records will be available via its own Website, the Biome Website < http://biome.ac.uk> and via the RDN central Resource Finder database < http://www.rdn.ac.uk> , and via other RDN hubs that have expressed an interest in having access to them, namely the Humbul humanities hub and Sosig social sciences

As the history of medicine is a broad and interdisciplinary subject that does not fit comfortably within any existing academic discipline, resources relevant to the history of medicine are also of use to specialists in other areas, including humanities scholars with an interest in the history of science, or in social and cultural history. For this reason, the sharing of MedHist's resources with other gateways is very important. This paper will outline the communities with a potential interest in the history of medicine in order to illustrate this point.

The essentially broad nature of the history of medicine raised an important issue for the MedHist team at the Wellcome Trust; namely where the gateway would "fit in" with similar resource discovery services within the Resource Discovery Network. It was decided that instead of being a completely independent service operating outside of the RDN, in order for it to reach a larger audience, the MedHist gateway should be associated with an existing RDN service, and that MedHist records should also be made available to other gateways wishing to have access to them using a suitable form of resource sharing technology. Reasons for situating the gateway within the Biome hub will be briefly discussed.

This paper will address the technical and intellectual approaches to metadata sharing and re-presentation undertaken by those involved in the project. MedHist records will be made available to other gateways via the OAI protocol for metadata harvesting; the protocol currently used by the RDN to gather the records from individual gateways into its central Resource Finder repository. An overview of the technical processes involved will be provided and there will be a discussion of the benefits of employing OAI in this particular context over other approaches to metadata sharing, such as live cross-searching. This approach to metadata sharing will be examined in relation to current developments and strategic thinking in this area within the RDN and beyond.

Also covered will be an examination of the work undertaken by the participant hubs in exporting and importing the records, and an evaluation of the success of the procedure. The differences in the procedures employed for Sosig and Humbul will be considered. In addition, the ways in which these hubs choose to to re-present the MedHist metadata for their constituent communities and the associated intellectual efforts involved will be examined.

As boundaries between the humanities and other disciplines become more fluid, resource sharing becomes a more important issue. Although based on the experiences of a particular project, it is hoped that the points raised and discussed in the paper will be relevant to those working in the digital humanities with an interest in tools for facilitating interdisciplinary research, and to those within the community interested or involved with providing access to digital resources, resource discovery and metadata sharing.

103

 Tierney, James. University of Missouri-St. Louis

British Periodicals, 1600-1800: An Electronic Index

This presentation would consist of two parts: 1) an introductory account of the history, development, and present status of this ongoing attempt to produce an electronic subject index to pre-1800 British periodicals; and 2) a demonstration of the CD-ROM disk, containing the 69 periodical indexes currently entered into the database.

Just as modern periodicals reflect the entire scope of twentieth-century culture, so do pre-1800 British periodicals record the spectrum of seventeenth and eighteenth-century culture. These early periodicals are gold mines of information relevant to that society's politics, religion, philosophy, the early progress of medicine and science, business and economic theory and practice, legal history, the state of education, social customs and practices, pastimes and entertainments, literature and the arts, etc. etc.

Yet, this immense body of information--hundreds of thousands of pages worth--remains an enormous, uncharted terrain. No tool exists to guide users through its contents. Existing catalogues like the ESTC can direct us to libraries holding copies of the periodicals, but they can't tell us what's in them. A few dozen periodicals enjoy hard-copy indexes, but this number pales in view of the 1,000+ periodicals extant from the period. In effect, every student of the period is faced with the distressing fact that there is no single, comprehensive index to the contents of these periodicals. Any scholar attempting to discover references to his/her particular interest in these periodicals must wade through the texts of the originals (or microfilm copies), page-by-page, a tedious task with regularly frustrating results because of the sheer bulk of material. It's no surprise, then, that scholars of early modern British cultural history would warmly welcome an index to the contents of the age's periodicals.

In the early 1930s, the eminent scholar/collector James M. Osborn, of Yale (whose rare book and manuscript collection would later become a major wing of the Beinecke Library at Yale) set out to remedy this regrettable situation. He hired a handful of young British scholars to index periodicals both in the British Library and in the Bodleian Library at Oxford, the two largest UK repositories of pre-1800 British periodicals. Osborn selected the list of titles to be indexed, instructed his indexers in the process, supplied preformatted 4" X 6" cards for recording data, and the project was launched.

The process required indexers to read every page of each periodical and, for each subject on a page, to fill out a preformatted card. In effect, Osborn's index was a true subject index, not merely a listing of the periodicals' tables of contents. A completed card recorded data for seven fields: subject, author, reprint data, title of periodical, date, page reference, and the library location of the copy indexed.

When World War II broke out and the project was perforce concluded, the indexers had produced indexes for well over two hundred periodicals. In the process of shipping the cards to Yale, however, many of the indexes went to the bottom of the Atlantic, the victims of German torpedoes. For almost forty years, the remnants of Osborn's dream lay untouched in the basement of the Beinecke Library. In the late 1970s, Osborn offered the entire collection to this writer, and, shortly after Osborn's death within the next year, the 80,000 index cards were transported to St. Louis, where, ever since, the collection has been lodged in heavy metal, file-card cabinets in my home office.

For more than twenty years, as time and funding have allowed, I've been turning these cards into a kind of periodical index Jim Osborn would never have imagined. An initial inventory showed that the collection contained indexes to the full runs of 136 periodicals, indexes to another twenty periodicals that needed only minor additional indexing, and fragmentary indexes to another forty-nine titles. Although, from the outset, my intention had been to supplement Osborn's indexes by indexing all extant British periodicals from the age, practicality required that the first stage of the project be limited to the 156 complete and near-complete indexes.

At that point, two other phenomena making an impact on the academic world prompted major enhancements to both the content and the mode of future publication of the index. Digitizing the index in electronic form and delivering the data on individual desktops had become a real possibility. Although the workload would be considerably increased by making the transition to electronic mode, an affirmative decision was inevitable.

Second, a new scholarly interest in publishing history had begun to give rise to the founding of new professional societies, journals, and academic programs concerned with every facet of publishing history. To meet these scholarly demands, Osborn's simple, seven-field records were elaborated into more comprehensive records that accounted for such additional data as the names of editors, the names and addresses of periodical publishers and printers, and bibliographical accounts of periodicals, including their days and frequency of publication and their prices.

Of less dramatic but of equal importance has been the development of an elaborate processing system to verify the accuracy and thoroughness of Osborn's data. First, each Osborn card is reviewed to certify the spellings of personal names, places, events, etc., as well as to establish the accuracy of dates where applicable. Next, the cards for each individual periodical index are collated with the original text of that periodical (usually on microfilm). Where necessary, adjustments or additions are made, and issue numbers, dates, and pagination are verified. Also in this process, much of the data for those fields added to Osborn (see above) are retrieved from the texts of the periodicals.

The project was begun on the University of Missouri main frame computer in the early 1980s, was ported to a PC about 1990, and then passed through a succession of software database programs until 1997 when it was lodged in Microsoft's ACCESS. In 1999, a user friendly interface for searching the database was added, and the entire database (indexes to 69 periodicals in various stages of completion), together with a run-time version of ACCESS, was written to a CD-ROM disk. Since then, the CD-ROM has been demonstrated at six international and national professional meetings, and in each case has been enthusiastically received. The project enjoys the guidance of an advisory board comprised of thirteen distinguished scholars from eminent British and American universities and libraries. At present, through the support of a $30,000 grant from the Delmas Foundation (New York), the database is being heavily revised and will appear in its second version within the month.

 

105

McKnight, David. McGill University Libraries

Interactive and Interdisciplinary: Knowledge Management


Since 1997, the Digital Collections Program, McGill University Libraries has produced twenty large and small scale scholarly digital collections of varying degrees of complexity. For the most part, the projects are based upon on the Library's unique rare collections ranging from architectural archives, literary and historical manuscripts, to photographic and print collections. Central to the success of the McGill projects has been the matching of a project curator whose knowledge of a collection is mapped to the editorial and technical role the Digital Collections Program plays during the course of producing a digital project.

The production of these digital collections from the moment of content selection to the final click of the mouse on the day of the project launch is a complex process drawing upon several concurrent and complimentary models: traditional scholarly monograph editorial practices, knowledge management, multimedia production, software and database design and web page design.

The emerging practice of the McGill University, Digital Collections Program is to insure that its digital collections are set within a scholarly framework to insure that the project meets rigorous intellectual standards and to provide technical solutions that will enable the end-user to interact with the collection in a meaningful way and from a number of perspectives. These range from the contextual apparatus created to aid interaction with the collection to the mechanisms designed to access and the retrieve the primary documents. In addition, this combination is designed to meet the information needs of a range of users from students to advanced researchers ranging across a number of disciplines.

For the sake of this paper, I will discuss the production process of three projects created by the Digital Collections Program the projects include: 

The Moshe Safdie Hypermedia Archive (1998)
In Pursuit of Adventure: The Fur Trade in Canada and the North West Company (2000)
The McGill University Napoleon Collection (2002)

Common to each of these projects is an evolving design process and practice which takes into consideration a number of factors that have a direct bearing upon the success of each project from the perspective of a useable and sustainable digital scholarly resource. 

The balance of my paper will focus on the design process which includes a consideration of knowledge management, asset management, editorial factors, information design, database modeling, web design and resource allocation which taken together represent the diverse elements that also include matching human and technical and fiscal resources to a particular project. 

In my paper, I will briefly highlight the importance of each of the following components:

Knowledge Management :

Knowledge management refers to an analysis of the primary documents in relation to the overall objectives of the digital project with particular emphasis upon:

Selection

Organization of materials

Concept Mapping

Currency
Relevance
Audience
Dissemination

Asset Management:

Asset Management is the means by which the project team assesses the technical requirements and standards used in the production of images or texts of primary documents.

Primary Documents (images, texts, manuscripts, etc.)
Ancillary media (supporting images, animations, video, audio files)
Technical Standards
Archiving

Editorial Factors:

Editorial factors relate to the production of the supporting written content for the web site. 

Authorship
Content
Stylistic Guidelines
Accuracy

Information Design:

Information design pertains to the overall choice of the technical requirements and solutions to meet the projects' overall objectives. 

Platforms
Scripts
Standards
Programming Interactivity
Search Engines
Metadata 
Indexing
Maintenance
Updating

Data Modeling

Data modeling is the analyses of the data objects used in the project and by so doing establishing relationships of the parts with the whole. This is the first step in database design and object programming. 

Database design
Data Integrity
Performance

Web Page Design

Web page design is the process of designing the project user interface and its components.

Interface
Interactivity
Graphics 
Typography 
Scripting

Resource Allocation

This element provides the project team with a full understanding of the resources required to complete the project in a timely, cost-effective and with a high degree of quality assurance.

Funding
Hardware
Software
Staff

Conclusion

Unlike the creation of a print scholarly monograph, born digital publications, ideally draw upon the knowledge of the project curator and the editorial, design and technical skills of the Digital Collections Program. With a greater understanding of the need for knowledge management techniques, solid information architecture, data modeling and web design, producers of digital collections can build sustainable, interactive and interdisciplinary digital content for use by a wide range of users including those based locally, as well as, students and scholars from around the world. 

107

Barker, Emma. UK Data Archive
Corti, Louise. UK Data Archive 

Enhancing Access to Qualitative Data: Edwardians On-line

We are currently seeing a new culture emerging in the social sciences that encourages the re-use and secondary analysis of qualitative research data. Over the past eight years, many key datasets, including 'classic studies' dating back to the 1950s, have been acquired and catalogued by Qualidata and a prime aim of the service is to develop ways of increasing access to these resources. Although the number of digitally formatted and coded datasets is steadily increasing, the majority reside in paper and analogue audio formats. Digitization and on-line dissemination have been identified as requirements for the service and there has been progress in the provision of metadata, but to date there is no framework for preserving the content of qualitative datasets in electronic form. To address this problem, Qualidata has undertaken a 6 month pilot study to develop a methodology for increasing access to archived data, which is based upon a large collection of oral history interviews, known as "Family Life and Work Experience Before 1918". The main objective of this project is to produce a comprehensive data model in a format appropriate for interchange that will enable sophisticated on-line searching and information retrieval, and which is potentially applicable to other qualitative datasets.

In this paper we present our initial results, showing how in addition to the structural content of the interviews, the analytical scheme used in the original analysis might be preserved electronically as a device for enhancing searching. We demonstrate how existing XML applications, such as the TEI and DDI, provide appropriate standards for marking up the content of interview data in a format that can be easily delivered on the web, but we also identify some limitations for their use with our data model. We show how a stand-off annotation approach in XML, using a set of TEI inspired DTDs, is a straightforward solution which meets our requirements and which may be readily extended and applied to other datasets. Converting resources to machine-readable form in a fast and efficient manner is still a challenge for resource creators. Therefore, we also describe some tools and techniques that were found to be appropriate for the semi-automated conversion of type written texts. These are shown to have a general utility for future projects, including the conversion of datasets encoded in qualitative software packages to a standard XML format.

108

Lee, Edmund. English Heritage

Research vs Resource: competing or co-operating?

In the ideal world of digital humanities research it would be possible to see a ‘virtuous circle’ in operation as existing digital resources are identified, retrieved and drawn together with new research to create, in turn, new digital resources advancing the understanding and communication of knowledge.

In effect we can visualise a digital equivalent of the longstanding scientific hypothetico-deductive cycle. This starts with hypothesis development, followed by observation and data collection and then analysis of the information, leading on to re-examination of the original hypothesis, and its further elaboration. ‘Paper based’ research procedures, including for example, peer-review, publication in refereed journals, standard approaches to the citation of sources etc. have long been in place to support this traditional approach. These approaches can be greatly enhanced by electronic equivalents of these procedures – online journals, email discussion groups etc. But modern research now produces a raft of digital information artefacts that do not have obvious paper equivalents. Does the research community understand the requirements of resource discovery in this new environment? Digital technologies allow us to publish more than ever before, but are specialist reports contributing to the advancement of online dissemination of knowledge and understanding, or do they gather the digital equivalent of dust on carefully archived and catalogued, but not digitally accessible filestores, CDs and DVDs? Likewise do the systems used for resource discovery support the necessary intellectual rigour required for analysis of results? My proposal is that there is an increasing divide between the approaches of ‘research’ on the one hand and ‘resource discovery’, which needs to be bridged.

This paper draws on examples from the management of data in the ‘historic environment’ sector – the nations rich heritage of archaeological sites and historic buildings. These are based on the author’s experience as Data Standards Supervisor with the English Heritage Data Services Unit, and as Forum Co-ordinator to the Forum on Information Standards in Heritage (FISH).

The paper will identify the strategic need for humanities researchers and those that provide access to their materials to develop a greater awareness of each other’s requirements. This will be necessary to derive the maximum benefit from publicly funded research on the one hand, and present the best available understanding through publicly funded resource discovery networks on the other. To illustrate the problems, the paper will present a morphological divide between these two communities.

With specific reference to U.K. research into the ‘historic environment’ the paper will examine differences between the information management approaches under the following headings:

  1. the Scope of information collection,
  2. Key information technologies employed,
  3. Data Modelling approaches,
  4. Semantic / Taxonomy systems
  5. Support for different query types,
  6. Organisational context
  7. Political imperatives

The paper will then move on to discuss the role of the development of common ‘data content’ or metadata standards, and semantic or terminology standards in bridging this divide. Content and terminology standards will be proposed as a necessary adjunct to the development of purely technical standards for interoperability that are emerging. The later include z39.50 profiles and XML document type descriptions. These emergent technologies provide the ‘medium’ for the movement of the information, but a further level of sector-specific content and terminology standards are required to convey the ‘context’ or meaning in a useful way to the end user.

Again drawing on the U.K. historic environment sector the paper will present current initiatives that are addressing these issues. The paper will present the work of the English Heritage Data Services Unit (DSU) which focuses on these areas. DSU work in hand includes:

The development of effective communication networks both within the heritage sector and with closely related government bodies, such as the Ordnance Survey;

The identification of training needs and presentation of training in information systems for EH staff;

Auditing and advisory services to heritage data managers;

The development and implementation of data standards for the historic environment sector. These include MIDAS – a data content standard for historic monuments, buildings, archaeological sites and related information resources - and INSCRIPTION a framework of controlled terminologies for indexing and retrieval of heritage data. Development of these resources requires input from subject specialists as well as data managers. The role of the forum in developing the necessary network of specialities will be presented, and the importance of involvement of university based researchers will be stressed.

It is hoped that there will be discussion time available to compare the experience of the historic environment research sector with other humanities – all comments welcome!

111

Hartland-Fox, Rebecca. University of Central England

Evaluating Your Digital Information Services

Section 1: Introduction

Traditionally, libraries have been print-based. Increasingly, more services can and are being delivered digitally. Students and academics can retrieve full texts and a host of other services directly at the desktop.

Higher education institutions are taking advantage of new developments for many reasons. These include collaboration and co-operation between other institutions and organisations and improved access to a wide-range of collections and resources (particularly of benefit to distance learners and part-time students). Often electronic access enables better search facilities.

One of the main influences in the uptake of digital information developments has been the funding made available. This has enabled institutions to plan, research and implement specific projects. For example, ELib phases provided made &pound;15 million available over 3 years. "eLib phase 3 has had an important impact on HE libraries by accelerating the uptake of new technologies in a practical, user services oriented way" (ESYS 2001). Overall co-ordination of specific projects has helped to avoid duplication of research, and the disseminated research has informed researchers and practitioners alike. The digital services are increasingly an integral part of institutional networked services.

Section 2: What’s in a name?

The terminology is confusing. The terms "digital library", ("digital" or "electronic") "information services" and "electronic library" are interchangeable. Some theorists also refer to "hybrid libraries" which "aim to integrate new technologies, electronic products and services already in libraries with the traditional functions of a library" (JISC phase 3 2000) 

Realistically, all terms point to the same type of development and it is the incorporation of new developments into the existing and successful library model which the "hybrid library" refers to. This is often the aim of many institutions – using technology to enhance and improve existing services.

In the commercial sector, and increasingly in education, the term "digital asset management" has been used. "First it was document management. The next year it became knowledge management. Now it’s called content management. Next year who knows?" (Doering 2001). Digital asset management essentially refers to the technologies used to locate and retrieve digital information content. Therefore, as the systems used in HE are becoming more sophisticated, the terminology may change to reflect the business sector.

However, what’s in a name? For the purposes of this paper, I seek to address the issue of evaluation in the context of electronic information services in general. These include the one or more online services and electronic resources , like the following examples: short loan collections; exam papers, course materials, theses, student projects and book chapters. These can come in a variety of formats, including text-based, multimedia files and audio visual files.

Digital information services can include any combination of the media cited above, but it is more than a series of hypertext links or electronic journal services.

There will be more information available on current work on digital information services in HE after the responses from a survey which we are conducting next month. However, two projects with whom we are working closely, are UCEEL (at UCE) and UDEL (at University of Derby). These are two useful working examples of an HEFCE-funded project (UCEEL) and a project entirely self-funded by the institution (UDEL). I intend to draw on these projects to provide examples of evaluation techniques and more specifically, to highlight aspects of good practice. In addition, other projects around the U.K. and abroad will inform our research.

Colleagues across the sector are making progress into sharing expertise in digital information for example, of content with the FAIR Programme (Focus on Access to Institutional Assets) and we aim to harness this sharing of good practice across the sector.

Section 3: Evaluation and digital information services

Why evaluate?

The purpose of any evaluation is to establish whether the systems, activities, personnel or other resources are effective. Evaluation is analytical and objective and used for one of three reasons: to justify (for example to show evidence of cost-effectiveness or useful features of the system) , to improve (to highlight areas for improvements and suggest possible means ) or to condemn (for example to show aspects which are not effective or useful) 

In the context of digital information services, evaluation is often conducted to enable improvements. Many systems are relatively new (last decade) and little work has been done on effecting change. The focus in the early stages was on content and making materials available.

Evaluation work being conducted in HE: good practice

EVALUED is an HEFCE funded project based at the University of Central England. It has been set- up to develop a transferable model for e-library evaluation in higher education. The project commenced in December 2001 and will complete in February 2004.

The project aims to contribute to the effective management and deployment of resources in the development of electronic library initiatives in the Higher Education sector. More specifically, we intend to provide resources to support library managers and other senior strategic managers in the higher education sector with evaluation and planning of electronic library initiatives and aim to provide training in evaluation of electronic library developments and awareness raising activities to HE institutions and beyond.

Useful and informative examples of good practice and specific types of research will be indicated by responses from our forthcoming survey. We hope that this will illuminate our research and enable us to focus on useful dissemination of good practice. However, examples of specific research from USA may parallel and even supplement our findings. Some are concerned with aspects of usability – for example using real users to help inform where potential problem areas exist (For example, work on Alexandria Digital Library, Hill et al 1997)

The crucial aspect to evaluation is planning and selecting the appropriate technique. We are currently identifying potential aspects for evaluation and effective measures and techniques to support these. Some examples will be implemented in partnership with UCEEL and UDEL and it is hoped that the results can be outlined. The elements of good practice will be drawn from case studies of projects and services which are currently being undertaken at institutions across the UK. 

Section 4: References
Doering, David 2001
Defining the DAM thing: how digital asset management works
Emedia magazine 14(8) pps 28-33

ESYS (2001)
Summative evaluation of phase 3 of the e-Lib initiative: final report
http://www.jisc.ac.uk/progs/index.html . Accessed 8.5.01

Hill, L, Dolin, Ron, Frew,James , Kemp,R.B; Larsgaard,M;Montello, D.R;Rae,Mary-Anna;Simpson,J. (1997)
User evaluation: summary of the methodologies and results for the alexandria digital library
Proceedings American Society for Information Science 225-243

JISC phase 3 e-Lib 
http://www.jisc.ac.uk/progs/index.html#elib. Accessed 4.2.02

UCEEL website
http://diglib.uce.ac.uk/webgate/dlib/templates/about.asp. Accessed 4.2.02

114

Anderson, Ian. University of Glasgow

Information Seeking Behaviours in the Digital Age: UK Historians and the Search for Primary Sources

This paper presents the results of the first phase of a unique international project: Primarily History: Historians and the Search for Primary Material. This project is exploring how historians locate primary research materials in the digital age and what they are teaching their students about finding research materials. 

To date, there is very little evidence regarding the use and efficacy of archival electronic access tools. Indeed, there has never even been a study of the use and efficacy of finding aids in paper form or how researchers use them. While we do not yet know a great deal about how researchers navigate archives and their resources we know even less about how they search for primary research materials and locate repositories, collections, and finding guides since the advent of the Web and developments such as EAD (Encoded Archival Description SGML DTD (Standard Generalized Mark-up Language Document Type Definition). As those in the cultural heritage community and beyond seek to develop further frameworks for enhancing and integrating access (such as OAIS (Open Archives Initiative System) and METS (Metadata Encoding and Transmission Standard) there is a pressing need to take into account users' information seeking behaviours. If in the future, as Conway suggests, we can or will only preserve what is used in

Although concentrating on historians the project's sampling techniques, methodology and preliminary findings raise questions for all humanists who seek digital or analogue material through online, print or informal methods and are responsible for teaching and mentoring students to do likewise. 

Results are presented from a survey of 800 historians researching and teaching history in UK higher education. Dr. Helen Tibbo of the School of Information and Library Science at the University of North Carolina, Chapel Hill is conducting a parallel survey of historians and in the US. This paper presents results from the UK strand of the study. Historians are just one, and often not the largest, group who use archives and repositories but in many cases they are the most respected. Many archivists and curators see scholars as their most important customers because of their published research and because they are training the next generation of researchers. Thus, historians are the focus of this study but literary, music, theatre, film, television scholars, genealogists, or secondary students might just as easily be, and probably should be in future studies. 

In recent years there have been a limited studies of humanists’ use of technology [6,10,12,13]. Wiberley and Jones [14,15,16] have studied a group of humanists and their information technology use over time and Andersen [1] has looked specifically at how historians use technologies such as websites for their teaching. Some research has studied the information seeking behaviour of humanists, [30] but few studies have explored how historians look for materials [11]. No one has yet to explore how historians look for archival collections since the advent of electronic finding aids (in particular the EAD) [5,8,9]. 

It is an axiomatic that all systems should be built around users rather than making users bend to the systems. However, this is predicated upon knowing a good deal about users; and this can only come from conducting extensive and rigorous user evaluations rather than relying upon anecdotal evidence and gut feelings about clientele. Archivists do not have a great track record in the area of user studies, having gathered little systematic data about users and even less about their information seeking behaviors [2, 7]. There have, however, been calls for more archival use and user studies [3,4]. 

Initial conclusions from this study indicate the need to provide multiple pathways of access to historical research materials that match the diverse strategies that historians adopt. These include the provision of paper-based aids and enriched electronic forms. If providers of electronic and online finding aids can be characterised as 'build it and they will come' then it is evident that some historians have come but been disappointed. In this regard the need for both user studies and user education is evident.

Given the widespread, if variable, use of electronic and online finding aids by historians it is essential that archivists and other information providers understand how users go about locating information prior to spending or allocating resources on electronic finding aids, metadata harvesting, digitisation projects and programmes and digital library designs. 

1. Andersen, D.L. Academic historians, electronic information access technologies, and the World Wide Web: A longitudinal study of factors affecting use and barriers to that use. The Journal of the Association for History and Computing, (June 1998). 
2. Collins, K. Providing subject access to images: A study of user queries. American Archivist 61 1 (Spring 1998). 
3. Conway, P. L. Facts and frameworks: An approach to studying the users of archives. American Archivist 49 (Fall 1986).
4. Dearstyne, B. W. "What is the use of archives? A challenge for the profession. American Archivist 50 (Winter 1987).
5. Kiesling, K. EAD as an archival descriptive standard. American Archivist 60 3 (Summer 1997). 
6. Massey-Burzio, V. The rush to technology: A view from the humanists. Library Trends 47 4 (Spring 1999).
7. Miller. F. Use, appraisal, and research: A case study of social history. American Archivist 49 (Fall 1986).
8. Pitti, D.V. Encoded archival description: The development of an encoding standard for archival finding aids. American Archivist 60 3 (Summer 1997).
9. Pitti, D.V. Encoded archival description. An introduction and overview. D-Lib Magazine 5 11 (November 1999).
10. Tibbo, H.R. Abstracting, Information Retrieval and the Humanities: Providing Access to Historical Literature. (ACRL Publications in Librarianship no. 48), American Library Association, Chicago, 1994.
11. Tibbo, H.R. The EPIC struggle: Subject retrieval from large bibliographic databases. American Archivist 57 (Spring 1994). 
12. Tibbo, H.R. Indexing in the humanities. Journal of the American Society for Information Science 45 (September 1994).
13. Tibbo, H. R. Information systems, services, and technologies for the humanities. In: Williams, M., ed. Annual Review of Information Science and Technology, 26 Elsevier, Amsterdam, The Netherlands, 1991.
14. Wiberley, S. Habits of humanists: Scholarly behavior and new information technologies. Library Hi Tech 9 1 (1991).
15. Wiberley, S. and Jones, W. G. Humanists revisited: A longitudinal look at the adoption of information technology. College and Research Libraries 55 (November 1994).
16. Wiberley, S. and Jones, W. G. Time and technology: A decade-long look at humanists’ use of electronic information technology. College and Research Libraries 61 5 (September 2000).

116

Bowen, David. Audata Ltd.
Hosker, Rachel. Audata Ltd.

Digital Government and Cultural Memory

Government Records are an important resource for research in the humanities. To use them for research, we must get the information from them and then interpret this information. Technology is increasingly important for both obtaining and interpreting these records. The rapid changes of technology have influenced the way we store and access our records. 

An increasing proportion of records are created and stored electronically. Access to some of these records has already been lost because we no longer have the hardware or software to view the records. Technology has moved on too fast for us. If we are to preserve access to the digital records which we are creating in ever increasing numbers we must plan for preservation as a conscious activity. This paper will discuss the human factors and technical requirements for digital preservation. 

Let us look first at some aspects of digital records: how do people use them, and what do people think about them? Digital records are easily exchanged and easily altered. You can send an e-mail to a dozen people with the click of a mouse, and each of them can pass it on to another dozen as easily. These recipients can edit the e-mail and send it on again, with their own changes added. This flexibility leads some people to think of digital records as more disposable and less valuable than paper or other physical records. 

However, the Government of the United Kingdom (led by the office of the E-Envoy) is committed to having all government processes available electronically by 2005. Other governments have similar commitments: the Government of the Netherlands intends to have 25% of its transactions in digital form by the end of 2002. These government initiatives will mean that more and more government records are born-digital. 

Digital records also have legal weight. Both companies and individuals in the UK have been convicted in court on the basis of e-mails introduced into evidence. The European Directive says that the courts and laws cannot discriminate against any contract or signature just because they are in a digital format. Therefore the growing importance of digital records is assured in law. 

Other groups are concerned for our cultural memory. Every year on the 24th of October ~(1024 equals one kilobyte) an information service is held at the Daioh Temple of Rinzai Zen Buddhism in Kyoto, Japan. They believe that information and knowledge are being lost because electronic records are not being preserved. They see this as a cultural issue rather than a technical one. "There are many 'living' documents and software packages that are thoughtlessly discarded or erased." It is this thoughtlessness that has drawn the concern and attention of Head Priest Shokyu Ishiko. 

This thoughtlessness is the key human factor in digital preservation. People who are creating records must become aware that they are creating records; they must learn how to do this reliably People responsible for managing records must learn what is required to preserve digital records. They must be given budgets, staff, and training to achieve the levels of digital preservation necessary to preserve our cultural memory. 

Since many digital records are damaged from the moment they are created, that is the first point of focus for digital preservation. Two examples will show how important it is to educate people in the record-keeping aspects of their work. 

Have you ever received an e-mail which said "my comments are below in red" (or "in bold")? Often the received e-mail is uniform in format, with no coloured type and no formatted type. That is not surprising, because the formatting is not a fixed property of the e-mail; it is a property of the combination of the e-mail and the e-mail viewer. People who don't understand this, are sending e-mails which are not correctly viewed (or understood) by today's recipient, let alone the future user. 

Many people (and companies) have letter templates which insert the date automatically. This was acceptable, when the computer was used as a complicated typewriter, and the record was always printed to paper. However, suppose you send a letter with an automated date as an e-mail attachment. You send the letter with "today's date": 8 September 2002. The person who receives your letter opens it, and it has "today's date" on it: 9 September 2002. They forward it to a colleague, who returns from a conference and opens the e-mail. The letter has "today's date" on it:: 14 September 2002. In another twenty years, your papers are archived, and the letter has "today's date" on it: 25 October 2022. From its inception, this record has been badly compromised: it has not had a reliable date. 

  • Within government questions will also be raised as to which digital records to preserve. These are policy, and personal, questions, with human (not technical) responses: 
  • How will government decide which digital records to preserve?
  • Are these records needed by government itself?
  • Will these records be needed by citizens and why?
  • How will records be appraised and preserved if they serve more than one function and department?
  • How much will this cost?

Technical methods for preserving digital records can be classified as: 

  • Migration
  • Emulation
  • Open standards or 
  • Technology preservation

Practical results from each of these methods will be presented, along with information about their cost and technical complexity. 

We can preserve our born-digital memory, but only if we begin now.

119

Duffy, Celia. Royal Scottish Academy of Music and Drama
Barrett, Stephen. Royal Scottish Academy of Music and Drama
Marshalsay, Karen. Royal Scottish Academy of Music and Drama

Developing HOTBED - An Innovative Resource For Learning

In this paper we will describe the aims and preliminary outcomes of the Royal Scottish Academy of Music and Drama&iexcl;&brvbar;s HOTBED project, which is funded under the JISC&iexcl;&brvbar;s DNER initiative. The purpose of the project is to create a body of networked digitised sound resources, to investigate and build useful software tools for manipulating these audio materials (extending their mode of use beyond simple playback of sound materials) and to evaluate implementation in very specific learning and teaching contexts, the RSAMD&iexcl;&brvbar;s BA in Scottish Music course and courses in Scottish ethnology at our partner institution the University of Edinburgh&iexcl;&brvbar;s Department of Celtic and Scottish Studies. The evaluation will cover a variety of perspectives (take-up, nature and pattern of usage of resources, added value to learning experience, effects on teaching approaches). In particular we need to learn more about the specific needs of performa

A subsidiary goal of the paper is to share with the DRH community some of the particular issues that have arisen over the first phase of work &iexcl;V both positive and negative &iexcl;V which may have a broader relevance beyond the rather narrow field in which HOTBED operates. Indicative issues to be covered are working within and negotiating around the culture and infrastructural realities of a small institution, decisions on metadata creation, determining user needs and project management with a very small, multi-tasking team.

Relationship to the themes of the conference

Time-based media in a performing arts discipline (performing arts being relatively under-represented at DRH conferences);
Application of digital resources to teaching.

Original or innovative methods

On the implementation side, the innovative aspects of the project lie in the performer-centred approach. In addition, the HOTBED context allows students to interact with the resources in a number of different modes. Students and staff in the target user groups have a particularly intense relationship with archival resources: they not only access resources for study and performance purposes, but conduct their own fieldwork, making recordings which they then document for archival purposes. Accordingly, the project will not only digitise existing sound holdings and evaluate their use, but will also introduce digital resource creation and archiving into the curriculum and report on the results.

On the technical side, the inclusion into the HOTBED audio streaming environment of either newly-built or adapted audio tools such as slowing down the track but retaining pitch (useful, for example, for performers to carry out close analysis or for transcription) is among the innovative features of the system.

Value of the work to the broad humanities community 

The particular resources bring digitised by the project are of a specialist nature and of most interest to a set of minority subjects. However, this project covers areas of wide relevance to the DRH and broader arts and humanities community, for example, creation and management of networked sound resources, access to digitised archival material (especially in the area of music and ethnology), specialist user needs in learning and teaching, the role of the smaller institution in the context of distributed resource provision and the role of networked provision in handing on cultural heritage.

We will also report on some obstacles encountered by the project that will be recognizable to the wider community. The nature of the sound materials in HOTBED (especially the archive fieldwork recordings) and the particular requirements of our users have made metadata creation problematic. Decisions on the degree of compromise on metadata description will be a issue familiar to many other digitization initiatives and will be part of the scope of the evaluation on HOTBED. Other familiar challenges are working with a standard scheme like Dublin Core alongside non-standardarised indexing schemes that have evolved for ethnographic fieldwork material.

Relationship to the relevant work in the field 

As far as we are aware, relatively little work has been done on implementing networked sound resources in academic contexts and even less on the effects on pedagogy. Notable exceptions include the pioneering work of the University of Indiana at Bloomington&iexcl;&brvbar;s Variations project and, in the UK, the University of Surrey&iexcl;&brvbar;s e-Lib Patron project. The current project extends this research in that it deals with archival field recordings, compositions and performances of traditional music, rather than commercial recordings of &iexcl;&sect;art&iexcl;&uml; music, and is aimed primarily at performance specialists, rather than the more academic disciplines within music such as musicology or music theory. The largest collection of freely available digitized sound collections of traditional music is served by the Library of Congress under its American Memory scheme; although this collection is extremely useful to our target audience (and was used as an exemplar for User Needs Analysis work on HOTBED), its focus is on improved access to the Library's vast collections. HOTBED's focus is on implementation in specific learning and teaching contexts.

120

Garrod, Peter. University of London Computer Centre

The Schools' Census and "Digital Archaeology"

"Digital archaeology" was used by Ross and Gow as the title of their 1999 study of data recovery problems and techniques. One of the lessons drawn by their report is the dependence of digital resources on supporting technical and contextual information. This information has to be preserved if electronic records are to remain useable over time: in the authors' words, "data without relevant contextual documentation has limited value". They cite the example of the Kaderdatenspeicher, a database of party functionaries of the former German Democratic Republic which survived German re-unification, but without much of the information necessary to interpret the data. The information had to be recovered by drawing on a variety of paper and electronic sources, through the use of specialised software and by consulting former staff of the GDR archives.(1) Although justified by the historical value of the data, the reconstruction process (in the words of another writer) was "painstaking, time-consuming - and exp

"The Schools' Census and 'Digital Archaeology'" will outline how similar techniques have been employed at the UK National Digital Archive of Datasets. NDAD is operated by the University of London Computer Centre on behalf of the Public Record Office, the UK national archives, and preserves electronic data created by UK government departments and agencies. The paper potentially addresses two themes of DRH 2002: "digital libraries, archives and museums" and "other social sciences where these overlap significantly with the humanities". It will show how it is possible - given sufficient time and ingenuity - to recover information about badly documented data and thereby give meaning to data which might otherwise be difficult to interpret.

The Schools' Census is an annual survey of schools in England (formerly also including schools in Wales) which has been conducted by the UK Department for Education and Skills and its predecessors since circa 1945. Schools complete questionnaires in January of each year recording information about their pupils, teachers, classes and courses of study. Data on individual schools survives in digital form from 1975 onwards. To date, 60 datasets covering the years 1975-1989 and 1993 have been deposited in NDAD. The paper will focus on the issues which NDAD encountered when dealing with the earliest datasets: those covering 1975-1979. 

The chief problem was the lack of adequate supporting documentation. The only documentation which the department could supply were data dictionary files which were transferred with the datasets. These explained the meanings of encoded values and provided field descriptions which indicated the functions of fields. However, the field descriptions were highly problematic. A number had been truncated at around 60 characters, resulting in the loss of crucial information. The department believed that this had been caused by the migration of the data between formats in the early 1990s. Another problem was that many fields had identical descriptions but contained different data. It was also common for the field descriptions to contain abbreviations and terminology for which there were no obvious explanations. 

These problems meant that much of the Schools' Census data from 1975-1979 was potentially unusable. NDAD felt that the Schools' Census was a potentially important resource for historians and social scientists, and that it was worth attempting to reconstruct information about the data by drawing on external sources. Our work involved three elements:

1. We located examples of completed Schools' Census forms held in a local archive office and in individual schools. A systematic comparison of the information on the forms with the records for the schools in question in the datasets enabled us to clarify the functions of many fields, by relating them to the original survey questions.

2. We consulted the education department's annual volumes of published results for the Schools' Census. These provided useful background information, and gave us a way of validating the datasets by checking totals which we had derived from the data against the official statistics.

3. We conducted our own searches and analysis of the data - for example, to find out if particular fields only related to certain types of schools.

The full paper will describe this process of "digital archaeology" in greater detail. In so doing it will address the need which Ross and Gow have highlighted for more case studies about data loss and data recovery.(3) The paper will emphasise that while NDAD has significantly improved the value of the Schools' Census data as a resource, this has come at a high cost in terms of staff time. It will illustrate to data creators, custodians and users that it is much better to ensure that supporting documentation is created in the first place and preserved alongside data, than to attempt to recover this information after the fact.

References:

(1) Seamus Ross and Ann Gow, "Digital Archaeology: Rescuing Neglected and Damaged Data Resources: A JISC/NPO Study within the Electronic Libraries (eLib) Programme on the Preservation of Electronic Materials" (London, 1999), pp. 41-42.

(2) Mary Feeney, ed., "Digital Culture: Maximising the Nation's Investment: A Synthesis of JISC/NPO Studies on the Preservation of Electronic Materials" (Lon don, 1999), p. 70.

(3) Ross and Gow, "Digital Archaeology", p. 44.

122

Turner, Chris. University College London
Hockey, Susan. University College London
Sexton, Anna. University College London
Yeo, Geoffrey. University College London

TEI, EAD and Integrated User Access to Archives: Towards a Generic Toolset

This paper will consider a range of issues surrounding the development of a generic toolset to bring together Encoded Archival Description (EAD) and the Text Encoding Initiative (TEI) to link online archival finding-aids to digitised transcripts and images of paper-based archival materials. At present, these two XML encoding systems are often used independently of each other and no generalised environment exists for linking them. Where links do occur in online archival finding-aids, they normally point to digitised images or transcriptions without provision for any analysis or manipulation of the source.

Some prototypes for using EAD alongside other SGML/XML-based systems have been developed in the USA, notably by the William Elliot Griffis project at Rutgers University http://www.ceth.rutgers.edu/projects/griffis/project.htm and by the Online Archive of California http://www.bampfa.berkeley.edu/moac/ . However, the main focus of these projects has been on the provision of access to specific materials, rather than the development of a generic toolset for use in connection with a wide range of archives. The LEADERS (Linking EAD to Electronically Retrievable Sources) Project team based in the School of Library, Archive and Information Studies, University College London (UCL) is working towards the provision of a generic and unified Internet-based interface for finding items within archival collections and for searching and manipulating electronic representations of those items. The LEADERS Project is funded by the Arts and Humanities Research Board. Samples from the University College London Archive and the George Orwell Papers held at UCL are being used as test-beds for research.

The paper will focus particularly on three aspects of this work:

  1. The typology of archive users and the identification of user needs and user behaviour. We are developing a categorisation model which takes account of previous work on user needs, both in the UK and internationally, but seeks to analyse the user marketplace more fully than any previously published study. It is intended that this model should provide a basis for selecting a sample of users who can be asked to supply more detailed information about their needs when using online archival finding-aids and representations of original paper records, as well as feedback on the progress of our work. By seeking feedback from a representative cross-section of users we plan to find a basis for matching the design of an interface based on the TEI and EAD to the needs of its potential users.
  2. The use of TEI markup for textual representation of archival documents. The TEI was designed for markup of primary sources in the humanities and has been widely used for literary texts; it has also been used for dictionaries, for manuscripts, and in language studies and digital library projects. The paper will consider the issues that arise from seeking to use the TEI to cover the content and structure of a range of commonly-occurring archival documents. These might include administrative or operational records created by government agencies, universities, businesses or other corporate bodies as well as personal diaries, letters and notebooks. Topics discussed will include the presence or absence of formal rules or models for document structure and the range of possible relationships between one document and another within archival collections.
  3. The interface between the TEI and EAD. While the development of EAD was based on some previous work by the designers of the TEI, it was developed primarily for use as a ‘stand-alone’ tool. The integration of the two systems raises questions about the nature of appropriate linkages between them, and the role of the TEI Header when used in conjunction with the more extensive metadata framework provided by EAD. The paper will explore these issues in the light of the ongoing work by the UCL team.

The topics to be discussed in this paper are central to a number of conference themes, notably the provision and management of access to archival sources, and the analysis and modelling of information resources used in humanities research. The paper will be of particular interest to scholars seeking to use electronic resources, to members of the humanities computing community and to archivists and other curators seeking to improve access to their holdings through the use of digital technologies.

While the work to be presented has not yet reached any final conclusions, we are exploring a number of aspects both of textual markup and of remote access provision that (as far as we are aware) have not been fully investigated in previous work in this field. The successful conclusion of the project will enable archivists to present the richness of the materials in their custody in ways that have hitherto been unavailable, and this in turn will provide scholars with new means of access to research materials and new ways of analysing and exploiting them.

125

Corrigan, Karen. Newcastle University
Beal, Joan. University of Sheffield
Moisl, Hermann. University of Newcastle
Allen, Will and Row, Charley. University of Newcastle

The Analysis and Visualization of a Socio-cultural Resource: Topographic Mapping of the Newcastle Electronic Corpus of Tyneside English

This paper addresses the related issues of: (i) the statistical analysis of linguistic corpora and (ii) intuitively-accessible representation of data analysis results, with particular reference to the Newcastle Electronic Corpus of Tyneside English (NECTE) resource enhancement project. This paper will relate to various DHR conference themes. For example, a crucial issue in the NECTE project has been the provision and management of access to the NECTE digital archive resource so as to accommodate a variety of research goals. Additionally, the presentation will demonstrate the topographic mapping technique which, it will be argued, aids knowledge representation, in general, and visualization, in particular. Topographic mapping can be innovatively applied to digital resources of this kind in order to highlight regularities in the interrelationships of certain features in the data. The presentation will focus on three main themes:
 

1. The NECTE project
The NECTE project is based on two corpora of recorded speech collected in the late 1960s and in 1994 in the Tyneside (UK) area. Its aim is to combine the two into a single digital archive to be made available in a variety of formats so as to accommodate a range of end-users in the humanities and social sciences, i.e. digitized sound, phonetic transcription, standard orthographic transcription, and various levels of tagged text, all of which will be aligned.

2. Topographic mapping and its application to NECTE

a) Topographic mapping
This theme explains the nature of topographic mapping and the motivation for its use in this innovative context. The aim of topographic mapping is to represent relationships among data items of arbitrary dimensionality n as relative distance in some m-dimensional space, where m < n. It is used in applications where there are a large number of high-dimensional data items, and where the interrelationships of the dimensions are not immediately obvious. The data items are typically represented as a set of length-n real-valued vectors V = {v1, v2…vk); these vectors are mapped to points on a 2-dimensional surface such that the degree of similarity among the vi is represented as relative distance among points on the surface. 

b) Motivation and application to NECTE
Corpus analysis is often concerned to discover regularities in the interrelationships of certain features of interest in the data. Cluster analysis has been widely used for this purpose, and topographic mapping is, in fact, a variety of cluster analysis. Its chief advantage is the intuitive accessibility with which analytical results can be displayed: projection of a large, high-dimensional data set onto a two-dimensional surface gives an easily-interpretable spatial map of the data's structure.

c) Implementation of topographic mapping using the SOM architecture
There are several ways of implementing topographic mapping. NECTE adopts the self-organizing map (SOM) implementation, and this section briefly describes the SOM architecture
 

3. Topographic mapping analyses of NECTE data
Our presentation will conclude with a discussion of the nature of our data and the manner in which it can be represented using topographical mapping techniques. Our data consist of high-dimensional vectors that correlate selected social, phonetic and morpho-syntactic variables, all of which are spatially mapped to a two-dimensional surface with the aim of visualizing the interrelationships of the chosen variables. The paper will conclude with a demonstration of the technique with respect to a range of variables directly related to current linguistic research in the humanities and social sciences.

References

Borg I, Groenen, P 1997 Modern Multidimensional Scaling - Theory and Applications. Springer.
Everitt B 1993 Cluster Analysis, 3rd ed. E. Arnold. 
Haykin S 1999 Neural Networks. A Comprehensive Foundation. Prentice Hall International.
Honkela, T 1997 Self-Organizing Maps in Natural Language Processing. PhD thesis, Helsinki University of Technology, Espoo, Finland.
Jolliffe I 1986 Principal Component Analysis. Springer.
Kaski S, Honkela T, Lagus K, Kohonen T 1998 WEBSOM--self-organizing maps of document collections. Neurocomputing 21: 101-117.
Kohonen T 1995 Self-Organizing Maps, 2nd ed. Springer.
Kohonen T, Kaski S, Lagus K, Salojärvi J, Paatero V, Saarela A 2000 Self Organization of a Massive Document Collection. IEEE Transactions on Neural Networks 11(3): 574-585.
Lagus K, Honkela T, Kaski S, Kohonen T 1999 WEBSOM for textual data mining. Artificial Intelligence Review 13(5/6): 345-364.
Manning C, Schütze H 1999 Foundations of Statistical Natural Language Processing. MIT Press.
Merkl, D Text data mining. In Dale R, Moisl H, Somers H (eds), Handbook of Natural Language Processing, Dekker, pp 889-903
Pellowe J et al. 1972 A dynamic modelling of linguistic variation: the urban (Tyneside) linguistic survey. Lingua 30: 1-30.
Rojas R 1996 Neural Networks. A Systematic Introduction. Springer.
Strang B 1968 The Tyneside Linguistic Survey, Zeitschrift für Mundartforschung, Neue Folge 4: 788-94.

133

Fenton, Katherine. Lecturer, School of Informatics, UNN

Medfrench On The Web: A Generic XML System From A Specialized Old French Resource.

The paper describes how a standalone text-based teaching resource is being
updated to incorporate more media formats, more sophisticated methods of
information modeling and linking and more varied levels of access, resulting
in a system that can be used in a variety of academic situations.

MedFrench is a computer-based teaching system produced in 1991 by Brian Levy
and Alan Hindley at the University of Hull and has been widely used in
teaching medieval French literature in universities in Europe and North
America. This DOS-based application contains medieval French texts and
useful associated information, such as linguistic analysis, and cultural and
thematic annotations. A series of short texts from the twelfth and
thirteenth centuries are presented to the student in order of increasing
difficulty. As the student moves the cursor forward over the text, one word
at a time, a wide variety of information is made available. A sequence of
hypertext windows linked to every word offers aids to recognition through a
number of glossary functions: headword, root forms, modern French
equivalents, grammar and syntax and, where appropriate, a more extended
translation of the phrase in which the word appears. As the student works
through the passage, further windows open up ('pearls of wisdom') offering
more detailed lexical information and more critical editorial commentary on
the style and structure of the text and on its cultural background. There is
also a notepad facility so that students may save some of this information
and make their own comments about the text. Information about a student's
session is saved to file so that the teacher can monitor student access.

There is a lot of information in MedFrench, making a variety of exercises
possible for the teacher of medieval French language and literature.
However, in today's world of online multimedia hypertext teaching and
learning systems, it appears dated and of limited range: the interface
design now seems rather basic and the navigational methods are not
intuitive. In a networked version of the program it is difficult for the
students to save their notepad comments or print them out. The students
who might really benefit from using such a tool are turned away because of
the application's restrictions and the effort involved. Students are in any
case now used to the point-and-click technologies and the easy-on-the-eye
graphical user interface of web browsers.

A new MedFrench is needed - one which retains not only the richness of the
original in terms of content but increases it in terms of multimedia
materials, and incorporate a more comprehensive range of functionalities
that were not possible ten years ago. The developments in electronic
publishing, corpus linguistics, online learning, and internet connectivity
have suggested a variety of possible solutions to the problems of
translating MedFrench to a web environment.

A pilot web version of MedFrench has been produced which shows what kinds of
functionality are possible. For example, instead of the texts being
presented in order of difficulty, the display could be switched to
chronological order for a historical approach to the texts. The associated
information could also be made accessible in a variety of ways, in terms not
just of the student reading the text in a linear way, but of the researcher
investigating a particular literary theme, historical detail or linguistic
feature, or requiring more sophisticated querying and hyperlinking
mechanisms. Also, a greater variety of materials that constitute the
annotations for each text can be incorporated such as pictures and sound
recordings. With the client-server architecture of the internet and
integration with existing distance learning platforms, such as Blackboard,
there are range of possibilities for student-teacher interaction.


A project group has now been established led from the School of Information
Studies at the University of Northumbria in collaboration the original
MedFrench editors from the University of Hull. The intention is to
coordinate the general upgrade of MedFrench so that the range of texts and
supplementary materials is expanded and at the same time the encoding,
transformation and access systems are developed. However, it is important
that the texts can be accessed independently from the system and not
encrypted into proprietary formats as the original MedFrench was. So a
further goal is to produce an interactive multimedia teaching system that is
generic; that is, one that can be ported to a variety of different academic
situations, not just medieval French, but those involving other languages,
and other types of texts such as modern literature and historical
documents. The texts are to be encoded in eXtensible Markup Language(XML)
using the TEI Guidelines' DTDs, the most powerful of emerging digital
technologies to allow for searching, comparison and interlinking of
materials as well as long term storage. MedFrench is an opportunity to
explore the possibility that these technologies offer in particularly with
regard to the enriched hypertextuality offered by XLink. Similar projects
are currently being carried out in other areas of medieval literature and
language study.

The development of a web version of MedFrench is therefore, not simply a
conversion of a standalone collection of texts to hypertext markup language,
but also an exploration of generic issues relating to literary and
linguistic data modeling, student access and interactivity and the pedagogic
value of multimedia electronic editions of humanities texts.

135

Williamson, Nigel. University of Sheffield
Smith, Carl. University of Sheffield

Cistercians in Yorkshire Project: Reconstructing the Past Using Virtual Reality

Introduction
We are all increasingly familiar with the creation of alternate virtual realities using computer based technologies. Computer games, Play Stations, films such as Lords of the Rings and Star Wars exploit the three dimensional modelling powers of computers to create visual representations of fantasy worlds, objects and people. As well as turning fantasy into reality these same technologies can be used to recreate the past and make dinosaurs walk, meet your ancestors, watch gladiators duel and recreate lost buildings. Such is the human reliance on visual information and our acceptance of it as reality that the power of this type of technology to shape human understanding and impart historical knowledge is immense. The potentials for the use of virtual reality (VR) with archaeology and history are huge, particularly for those wishing to present their material to a lay audience. While professional archaeologists are able to visualise a whole building from a few post holes or a line of half buried stones, most p

The Project its Aims and Context
The Cistercians in Yorkshire project is a New Opportunities Fund (NOF) funded project which aims to use VR technology to recreate scholarly accurate models of Cistercian abbeys for use by the general public as part of a freely available web-based learning package exploring the history and architecture of the Cistercian order in Britain. Virtual Reality is an especially powerful tool as the functional meaning of any site can usually be more accurately translated through an actual understanding of the structured navigation of the space. The package will focus on five of the Cistercian abbeys in Yorkshire: Fountains, Rievaulx, Byland, Kirkstall and Roche. Central to the project will be three-dimensional reconstructions of the abbey churches and claustral buildings at each site; these will be accompanied by text-pages explaining the history and the social, economic and cultural significance of each foundation. Users will be able to navigate themselves round the buildings, or follow tours in the daily life The target audience is varied including school children, "life long learners", undergraduates, and the general public. The project also aims to present historically accurate models to the academic community. The paper will explain how we are attempting to meet the requirements of all these audiences with one learning package.

The use of Virtual Reality and Virtual Environments within Archaeology and in conjunction with similar sites is nothing new although this project is as far as we are aware unique in several respects most obviously its scale and its aim to produce and deliver high quality, historically accurate models to the academic community. 

This paper will start with an examination of the various way these technologies have been applied to similar sites and subjects and outline the aims of the project and the importance of the subject. The paper will then briefly explain the structure and organisation of the project team before discussing in more detail the projects use of VR and the issues it faces. Finally the paper will look at how the project will use a delivery the VR as part of a larger learning package.

Using the architecture as a narrative to visually depict the Cistercian way of life 
The projects objective of creating scholarly accurate models and delivering them to the general public and ‘life long learners’ within the context of a learning package raises a series of practical and theoretical issues and themes which this paper will address: 
The Modelling Process: Technology and Techniques
The modelling phases of the project involves the realisation of the virtual environment (VE) through the reconstruction of plans and diagrams. The reconstruction is developed, validated, and delivered in stages. This incremental approach is designed to allow a constant evaluation of the accuracy of the source materials highlighting discrepancies or inconsistencies as the reconstruction proceeds.
We are primarily using CAD to construct wire frame models of the buildings. Photogrammetry is used where more substantial ruins remain to speed up the drawing process. The completed wire frame models are then texture mapped and rendered in 3D Studio Max. These completed models are then converted into deliverable models in VRML, JAVA3D, and movies. The paper will briefly explain the modelling processes workflow and rational. 
The archive of the project is intended to be made available to the general public so that all the source files can be analysed and where necessary reused to provide updated or modified models. This objective is designed to ensure that the lifespan of the project and the models themselves is significantly extended. Some of these source files will include .DWG, .DXF, .WRL, GZIP WRL, and .MAX. 

Meta data and preservation 
To ensure the optimum use of VE’s metadata standards and technical protocols are needed for describing virtual worlds in ways that reflect not only the research needs of scholars but also the interests of the general user. Currently such standards are rapidly developing area. Without such initiatives for the comprehensive description of VE, reconstructions can rapidly become disassociated from any academic discussion linked with their development. Metadata modelling standards will also help to ensure that each model is technically compatible with those produced by other humanities VR projects. It is the projects belief where possible that the information provided within the VE should allow the user to develop a fuller understanding of its creation and its development over time. The paper will detail the projects use of meta data as a practical contribution to the discussion and development of meta data standards for VE

Evaluation and Testing 
If the virtual environment is to be truly conducive to learning, then effective evaluation methods are crucial to the development of the VE. In attempting to develop effective evaluation methods the project has faced a series of questions: What are the requirements for acceptable performance? What exactly does verification and validation mean for VEs? What are the prerequisites for successful virtuality? The paper will examine and explain the projects attempts to answer these questions 
Delivery: problems, issues, mechanisms.
The delivery of the models to the user presents a series of problems, both technological and didactic, which the paper will explore: 
The modelling process produces a series of models which allows representations of the site to be produced at any stage in its development. As any combination of elements at any chosen scale may conceivably be of interest to architectural historians and archaeologists we have to carefully decide what constitutes a separate element for each building. We also have had to decide whether to divide up the building into separate elements such as the east end or the north transept.
The audience for the completed site ranges from school children to the retired, and material has to be deliverable to users at home down a modem. The preconceptions of the audience and the constraints placed on us by modem delivery, present the series of problem and choices which the paper will outline. 

On the more technical side the paper will explore the delivery options which are open to the project including VRML, QuickTime VR, Avi files or MPEG movie. Each of these has their own advantages and disadvantages. The paper will also outline the projects attempts to exploit the former and minimise the latter in employing different techniques in specific contexts.

Reconstruction and Recreation 
Although we can reconstruct much of the fabric of the building from the fragments which remain there are sections for which there is no surviving fabric or images where we have to recreate what we believe was there. However, these models have the potential to generate a level of detail which is rarely available in the drawings or plans of the site and there is a tendency for these visual representations to be accepted as completely accurate when in fact there are limits to our historical understanding of the site. Of course the issue of how to accurately reconstruction the past is nothing new to historians or archaeologists and is indeed not confined to the realm of virtual reality. However, the paper will explore the techniques which are available to allow us to indicate what is recreation and what is reconstruction while not distracting from the overall coherence of the model we are presenting.

The Learning Package
So far the paper has concentrated on the issues surrounding the creation of the Virtual Environment and this indeed is the central aspect to our project. However, it is not an end in itself and the virtual reconstructions form part of a larger learning package. The final part of the paper will explain how the reconstructions are used in the web based learning package and how they are contextualised by the interpretive texts. Issues of web design will be linked to the technical constraints and options outlined above in explaining the finished package.

137

Crawford, Tim. King's College, London

'The best musick in the world'

Thomas Mace's enthusiastic description of the lute repertory, written in 1676, would have aroused some curiosity even in his own time. The lute was in fact entering a period of decline which was quite rapid in England and France, although the instrument maintained a vital presence in the musical scene in German-speaking countries for about another century longer. It had, of course, been among the most important instruments of the Renaissance period, both as a solo instrument and for accompanying the voice. Its virtual disappearance for most of the 18th, all the 19th, and much of the 20th centuries was due as much to its intimate and soft-voiced character, which precluded performance in large public venues, as to its extraordinarily idiosyncratic form of notation.

This system, known as tablature, was an extremely economical, yet perfectly expressive, way of conveying to the player all that he or she needed to perform the music on the lute (essentially the same system was used for other stringed instruments as well). What it did not show was any information about the abstract musical structure of the piece; notes, for example, are indicated diagrammatically as finger-positions rather than as pitches, while the unfolding in time of their rhythmical sequence is shown by an entirely separate array of signs. While obvious meaningful and helpful to the player, lute tablature's bewildering array of ciphers is completely unintelligible to players of other instruments. Although tens of thousands of pieces of lute music in tablature are extant in printed collections and manuscripts from c.1480 to c.1790, a very small proportion of this rich legacy of music, some of it of the very first quality, has been made available in transcription into conventional musical notation. Fo

Modern technology can offer a solution. Lute tablature may be encoded for the computer without much difficulty (its prescriptive nature makes it in some ways ideal for the task) and a corpus of such encodings can be built and provided online; images of the original sources can be provided to enable a critical assessment of doubtful passages; music playback, either from audio recordings or from MIDI files derived from the tablature, can give a 'sonic' impression of the scores; a transcription into conventional music notation can also be provided 'on the fly'. Furthermore, the encoded scores can be used for musical analysis or database searches, or combined with the corpus metadata to provide a resource for various 'repertorial' analyses which would be impossible without the aid of computers.

This paper will outline the first stages of the AHRB-funded project ECOLM (Electronic Corpus of Lute Music) which has been running at King's College, London, since 1999. It will show some examples of the encoding of tablature as well as the management of source and piece metadata, including a brief discussion of the pilot implementation using XML which we hope to adopt in later phases of the project. It will also show the use on lute music of some novel techniques from the OMRAS music-information-retrieval project which demonstrate the potential power of such tools for music analysis in general.

The ECOLM project does not simply aim to provide an easily-accessible source of music to players or scholars devoted to the history of the instrument. An important feature of the encoding procedure is that it offers ready means of data-conversion into other formats in which other musical resources might already be stored. Gradually, online music-research tools based on medium-to-large databases of music are beginning to emerge, but these are almost always of monophonic music (only one voice or instrumental part sounding at any moment). Since lute music is inherently polyphonic (or chordal) our encoding and data-processing techniques represent a significant step forward in certain respects, and since they may be applied with suitable modification to more familiar repertories such as keyboard music (which has hitherto been considered particularly problematic) they are likely to prove of benefit to the wider musicological community in allowing cross-domain searching and detailed comparison between the diff

In particular, one entirely new method for identifying musical 'variations' developed jointly within the ECOLM and OMRAS projects has shown itself to be an extremely powerful tool for music-retrieval. We will show how an audio recording can even be used to recognise pieces of polyphonic music stored in a database of some 3,000 substantial musical works. While our audio-recognition technique is still a far from perfect means of capturing all the details of a performed piece, we are also investigating the possibilities of optical character recognition for printed (and even manuscript) tablatures in order to simplify and accelerate the very labour-intensive encoding procedure.

Finally, plans for future stages of the ECOLM project will be outlined, including a parallel initiative to assist with the corpus-building exercise on a much larger international scale.

138

Efron, Miles. University of North Carolina
Fenton, Serena. University of North Carolina
Jones, Paul. University of North Carolina

The Problem of Access in Contributor-Run Digital Libraries

The advent of digital libraries (DLs) inspired Utopian thinking in their early advocates. Among the virtues of these on-line collections, developers cited novel modes of presenting materials (Hauptmann et al. 1995), new ways to do research (Sperberg-McQueen 1994), and more efficient means of collection management (Arms 1995).
In other words, the idea of access was central to digital library utopianism. Insofar as they enabled computers to digest large quantities of data into meaningful forms (whether by divorcing content from material, by representing semantics in full-text databases, or by automating authentication and acquisition protocols) DLs, their proponents argued, offered users a new form and degree of access to information.

While in large part these promises were accurate, practice has shown that DLs complicate the idea of access even while furthering its cause. Access has proven to be an active pursuit; merely placing materials on-line does not make them accessible in any meaningful way. Moreover, providing access implies an a priori definition of information that binds the user. For example, information retrieval technologies depend on a collection's definition of what constitutes a document--will we represent documents by key words, abstracts, full text? Thus providing access is inherently a matter of representation (Furnas et al. 1987). As a type of representation, access can be understood as a form of useful exclusion; we show users what they want by hiding everything else. Thus the question of art becomes, "who does the excluding?"

Understanding access has entered a new phase in light of a recent development: the appearance of contributor-run digital libraries (Jones 2002). In contrast to traditional libraries (either physical or digital), contributor-run DLs solicit their users to perform tasks usually performed by administrators or librarians. Services such as Slashdot, Epinions, and Amazon enlist their readers to write, catalog, and review the information they provide. By loosening the reins on document representation and collection development, contributor-run DLs change the answer to the question, who does the representing? Whereas traditional libraries rely on administrators to represent the collection, contributor-run DLs rely on democratic data representation.

This paper describes the recent redesign of Ibiblio.org, a digital library that depends on its users to create, catalogue, and manage its collection. In particular, we describe the design and use of Ibiblio's collection management software, which uses author-generated metadata to provide access to information. We argue that allowing users to run their own library is useful, especially insofar as the practice admits a democratic, bottom-up definition of access.

Before calling itself a digital library the service now known as Ibiblio.org lowered the barriers to user contributions. Since 1992 Ibiblio.org (previously known as MetaLab.unc.edu and Sunsite.unc.edu) constituted an on-line forum by offering web space to non-commercial projects. From its inception Ibiblio pursued a broad collection by admitting almost any non-profit project, with the idea that users, not collection administrators would decide what was worth using. Thus Ibiblio became an early home for such sites as Project Gutenberg, Documenting the American South (an early DL in its own right), and one of the largest collections of software for the Linux operating system. This Linux archive became a high-traffic service, run almost entirely by the open source community. The archive functioned by providing a secure FTP space, to which anyone could suit software. To be included in the archive, these uploads needed to be accompanied by a small, author-generated metadata file, called a Linux Software Ma

Although Ibiblio functioned well as an on-line repository, access to its collections remained problematic during its early days. For instance, finding software in the Linux archive required a user to traverse a complex directory structure, guided by often gnomic subject information. Thus Ibiblio administrators began to pursue a more active type of access to the collection. This process began with the implementation of Linsearch (http://www.ibiblio.org/linsearch) in 1999. Using the LSM metadata, Linsearch provides a fielded-search mechanism for finding information in the Ibiblio Linux archives.

The effort to improve access to Ibiblio's collections was boosted by a grant from the Center for the Public Domain in 2000. The purpose of this grant was to refashion Ibiblio as a full-fledged, contributor-run digital library. While this redefinition implied a variety of changes, paramount among them was Ibiblio's improvement of user access facilities. This included expanded retrieval capabilities supported by LSM-style metadata. We also undertook improved browsing support via a new user interface. Finally, we hoped to implement trust metrics to make useful information more visible to users.

To create these services, Ibiblio developers built the Ibiblio collections index (http://www.ibiblio.org/collection/). The collections index is a relational database that allows contributors to create and edit metadata that describes content they have suitted to Ibiblio. The format for this metadata is based on the Dublin Core element set (http://purl.org/dc/), and the site is compliant with the Open Archives Initiative (http://www.openarchives.org/). Thus Ibiblio's author-generated metadata is available to any search service that participates in the OAI. More immediately, these Dublin Core records provide fielded searching of collection-level metadata, styled after the Linsearch facility.

Part of the metadata creation process involves authors in assigning subject headings for their collection. To facilitate this process, the collections index uses a subset of the Universal Decimal Code (UDC). Thus authors choose headings from a structured list of possibilities. This foregrounds an important point in providing access to contributor-run DLs: access involves balancing freedom and structure. We want to utilize the collective efforts of our users by giving them free rein in creating and describing content. But in order to be useful, these efforts must be given some structure.

By virtue of the UDC classification, users may access the collections by subject browsing. The collections index has improved our administrators' access to the library as well, thanks to the software's administrative views of the data. These views allow administrators to review the work of our contributor cataloguers and to catalog collections whose authors have not suitted metadata records. Most importantly, the collections index moves Ibiblio closer to the model of a full-fledged contributor-run library. Before implementing the index, all collection management was performed by Ibiblio administrators. At the time of the new index's implementation, we tallied 162 indexed collections. As of the writing of this proposal, the index contains 535 collections. While this does reflect growth in the Ibiblio holdings, it also indicates a higher ratio of catalogued to non-cataloged collections in the archive.

Our experience with LSMs and the collections index suggests that author-generated metadata is consistent and of high quality. These findings are in concert with the findings of Greenberg et al. (Greenberg et al. 2002). Removing impediments from the project of providing access to digital materials makes use of authors' subject expertise and relieves the burden on DL administrators. To expand the scope of user contributions Ibiblio is currently pursuing the OSPrey project. OSPrey extends the opportunity for contribution by inviting not only authors, but also DL readers to describe on-line materials. The system solicits information from any interested parties, using personalization and trust metric techniques to filter useful comments from distracting ones. As Ibiblio matures, orienting itself as a contributor-run DL, the site's maintainers are moving away from gatekeeping roles. Instead, we are working to provide intelligently designed structures and systems that allow our users to lead each other to usef

References
Arms, W. (1995). Key Concepts in the Architecture of the Digital Library. 
Dlib Magazine. July 1995. 
http://www.dlib.org/dlib/July95/07arms.html

G. W. Furnas, Thomas K. Landauer, L. M. Gomez, and S. T. Dumais. The vocabulary problem in human-system communication. Communications of the ACM, 30(11):964--971, November 1987.
http://citeseer.nj.nec.com/furnas87vocabulary.html

Greenberg, J. et al. (2001). Author-generated Dublin Core Metadata for Web Resources: A Baseline Study in an Organization. Journal of Digital Information. 2(2).
http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Greenberg/

Hauptmann A. G., et al. (1995). News-on-Demand: An Application of Informedia® Technology.
Dlib Magazine. September 1995.
http://www.dlib.org/dlib/september95/nod/09hauptmann1.html

Jones, P. Open(Source)ing the doors for contributor-run digital libraries CACM 44(5), 45-6.

Sperberg-McQueen, C. M. (1994). Textual Criticism and the Text Encoding Initiative.
http://www.tei-c.org/Vault/XX/mla94.html

140

Cole, William. University of Georgia

The Gromboolia Project: A MOO-based, Hypertextual Literature Classroom

Introduction
This project explores the potential of MOO (Multi-user domain, Object-Oriented ) as a platform for creating hypertextual online learning spaces. Specifically, it involves the creation of a large suite of MOO objects for the study and teaching of the "nonsense songs" of the British Victorian poet Edward Lear followed by classroom testing of the space with students. I hope to demonstrate the value of MOO as a tool for teaching literature and to provide a working model for the development of similar spaces for studying other topics. More broadly, I hope to weave together strands from the still largely separate realms of hypertext theory and computer pedagogy, showing how the insights of one can enrich and expand the other.

Rationale
The use of MOOs as general educational tools is nothing new. At sites like LinguaMOO [11], Diversity University [6], and many others, instructors in a variety of disciplines use MOO to facilitate discussion, provide distance learning, explore identity issues, and promote collaboration (for an overview of the evolution of MOOs and their use as educational resources, see Haynes and Holmevik [8]). MOO is particularly attractive in language and writing instruction since it is an environment literally constructed out of language [4, 7]. However, these uses emphasize the interaction that takes place within MOO-space with little regard for the nature of the space itself. This omission can also be found in much of the scholarship on MUDs and MOOs -- for example, Sherry Turkle's work on identity construction online [12] or Julian Dibble's account of the infamous "virtual rape" in LamdaMOO [5]. I would like to refocus attention on the underlying hypertextuality of MOOs. The typical spatial metaphor of "rooms" connec

Project Description
The project takes its name from Gromboolia, the collective name assigned to the imaginary settings of Lear's poems. Gromboolia is less a fully conceived alternate universe than a loose collection of place names and descriptions that recur among the poems, but it suggests connections, both obvious and subtle, between his works. MOO seems ideally suited for representing a pseudospace such as Gromboolia and the textual relationships which define it. 
The first stage of this project is underway, as I have begun construction of the central spaces and infrastructure on CowTown MOO hosted by the Ohio State University (
http://moo.cohums.ohio-state.edu/). CowTown is based on the High Wired enCore database developed by LinguaMOO [11], and uses their Xpress web interface. As currently planned, the suite will contain about eight rooms based on the places of Lear's Gromboolia, as well as rooms that will house the texts and illustrations of at least eight of Lear's poems. These latter poem-rooms make use of the generic web page object, a special form of room that allows the use of HTML markup to provide greater control and flexibility in formatting room descriptions. Using the web page object and cascading stylesheets, the typographic features of the standard edition of Lear's work [10] can be accurately reproduced.
The infrastructure of links between both types of rooms reflects the relationships among poems and places. Some of these relationships are fairly explicit (e.g., Lear's poem "The Dong with a Luminous Nose" is a retelling -- from a different viewpoint -- of his earlier poem "The Jumblies"), but I am especially interested in also representing implicit or provisional links between the objects (e.g., the connection between two differently-named places that serve similar functions within the poems). By creating variants of the standard exit object (a relatively simple task in MOO), I will be able to create visually distinct link types to express these different relationships.
Future additions to the MOO environment will include textual, contextual, and interpretive notes (which could take the form of separate spaces, of note objects placed in the room upon which they comment, or of more exotic interactive objects). Since one of the specific virtues of MOO is collaborative construction, I plan to leave some of the furnishing of Gromboolia to groups of students, whom I will bring to Gromboolia beginning later this spring. Gathering data about student interaction with and enhancement of the MOO space, and refining the space in response to that data will constitute the second phase of the project. 

Expected Contributions
The final result of this project will be a working MOO-based, hypertextual literature classroom, one that provides an engaging environment for learning about a particular literary topic and that provides models for the development of similar spaces for other topics. Besides the creation of the Gromboolia environment itself, I see this project as bridging the current gap between hypertext theory and computer pedagogy, potentially broadening the scope of both fields. To hypertext theory, it will bring the idea of hypertext as a shared space and of hypertext reading as a potentially social, rather than purely solitary, activity (Mark Bernstein provides a rare exception to this trend in his presentation of the "exotic" systems Card Shark and Thespis [2]). To the discourse on online learning environments, it will provide a MOO classroom where the subject of instruction is the teaching environment, where the structure of the knowledge to be learned or the topic to be explored shapes the space of instruction

References
1. Aarseth, E. 1997. Cybertext: Perspectives on Ergodic Literature. Johns Hopkins University Press, Baltimore.
2. Bernstein, M. 2001. Card Shark and Thespis: exotic tools for hypertext narrative. Hypertext'01: Proceedings of the Twelfth ACM Conference on Hypertext and Hypermedia, University of Aarhus, Aarhus, Denmark, Aug. 14-18 2001. (Davis, Douglas, and Durand, eds.), 41-50.
3. Bush, V. 1945. As we may think. Atlantic Monthly 176 (Jul), 101-108.
4. Day, M., et al. 1996. CoverWeb: Pedagogies in virtual spaces: Writing classes in the MOO. Kairos: A Journal for Teachers in Webbed Environments 1.2 (Summer). <http://english.ttu.edu/kairos/1.2/coverweb/bridge.html.
5. Dibble, J. 1996. A rape in cyberspace; or how an evil clown, a Haitian trickster spirit, two wizards, and a cast of dozens turned a database into a society. High Noon on the Electronic Frontier: Conceptual Issues in Cyberspace. (Ludlow, ed.). MIT Press, Cambridge, MA, 375-95.
6. Diversity University. Diversity University Educational Technology Services. <http://www.du.org/.
7. Haynes, C. 1998. Help! There's a MOO in this class! High Wired: On the Design, Use, and Theory of Educational MOOs. (Haynes and Holmevik, eds.). University of Michigan Press, Ann Arbor, 161-76.
8. Haynes, C. & Holmevik, J.R. 1998. Introduction: "From the Faraway Nearby." High Wired: On the Design, Use, and Theory of Educational MOOs. (Haynes and Holmevik, eds.). University of Michigan Press, Ann Arbor, 1-12. 
9. Landow, G.P. 1992. Hypertext: the Convergence of Contemporary Critical Theory and Technology. Johns Hopkins University Press, Baltimore.
10. Lear, E. 1951. The Complete Nonsense of Edward Lear. (Jackson, ed.). Dover, New York.
11. LinguaMOO. University of Texas at Dallas. <http://lingua.utdallas.edu/. 
12. Turkle, S. 1995. Life on Screen: Identity in the Age of the Internet. Simon and Schuster, New York.

143

Deegan, Marilyn. Forced Migration Online
Jessop, Martyn. CCH, King's College London
Syrri, Despina. American College of Thessaloniki

Mapping Migration in Macedonia

This project, Mapping Migration, is constructing a Geographical Information System (GIS) and web-site, hosted by King’s College London Centre for Computing in the Humanities. The partners are The Refugee Studies Centre, University of Oxford, The Centre for Computing in the Humanities, King’s College London and The Research Centre for Macedonian History and Documentation, Thessaloniki.

The project focuses on a section of the geographical region of Macedonia, more precisely the district of Kastoria, part of the Monastir Vilayet during the Ottoman empire, named prefecture of Kastoria and Florina when incorporated within the Greek state (1913), then becoming district of Kastoria following the end of the II world war and the Greek Civil war (1948).

History

Population migrations were among the decisive factors which contributed to the dynamism of Balkan history. Ancient Greece, the Roman and Byzantine Empires, the settlement of the Slavs, their medieval states as well as centuries of Ottoman rule left a deep impact on the ethnic, political and cultural composition of the peninsula. The Eastern Question and the rivalry of the European powers to participate in the Ottoman heritage introduced the Balkans into modern European history, parallel to the birth of national Balkan states in the nineteenth century. The mobility over the new borders of peoples who lived for centuries in multi-national empires made extremely difficult the establishment of ethnic frontiers based on the principle of national self-determination. The Balkan Wars of 1912-13, and in part World War I, had their origins in nationalist aspirations to complete territorial unification. The peace settlements after 1918 established boundaries very similar to those existing today, but left some

Ethnic mixtures resulting from migrations blurred national affiliations in neighbouring Balkan national confines. After 1941, the Balkans became a battlefield between the Allied and Axis forces and the scene of resistance movements significant for the post war settlement. The states followed divergent paths from 1945 to 1990, as Communist governments took power in several. Events since 1989, as governments and economies moved into a transition period to capitalism and western type liberal democracy, the rise of nationalism and civil conflict, and the breakdown of Yugoslavia are of major significance. Forced migration, crisis interventions, sovereignty, economic instability, ethnic tensions and the torn social fabric, pose challenges for sustainable development policies, both human and economic. As well as illustrating how states have attempted to shape and control forced migration, this project considers how migration is challenging traditional concepts of citizenship, sovereignty, national security and  

Project Outline

Mapping Migration starts at the present day and is organised in layers defined by a timeline showing significant events and migrations in the period 1880 to 2000. The backbone of the presentation is a series of about thirty maps (geographical, ethnological, religious, transport infrastructure, administrative and political). Many of these maps are being generated during the analysis of data by an off-line Geographical Information System (GIS). Interactive access to content is via location hot spots on specially produced image maps; these assist the users to navigate their way through the site. The hot spots will also have search options that will allow access to the different categories of information within the site. 

The content is drawn from a collection of material consisting of historical documents, photographs (both contemporary and historical), postcards, short explanatory texts, demographic metadata in databases concerning about ninety communities, multimedia presentations of places, historical events and cultural phenomena and newspapers contemporary to the period. The database material has been extracted from legacy database systems from the Museum of the Macedonian Struggle. Links to Forced Migration Online, Oxford and The Macedonian Heritage web-sites are available. Access to an on-line bibliography on the subject is available. The story of migration will be also be highlighted by illustrating its impact on a particular village community. 

The GIS provides a visualisation tool to allow academics from a number of disciplines to explore, analyse, compare and summarise migration throughout the period. The project presents information that has both geographic and temporal elements, the time aspect being of particular importance.

Much of the material recorded is unpublished and its availability online provides an invaluable resource. It preserves data that was formerly only available in databases that use legacy software, this is now available to a much wider audience. 

The project also plays a significant role in raising awareness of the fragility of forced migrants, the risks and potential existing in reconstructing broken lives as well as encourage states and international organisations to develop strategies and legislation to deal with the phenomenon.

The Mapping Migration Pilot Project is one of a new style of digital projects in the humanities that make use of image, database and text based technologies. The research methodology and the research issues that arise from the use of these technologies, the technical design, the integration of different technologies, and the associated issues of access to and preservation of the digital materials provide invaluable assets for the development of a far wider project on mapping migration in south east Europe and possibly worldwide.

Presentation
It is proposed that both the web site and Geographical Information System (GIS) be presented at the conference. The presentation will discuss the intellectual and research issues of the project placing equal emphasis on historical, cultural and technical issues raised in a project of this type. The problems, their solutions and how they relate to other work in this field will be discussed.

The project fits a number of the conference themes. Its presentation through the web site is a form of digital museum presenting both archive and modern multimedia material. The GIS is a powerful visualisation tool that allows historical data to be explored and analysed through the use of maps and charts. Both of the Web and GIS aspects of the project have highlighted issues concerning the design of information analysis tools and presentation media.

Bibliography
A detailed bibliography will be included in the final paper. A brief list is appended here:

The Valley of Shadow. William G. Thomas, Director Virginia Center for Digital History, University of Virginia
http://jefferson.village.virginia.edu/vshadow2/

Race and Place:African American Community History. Edward L Ayers and William G Thomas, Virginia Center for Digital History and The Carter G. Woodson Institute of African and Afro-American Studies.
http://www.vcdh.virginia.edu/afam/raceandplace/index.html

The Salem Witchcraft GIS: A Visual Re-Creation of Salem in 1692. Mike Furlough, Geospatial and Statistical Data Center, Benjamin Ray, University of Virginia
http://fisher.lib.virginia.edu/projects/salem/

A number of projects at the Library of Congress 'American Memory' site http://memory.loc.gov/ provide useful examples of directions for the project

146

Bia, Alejandro. Miguel de Cervantes DL
Carrasco, Rafael. C. University of Alicante
Sanchez-Quero, Manuel. Miguel de Cervantes DL

A Markup Simplification Model to Boost Productivity of XML Documents 

Abstract 
The aim of this paper is to explain the research work carried out by the Miguel de Cervantes Digital Library in developing a model to simplify and speed-up the markup process of literary texts.

Keywords 
Digital Libraries, TEI, Markup, XML, DTD, XML Schemas 

1. The model we propose:

The working model we propose consists of three levels: 

1. A complete, general, multi-purpose markup vocabulary (such as TEI), which favours the interchangeability of documents among different projects. 

2. A general DTD for our project, which is just a selected subset of the general markup vocabulary. An example of an appropriate tool to create this DTD is the TEI Pizza Chef [3]. 

3. Several simplified DTDs obtained from the general DTD by means of an automatic simplification process (one for each document category, e.g. drama, poetry, narrative or dictionaries). These DTDs are useful to validate the documents, acting, at the same time, as a guide to create new XML marked up documents. These simplified DTDs reduce the possibility of errors allowing only to chose from a restricted set of tags (the minimum necessary for each document category). Opposite to what it may seem, simplified restrictive DTDs are very easy to understand and considerably simplify the markup task. We have developed DTDprune [1] [2], an automatic DTD simplification program that serves this purpose and we are currently working on a similar tool for Schemas [5]. 

1.1 Five levels of changes to the DTD 

Before building the automatic tools, we started thinking on the kinds of modifications we would allow to be performed on DTDs. We have established five modification levels. The first three ones are basically restrictions on the use of attributes that produce documents that still comply to the general DTD. The other two types of modification produce documents that do not conform to the original DTD, but can be made to comply to it by means of a simple transformation. 

1. The first of these limitations is adding a set of specific and predefined values for attributes instead of CDATA, so as to force users to select from a list instead of writing freely. This type of change is highly advisable to normalize attribute values within a digitisation project, to prevent possible human mistakes, and to force the use of predefined values needed for further processing. The resulting documents will still comply to the original DTD that includes the more general CDATA declaration for attribute values. The DTD simplification program picks-up the possible values from the set of 
sample files.

2. The second limitation is adding new attributes which do not exist in the original DTD. In our case, for example, we included more specific attributes for rendition instead of the rend included in the TEI DTD. This change, in fact, is not so grave since, in the end, all the new attributes can be easily removed by means of a parser in case that full compliance to the general DTD would be needed for some purpose. The DTD simplification tool also detects this new attributes. 

3. The third kind of modification consists of making compulsory (#REQUIRED in DTD jargon), some attributes which are not so (#IMPLIED). In this way, the DTD obliges encoders to fill in the attribute, usually from a predefined list of legal values. This step cannot be automated since it depends on a human decision.

4. The fourth kind of modification is removing elements that are useless for our purposes. For this we either had to have a clear idea of the possible processes we would apply later to our texts (and the corresponding markup requirements) or use an incremental simplification process based on real examples aided by a DTD simplification program that removes the unused elements and simplifies the content models. This removal of elements makes the DTD more restrictive to the encoders and avoids, to some extent, human errors, since elements not used in the sample set can no longer be included using the simplified DTD. If the need arises for the use of some discarded elements, a new sample document can be added to the reference set and the automatic simplification performed again, obtaining a new DTD with this element added. 
We have always been in favour of a simple markup scheme since there is no point in applying marks that will serve no further purpose, and that will make the markup task unnecessarily complex. There are two types of element removals. One is removing elements from the whole DTD so that such elements can be no longer used in our XML texts. For instance, we have removed the unnumbered div element (unnumbered divisions), to force the use of numbered divisions . The other type is removing the possibility of insertion of some elements within some others by modifying the adequate content models, for instance, to disable the use of hi within. Both simplifications are performed by our program.

5. The fifth kind of modification we thought of is adding completely new elements which are not included in the original TEI-lite DTD. This type of change should be avoided as much as possible since it allows for the creation of files that do not comply to the original DTD. When our DTD simplification program finds elements in the example set of documents that are not present in the general DTD it issues an error message. 

2. Our Dtd Simplification Tool 

It is clear that doing these simplifications by hand is tedious and error prone. Constructing a set of sample documents representative of all the types of documents we need to markup together with a program that simplifies the DTD automatically would simplify this task. On these premises we have built PruneDTD. 
Our simplification approach is to automatically select from a general DTD only those features that are used by a set of valid documents (validated against the more general DTD) and eliminate the rest of them, obtaining a narrow scope DTD which defines a subset of the original markup scheme. First our tool builds a Glushkov automata [4] by extracting the structure of the markup model from the general DTD (see figure 1). Then the XML sample files are preprocessed to extract the elements used and their nesting patterns, i.e. to extract the structure. Based on the Glushkov automata that represent the regular expressions that define the possible element contents according to the general DTD, we keep track of the elements used in the sample files by marking the visited states of the automata. Finally, a simplification process takes place. This process eliminates unused elements and simplifies the right parts of element definitions, i.e. the regular expressions that define further nestings. The simplified DTD str

(Here goes figure 1: Architecture of the tool to simplify DTDs)

This "pruned" DTD can be used to build new documents of the same markup subclass, which in turn would still comply to the original general DTD. Needless to say that working with a simpler DTD is easier. 

3. Conclusions 

Using this automated method, the simplified DTD can be updated immediately in the event that new features are added to (or eliminated from) the sample set of XML files (modifications to files of the sample-set must be done using the general DTD for validation). This process can be repeated to incrementally produce a narrow-scope DTD. In this way, we use a complex DTD as a general markup design frame to build a simpler working-DTD that suits a specific project's markup needs. Another use of this technique is to build a one-document DTD, i.e. the minimum DTD derived from the general DTD that a given XML document complies. This minimum set of rules can be included within the document (DOCTYPE declaration).

4. References 

[1] A. Bia and R. C. Carrasco. Automatic DTD simplification by examples. In ACH/ALLC 2001. The Association for Computers and the Humanities, The Association for Literary and Linguistic Computing, The 2001 Joint International Conference, pages 7-9, New York University, New York City, June 2001. 

[2] A. Bia, R. C. Carrasco, and M. L. Forcada. Identifying a reduced DTD from marked up documents. In J. S. Sánchez and F. Pla, editors, Proceedings of the 
SNRFAI'2001 conference, volume II of Treballs d'informática i tecnologia, num.7, pages 385-390, Castellón de la Plana, Spain, May 2001. Publicacions 
de la Universitat Jaume I, 2001. 

[3] L. Burnard. The Pizza Chef: a TEI Tag Set Selector. http://www.hcu.ox.ac.uk/TEI/pizza.html, September 1997. (Original version 13 September 1997, updated July 1998; Version 2 released 8 Oct 1999). 

[4] P. Caron and D. Ziadi. Characterization of Glushkov automata. TCS: Theoretical Computer Science, 233:75-90, 2000. 

[5] R. C. Carrasco, A. Bia, M. L. Forcada, and P. M. Pérez-Antón. Turning DTDs into specialized tree-automata-based schemata to match a collection of marked-up documents. 2001.

147

Dunning, Alastair. Arts and Humanities Data Service Resource
Woodhouse, Susi. Arts and Humanities Data Service Resource
Guy, Damon. Slough Borough Council

Establishing National Standards: The NOF Digitisation of Learning Materials Programme

This paper takes as its subject the £50-million NOF Digitisation of Learning Materials Programme, initiated as part of the New Opportunities Fund (NOF). Commencing in August 1999, the NOF-digitise programme is the UK’s largest ever tranche of publicly-available funds to be dedicated specifically to digitisation. The funding has been distributed to 152 projects who are digitising collections and then creating learning materials over a diverse variety of subjects. Often consortium-based, the projects consist of parties from community and voluntary sectors, local authorities, libraries and archives, museums, further and higher education and the private sector.

The paper has two principal aims.

1) Firstly, to demonstrate NOF-digitise’s involvement in delivering digital resources on a national level. In the UK, the use of digital resources has tended to be restricted to those working or studying at institutions with the economic weight to initiate such projects themselves. The programme is moving to rectify this with the provision of digital resources that will support learning in its broadest sense. The programme is developing materials in our cultural, voluntary, educational and other heritage institutions, permitting them to be made more widely available and to be exploited in ways that only the flexibility of the networked environment makes possible. The scope of the materials being digitised by projects working under the NOF-digitise banner is deliberately broad, relating to the visual and performing arts, history and heritage, science and health, the environment etc. Many of these resources will have a significant impact not only on those studying humanities subjects at school, but the fu

It is important to note that the NOF-digitise programme has not been established to stand alone. It is one segment of a larger information strategy, the 170 million Peoples’ Network, that aims to provide both the infrastructure and the training necessary to allow greater access and greater use of digital resources. The sheer size of the programme means that it also has a key role in, and influence on, other complementary programmes in both the public and commercial sectors. The paper will explore the national information architecture that is being built to house and disseminate the materials created in the NOF-digitise programme, as well as its influence on other digital initiatives

2) The second aim of the paper concerns the technical platform built for the NOF-digitise programme. The technical standards and codes of good practice developed by information professionals in the Higher Education and academic library sectors are now being adopted on a national level. The paper will investigate the philosophy at work in developing this platform for NOF-digitise, and articulate the issues and problems that have arisen as these standards are applied in a different context and to a larger user-base.

When NOF-digitise commenced, its programme managers were acutely aware of the possibilities of loss inherent in creating digital data, as well as the general lack of experience of digitisation in those applying for project grants. Thus a technical advisory service was established in order to provide a broad range of information, and develop the technical platform from which the projects would operate. The building blocks of this platform have been the NOF-digitise Technical Standards and Guidelines. Each project has been presented with this document, detailing which standards must be adhered to, which standards should be adopted, and what standards may be taken up. 

The choice of particular standards, file formats and suggestions of good practice required much deliberation. It was acknowledged that the Technical Standards would have much influence on the parameters within which the NOF-digitise projects operated, as well as having an effect on the wider digital landscape, at home and abroad. Guidelines were established not just for data capture, but data management, collection development and access. 

Perhaps the most important aspect of the technical guideline’s philosophy has been its adherence to open standards. There were advantages in choosing established proprietary formats for both the capture and the delivery of the digitised data, not least the familiarity of many projects with certain commercial packages. But the adoption of proprietary formats would have been a risky enterprise. Without sufficient foresight in this area, previous public digitisation projects have run aground, risking the very digital data that formed the core of the project. The concern for the long-term preservation of the digitised materials is an undercurrent running through all of the technical advisory service’s advice. How national projects have responded to this philosophy, and the technical advisory service as a whole, will be investigated during the paper.

148

Bia, Alejandro. Miguel de Cervantes DL
Vélez, Soledad. Miguel de Cervantes DL
Sanchez-Quero, Manuel. Miguel de Cervantes DL
Garcia, Juan Carlos. Miguel de Cervantes DL

Manuscripts of America in the Royal Spanish Collections: A Joint Digitization Project with the Library of the Royal Palace of Spain

1. Introduction
With an aim of bringing cultural contents to cyberspace and spreading some unknown aspects of the history of the Americas, the Miguel de Cervantes Digital Library has embarked in a joint effort with the Library of the Royal Palace of Spain, to develop the digital web publication of the Manuscripts of the Americas in the Royal Collections funds. In this joint venture, the Library of Royal Palace supplied its invaluable contents for digitization, and the Miguel de Cervantes DL its technology and experience as a digital publisher. The goal was to join the ancient and the new, the most precious and carefully preserved documents with the new electronic publishing technologies. The result was to make freely available to a worldwide public those otherwise unreachable treasures of the Royal Collections. We expected this effort to have an impact both on the general reader and on the specialized researcher. In both cases they would be spared the time and cost of a trip to Madrid, and the risk of an access denial. In

With a long term vision we set high level quality requirements to obtain accurate digital facsimiles of the originals not only for the present state of the art of web technology but for future higher standard of graphic display. With an innovative spirit we applied the best available technology at the service of culture.

2. The library of the royal palace
The Library of the Royal Palace of Spain has its origins in the private libraries of the kings of the House of Borbones. In the National Patrimony collections, which include the Library of the Royal Palace, we can find unique editions, rare books, incunabula, manuscripts, old printings, ancient maps, musical scores, drawings, engravings, rare book bindings, coins, tapestry, furniture and much more. Most of the Manuscripts of the Americas collection comprises manuscripts of the 17th and 18th centuries, the kings' letters, royal credentials, royal bans, etc., all of great historical and cultural value: a wide and valuable testimonial set of the American colonial period.

3. Some technical details of the project
We started building this digital collection in two parallel fronts. On one hand, cataloguing the funds, and on the other hand, digitizing them. For cataloging, we defined an XML (eXtensible Markup Language) mark-up scheme based on the Master guidelines (Burnard and Robinson, 1999), and developed our own software for the automatic transformation of catalogue records to nicely formatted HTML (HyperText Markup Language) for web publication.

The digitization of the funds was partially done by the Miguel de Cervantes DL, and partially assigned to a private company that followed our quality standards.

After digitization was done, partially-automated digital image processing was applied to the top quality big-size preservation images to obtain both thumb-nail images to create graphical indexes and medium-size medium-quality images for adequate web transmission and display on nowadays screens. In this process, high quality TIFF images were converted to lower quality JPEG ones. Let’s remember that the JPEG format offers a high level of compression for higher speed transmission but with some loss of quality.

The last stage in digital facsimile production was the automatic generation of HTML facsimile ensembles using FacsBuilder, a tool we have developed to speed-up de mounting of facsimile sets for web display. 

To complement this huge digital publishing effort, a lot of care and meticulous work was devoted to graphic design of the web site where the digital collection would be displayed.

3.1. Cataloguing
For cataloguing, we followed the MASTER guidelines (Manuscript access through Standards for Electronic Records). MASTER is a project partially funded by the European Union with the purpose of defining a general norm based on XML for the description of manuscripts. This working group developed a DTD (Document Type Definition) for manuscript descriptions that is compatible with the recommendations of the TEI Consortium (Text Encoding Initiative) and is expected to be included in future editions of the TEI Guidelines. In practice, the Master tag set is larger and more complex than the TEI Header, the TEI subset devoted to bibliographic information.

The use of the MASTER guidelines for cataloguing manuscripts offered us a wide range of possibilities through its rich markup system for authorities, entities and toponyms. It allows the librarian to develop very rich descriptions of manuscripts, while it allows the user to make both simple and also highly accurate searches to fulfill all kinds of requirements.

MASTER permits a multilevel bibliographical description: a general descriptor of the manuscript (main record), and several analytic descriptors for each one of the parts that conform the document.

3.2 Advantages of XML-Master with respect to MARC
The use of an XML format for bibliographic metadata grants compatibility and easy interchangeability between platforms and applications by means of simple transformations as XML and XSLT are gaining ground as data exchange standards (Bradley, 2000). This ease of transformation to other formats plays an important role in the preservation of metadata by preventing obsolescence of the format. The open text nature of XML markup makes it easy to read and to process giving independence from other obscure proprietary formats.

Another plus in using XML is the possibility of generating different output formats with different renderings by means of XSLT (XML Stylesheet Transformations) (Kay, 2000). An example of these transformations is the automatic processing of XML records for research purposes, statistical analysis, insertion in databases, complex searches, etc.

3.3 Advanced searches
The use of XML for digital publishing allows for complex searches based on semantic tags. For instance, we can search the word "Aragón" only in responsibility statements and only when it is contained between proper name tags.

Searches based on the structure of the document, which are not possible with ordinary relational database architectures are becoming more and more common. For example, we can search for the word "Aragon" in responsibility statements but now only when it is contained in an organization name.

3.4 Software
At the Miguel de Cervantes DL, we currently do research in new technologies and technological innovation concerning digital publishing (Bia and Carrasco, 2001), natural language processing (Bia-Munoz:2000), computational tools for linguistic research (Zaslavsky, Bia and Monostori, 2001) and web publishing technologies. For this project we have developed highly complex software, like FacsBuilder (Alejandro Bia, JuanCarlos García and Manuel Sánchez-Quero), which saves time, dramatically reducing production costs. Starting from a sequence of numbered images and two templates the facsimile edition is automatically assembled.

3.5 Transformation process
The original bibliographic records from the Royal Palace were sent in MARC format. We had to decode them and develop the necessary structures to keep the semantic information associated to the cataloguing data. For this purpose, we had to identify relationships and dependences. The result was the automatic generation of MASTER records from MARC ones. However, expert revision by a librarian was necessary to assure the quality of the resulting data.

References

Bia, A, García, J.C. & Sánchez, M.. (2001). A Versatile Facsimile and Transcription Service for Manuscripts and Rare Old Books at the Miguel de Cervantes Digital Library. Proceedings of First ACM/IEEE-CS Joint Conference on Digital Libraries, page 477, Roanoke, Virginia, USA.

Bia, A. & Carrasco, R. C. (2001). Automatic DTD simplification by examples. ACH/ALLC 2001,
Joint International Conference, pages 7–9, New York University, New York City.

Bia, A. & Muñoz, R. (2000). Aplicación de Técnicas de Extracción de Información a Bibliotecas Digitales (Applying Information Extraction Techniques to DLs). In Ferro, M. V., editor, Proceedings of the XVI Conference of the SEPLN, vol 26, pages 207–214, Univ. of Vigo, Spain.

Burnard, L. & Robinson, P. (1999). Vers un standard européen de description des manuscrits: le project Master. In Les documents anciens, volume 3 of Document numérique, pages 151–169. Hermes Science Publications, Paris.

Zaslavsky, A., Bia, A., & Monostori, K. (2001). Using Copy-Detection and Text Comparison Algorithms for Cross-Referencing Multiple Editions of Literary Works. In Constantopoulos, P. and Solvberg, I., editors, 5th European Conference, proceedings/ECDL 2001, volume 2163 of Lecture Notes in Computer Science, pages 103–114, Darmstadt, Germany. Springer-Verlag.

151

Butler, Terry. Arts TLC
Sinclair, Stefan. MLCS, University of Alberta

TAPoR - A Canadian Text Analysis Portal for Research

TAPoR, the Text-Analysis Portal for Research, is a major new initiative in the field of humanities computing. Funding has been recently announced from the Canada Foundation for Innovation, providing about $2.6 million (Canadian) toward a total project cost of $6.78 million dollars.

The project will establish six regional centres across Canada, which will be the anchors in a national text analysis research network. The project will provide a multi-faceted portal for researchers via the Internet, where they can access electronic text resources, and new suites of software for the conversion, adaptation, analysis, display and publication of electronic text. The undertaking is the largest ever of its kind in Canada and among the largest in the world.

The TAPoR project represents the first major initiative in Canada to exploit the Web as an intelligent medium for text. (We read "text" broadly to mean any record of human communication that can be digitally represented.) In addition to providing a framework for computing in textual and related fields, it represents an opportunity for the community of humanities computing scholars in Canada to actively participate in international fora which are developing standards, tools, and methods in this field.

McMaster University is the lead institution in this national initiative: the other partners represent leading humanities computing centres in Canada: University of Victoria (in collaboration with Malaspina University College), University of Alberta, University of Toronto, Université de Montreal (Law) and University of New Brunswick.

Our paper will briefly survey current humanities computing practice in Canada, and discuss the strategic issues facing us as we deploy the physical infrastructure for TAPoR. In the longer term, TAPoR has the potential to influence the practice of humanities scholars in Canada and beyond our shores. We will review the relevant experience from other national initiatives in the field, including the HUMBUL portal in the United Kingdom.

Project Description
TAPoR will build a unique human and computing infrastructure for text analysis across the country by establishing six regional centres to form one national text analysis research portal. This portal will be a gateway to tools for sophisticated analysis and retrieval, along with representative texts for experimentation. The local centres will include text research laboratories with best-of-breed software and full-text servers that are coordinated into a vertical portal for the study of electronic texts. Each centre will be integrated into its local research culture and, thus, some variation will exist from centre to centre.

Each local centre will have a laboratory available to the research community at the participating university. The laboratory will house networked workstations with scanners, media acquisition devices, and software. The software suite will include software to scan documents, optical character recognition (OCR) software to create electronic texts from images of documents, multimedia manipulation software for media associated with texts, software for encoding texts with structural and interpretative information, personal computer text analysis tools, and exporting tools for transferring electronic texts to the server in standard forms like XML (extensible markup language) and PDF (portable document format). The laboratories at individual sites will vary as they are integrated into the local research culture.

Each centre will run a text server for the local community and the region. This server will have the ample disk space to store large, media-rich text databases and it will be configured to be easy to maintain and capable of sustaining significant traffic, through load-balancing strategies. The servers will run locally developed text tools and a suite of common tools available to all users through an open research portal. For projects that are sharing electronic texts with a wider research community and need reliable access, there will be mirroring of functionality and resources between servers, thus allowing projects to scale from the local to public access (as well as provide valuable backup).

A key element of TAPoR is the vertical portal that will be developed for unified access to the resources and tools. The portal will handle security, access and rights management; provide an e-commerce-like application for the management of e-texts; facilitate the interconnectivity of the six servers including the interoperability among the text databases; and provide a common global access to TAPoR that can be linked to international partners. TAPoR will make the best practices of experienced computing humanists available to the larger humanities research community through this portal. 

Anticipated Benefits
This infrastructure will provide a giant leap forward for humanities researchers and will be unique to this country. It will provide local centres across Canada with a common interface to e-text resources and a common workbench for text analysis tools. Additionally, it will be flexible enough to be customized for particular projects and training needs. This infrastructure will enable a coherent approach to the training of new humanities scholars and personnel. It will create a collaborative network of humanists working to adapt computing techniques to the study of electronic texts. It will foster the development of new methods of analysis and enhance our ability to play a more significant global role in the advancement of humanities research. 

Moreover, TAPoR will transform how research results are transferred to the private sector and embedded in products such as document and content management systems, web search engines, personalization software and intelligent agents.

The research portal will support computing humanists who use electronic texts in their scholarly work in three ways. 
Text Representation - First, we learn through the careful preparation of electronic editions by choosing what to represent and what multimedia objects (like the images of a manuscript or audio clips of an oral interview) need to be incorporated into a scholarly edition. Ours is the age that is moving our textual heritage into digital form; research in this area is of vital and immediate importance.
Text Analysis Techniques and Tools - Second, we build and adapt computer software to implement research techniques with which we can ask interesting questions of electronic texts. As a critical mass of electronic texts becomes available, we are developing new techniques for analyzing and comparing them.
E-Text Access - Third, we develop scholarly environments based on electronic texts for research teams and the larger community to study issues through the aggregation of relevant tools, texts, and media. To answer many questions, a single text or tool does not suffice, nor do we conduct research in isolation. TAPoR will provide an effective environments for collaborative research.

References
Humbul web site.
Www.humbul.ac.uk
TAPoR information web site. huco.ualberta.ca/Tapor
UK Resource Discovery Network.
www.rdn.ac.uk

154

Sosnoski, James. University of Illinois at Chicago
Carter, Bryan. Central Missouri State University

International Learning Networks: The ASCEND project

The Arts and Sciences Collaborative Exchange Network Development (ASCEND) project has two central goals: (1) to be an online network where learning collaborations between the arts and the sciences across disciplines and educational institutions are developed and tested, and (2) to provide educational content and assessments of distance education projects for teachers and learners at the various sites in the network.

The organizational structure of ASCEND is based on the Virtual Harlem prototype, a project in which engineers, computer scientists, psychologists, and New Media Studies scholars have joined with African American Studies scholars, literary critics, historians, artists, and creative writers to build a model of Harlem, N.Y. at the time of the Harlem Renaissance. As the model of Harlem in the 20s is constructed, its current electronic version is made available to teachers and students at the various sites in the Virtual Harlem network. Assessments of both the technology and the learning experience are ongoing.

At the heart of the ASCEND project are collaborative learning networks (CLNs) the prototype of which is the Virtual Harlem project which integrates education in African American culture with the most recent advances in instructional technology and distance learning. On the one hand, it acquaints the public with one of the most astonishing periods of African American Cultural Heritage—the Harlem Renaissance. On the other hand, it acquaints students at several levels of the educational system, especially minority students, with advances in instructional technology, particularly with the use of virtual reality technologies. These objectives— to experiment and to educate --are integral to our conception of a Collaborative Learning Network (CLN). Persons who collaborate in the project can share their research discoveries or their study interests in the Harlem Renaissance with others in the network thus disseminating knowledge about it and promoting continued explorations into this historical period and its u

As we envision it, a CLN—because of its complex structure—requires that persons in the network to be both teachers and learners. The technical staff has to learn about the Harlem Renaissance from the non-technical staff. Similarly, the non-technical staff has to learn about the technologies of networking from the technical staff. Within this framework, everyone in the network is both teacher and learner at some level or with respect to some area of study. Even high school students who are learning about the Harlem Renaissance for the first time are encouraged to "discover" new materials relevant to the project and to impart their discoveries to others, including their teachers.

The unusual combination of disciplines in the project—African American culture, literary, historical, urban, gender, social, anthropological, artistic, graphic, dramatic studies, communication, psychology, engineering, computer science, and visualization—mandates that no one person in the network will be the master of any one perspective. At the same time, the diversity of perspectives allows each person in the network to view the subject matter and the technology from a previously unfamiliar perspective. Moreover, since the project is based on virtual reality scenarios, at the higher end of the technological spectrum, a certain excitement is continuously generated, especially when persons enter the network and view the work that has been completed. 

The Virtual Harlem collaborative learning network (which is comprised of all of the persons who are building Virtual Harlem) is based on a "systems dynamic" approach to learning whose aim is to provide "an effective basis than previously existed for understanding change and complexity" (Forrester 93). This approach focuses on acquiring "knowledge of how feedback loops, containing information flows, decision making, and action, control change in systems" (Forrester 93). It features a modeling technique using computers to develop a virtual dynamic structure that represents the real life system being studied. It is a type of "visual thinking" (Arnheim, 69; Hyerle, 96) used, for example, in computer generated visualizations of weather systems that allow us to more deeply understand how weather systems work. "Visual explanations" are not only commonly employed in the understanding of complex systems (Tuffe, 97) but deploy a mode of analogical inference-making or "configuring" also used in understanding the c

The ASCEND project, which is an outgrowth of the Virtual Harlem project, has six central sites: UIC and SciTech in Illinois, Central Missouri State (CSMU) in Missouri, Columbia Teachers College in New York, Alternative Educational Environments in Tucson Arizona, and Växjö University in Växjö, Sweden. At these sites are several scientific labs: the Data Visualization Lab at CUPPA, the Electronic Visualization Lab at UIC, FermiLab in Batavia (associated with SciTech), the Visualization Lab at the University of Arizona, and Frankhom in Sweden (an "online lab" for design that features video conferencing). 

Universities involved include: UIC, CMSU, Columbia Teachers College, University of Arizona, and Växjö University in Sweden. In addition, there are at least two Institutes involved (the Institute for Learning Technologies at CTC and one in the works at Växjö) as well as two small museums (SciTech in Aurora, IL and The Science Lab in Växjö, Sweden). The science departments to which members belong are: Computer Science, Engineering, Ecology, Visualization, and Communication. The humanities departments to which members belong are African American Studies, Fine Arts, and English. Community Technology Centers in New York are linked to the Institute for Learning Technologies at CTC. And, finally, two members are founders of the Unit for New Media Studies in the Department of Communication at UIC. Plans are already underway to involve other universities, institutes, and museums both in the states and abroad.

160

Moser, Dennis. Digital Imaging Faculty, Lee College

 Training The Rank & File: The Need for a Training Curriculum in Digitization of Cultural Heritage Materials

To date, most organized training in the digitization of cultural heritage materials has been aimed at managerial and directorial personnel. Examples of this are the training offered by both Northeast Document Conservation Center (the NEDCC’s "School for Scanning" workshops) and the Humanities Advanced Technology and Information Institute (HATII)/University of Glasgow-Rice University-University of North Carolina coalitions. Both are explicitly self-described as not for "training technicians" but for those who will be making decisions about digitization projects. This paper examines the growing need for technical training of the staff who will be doing the actual digitization. It concludes by 
describing a project funded by the Institute of Museum and Library Services comprising a collaboration between a museum, a public library, and a community college engaged in training the necessary technicians while creating a virtual collection of the three institutions.

Collection scanning has become an increasingly common activity for cultural heritage institutions, usually with the rationale of preserving collections by providing enhanced access electronically. Organizations such as the Research Library Group and the National Historical Publications and Records Commission (NHPRC) as well as governmental and academic institutions such as the Library of Congress and Cornell University have a long history of involvement in developing and implementing such projects. Further, their experience and expertise have been instrumental in the development of technical standards for this work. Their prominence has at least been a function of the resources upon which they have been able to draw. Technical expertise has come about by accomplishing their work in-house with their own staff or bringing in experts from outside to consult and train staff. 

Smaller and less well-endowed institutions have not been able to develop the in-house pool of talent that the aforementioned groups and institutions have and while their project managers and directors have benefited from NEDCC or HATII training (or other such opportunities), the needed technical skills are not being transferred to the technical staff at these smaller institutions. In other situations, there simply is no such technical staff available and the institutions are dependent upon knowledgeable outside vendors. National and regional library service advisors such as AMIGOS, SOLINET, the OCLC, and MARAC  have been instrumental in helping to provide some technical training in this area, but it is still, at best, an occasional para-professional endeavor. Some of these organizations do offer "continuing education credits" for the completion of training, but these are usually only useful at the employee’s performance evaluations and carry little academic value.

Through the auspices of a two year National Leadership Grant from the Institute of Museum and Library Services, Lee College, the Sterling Public Library and the Baytown Historical Museum have undertaken a project to digitize some 5,000 objects relating the development of the petrochemical industry in southeastern Texas, as exemplified through materials donated by the Exxon Corporation to the three institutions. Since one of the primary founders of the now ExxonMobil corporation was originally the Humble Oil Company of Baytown, Texas, the materials received relate directly to that portion of its history. The materials to be digitized include significant runs of refinery newsletters, corporate publications and a significant collection of historical photographs, all documenting the growth of Humble Oil as it is intertwined with that of the city of Baytown, Texas. 

One of the other goals of the grant was to develop a curriculum in the practice of digital imaging with an eye to providing the technical skills needed to perform the work to be done. This entails teaching full semester courses on the practical techniques of inventorying, preparing and scanning the materials in the collections. These courses are being taught to adult students, some of whom will eventually be pursuing additional academic training and others who will be pursuing strictly technical careers. These course are presently being included as elective courses of a certificate program and it is hoped that they will eventually become the basis for a separate certificate program.

161

van Zundert, Joris. NIWI - KNAW, Department of History
Laloli, Henek. NIWI-KNAW, Department of History
van Horik, René. NIWI-KNAW, Department of History

SkopeoPro: Durable Access To Image Archives Based On Open Standards.

Many institutions are creating digital collections of images. Photographs, drawings, paintings, maps and other types of visual sources are converted and made available with the help of information systems. A wide variety of digitisation methods, image specifications, metadata elements and IT infrastructures is applied. This rather heterogeneous situation can lead to range of problems regarding standardisation and (technical) interoperability, e.g.:

* In general it is difficult to exchange data between individual digital collections because a wide range of non-compatible metadata elements is used.
* The majority of information systems in use is build on proprietary data formats and meta data elements which make them difficult to maintain and difficult to adjust for new requirements.
* Many archives and archival institutions are not able to cope with the technical and financial demands put forward by providers of IT solutions for image description and archiving.

The proposed paper will report on the research project "SkopeoPro" carried out by the Netherlands Institute for Scientific Information Services (NIWI), resorting under the Royal Dutch Academy of Sciences (KNAW).
The main goal of this project is to develop an information system that enables the use of sets of adjustable descriptors (called application profiles) to document and disseminate digital visual sources. The possibility of exchanging these application profiles for viewing or exchanging data is an important target within the project.
Users of the system will be able to freely create and modify descriptors for their own use. Therefore descriptors may be based on well-accepted standards and guidelines, but they may also be highly adopted and specialised to fit a particular set of descriptors in use by a given archive.
To facilitate the for mentioned interoperability and adoptive behaviour the project is based strictly on open standards. The resulting system will therefore itself be available as open source software.
The paper will elaborate on the problem domain covered by the project and will describe the methods and techniques used to create a solution. Finally a discussion of further deployment of the project and the relation with other projects will be part of the paper.

Application profiles
Within the cultural heritage community it's common sense and good practice to add metadata to sources for the use and management of digital visual resources. Several metadata standards and guidelines exist to enable functions such as the administration, exploitation, digital archiving and resource discovery of the digital assets. Based on literature and own experience one can observe that organisations "mix and match" existing metadata element sets for local use. What happens here is that organisations create specific, adjusted application profiles to facilitate the creation of metadata.
The project "SkopeoPro" tries to develop tools for the creation, adjustment and reuse of application profiles. An application profile is a specific metadata schema that consists of elements drawn from one or more existing namespaces, combined by implementers and optimised for a particular local applicationi. A namespace declares the names and definitions of vocabulary terms. The Dublin Core element set(DCMES)is an example of a namespaceii.


By having access to a wide range of reusable and interoperable application profiles used by related institutes an organisation does not have to go through a labour intensive process to compile a metadata element set from scratch.
 

Interoperability
It can be observed that in most cases the metadata of digital image collections is rather specific. Next to metadata descriptors of local importance often elements from standard namespaces are applied. The "SkopeoPro" project enables cross collection searching by enabling automatic and manual mapping between local used elements and elements used in any other particular set of elements or application profile. The observation that mapping will partly be manually is supported by the following observation from the Schemas-Forum project: "From their various perspectives, [...] diverse communities seem to be reaching the same conclusion: the mapping process can be automated only in part, and that manual intervention by experts is usually needed to complete (or correct) the job".
Besides cross collection or cross schema search possibilities, the "SkopeoPro" system should make it possible to provide a view according to any particular profile, on data that was originally defined or created in accordance with another particular profile. This will enhance possibilities for different users (e.g. archives) to view and interchange data of different sources.

Open standards
The "SkopeoPro" project explicitly uses only open standards. The main reason for this is that in the future new components may be added with less effort and independently from licensing and copyright issues. Open standards also imply that other projects and researcher can use results from the project more easily. Standards and open techniques used by the project at this moment are: Java, XML, XML Schema, HTML, Public Licensing.

--------
i A detailed discussion on the concept of application profiles can be found in: Heery, Rachel and Patel, Manjula, 'Application profiles: mixing and matching metadata schemas', Ariadne ,<http://www.ariadne.ac.uk/issue25/app-profiles/intro.html, 25 (2000).
ii For more information on the Dublin Core namespace, see: <http://www.dublincore.org.

163

Macy, Laura. Macmillan Limited

From Print to Online

When the second edition of The New Grove Dictionary of Music and Musicians was published in print and online (as grovemusic.com), the editor, Stanley Sadie, was often asked in interviews whether he thought the online would usurp the print. ‘Oh no’ he would answer ‘the real Grove user will always prefer to sit in an armchair and read a book’. Sadie’s ‘real Grove user’ was not familiar to me, or to the many others who had been browsing, skimming, and occasionally carefully reading New Grove since its publication in 1980. But as he always did, Sadie captured in his answer something more than the truth about Grove. He captured its mythical status. However any individuals may have used ‘The Dictionary of Music and Musicians’, in the 120 years since its first appearance, ‘Grove’ was a Book…that one read… preferably in an armchair …preferably one of leather in a book-lined library.

Bringing that aristocratic old gent of a book into the world of hypertext and metadata without killing him off entirely (and with him all he has to offer), has preoccupied me for the past year, and I’ve learned much along the way. Every practical step in the development process carries an implicit philosophical issue: how to find the balance between the demands of a long-standing work with a century of expectation behind it and those of an online product with its own opportunities and expectations. I’m not the first to be confronted with such a task, and I won’t be the last. As it becomes increasingly evident that the online world is the ideal environment for reference content, more works with distinguished print histories join our ranks every day. It is my hope that this brief account of my experience – and that of my colleagues – developing Grove as an online product will be of use to those undertaking similar projects.

With the particular issue of a print history in mind, this paper will critically address some of the principal aspects of online development.

Content development: This has been the single most important challenge I have faced. One of the greatest opportunities offered by online delivery is that of currency. Theoretically, an online encyclopedia can have up-to-the-minute content and bibliography. The reality is tempered by the expectations of a market accustomed to the quality control possible in a ten-year long print project. How do we accelerate the editorial process to meet the expectations of an online community without compromising the integrity of a work known for its considered and thoughtful content?

Site design: A core market that is not known either for its computer savvy or its fashion consciousness demands a site design that is as clear and unencumbered as possible. As we learn more about how the site is being used, we continue to refine Grove’s design and interface.

Links: Any site on the worldwide web is partly a door to other sites. This role is a natural one for Grove, whose bibliographies have always been intended as a starting point for further research. Now that the web is a source for researchers, links to the best online sites are an obvious extension of our bibliographic policy. But Grove’s bibliographies are selective, and so must be our links. It was necessary to develop and implement an assessment system for new links and a regular review of existing ones. Early enthusiasm for adding as many links as possible has been replaced by a policy of quality over quantity.

Illustrative material: This is a complicated issue and an ongoing challenge. The print edition included many photographs, figures and musical examples for which we could not obtain online rights. But more importantly, both the types and sources of illustrative material available online are different from print. If our illustrative material online is to meet the standards set by the book, we must approach illustration from an established policy on what we illustrate and how, rather than simply making the best use of what we can find. This is easier said than done. Most complicated – and somewhat particular to Grove – is the issue of sound. Online delivery makes it possible to offer sound as well as notated musical examples. But this is the area in which we are perhaps most vulnerable to accusations of commercialism, populism or just plain bad taste. We have had to tread these waters carefully indeed.

164

Spaeth, Donald. University of Glasgow

Meanings of Possessions: A Textbased Approach to the Identities of the Middling Sort

Summary
This project uses computer analysis of probate records to study the middling sort in seventeenth-century England, and investigates the viability of tagged XML textbases for the representation and analysis of electronic documents. As complex sources, inventories provide a good source to test the potential of textbases which replicate the full content of records. Detailed evidence about household interiors, often discarded, will be examined to explore whether the ways people organised objects reveal the cultural meanings which they vested in their possessions and space, and thus how they defined themselves. Were there differences within the middling sort in the organisation of interiors, and how did these change over time? Related questions about the 'rebuilding' of seventeenth-century England, the gendered use of space, and changes in productive/domestic, and public/private, space will also be considered.

Context
Recent years have witnessed renewed interest in the middle ranks of society. Early modernists have documented the seventeenth-century emergence of a 'middling sort', who, with the better and poorer sorts, formed a familiar triad. Yet, while recent studies of Bradford and Halifax have located the 'making' of the middle class in industrialising towns, others (such as Dror Wahrman) have challenged the idea that economic change created a self-identifying middle class. Most historians now agree that cultural factors - including family, sociability and consumerism - were as important to middle-class identity as material factors and the nature of work. Jürgen Habermas's indentification of the emergence of a bourgeois 'public sphere', a literary arena for reasoned debate among individuals, has been influential. Yet the 'middling sort' should be understood on its own terms, and not viewed merely as a nascent middle class. We know most about its members in eighteenth-century towns, less about them before 1700 i

Probate inventories provide a rich source for understanding the cultures of the middling sort. Inventories have been intensively worked in studies of agrarian practice, and are a key source in the growing literature on the emergence of consumerism. Yet inventories provide better evidence of ownership and use of household goods than of wealth or consumption. Margaret Spufford has demonstrated how misleading their valuations can be. The search for consumerism has focused attention on the purchase of new types of goods, while other objects are ignored. Although inventories give numerous details about where goods were located in a house and their proximity to other objects, this information has received little systematic analysis. Stana Nenadic's analysis of domestic culture in eighteenth-century Scotland shows how revealing such details can be. By showing how the 'middling' organised their possessions, inventories may help us to understand how they constructed domestic and productive space. As Pierre Bourd

Inventories do not tell the full story, of course, and should be related to other sources, including wills, lists of ratepayers and parish officers, and to other evidence of material culture. Although the survival of inventories for others besides the middling sort provides opportunities for social comparison, occupational descriptions are imprecise and must be used with care. It is also possible that the locations of goods reflect the categorisations of appraisers more than those of residents. The decisions and identities of appraisers, who were most likely drawn from the same social network as the deceased, must also be studied. 

Methods
Several recent systematic studies of inventories have omitted details of household goods, because the complexities of the source are difficult to represent in a conventional database. An inventory takes the form of a hierarchy, consisting of a list of items, each containing a list of goods. Although an item may include numerous objects, it nonetheless has only a single valuation, making it difficult to know the values of single objects. Non-standardised spellings and descriptions add further complications. Some researchers have adopted a 'questionnaire' approach, making no attempt to record every object, but asking whether certain key goods were present, and recording the overall number or value of certain goods. This approach unfortunately discards most of the contents of an inventory. Because each house is treated as if it had only one room, it is impossible to study how residents organised the house. This may explain why such studies have found few differences in the consumer behaviour of social groups

One exception to the 'questionnaire' method is the pioneering work of Mark Overton, who has developed his own encoding system and software to retrieve details from inventories. While we share a source-oriented philosophy, my approach nonetheless differs from his in several ways, due to my use of XML and its transformation language. A key requirement for identifying the meanings of possessions is the ability to explore relationships between objects, whether in the same or different parts of the house, a requirement that XSLT appears to fulfil. My investigations so far indicate that XSLT can be used to retrieve information, study relationships between objects in the same or different parts (nodes) of the hierarchy (tree), and standardise or classify content. It also has the potentional to link content from several digital sources. As an international standard, XML is also increasingly widely used and thus facilitates the analysis and exchange of data. Most importantly, while Overton's tagging rules were d

A case study of around 400 inventories (and associated wills) from seventeenth-century Thame will be used to assess the potential of an XML-tagged textbase. A small market town, Thame had a mixed population, permitting study of the middling sort in involved in agriculture, trade or crafts. The digitisation of these records as text files, preserving the original layout and spelling, by the Thame Research Group, provides a head start. Records from both diocesan records and the Prerogative Court of Canterbury are included, ensuring that wealthier inhabitants are represented. The probate records will be also linked to other databases of tax lists, poor relief records and parish officers created by the Research Group.

References
Bourdieu, P. Distinction: A Source Critique of the Judgement of Taste (London, 1984).
Davidoff, L., and Hall, C. Family Fortunes: Men and Women of the English Middle Class 1780-1850 (London, 1987).
Habermas, J. The Structural Transformation of the Bourgeois Public Sphere (1989).
Kent, J. 'The rural "middling sort" in early modern England, circa 1640-1740: some economic, political and socio-cultural characteristics', Rural History 10 (1999), 19-54.
Koditschek, T. Class Formation and Urban-Industrial Society: Bradford, 1750-1850 (Cambridge, 1990).
Nenadic, S. 'Middle-rank consumers and domestic culture in Edinburgh and Glasgow 1720-1840', Past & Present 145 (1994).
Platt, C. The Great Rebuildings of Tudor and Stuart England (London, 1994).
Shammas, C. The Pre-Industrial Consumer in England and America (Oxford, 1990).
Smail, J. The Origins of Middle-Class Culture: Halifax, Yorkshire, 1660-1780 (Ithaca, 1994).
Spufford, M., 'The limitations of the probate inventory', in English Rural Society, 1500-1800, ed. J. Chartres and D. Hey (Cambridge, 1990).
Wahrman, D. Imagining the Middle Class: The Political Representation of Class in Britain, c. 1780-1840 (Cambridge, 1995).
Weatherill, L., Consumer Behaviour and Material Culture, 1660-1760 (London, 1988).
Wrightson, K. ''Sorts of people' in Tudor and Stuart England'. In J. Barry and C. Brooks, eds., The Middling Sort of People (Houndmills, 1994).
World Wide Web Consortium, XSL Transformations (XSLT)
Version 1.0,
http://www.w3.org/TR/1999/REC-xslt-19991116 (1999)

168

Oronoz, Maite. University of the Basque Country
Izaskun, Aldezabal. University of the Basque Country
Itziar, Aduriz. University of the Basque Country
Bertol, Arrieta. University of the Basque Country
Diaz de Ilarraza, Arantza. University of the Basque Country
Aitziber, Atutxa. University of the Basque Country
Sarasola, Kepa. University of the Basque Country

The Design Of A Digital Resource To Store The Knowledge Of Linguistic Errors

0. Introduction

In this paper we present the design of a digital resource which will be used as a repository of information of linguistic errors. As a first step in the design of this database, we made a classification of possible errors. This classification is based on information contained in Basque grammars (Alberdi et al., 2001; Zubiri, 1994) and our previous experience on knowledge representation of language students during their learning process (Díaz de Ilarraza et al. 1997). Besides, it has been carried out in collaboration with linguists of our group (http:ixa.si.ehu.es). With the purpose of validating this classification, a questionnaire was presented to experienced Basque teachers and proofreaders from newspapers or publishing houses. With their advice we completed a classification of possible errors. We designed a Zope interface (Zope is a framework for building web applications that lets you connect to external databases (Latteier et al., 2001)) so that linguists and experts in the subject will be able to introduce, through Internet, any error found in a corpus (along with its corresponding information).

The importance of this database relies on the fact that the information contained in it will be used in two different, but in some sense, complementary, projects: i) a robust Basque grammar corrector that would get added to the already existing spelling corrector (Agirre et al., 1992) developed by our research group and integrated in Microsoft tools and, ii) a Basque tutor for syntax correction that would improve the one existing now (Díaz de Ilarraza et al., 1998) which includes different facilities related to giving adapted advise about morphological questions.

1. Classifying the errors

As mentioned, in order to make a thorough classification of the mistakes users might make, we used as a basis a set of Basque grammars, our previous experience in error classification (Maritxalar 1999) and followed the advice of some linguists in our group. Besides, we contrasted our classification with other works on error typology (Becker et al., 1999).

This way, we obtained a classification in which all errors were divided into five main categories:

- Spelling errors

- Morphological, syntactical or morphosyntactic errors

- Semantic errors

- Punctuation errors

- Errors due to the lack of standardisation of Basque

Each category was subcategorised so as to make a classification as detailed as possible.

2. The questionnaire

Our error classification is focused on Basque. It must be taken into account that, as the standardisation of Basque started in the late sixties, it has not been yet completed. The Basque Language Academy (Euskaltzaindia) publishes periodically rules for the standardisation of the language but they do not cover all its aspects. So, sometimes it is difficult to decide whether a given structure may be considered standard or not.

All these characteristics made more difficult to create a proper error classification in Basque. Therefore, as we assumed that learners of Basque and language professionals do not make the same mistakes and with the same frequency, we prepared a questionnaire in which experienced Basque teachers and proofreaders were asked about two aspects. We wanted to know if all the errors we considered were actually errors and, if this was the case, which was their frequency of occurrence in the kind of texts they usually work with. In the near future, we intend to continue implementing rules (Gojenola et al., 2000) for the detection of errors starting with those ranked with the highest frequency in the questionnaire.

3. The error database

We carried out the design of the database with two objectives in mind: to be open and flexible enough (this will allow the addition of new information), and user friendly (the interface for human users has to be designed as an easy-to-use tool). We designed a simple, standard database to collect errors of the different types mentioned in point 1. The database will allow anybody with permission to do so, to update the database using network technology.

The database is composed of four entities: error, category, text and correction. In the entity named 'error', we store, among other things, the following technical information: whether the error is automatically detectable/rectifiable, and in such case, which is the most appropriate tool to detect/correct it. Some psycholinguistic information is stored too: the level of language knowledge where errors occur and the level of language knowledge where errors are supposed to be corrected. This psycholinguistic information will be used to improve a computer-assisted language-learning environment developed in our group. Depending on the students' language knowledge level, the system will give advice and correct or not the error.

We also specify the origin of the error (e.g. influence of Spanish) and the possible cause of it. For each occurrence of the error in each sentence, an attribute with a value ranging from 0 to 5 indicates to which extent we are sure that it is really an error in this context. A given word or structure might be always considered an error or it might be considered an error just in some given contexts (e.g. "The bread ate John" might be correct in poetry).

4. Conclusions and future work

This work presents how experts' knowledge can be used to improve a classification of errors and how these errors could be encoded in a database. For the future, we intend to fill this database using network technologies. With this information we will continue implementing syntactical patterns for the detection of the most frequent errors. We also intend to use the information encoded in the database to construct a grammar checker. This grammar checker will be adapted to complete an intelligent computer-assisted language-learning environment for Basque (Díaz de Ilarraza et al., 1998).

Bibliography

[Agirre E., Alegria I., Arregi X., Artola X., Díaz de Ilarraza A., Maritxalar M., Sarasola K., Urkia M.] "Xuxen: 1 Spelling Checker/Corrector for Basque based in Two-Level Morphology" Proceedings of ANLP'92, 119-125, Povo Trento, 1992.

[Alberdi X., Sarasola I.] "Euskal estilo libururantz. Gramatika, estiloa eta hiztegia" Euskal Herriko Unibertsitateko argitalpen zerbitzua. Bilbo. 2001.

[Becker M., Bredenkamp A., Crysmann B., Klein J.] "Annotation of Error Types for German News Corpus" Proceedings of the ATALA workshop on Treebanks, Paris. 1999.

[Díaz de Ilarraza A., Maritxalar A., Maritxalar M., Oronoz M.] "Integration of NLP Tools in an Intelligent Computer Assisted Language Learning Environment for Basque: IDAZKIDE" Proceedings of Natural Language Processing and Industrial Applications Moncton, Canada. 1998.

[Díaz de Ilarraza A., Maritxalar M., Oronoz M.] "Reusability of NLP tools for detecting rules and contexts when modelling language learners' knowledge" Proceedings of Recent Advances in NLP (RANLP97), 342-348. Tzigov Chark (Bulgary). 1997.

[Gojenola K., Oronoz M.] "Corpus-Based Syntactic Error Detection Using Syntactic Patterns" NAACL-ANLP00,Student Research Workshop . Seattle. April 30, 2000

[Latteier A., Pelleitier M.] "The Zope Book". New Riders. ISBN: 0735711372. July, 2001.

[Maritxalar M.] "Mugarri: Bigarren Hizkuntzako ikasleen hizkuntza ezagutza eskuratzeko sistema anitzeko ingurunea" Computer Science Faculty, UPV-EHU, Donostia, 1999.

[Zubiri I.] "Gramática didáctica del euskera" Didaktiker, S.A. Bilbo. 1994.

169

Schreibman, Susan. Maryland Institute for Technology in the Humanities

The Versioning Machine

Peter Robinson's 1996 article, "Is There a Text in These Variants" lucidly describes the challenge of editing texts with multiple witnesses in an electronic environment. The article raises questions about texts and textuality exploring where the "text" resides: is it to be found "in all the editions and printings: in all the various forms the text, any text, may take" (99), or is it in the imperfect transmission of the text over time. His thesis, "to consider how the new possibilities opened up by computer representation offer new ways of seeing all the various forms of any text - and, with these, the text beneath, within, or above all these various forms" (99) is a challenge current interfaces for electronic editions have not fully embraced. 

Projects like Robinson's own Hengwrt Chaucer Digital Facsimile and The Blake Archive, are two electronic scholarly editions which are excellent examples of projects that do not edit "from one distance and from one distance alone" (Robinson 107), but explore the text that emerges both in and between the variants, both in the text's linguistic and bibliographic codes. Both these projects, however, utilize commerical software systems to publish their texts. In the case of The Hengwrt Chaucer Digital Facsimile, Robinson's own publishing system, Anastasia: Analytical System Tools and SGML/XML Integration Applications (http://www.sd-editions.com/anastasia/index.html), is utilized. In the case of The Blake Archive, Dynaweb (http://www.inso.com/) is used. Both these publication tools require, in the case of Anastasia, some knowled

The Versioning Machine that Jose Chua, Amit Kumar, and Susan Schreian are developing at Maryland Institute for Technology in the Humanities (MITH) is an open-source tool that will be freely distributed to individuals and not-for-profit organizations under general public license. It will display XML encoded text which is both deeply-encoded and which exists in multiple witnesses. A prototype of this tool, developed by Jose Chua and Susan Schreian at New Jersey Institute of Technology, was presented as a poster session at last year's ACH/ALLC in New York. The present project will build upon Chua and Schreian's work to create a more robust tool which more fully addresses problems of encoding multiple witnesses and displaying them. 

To limit development time, the first iteration1 of The Versioning Machine will provide users with one publishing environment to display multiple witnesses. It will presume that users will use The Text Encoding Initiative (TEI) Document Type Definition following the Parallel Segmentation Method (TEI P4 19.2.3). It will also support the TEI's "Transcription of Primary Sources" tagset for indicating altered text, and will be based on the "Base Tagset for Prose". XSLT and CSS style sheets will also be provided to transform and style texts based on this DTD.2

The Versioning Machine will also display scholarly apparatus appropriate to the version of the text being displayed. Our XSLT stylesheet will reflect a robust typology of notes to indicate this apparatus. In many cases, a note may persist across several witnesses. In these cases, readers will be provided with visual clues to know if they had previously read a particular note (for example, unread notes may appear in yellow, and read notes in red). 

Another particularly challenging editorial issue has been the representation of the intra-textual writing process. When working with a particular manuscript draft, an editor may make an educated guess as to when additions, deletions and emendations were made (based on handwriting, ink colour, the logic of the sentences, etc). Indeed, one of the beauties of publishing facsimile editions is that no editorial statement need be made in this regard - it is left up the reader to decide on the writing process based on an examination of the evidence. Of course, this textual ambiguity can be replicated in the digital environment by publishing high quality scanned versions of text. Unfortunately, however, encoded texts allow no such ambiguity. Representing the intra-textual revision process in an encoded edition is a particularly interesting and challenging problem. In these cases, relative ordering may alleviate the encoding process. We are experimenting with assigning ordering comparative identifications in XML

This tool is being designed for both research and teaching purposes. It will presume that prior to using this tool, users are familiar with the general concepts of scholarly editing, as well as the TEI (particularly its tagsets for "Transcription of Primary Sources", Certainty and Responsibility" and "Critical Apparatus"). Documentation will be provided so that encoders can take their work to the next level by providing them with a web-deliverable publishing environment for displaying a critical edition featuring multiple witnesses of texts. 

Texts used to display the functionality of The Versioning Machine will come from several current projects at Maryland Institute for Technology in the Humanities, including the Dickinson Electronic Archive (http://jefferson.village.virginia.edu/dickinson/) and The Thomas MacGreevy Archive (http://jefferson.village.virginia.edu/macgreevy)
 

Bibliography

Robinson, Peter. "Is There a Text in These Variants". In The Literary Text in the Digital Age. Richard J Finneran (ed). Ann Arbor: The U of Michigan Press, 1996. rpt.1999. pp.99-116.

Sperberg-McQueen, CM and Lou Bernard. The Text Encoding Initiative Guidelines P4: XML Compatible Edition. 2001. http://www.tei-c.org/P4X/
1 A second phase of the project will include providing alternate DTDs based, for example, on encoding multiple witnesses as separate texts.
2 As all our code will be open source, users who have the appropriate skills may alter our style sheets, or indeed, substitute their own to reflect their own method of tagging.

170

Lundberg, Sigfrid. Lunds Universitet

St Laurentius Digital Manuscript Library: An Excursion Along The Border Between Metadata For Resource Discovery And For Resource Description

Medieval manuscripts are complicated. Each manuscript is a unique individual with its own history. Often, not always, they contain
several pieces of intellectual content, each of which may be well known in the sense that its content appear in many manuscripts from
the same time, and that this content may appear in printed editions today. To catalogue medieval manuscripts requires a complex descriptive
metadata schema, which is capable of capturing both the individual manuscript as work of art and as a unique blend of well known
intellectual content.

That is, you should be able to search for works by (say) Boethius or Virgil. But you should also be able to search manuscripts that have
been owned by (say) the monastic society of the Lund cathedral. Or manuscripts that were illuminated in Italy. The S:t Laurentius Digital Manuscript Library [1] is, among other things, an attempt to combine these two aspects. The collection of medieval manuscripts at Lund university library is being digitized, and cataloged using the Master XML DTD [2] developed by the Master project [3].

These records may be formatted as electronic texts using XSLT and a text formatting system. In addition, the Master descriptive metadata
is being transformed, again using XSLT, into a format more suitable for loading into a database.

In that database there will is one record for each manuscript. That is one record describing the actual physical object. In addition, each manuscript
contains intellectual content described as manuscript items in the Master description scheme. These items may be nested, such that a manuscript item
can have multiple parts and may be a part of some other item. In the Laurentius database, these items are individual records linked to other
records through isPartOf or hasPart relations.

All records, manuscript level as well as item level, ones are loaded into a single database which is searchable through Z39.50[4] making it
compatible (in principle) with our library OPAC. Laurentius can deliver search results in MARC.

However, the Laurentius database has a much more intricate search attribute architecture. For instance, the bib-1 attribute set[5] defines a
search attribute 'name'. So does Laurentius. All the different kinds of names defined by the Master DTD is searchable through that search
attribute. However Laurentius is using a hierarchical search attribute structure, such that 'name' is an aggregate of fields that are connected
to locally defined names, like historical names, names of historical persons involved in the origin of the manuscripts, or persons involved in
the acquisition of the object etc.

The same reasoning is applied to place names, dates and so forth. Even plain text (i.e., descriptive prose not bound to any particular field in
the database) is entered in this hierarchical structure. This means that we can define a search attribute 'history' combining all historical data,
and 'place' through which all place names can be searched, but there is also an attribute 'history-acquisition-place' for searching on this aspect
of our collection.

References
----------
[1]
http://laurentius.lub.lu.se/
[2]
http://www.hcu.ox.ac.uk/TEI/Master/Reference/DTD/masterx.dtd
[3]
http://www.cta.dmu.ac.uk/projects/master/index.html
[4]
http://lcweb.loc.gov/z3950/agency/
[5]
ftp://ftp.loc.gov/pub/z3950/defs/bib1.txt

172

Rae, Jan. The Open University

Value Added Video: The OU’s Video Data Capture ‘Lab’

The UK Open University uses video data capture as part of its software resource development process. A Lab has been set-up that enables formative evaluation and usability testing to be carried out and for ‘evidence’ of ‘participant’ actions and thoughts to be recorded so that they can be made available to ‘observers’ for a variety of purposes.

The primary aims of the video data capture ‘lab’ are twofold, firstly, for the space to create an encouraging, social environment in which participants can feel free to expose their views and voice their experiences of the software resource under scrutiny. Secondly, for the video data technology to ‘capture’ both the participant’s ACTIVITY while using the resource and their ARTICULATION about the perceived strengths and weaknesses of the software from their point of view.

The OU’s Lab has separate ‘observer’ and ‘participant’ spaces. Participants in their space feel that their views are being offered freely and without duress or coercion and that they will be accepted anonymously, important for preventing any possible sense of becoming either ‘friend’ or 'victim' of the investigation. Observers, in their separate area of the lab, can talk with one another without being overheard by the participants and, if necessary, impose control. Communication between participants and observers is made possible via headsets and microphones so that both can clarify any problems or explore any opportunities within the given scenario and that the observers can prompt activity or stimulate the Thinking-Aloud protocol feedback, all of which is recorded. Multiple video images record keyboard and mouse activity simultaneously with the screen display and the participants' working environment.

Developing roles and developing guidelines ...

Observers have a vital role to play both during the sessions and later in the process of analysing, articulating and disseminating the findings to others. Seeing how participants have received a particular element of software design can be extremely illuminating for academics, editors, librarians, software designers, graphic designers, cartographers, and producers etc, all of whom can play a role at some point in the development of software resources. The interests of these various stakeholders in both the pedagogic and the software design of the resource must be accommodated by the process and can influence how the evidence is used.

Some will observe and play an active part in the video data capture recording session as a synchronous, 'live' event. Some may, in fact, be both observer and participant, learning from a reflexive use of video evidence. Others will have the video evidence presented to them as an asynchronous event after the original recording, offering a reflective use of it for stakeholders in the design of the software resource. The value of the video evidence lies in ‘showing’ real experiences of software resources in use to the range of players who might expect to become involved in the process of software creation.

As the role of those involved in the process of video data capture has evolved so have the guidelines that are followed by the growing number of groups using the technique. This paper offers these guidelines as advice on various process elements that can lead to the production of usable video evidence, including:

  • the choice of appropriate participants that will be able to provide usable evidence
  • the ordering of participants to help get the best out of recording sessions
  • how to get permission from your participants to use the data that they provide
  • issues relating to data protection.

Video data capture does not require large numbers of participants to be involved; an understanding of which data capture paradigm will best inform your needs will help you to decide how many participants a particular investigation needs. Nielsen (1994) suggests that "it is possible to run user tests without sophisticated labs, simply by bringing in some real users, giving them some typical test tasks, and asking them to think out loud while they perform the tasks", but to do so without video capture would rob the process of the evidence for any later, asynchronous use.

Sessions should be ‘scripted’, though this can be used pragmatically. A script gives structure to the video capture session and allows the findings from different sessions to have a degree of standardisation that makes analysis easier. As with most data collection activities it is advisable to pilot your activity, any refinement or customisation that the pilot suggests will ultimately contribute to the overall value of the video evidence. Usability testing protocols and material available on the Web, for example from Information & Design or Steve Krug, may need modification to cater for the demands of UK Copyright and Data Protection regulations.

There are pitfalls that need to be avoided when capturing participants on video. They might sound basic but nevertheless they can happen. For example, don’t let your participants do all their talking before the video capture session begins and don’t let them get too lost in the software. Don’t let the session last too long and don’t let the participants feel it is they that are being ‘tested’ - as Krug says, "... we’re testing the site, not you".

Arts & Humanities ...

The paper will include illustrations of the OU’s video data capture Lab which is still expanding. Examples of the OU's Faculty of Arts use of the Lab for Arts & Humanities software resources will be discussed as well as findings from the literature of the value placed on video data capture as a valued source of empirical data. Issues of staff development and staff training in software processes will also be discussed.

Information & Design, Usability Testing Materials, available at: http://www.infodesign.com.au/usability/usabilitytestingmaterials.html

Nielsen, J. , 1994, Guerrilla HCI: Using Discount Usability Engineering to Penetrate the Intimidation Barrier, available at: http://www.useit.com/papers/guerrilla_hci.html

Krug, S., sample test script, available at: http://www.sensible.com/

 

Sessions

118

Owen, Catherine. Performing Arts Data Service
Kilbride, William. Archaeology Data Service

Re-creating Times, Places and Spaces

Introduction

This experimental panel session seeks to exploit the shared experience of two very different sectors - archaeology and the performing arts. While comparisons between these sectors have been made in conventional scholarship, the implications of these shared challenges have not been explored in the context of digital resources. Rather than focussing on specific applications or methodologies, this session will explore the shared experiences of these sectors as they come to terms with the digital revolution. The panel session will engender a lively exchange of views.

As curators of digital collections we develop and refine methodologies to document, display and preserve data collections. What we must also address is what role the artefacts represented in those collections play in helping the user to understand the temporal events to which they refer and how this might impact on our curatorial activities.

There are some unexpected yet theoretically informative parallels between performance arts and archaeology, which have consequences on how these radically different sectors approach digital resources. This is most obvious in on-going conversations between services in the AHDS, but is of wider relevance. Both the Performing Arts Data Service (PADS) and the Archaeology Data Service (ADS) are engaged in managing collections that encourage the re-creation or re-interpretation of events, activities and spatial entities. The many, at times surprising, parallels between performance and the archaeological study and the kinds of materials created to document these activities have led both services to investigate how working practices overlap, and where exchanges are possible or desirable.

This session brings together performers, archaeologists and academics who both archive and re-use digital resources in those disciplines to discuss the challenge of documenting the temporal and spatial. It will discuss whether digital environments offer new ways to approach these problem.

Abstract

Performing arts materials are inescapably about activity. The most passive of resources - a musical score, a play script - are the instructions for creating and re-creating movements of the body and contain directions which delineate both space and time. As users, we may not always wish to use these instructions to experience the activity ourselves, preferring instead perhaps to suit them to other forms of analyses, but much of our research is likely to depend on an understanding of how those instructions have been interpreted in the past and how they may be interpreted in the future. Practice-based research may lead us to multiple instances of the work, both as performed and interpreted by others, but also as performed and interpreted by ourselves as we endeavour to reach conclusions.

The archaeological record, at first inspection is anything but active: it is static and passive. Yet to understand the archaeological record it is essential to understand the different activities that constructed it. At a basic level, we seek to understand the activities and practices of past populations through their surviving material culture. This material culture is also the subject of natural and anthropogenic post-depositional activities that cloud or restrict our ability to perceive the underlying human activity with which we are primarily concerned. Thus to reconstruct the material record we need also to understand the activities that have lead to its modern form. Finally, and controversially, the practices of archaeology are themselves critical to the construction of the archaeological record. Excavation is a destructive process. The interpretations made and the significances attributed in the process of fieldwork are critical to the formation of the archaeological record since the process of d

Traditional archives and their digital counterparts in archaeology and the performing arts have an underlying problem of surrogacy. How can we move between the surrogate and often static, archival records and the cultural activities they purport to represent?

This problem is compounded by extraneous issues. Whilst the entertainment industry clearly recognises the threat to traditional commercial exploitation of its talent via the medium of the web, another reason for the lack of multi-media materials available to the research community is the reluctance of performers to allow any documentation of their work, either analogue or digital and to restrict access to any materials that are created. This reluctance is borne primarily from the recognition that it is impossible to re-construct objectively the meaning of a temporal event – effectively, the camera always lies (and so does the tape recorder). And, even for those performances which do not advertise themselves as improvisatory, any attempt at documentation can only illustrate a single instance of something which has had a past life and will continue developing.

 Accepting that any documentation of the performance is necessarily subjective and can only offer glimpses into the whole, the way in which archives, both digital and analogue, choose to present their holdings can influence the potential for re-use. At the PADS, a common data structure links discrete objects to ‘parent’ records which detail the performance, in some cases a further hierarchical level details the work being performed. In effect, we are creating records for conceptual entities, rather than for material resources, in order to create a meaningful environment for our objects. This is at odds with standard library practice and had led to challenges in interpreting standards for resource discovery (which very necessarily allow for retrieval of tangible objects) to describe temporal events which may involve several hundred contributors and acquire multiple meaningful dates.

In archaeology, the opposite situation pertains. While performers may discuss live performance, archaeological interpretation normally pertains to periods that are outwith the common or shared experience of scholarship. The record itself is constructed in the present - and as such the relationship between the objective record and the past to which it refers is obscure - indeed even the objectivity of the record is a matter of dispute. Given the scientific grounding of archaeological practices, this question of objectivity and subjectivity has been at the core of bitter and unresolved disputes among different theoretical traditions in archaeology. Given the detailed philosophical analysis that follows from the term "data" among archaeological theorists, it seems fair to ask what the Archaeology Data Service really means by data.

But what about data which is ‘born digital’ and which is not a surrogate for an analogue object or event? What about virtual realities, collaborative digital environments and digital performance resources? At first glance such resources appear more straightforward to collect and manage because they are designed for the digital environment. But, they are very often also designed with temporal elements which cannot be easily documented or reconstructed and may depend on bespoke or platform-dependent software which poses profound challenges to the archive responsible for preservation. In such cases, the artist(s) archaeologist(s) and archivist(s) may be forced to collaborate in the selection of materials to preserve and in the selection of formats which best represent the original intention in the performance or study.

162

Osborne, John M. Dickinson College
Anderson, Ian G. University of Glasgow
Gerencser, James W. Dickinson College.

Teaching, Learning, and Digitizing
Resources in the Humanities through the Cooperation of Educators and Archivists

This proposal recommends a panel to discuss the many benefits of cooperative efforts between educators and archival resource managers to utilize classroom assignments as a means to create digital assets while simultaneously educating students about technology and its applications and about primary resources and their research value.

This panel will explore strategies and possibilities for the fruitful cooperation within the Humanities in the digital arena. The example of two continuing innovative ventures, involving institutions that contrast markedly in location, goals, mission, and institutional scale, will initiate discussion on ways learning, teaching, and digital collections in the humanities may cooperate to their significant mutual benefit.

The needs of the student, teacher, and archivist differ. Students require inspiration, a sense of investment in their work, and, if possible, practical experience with some elements of their education which otherwise would remain on the theoretical plane. Instructors require meaningful and "minds on" engagement in their classes and the surety that their endeavors will not only teach existing standards but will also explore continuing possibilities in their field. Managers of archival resources need to build and refine both their digital assets and their user base, often with poor or non-existent resources outside of "top-down" driven grants. The three presentations on this panel will serve expansion of this discussion. Two will reflect the "user"/academic point of view of student and teacher, while the third will continue from this and stress the benefits and challenges for the archivist.

The first panelist will describe and discuss the history and development of an entry level Historical Methods class at a small liberal arts undergraduate college in Pennsylvania. Even though the use of primary materials in undergraduate instruction in the United States appears to be uncommon, this program has a thirty-year record of archival use. The recent innovation therefore exploits emerging technological developments by adapting and reforming a lengthy pedagogical experience. The main direction this has taken is that of user production of digital collections -- with appropriate historical context and illustration -- that are then archived at the site of the original source. This adjustment has offered significant advantages and rewards as well as challenges and valuable debate on topics that the Humanities face and will continue to face in the future. This first section will discuss these complexities from the point of view of teaching and learning at the undergraduate entry level and make recommen

The second paper describes, explains and analyses an alternative model for integrating learning, teaching and digital collection development that is both scaleable and transferable. This model operates within a different geographic, educational and pedagogic framework but provides the same win, win, win scenario for instructors, students and end users as that at Dickinson College. For four years the Humanities Advanced Technology and Information Institute at the University of Glasgow has offered Arts Faculty honours (3rd and 4th year) and Computing Science MSc IT students an option in digitization (2D Digitisation: Theory and Practice). A significant proportion of student assessment (60% of honours students final mark, 40% of MSc students final mark) includes the completion of an independent, personal digitization project. In the first two years of the course problem of access to, and availability of suitable source material proved a significant stumbling block for students. This produced projects that

Furthermore, the digital objects that students produced did not represent any coherent collection that could be re-used, even where the individual objects were of a high quality. These problems impeded the achievement of the course objectives, particularly that of students demonstrating their skills with high-end materials and equipment about which they had been taught. Therefore, in order to minimise the time students spent locating and accessing materials, to provide a level playing field for assessment and to better fulfill course objectives, a collaboration was sought with the Hunterian Art Gallery, Special Collections and Photographic Unit at the University of Glasgow. This has provided students with access to material of significant historical and intellectual value (15th and 16th Fine Art Prints) and to professional digitization equipment. This has not only had learning benefits to students, but has enhanced and developed the teaching practice of the instructor and provided the Hunterian Art Gall

The third panelist will discuss the unique possibilities afforded to archivists who team with educators to encourage the use of primary resources while developing the online accessibility and usability of local holdings. Digital technology challenges smaller archives and manuscript repositories and they need to balance often unreasonable user expectations and demands for increased digital access against the reality of limited staff and budgets. At Dickinson College, cooperation between the History Department faculty and the College Archivist has allowed those challenges to be turned into opportunities. While history students receive the benefits of a useful education, the Archives, in turn, receives the benefits that follow from the organization, digitization, and contextualization of its holdings. The innovative approach to teaching history at Dickinson, as introduced by the first panelist, has allowed exercises, once designed to instruct students in the use and nature of primary resources, to grow int

Discussion following the presentations will stress the common opportunities, costs, and benefits drawn from these examples. The session will stimulate consideration of a widely transferable strategy for such collaborations in the Humanities between learning, teaching and the archivist which can exploit the way that "users can create users" in the digital arena.

141

Hillyard, Mathew. Public Record Office
Sanderson, Rob. Liverpool University
Fraser, Michael. Oxford University

An Emerging UK Archival Network

A virtual archives catalogue for the UK is gradually evolving. At present there are different strands, funded by different bodies, presenting different catalogues, and at differnt stages of development. However all are sufficiently standards-based to make interoperability a real possibility. This session looks at three highly innovative building blocks towards one stop searching for archives in the UK: the English strand's (Access to Archives or A2A) use of XML; Z39.50 searching of multi-level catalogues across strands; harvesting OAI data from the strands and presenting it through one portal.

---

Digital Access to Archives through XML

Mathew Hillyard

After centuries of a monopoly for the paper catalogue, accepted research methodology has perhaps inevitably come to centre on the 'whole' physical finding aid - the bound tome on the sturdy bookshelf with its complete, sequential and contextually true record of all holdings past and present. And for the researcher standing in the flesh before banks of these logically arrayed bookshelves, the wealth of knowledge within arm's reach has always been unquestionably impressive.

But things have changed. This is the era of the virtual, the fast and the pre-processed; of the remote-control and the channel-hop; the ergonomically-designed and the user-friendly. The soundbite. Edutainment.

Providing a new form of (digital) access to archives (in A2A1) meant having to challenge the traditional research methodology in several ways: our researchers are now no longer there in the flesh, yet any word on any page of those catalogues should suddenly be within finger's reach. And as if that weren't enough, users want those select words on their screens within seconds - before online patience runs out and they're clicking off and away to some other corner of cyberspace.

So instead of starting with the 'whole' finding aid, we reversed the methodology. We start with the words, or the 'key' words to be precise. Using the XML2 technologies to hand, we build an instant, couple-of-Kilobytes' worth of virtual finding aid. We snap-shot those keywords, first framing them in their immediate contextual surroundings, then back-filling with the trail of salient headline information which precursed them. We do of course still also offer the traditional, full text, multi-Megabyte view of the finding aid but more than 90% of our users never ask for it...

A2A can enjoy the luxury of focusing foremost on the keyword because, as a centrally collated and homogenised database of nationally distributed catalogues, it is almost a 'virtual network' in its own right (without any of the associated interoperability issues). But since the database is built on EAD3 and XML standards, we can exploit the same technologies we have used to focus on the keyword to focus instead on an alternative exposure with 'real networks' in mind. Perhaps this should be the gist of the whole, presented and exchanged in small, mutually decipherable packets of data which aren't unduly subject to processing overheads or to the congestion of network traffic.

With conventionally accepted metadata components in mind, we can filter out the five EAD elements specifically designed for the purpose: subject, person, place, family, corporate body. Alternatively, the hierarchical nature of EAD means we can cream off and expose just the highest level (or levels) of pertinent information. Or we can combine or mix-and-match both of these approaches. Yet another option might be to generate a 'Table of Contents' type representation of the whole based around heading information only - and offer that as the searchable synopsis. We could even do all of this and add extra gobbets of peripheral information on-the-fly, such as a stamp to show the date or time of extraction.

Whatever the forthcoming demands of the online researcher or the emerging requirements for networked information exchange, it is the use of XML standards within A2A which affords us all this flexibility, facilitates our interoperability, and guarantees the viability of this wealth of data for the future.

1 Access to Archives: http://www.a2a.pro.gov.uk

2 XML: http://www.w3.org/XML

3 EAD: http://www.loc.gov/ead

---

 

Rob Sanderson

A Distributed Archival Network Model and Implementation: The Archives Hub Version 2

After the success of the current Archives Hub1 service, the next step which must be made is to decentralise this federated database into a true distributed service where all repositories are responsible for maintaining their own data. However interruption to the service must be minimal and all functionality currently available with the federated database must be maintained.

Using the Z39.502 protocol, an ISO standard for Information Retrieval, and the EAD data format for encoding Archival Finding Aids, makes this goal of decentralisation feasible, yet with more than 70 repositories contributing data, there were still many questions left to be resolved.

Using the Scan Harvesting model presented by Professor Ray Larson at JCDL 20013, it is possible to discard the previous 'broadcast search' model where the client was forced to interact with all of the servers in the network, regardless of whether they had any matching records or not and move to a fundamentally more efficient system of targeted searching while still in a dynamic and distributed environment.

By implementing a more standard relational database for metadata about the archival network along with judicious caching enabled only through the Z39.50 Explain service and the emerging ZeeRex4 standard, it is possible to further streamline the system allowing for real time interaction which remains within the operating parameters for the service.

This paper will discuss these issues of the Archives Hub 2 and how they were resolved, along with how they may be repurposed for similar networks. Further investigation is being carried out as to the potential for a fully international Archival Network partnering with the California Digital Library5 in the United States, and progress on this front will also be reported on.

1Archives Hub: http://www.archiveshub.ac.uk

2Z39.50: http://www.loc.gov/z3950/agency

3JCDL: http://www.acm.org/jcdl/jcdl01

4ZeeRex: http://explain.z3950.org

5CDL: http://www.cdlib.org

---

Fraser, Michael. Oxford University

Simple access to archival descriptions using the Open Archives Initiative Protocol

The Open Archives Initiative [1] was developed in response to a perceived need to provide interoperable access to archives of scholarly e-print literature. The Initiative has since developed to encompass a range of data providers. The OAI provides a lightweight (or 'low barrier') protocol for exposing and harvesting metadata. A data-provider exposes metadata about collections or items within collections and a service-provider harvests and aggregates metadata from one or more OAI data-providers (and would normally expect to provide a range of resource discovery services).

This presentation will discuss the issues concerning the exposition and harvesting of OAI metadata derived from archival descriptions.

The Humbul Humanities Hub is a partner in a JISC-funded Resource Discovery Network (RDN) project to develop a series of subject portals within the DNER. Humbul is collaborating with the Arts & Humanities Data Service on the development of an Arts & Humanities Portal (AHP).[2] The portal will give deeper access (via cross-searching/browsing and additional services) to a range of distributed data providers, with an emphasis on providers within the DNER. One type of data which has been identified for inclusion within the portal is archival descriptions, notably from participants in the UK's emerging archival network. Access to archival descriptions via the AHP has the potential to enable the following:

· Participating archives expose their collections to new audiences

· Users cross-search archival descriptions alongside other services available in the AHP;

· The portal is able to add some value by potentially making links between related data held or accessed by the AHP.

OAI metadata is not intended to replace existing, richer, metadata schemes like EAD. Rather it is intended as a mechanism which enables interoperable simple resource discovery across heterogeneous resources. Likewise, the development of the AHP is not intended to replace users' direct interactions with the services included within the portal. However, OAI can offer advantages where distributed services do not support a common metadata scheme (e.g. EAD) or a common search protocol (e.g. Z39.50).

Conformance with the OAI protocol as a data-provider requires support for the simple Dublin Core Metadata Element Set as minimum. Data providers may also expose their metadata according to other established schemes alongside simple Dublin Core records (e.g. EAD, Marc). The presentation will discuss some of the work which has been undertaken to map EAD, for example, to simple Dublin Core within the context of the Open Archives Initiative, including addressing collection vs. items issues; establishing a minimum set of metadata; and producing guidelines concerning how data should be structured within any given DC element.[3]

As an OAI Service Provider the AHP will harvest OAI metadata from each participating service at agreed intervals; index the resulting repositories; and provide basic search & retrieval interfaces, integrated within the cross-searching functionality of the AHP. The proposed model is based on that employed by the Resource Discovery Network to enable cross-searching of records across the hubs via the RDN's ResourceFinder. The ResourceFinder searches a database derived from data harvested from OAI repositories maintained by each of the Hubs. Harvested data is pre-processed (e.g. to normalise entity references) and indexed by Cheshire, which also provides the Web-based user interface and Z39.50 service.[4]

The intention is to build a service which, whilst integrated into the AHP, can itself be re-used outside the context of the AHP (e.g. by participating archival services).Through this effort information about the diverse and rich collections held within archives throughout the UK might be disseminated further by separating data and presentation whilst simultaneously preserving links from data exposed in this way back to the originating archive.

References

[1]. Open Archives Initiative. http://www.openarchives.org/

[2]. RDN Subject Portal Project. http://www.portal.ac.uk/spp/

[3]. Mellon-funded OAI projects include EAD records. See Donald J. Waters. "The Metadata Harvesting Initiative of the Mellon Foundation". ARL Bimonthly Report 217 (August 2001). http://www.arl.org/newsltr/217/waters.html (especially the work being undertaken at the University of Illinois at Urbana-Champaign (http://oai.grainger.uiuc.edu/).

[4]. See Pete Cliff. "Building ResourceFinder". Ariadne 30 (December 2001). http://www.ariadne.ac.uk/issue30/rdn-oai/

123

Southall, Humphrey. University of Portsmouth
Burton, Nick. University of Portsmouth
Richardson, Gudrun. University of Portsmouth
Macgill, James. University of Leeds

Putting the Great Britain Historical GIS on the web: A Vision of Britain through Time

The Great Britain Historical GIS (Geographical Information System) has been developed since 1994 through a complex series of grants from research funding bodies. The original aim was to create a tool for researchers in economic, social and demographic history. The system contains extensive transcriptions of statistical tables in the reports of the Census since 1801, plus vital registration data (births, marriages and deaths), unemployment and other measures of economic distress. All this information concerns geographical areas, from Great Britain as a whole down to individual parishes.

Our new project supported by the New Opportunities Fund builds on this work but is creating a different resource for a very different kind of audience. We have to move from a system built around proprietary GIS software which only a minority of the project’s staff know how to use to a web-based system, following open standards as far as possible, that attracts life-long learners because it is interesting and even fun. While one of our goals is to present our statistical information in interesting ways, our content must be broadened out to create a ‘sense of place’ with information on the history of every town and village in Great Britain: a true ‘vision of Britain through time’. This session explores different aspects of this transformation.

---

Towards A Cornucopia Of Place: Diversifying And Simplifying GBH

The existing GBH system has two components: a very complex geographical information system containing the changing boundaries of the counties, districts and parishes of Britain, implemented using the industry standard ArcInfo package but also a large amount of custom code written in Arc Macro Language; and an Oracle database storing c. 30m statistical data values in over 200 separate tables.

The new project will substantially diversify our holdings. Our mapping of changing boundaries will be extended to cover Scotland, and we are also integrating digital mapping of ‘ancient’ parish boundaries, and some sub-parish units, created by another project. These boundaries provide an essential framework for statistical and other content, but do not in themselves tell users anything about past landscapes, so we are adding digital images of two complete editions of Ordnance Survey one inch to the mile maps, the First Series published over the 19th century and the New Popular edition published in the 1940s. These sheets need not only to be image scanned but geo-referenced, relating each sheet to geographical coordinates, and ‘rubber sheeted’: the First Series maps are based on a relatively inaccurate survey, so the scans need to be stretched and compressed to more accurately fit real-world locations. New content also includes the complete texts of three descriptive gazetteers published in the late 19th

Our system must also be simplified and moved towards open standards. Firstly, the web site will be supported only by Oracle, without ArcInfo, and this means the boundary information as well as the image scanned maps and the text must be held within Oracle using specialised subsystems such as Oracle Spatial. Secondly, while we are required to support the Dublin Core we are exploring the use of a number of additional metadata standards designed specifically for spatio-temporal information. Thirdly, and linked to the creation of additional metadata, we are exploring ways to drastically reduce the total number of tables in the system. We both hope and expect that the final system will be heavily used, so performance issues are critical: while database-driven web sites are commonplace, ones based on spatially-enabled databases are less so.

---

Constructing an on-line historical gazetteer/place-name authority file for Great Britain

Both elements of the GBH GIS project, as they currently stand, involve places: names in the census transcriptions, locations and boundary change information in the GIS. A standard name is required to link the census information to the GIS, but no formal authority constraint has been placed on the names. Instead, they represent the development of names and of the location of parishes within the shifting administrative geography of the last two hundred years in England, Wales and Scotland.

As part of our commitment to NOF, we aim to organize this data into a format which will provide a comprehensive authority list of place names in Great Britain. Given the nature of our data, we also need to present change over time of a particular unit, such as dates during which it was known by a particular name, or the changing administrative hierarchies within which it has been located. This will require the identification of variant forms of name for a single place, distinguishing places with identical names, and associating them with a geographic locator.

We are investigating the use of the Alexandria Digital Library (ADL) gazetteer standard in order to facilitate this. It maintains four required elements: unique ID number, name, feature type and geographic location, but supports many additional elements including variant names, start and end dates for names and source information. The feature type is a difficulty, as the standard requires use of the ADL feature type thesaurus, which fails to support the administrative geography of Great Britain.

As well as using the information from censuses stored within our database, we will also look to F. Youngs, Local Administrative Units of England 2 vols (Royal Historical Society, 1979 and 1991) for external verification of our name forms and for supporting information. Other sources of additional information will be linked to the system: The Imperial Gazetteer of England and Wales (Edinburgh, 1870-72); Bartholomew's Gazetteer of the British Isles (Edinburgh, 1887) and, thanks to a collaboration with the Gazetteer of Scotland project, Groome's The Ordnance Gazetteer of Scotland (Edinburgh, 1882).

___

Presenting Cultural Content Geographically On The Web

Many of the projects within the nof-digitise programme and other web content creation programmes are creating place-specific content, and a large part of the funding within nof-digitise is going to consortia concerned titled "Sense of Place" --- but just how do we create a sense of place through a web site? Obviously, if a collection comes entirely from a single small area, some kind of sense of place may be created simply by looking at it. However, most projects cover areas larger than most people's notion of a place -- a neighbourhood or locality, but certainly not a whole county let alone a whole region. Sense of place must therefore be partly about selection within a collection, raising issues about metadata discussed elsewhere.

However, it is less than obvious that simply providing a geographical search engine, within a single site or via some kind of programme-wide portal, is all we need: sorting through hundreds of thousands of digital images and displaying those "near" a specified point. Some very old ideas from geography say that places are defined by site and situation; by the land they sit on, and the other places that surround them. In this sense, place is about context as much as content, and we need to present our content geographically, not just provide a spatial search engine.

A major part of the new GBHGIS project is based around GeoTools, an open source toolkit for geographical visualisation (see www.geotools.org). Geotools is a class library written in Java, so there are some restrictions on what browsers it will run within. However, it has been written very carefully to use a limited "lowest common denominator" subset of Java, and we it should run on a very wide range of desktop PCs using any "version 4" or later browser. The GBHGIS site itself is being designed so that gazetteer reference enquiries, for example, will not depend on the use of GeoTools, but it is hard to imagine anyone wanting to do interactive graphic visualisation on a mobile phone.

The presentation will cover both the history and continuing development of the underlying GeoTools class library, and a number of applets developed using it. These will include applets for visualizing the statistical material within the GBHGIS, such as a system built around the 1851 Census of Religion, displaying the religious mix in any selected district as a pie chart; the new version of the TimeMap Viewer, developed by the Electronic Cultural Atlas Initiative with GeoTools as the principal foundation (see http://www.archaeology.usyd.edu.au/timemap); and some exploratory work developed in partnership with the British Library, our partners within the "Sense of Place (National)" NOF consortium. These systems enable users not just to look at maps on the web, but to zoom in on them, "pan" around to look at different areas, and to select particular content for more detailed examination.

145

Dunning, Alastair. Arts and Humanities Data Service
Woodhouse, Susi. Resource
Guy, Damon. Slough Borough Council

Establishing National Standards: the NOF Digitisation

---

The People's Network And The NOF Digitisation Of Learning Materials Programme: Building An Information Architecture

This paper gives consideration of how two of the largest publicly-funded ICT programmes in the UK will together widen access to digital resources for the humanities. The paper will explore how these initiatives will provide a major contribution to the common information environment and point to how early agreement on common technical standards for content creation has enabled this.

NOF-digitise forms part of an information structure that ensures not only that Internet access is opened to a much greater range of the public, but the same wide user base has access to quality resources on the Internet. This structure is the £170 million People's Network that will, by the end of 2002, deliver ICT learning centres in all of the UK’s public libraries supported by appropriately trained and qualified staff together with access to a significant body of new online resources to support learning. NOF-digitise forms the last part of this strategy, and, overall, it represents a significant statement of support by the Government for public libraries and their role in delivering and supporting lifelong learning, improving social inclusion and community capacity-building. As a result of this work, the public library community in the UK in the midst of a sea-change in the way it develops, manages and delivers services, not least digital resources.

Three years from its inception, the NOF-digitise programme has now emerged as a coherent whole firmly based on collaborative working within consortia and partnerships of varying degrees of formality. All work is built upon an agreed range of technical standards, framed in such a way as to allow not only adoption of recent and emerging work such as that on collections description and the OAI protocol, but also to ensure optimum accessibility, sustainability and interoperability of materials.

All NOF Digitisation programme resources will be available via the People's Network and it is intended that they should also present a framework - both managerially and technically - which can be built upon by institutions in the cultural and other domains over the course of time. At the time of writing, a strategy to develop an overall architecture for the programme is in progress and will, it is hoped, be framed in such a way as to make it possible for NOF resources to sit within a common information environment and for their maximum potential to be realised.

---

The NOF-digitise Technical Advisory Service.

This paper aims to explore some of the issues raised in providing a technical advisory service for the £50 million NOF-digitise programme.

The advisory service has provided (and continues to provide) a range of resources for those creating and delivering digital content. The advisory service is based at two JISC services, the Arts and Humanities Data Service (AHDS) and the UK Office for Library Networking (UKOLN). The AHDS and UKOLN have not only an excellent combined knowledge of the processes involved in creating and delivering electronic collections, but are part of a wider sphere of services that have expert knowledge in all facets of the information technology environment. Thus the technical advisory service has been able to call on the shared experience of those with working within services, department and libraries in Higher Education, drawing in various strands of information to provide a coherent body of advice. This information is relevant not only to the various institutions working under the NOF-digitise banner, but any organisation getting to grips with the development and dissemination of digital resources. Expertise develope

The Technical Standards and Guidelines has been the most important element of the advisory service, issuing projects with a list of accepted formats and practices for the creation, management and delivery of their resources. It has been a tricky path to forge (the wish to stipulate particular standards has needed to be tempered by the desire to allow projects flexibility), yet a worthwhile one that has managed to raise some very interesting issues. The advisory service’s preference for non-proprietary formats, while generally accepted, has caused some problems. For example, it has been difficult to convince some projects of the disadvantages of Macromedia Flash. The software’s plus points (its ability to create eye-catching animation and its relative ease of use) have disguised the difficulties in migrating graphics developed using Flash, and also disguised the fact that a significant minority of users cannot download and run Flash files. The conference paper will expand on such issues, indicating why c

The technical advisory service has also been responsible for responding to the projects’ need for workshops, information papers. This has allowed us to construct of a picture of the state of technical knowledge amongst this wider user base, and so point as to where more needs to be done in terms of information dissemination.

The paper will also discuss the establishment of a NOF portal to allow for cross-searching between projects’ collections. The technical guidelines stipulate that projects must use Dublin Core compliant resource discovery metadata. When the projects are completed, the use of DC metadata will allow the Open Archive Initiative protocol to be employed to bring the NOF-digitise collections together (and, of course, allow for other collections to be searchable next to the NOF portal).

---

Collaborative Working For Large Digitisation Projects

This paper highlights the work of one of the consortia, the Sense of Place South East (SoPSE), formed as a result of grant awarded by the New Opportunities Fund. At the outset, there was little experience of the creation and delivery of digital materials within the consortium. Neither was there much individual experience of working as part of a larger technical project. Yet the consortium is now working towards developing a formidable array of digital objects and learning materials, delivered via a single website.

Taking the project as an example, the paper will articulate the steep learning curve involved in such types of public-sector digitisation projects. The project started as a humble group tackling a common goal, and initially that goal was overcoming the shared fear of not knowing how to achieve the stated project aims. Yet the first goal was attained by sharing information and confronting what was considered to be the main fear in a technology project - the technology. Through the formation of a consortium advisory board, the in-house development of technical expertise and the development of a specialised technology group, the project’s knowledge has grown steadily.

This paper will also highlight some of issues raised in developing a website and resources that have to incorporate not only the NOF technical standards and guidelines, but the needs of each of the various groups working in the consortium. Some of the main themes the project has had to deal with have included: the successful pooling of expertise to create an expert group to tackle web design and web technology; the utilisation of cutting-edge developments to handle a high volume of digitisation, and archiving and delivery procedures; the establishment of a key word dictionary and common terminologies; the development of a solid technical platform that would allow smaller partners within the consortium with little technological resources of their own to preserve their digital collections; the development of an information architecture model that would allow for the final delivery of a public archive of digital resources.

Some of these issues have been difficult to handle; others have come more easily.

The project started as a group of dissimilar collections with little obvious commonality except a "feeling" of a "sense of place". Through the development of collaborative specialist expertise, the establishment of links with the academic community and, in particular, the use of the resources provided via the NOF advisory service, there is now an efficient mechanism for the integration of various collections through a synergistic organisation with disparate digitisation goals but common management.

153

Proffitt, Merrilee. RLG
Smith, MacKenzie. MIT Libraries
Beaubien, Rick. UC Berkeley Library

METS: The Metadata Encoding & Transmission Standard

---

METS is a generalized metadata framework, developed to encode the structural metadata for objects within a digital library and related descriptive and administrative metadata. Those currently involved with or planning digitization will want to hear about METS, which can help to structure data for presentation and/or archiving.

Expressed using the XML schema language of the World Wide Web Consortium, METS provides for the responsible management and transfer of digital library objects by bundling and storing appropriate metadata along with the digital objects. The use of a single, flexible means of encoding can simplify both the exchange of objects between repositories and the development of software tools for search and display of those objects. Additionally, METS encoding will provide a coherent means for archiving digital objects and their metadata. The METS initiative has two major components. On the technical side, the initiative seeks to provide a single, standard mechanism for encoding all forms of metadata for digital library objects. On the organizational side, the group looks towards developing mechanisms for maintenance and further development of the format, including establishing a METS testbed and METS tools.

The first paper will provide a basic introduction to METS and will outline the objectives and progress and of the METS initiative to date. The second papers will report on efforts to develop a comprehensive toolset for creating and using METS objects. The third paper will be a report from the field, giving organizational background and context for the work to be accomplished, and explaining how METS fits into that work, and explaining and how a comprehensive toolset may help to meet community objectives.

*************************

Introduction to METS

MacKenzie Smith

METS is a generalized metadata framework, originally developed to model the structure of items in a digital library, and, optionally, to relate descriptive, administrative, and behavioral metadata to that logical structure. METS enables the management, manipulation, and transfer of digital library objects by bundling and storing appropriate metadata along with information about the structure of the digital objects and the location of the digital files that comprise the objects.

METS is expressed using XML, which means that METS data is stored according to platform and software independent encoding standards, such as UTF-8 (Unicode). One important application of METS may be as an implementation of the Open Archival Information System (OAIS) reference model and as such can function as a Suission Information Package (SIP) for use as a ingest transfer syntax; a Dissemination Information Package (DIP) as an export transfer syntax or to drive applications to manipulate the digital objects, such as to display them; and an Archival Information Package (AIP) for storing and managing digital objects internally.

Background

METS had its beginnings in a project that identified metadata and complex digital object structure as an area of critical concern for digital libraries. As more and more institutions created digital images and other digital files, there was growing concern about sensible storage for the digital objects (defined as digital files plus associated metadata). It was the beginning of a serious discussion, tying together many important aspects of digital library research. The Making of America 2 (MOA2) project sponsored by the Digital Library Federation (DLF) in the early stages and funded by the National Endowment for the Humanities was the project that resulted from these discussions. New York Public Library and the libraries of Cornell, Penn State, and Stanford collaborated under the leadership of the University of California, Berkeley Library, contributing images and data towards an investigation of structural and administrative metadata for digital objects. The MOA2 SGML Document Type Definition (DTD), which was the direct predecessor of METS, was developed for the MOA2 project to encapsulate what were then seen as the required metadata elements.

The MOA2 project was completed in early 2000, the Council on Library and Information Resources (CLIR) published the group's findings, and the MOA2 DTD was circulated for assessment and discussion. While MOA2 aroused considerable interest within the library community, the MOA2 DTD was too restrictive in some respects and lacked some basic functionality, especially for time-based media such as audio and video. A meeting was held in February 2001 for the various parties interested in advancing the MOA2 DTD to the next stage. Following this meeting, METS was born.

The METS Initiative: Technical Underpinnings

The technical component of the METS initiative has completed a draft schema for the encoding format and made it publicly available for review. The METS schema tries to support the dual and sometimes competing requirements of ensuring interoperability and exchange of documents between different institutions while also allowing for significant flexibility in local practice with regards to descriptive and other metadata standards.

METS has a very simple structure with just four major components: descriptive metadata, Administrative metadata, behavior metadata, a digital file inventory, and structure map. Only the file inventory and structure map are required.

-- Descriptive metadata is optional, and a METS object can contain a Metadata Reference or a Metadata Wrapper. A Metadata Reference is a link to external descriptive metadata, and a Metadata Wrapper is for descriptive metadata that is internal to the METS object, as either Base64 encoded binary data or XML. METS does not require a particular scheme for descriptive metadata, so the implementer can choose the most appropriate descriptive scheme.

-- The administrative metadata, also optional, has four optional subcomponents for technical metadata, rights metadata, source metadata, and preservation metadata. Each of these subsections act like the descriptive section in that the metadata can be encoded ("wrapped") within the METS document or pointed to in an external location "referenced"). What administrative metadata scheme is used is also at the discretion of the implementer, so only appropriate metadata need be supplied, depending on the type of digital object represented by the METS object.

-- An optional behavioral metadata section can list programs which perform operations on the digital object, such as display (or "dissemination"). These programs are linked to the part of the structure map which they will act on.

-- The file inventory allows for listing all the files associated with a digital object. Files can be grouped; some groupings might include master files, thumbnails, etc. The files may be pointed to or can be contained internally as Base64 encoded binary data.

-- The structure map forms a simple or complex tree structure that describes the relationships of the digital files (listed in the file inventory) that together comprise the digital object. The structure map permits the definition of a digital object that has either parallel or sequential modes and also allows for the coding of particular regions or zones of a digital file as part of the document.

The METS Initiative: Organization

The standard is currently maintained in the Network Development and MARC Standards Office of the Library of Congress. Having played a key role in moving this initiative forward and serving as the work coordinator, the DLF has helped to bring the METS work to the forefront. RLG has recently taken over as the new coordinator. To support the development of METS into a useful and responsive community standard, an editorial board has been formed to capture requests for changes to METS, make decisions about which changes to adopt and how to implement them in the XML schema. Version 1.0 of the schema is now in public release, and review and evaluation process based on use is ongoing.

References:

METS homepage: < http://www.loc.gov/standards/mets >

MOA2 homepage: < http://sunsite.berkeley.edu/MOA2 >

DLF homepage: < http://www.diglib.org/ >

RLG homepage: < http://www.rlg.org/ >

OAIS: < http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html >

Hurley, B. Price-Wilkin, J., Proffitt, M., Besser, H. (1999) The Making of America II Testbed Project: A Digital Library Service Model, CLIR. Also available at: < http://www.clir.org/pubs/reports/pub87/contents.html >

******

METS Tools and Architecture: Deploying METS in the U.C. Berkeley Library

Rick Beaubien

The sophistication and nature of METS objects fairly precludes working with the XML encoded files in their native form. Rather, a successful deployment of METS requires an architectural framework consisting of multiple modules supported by various auxiliary tools. The parts of such an architectural framework are likely to include:

  1. Applications for gathering structural, descriptive and administrative metadata and for assembling this metadata into METS objects
  2. A searchable, METS compliant repository
  3. Viewers to allow end users to display and navigate METS objects and their associated metadata.
  4. Data rescue applications to ease the transformation of legacy digital materials into the METS encoding format.

The U.C. Berkeley Library is one of many institutions in the process of developing such an integrated digital library architecture around METS and provides a case study for METS deployment problems and strategies. The tools coming out of efforts such as Berkeley’s, once made publicly available, can facilitate the widespread adoption of METS.

Applications for gathering metadata and creating METS objects.

The U.C. Berkeley Library was the lead institution in the Making of America II project that developed the MOA2.DTD, the immediate predecessor of the METS schema. For the MoA II project, the Library developed a prototype Microsoft Access database and user interface for capturing and maintaining all of the structural, descriptive and administrative metadata required to build digital library objects. It further developed a program to query this database and assemble these structural, descriptive and administrative metadata into MOA2.DTD compliant XML documents.

This proof-of-concept prototype demonstrated that very complex MOA2 objects could be assembled automatically from a database. However, the limitations of MS Access quickly exhibited themselves to the staff testing this prototype. Some of these problems were expected, such as the limits on the size and speed of the MS Access database. However, others provided new insights on how the next generation user interface should be developed. Some examples of needs that were identified include: a more visual, intuitive interface; increased flexibility in tailoring input forms to project needs; a more sophisticated system of "defaults."

With these "lessons learned" in mind, U.C. Berkeley Library is currently developing a METS object creation and maintenance module built around an SQL Server database. This module uses a web browser interface driven by Java servlets to gather the requisite metadata. The interface communicates with a Java server that mediates between user-centric and database-centric views of the metadata, and commits the users' edits to the database. The module also includes a sophisticated project management component that allows project managers to configure their input screens based on project needs. As with the MoA II prototype, once metadata input for a project has been completed, a Java application is run to query the project database and assemble the end METS objects.

A searchable, METS compliant repository

U.C. Berkeley Library currently plans to meet the indexing and searching needs of its METS repository module by loading the METS objects into Tamino, a native XML database developed by SoftwareAG of Germany. Tamino will index the key elements of the METS objects, particularly all levels of descriptive metadata, and provide for searching by means of an expanded version of W3C’s XQuery language. Indexing METS objects poses some special problems, because METS objects may simply point to external descriptive metadata instead of internally wrapping these. The module that prepares METS objects for loading into Tamino will therefore need somehow to incorporate the external descriptive metadata into the Tamino indexes.

A METS repository manager written in Java will facilitate the search and delivery of METS objects from the Tamino database. To initiate a search, users will interact with a web browser interface driven by java servlets; this interface, in turn, will communicate with the repository manager. Using the web interface, the user will select the desired category of materials s/he wishes to search and formulate an easily parameterized search. The repository manager will receive the search request and convert the search parameters into the XQuery language statement expected by the Tamino database. Once the search is completed, the repository manager will deliver the results to the user interface in the form of a list of entries that are hot-linked to a viewer application capable of displaying the associated METS objects.

The implementation of the search portion of U.C. Berkeley Library’s METS architecture is still very much in exploratory stages. Further progress is awaiting enhancements to the Tamino database promised for the last quarter of this year.

Viewers to display METS objects from the repository

As part of its participation in the Making of America II project, the U.C. Berkeley Library developed a generalized Java application suite that enables end users to view and navigate complex objects encoded according to the MOA2.DTD. The Library is currently converting this viewer to work with METS objects as well. As part of this process, it is expanding the viewer to take advantage of capabilities supported by METS but not by MOA2. In particular, Berkeley is collaborating with New York University to add support for Audio and Video materials which were not accommodated under the MOA2.DTD.

Users of the METS viewer interact with a web browser interface driven by Java servlets. This interface communicates with remote (server-side) Java components, particularly the repository manager. These activate the requested objects and fulfill the users navigation requests.

The METS viewer allows users to navigate through the divisions of a METS object both randomly via a "table of contents" (based on the METS object’s structMap), and sequentially ("next", "previous"). The users can select among all of the available manifestations of a currently selected division of the object (for example: a low resolution image, a medium resolution image, or a TEI transcription). They can also opt to view two manifestations, such as a page image and its associated TEI transcription, side by side. The viewer then provides for coordinated page turning of such adjacently visible manifestations. Finally, the viewer provides users with links to associated descriptive and administrative metadata. All of these capabilities are directly supported by the METS encoding of the active object.

The METS viewer natively supports all content types natively supported by a web browser (html, jpeg, gif, etc). For other content types, such as PDF and MrSID, it can be configured to invoke helper applications to display the content. Some native support for such content types may be added in the future via Java applets.

Data rescue applications

U.C. Berkeley, like many organizations public and private, has invested heavily in the creation of digitized and born-digital materials. Many of these were created with no more thought given to their use than posting them on the Web. Beyond structural metadata, many current digital materials are not linked back to their descriptive and administrative metadata. To deal with such materials Berkeley plans to define a simple, probably xml-based, data-structure "template" into which the metadata elements and content-file locations of legacy digital materials can be mapped. In conjunction with this data structure, it will develop a standard program that can generate METS objects from files conforming to the template structure.

Berkeley hopes that its approach to data rescue will also help other institutions make the transition to METS. The advantage of the approach is that organizations do not have to have an expert understanding the METS schema to create METS objects. Instead, they will cross-walk their metadata and content-file locations, which they should know well, to the simple, standard mapping of the template file.

Other tools and approaches

The U.C. Berkeley Library is just one of many institutions developing a digital library architecture and associated tools around METS. The METS Editorial Board is encouraging an "open source" approach to the release of such tools, and many will probably be available for use and further development by other institutions. As these tools become available, they will be listed, with links, on METS home page.

References:

METS home page (particularly the tools section): http://www.loc.gov/standards/mets

MOA2--background to the predecessor of METS: http://sunsite.berkeley.edu/moa2

******

Merrilee Proffitt

Marriage, Women, and the Law Meets METS: An Arranged Union or Happily Ever After?

Collection Background

The Marriage, Women, and the Law collection is the result of a collaborative endeavor by RLG's member institutions. Launched in June 1996, this ambitious project with the intriguing title "Studies in Scarlet: Marriage and Sexuality in the United States and the United Kingdom, 1815-1914" was established in order to test the proposition that a virtual collection for scholarly research can be created cooperatively, made widely available, and be responsibly maintained for future use.

With a focus on family law and domestic relations in the 19th century, the collection provides scholars throughout the world with electronic access to materials supporting research on a broad range of topics, including marriage, divorce, adultery, miscegenation, polygamy, and birth control. The content of the collection, gleaned from case reports, statutes, novels, newspapers, diaries, and letters, is designed to support scholarship in disciplines including law, history, sociology, political science, women's studies, and criminology.

The strength of this collection is the extent, quality, and cohesion of the content. Seven major research institutions in the United States and Great Britain contributed primary and secondary materials. The project participants included the New York Public Library, New York University Law Library, Harvard University Law Library, the North Carolina State Archives, the University of Pennsylvania Law Library, the Library Company of Philadelphia, Princeton University Libraries, and the University of Leeds.

Over 4,000 works mark up the project. All documents in the collection are represented by digital images (over 200,000). Some documents are also represented by transcribed, encoded (TEI) texts. Some of the "digital objects" in the MWTL collection are a single image, while others are hundreds of pages long. All project participants followed common guidelines for scanning, file naming, and descriptive metadata.

The collection has been made accessible through RLG’s Eureka web interface, and is available brow. A quick Google search reveals that the project homepage is linked to by over 150 web sites, ranging from libraries’ "selected digital collections" pages, to subject specific sites on a variety of topics (women’s history, legal history, and family law are just a few).

 

One could infer from the above that the project’s goals of creating a cooperatively built resource and making it widely accessible were successful. How about the objective of maintaining the collection into the future?

Enter a Multi-Purpose Suitor

Like many other institutions, RLG has been looking carefully at METS means of implementing an OAIS-compliant digital archive. In addition to investigating METS as a component of a digital archive, RLG was also interested in testing METS for use in a new service, RLG Cultural Materials. RLG Cultural Materials (RCM) is a service that has resulted from a collaborative effort on the part of RLG member institutions. RCM brings together primary source in digital form, and is provides a means for discovery and use of authenticated, rights-cleared digital materials through an advanced and easy-to-use web interface. With over 106,000 objects currently loaded in the system from xx participants (and over a million proposed objects from over xx institutions), the RCM team has needed to be familiar with a number of different metadata standards. As many of the items anticipated for RCM are "complex digital objects," RLG anticipates that at least some of these objects will be encoded in METS. MWTL appeared to be a good test case for METS, both to encode the structural metadata so that the collection could be included with other primary source materials in RLG Cultural Materials, and to meet the Marriage, Women, and the Law project goals of properly archiving the collection.

A Good Match?

At this writing, the jury is still out, but all signs point to a happy union. We have successfully mapped out a plan for encoding the structural metadata for objects in the collection into METS, and even developed a prototype METS viewer for RCM. Developing a digital archive quality METS encoding for the project is proving more challenging. The files that comprise the MWTL collection are backed up on a regular basis. Of course, "backing up" files is entirely different than undertaking digital preservation actions. For MWTL to meet what is currently considered to be state-of-the-art in digital archiving will mean supplying or recreating a good deal of metadata that is not available. At the time that MWTL was created, there was no standard way for storing and serving files that represent a digital object. The descriptive metadata was stored in a database, the files were stored in the UNIX file system, and a perl script pulled the whole thing together in Eureka. And, although participants and digitization service bureaus followed project guidelines for digitization, the sort of technical metadata for imaging that one would hope to have in a digital archive is not available. Some level of digital archeology may be required to fully convert the collection (i.e. mining TIFF headers). These issues are not problems with METS, but are more representative of a first-generation large-scale collaborative digitization project. In fact, METS is helping RLG come to terms with some of the "shortcomings" (as viewed through today’s lens) of the project. By September, we will be able to give a full report both on the completion of mapping the structural metadata, and on the prospects for a fuller encoding to meet digital archiving purposes.

This project is very representative of other similar digitization projects, large and small, undertaken at institutions over the last 10 to 15 years. The existence of a toolkit to assist with METS conversion or "data rescue" would have been very helpful, had it been available at the outset of this conversion effort.

References:

Marriage, Women, and the Law homepage: < http://www.rlg.org/scarlet/ >

"Studies in Scarlet," RLG News Issue 40, Spring 1996 < http://www.rlg.org/rlgnews/news40.html >

RLG Cultural Materials homepage < http://culturalmaterials.rlg.org/ >

"Scenes from a Database: RLG Cultural Materials in Use," RLG Focus, Issue 53, December 2001 < http://www.rlg.org/r-focus/i53.html >

157

Vanhoutte, Edward. Centre for Scholarly Editing and Document Studies
Neyt, Vincent. University of Antwerp (UIA)
Debusschere, Joke. Centre for Scholarly Editing and Document Studies.

The Transcription of Modern Manuscript Texts

f it is true that transcribing and encoding the text are acts of editing which cannot be left to an encoder who is no textual scholar, or as Michael Sperberg-McQueen has noted: "a final division of labor between scholar and encoder [...] would, taken literally, absolve researchers from any responsibility for the quality, intelligence, or utility of the encodings they create–a worrisome state of affairs." (Sperberg-McQueen 1991: 36), and that encoding cannot be done objectively without imposing ones theory on the text on a machine readable transcription, then the markup language used should enable the transcriber/encoder/editor to do what s/he wants to do. With respect to older texts, this has already been done successfully using the TEI encoding scheme, but a modern author is not a scribe. Modern manuscript material excels in the documentation of the complexity of the dynamic genetic process, for which new encoding strategies are needed. At the Centre for Scholarly Editing and Document Studies (Cent

This session will focus on these new encoding strategies developed in the respective projects which will be introduced, and will call for an thorough revision/extension of the TEI encoding schemes.

(None of the presenters at this session is older than 30).

Reference

Sperberg-McQueen, C.M. (1991), Text in the Electronic Age: Textual Study and Text Encoding with examples from Medieval Texts. In: Literary and Linguistic Computing, 6/1 (1991), 34-46.

---

Texts And Transcriptions: Mapping Scribal Complexities Onto A Line Of Text

Lead Author

The transcription of modern manuscript material is the core activity of a couple of newly initiated electronic editing projects at the Centre for Scholarly Editing and Document Studies (Centrum voor Teksteditie en Bronnenstudie) of the Royal Academy of Dutch Language and Literature in Ghent, Belgium (Koninklijke Academie voor Nederlandse Taal- en Letterkunde). In order for these projects to result in the publication of versioning editions, representing multiple texts (Reiman 1987) - in facsimile as well as in machine-readable form, in concordances, stemmata, lists of variants, etc. - transcriptions of all the extant material have to be made in a platform independent and non-proprietary markup language which can deal with the linguistic and the bibliographic text of a work and which can guarantee maximal accessibility, longevity and intellectual integrity (Sperberg-McQueen, 1994 & 1996: 41). The encoding schemes proposed by the TEI Guidelines for Electronic Text Encoding and Interchange (Sperberg-McQ

Although the TEI subsets for the transcription of primary source material "have not proved entirely satisfactorily" for a number of problems (Driscoll 2000), transcription and digitization guidelines for older texts can be produced on the basis of the TEI encoding scheme , (Robinson 1994, and Robinson & Solopova 1993). The transcription of modern manuscript material using TEI proves to be of a more problematic nature because of at least two essential characteristics of such complex source material: namely the notions of time and of overlapping hierarchies. Since SGML (and thus XML) was devised on the assumptions that a document is a logical construct that contains one or more trees of elements that make up the documents content (Goldfarb 1995: 18), several scholars began to theorize about the assumption that text is an ordered hierarchy of content objects (OHCO thesis), which always nest properly and never overlap,[1] and the difficulties attached to this claim.[2]

The TEI Guidelines propose five possible methods to handle non-nesting information,[3] but state that

"Non-nesting information poses fundamental problems for any encoding scheme, and it must be stated at the outset that no solution has yet been suggested which combines all the desirable attributes of formal simplicity, capacity to represent all occurring or imaginable kinds of structures, suitability for formal or mechanical validation, and clear identity with the notations needed for simpler cases (i.e. cases where the textual features do nest properly). The representation of non-hierarchical information is thus necessarily a matter of choices among alternatives, of tradeoffs between various sets of different advantages and disadvantages." (chapter 31 "Multiple Hierarchies" of the TEI Guidelines)

The editor using an encoding scheme for the transmission of any feature of a modern manuscript text to a machine-readable format, is essentially confronted with the dynamic concept of time which constitutes non-hierarchical information. Whereas the simple representation of a (printed) prose text can be thought of as a logical tree of hierarchical and structural elements such as book, part, chapter, and paragraph, and an alternative tree of hierarchical and physical elements such as volume, page, column, and line–structures which can be applied to the wide majority of printed texts and medieval manuscripts–, the modern manuscript shows a much more complicated web of interwoven and overlapping relationships of elements and structures.

Modern manuscripts, as Almuth Grésillon defines them, are "manuscrits qui font partie d'une genèse textuelle attestée par plusieurs témoins successifs et qui manifestent le travail d'écriture d'un auteur." ["manuscripts which are part of a textual genesis for which many consecutive witnesses give evidence, and which are the manifestation of the author's labour of writing." (my translation).](Grésillon 1994: 244) The French school of Critique Génétique primarily deals with modern manuscripts and their primary aim is to study the avant-texte, not so much as the basis to set out editorial principles for textual representation, but as a means to understand the genesis of the literary work or as Daniel Ferrer put it: "it does not aim to reconstitute the optimal text of a work; rather, it aims to reconstitute the writing process which resulted in the work, based on surviving traces, which are primarily author's draft manuscripts" (Ferrer 1995: 143). Ther

The application of hypertext technology and the possibility to display digital facsimiles in establishing electronic dossiers génétiques, provides the editor with a multiplicity of ways in which s/he can regroup a series of documents which are akin to each other on the basis of resemblance or difference. The experiments with proprietary software systems (Hypercard, Toolbook, Macromedia, PDF, etc.), however, are too much oriented towards display, and often do not comply with the rule of "no digitization without transcription" (Robinson 1997).

Further, the TEI solutions for the transcription of primary source material do not cater for modern manuscripts because the current (P4) and previous versions of the TEI have never addressed the encoding of the time factor in text. Since a writing process by definition takes place in time, four central complications may arise in connection with modern manuscripts and should thus be catered for in en encoding scheme for the transcription of modern primary source material. The complications are the following:

1 .Its beginning and end may be hard to determine and its internal composition difficult to define (document structure vs. unit of writing): authors frequently interrupt writing, leave sentences unfinished and so on.

2.Manuscripts frequently contain items such as scriptorial pauzes which have immense importance in the analysis of the genesis of a text.

3.Even non-verbal elements such as sketches, drawings, or doodles may be regarded as forming a component of the writing process for some analytical purposes.

4.Below the level of the chronological act of writing, manuscripts may be segmented into units defined by thematic, syntactic, stylistic, etc. phenomena; no clear agreement exists, however, even as to the appropriate names for such segments.

These four complications are exactly the ones the TEI Guidelines cite when trying to define the complexity of speech, emphasizing that "Unlike a written text, a speech event takes place in time." (Sperberg-McQueen and Burnard 2001: 254). This may suggest that the markup solutions employed in the transcription of speech could prove useful for the transcription of modern manuscripts, in particular the chapter in the TEI Guidelines on Linking, Segmentation, and Alignment (esp. 14.5. Synchronization).

This paper will deal with the practicalities underlying the production of electronic scholarly editions and will report on the results of a research stay at the Wittgenstein Archives in Bergen with a EU Research Infrastructure grant in June-July 2002.

Notes

1. Annex C of ISO 8879 introduces the optional CONCUR feature (not available in XML) which "supports multiple concurrent structural views in addition to the abstract view. It allows the user to associate element, entity, and notation declarations with particular document type names, via multiple document type declarations." (Golfarb 1995: 89).

2. See on overlap-related problems: Barnard et al. 1988; Barnard et al. 1995; DeRose et al. 1990; Durand et al. 1996; Huitfeldt 1995; Renear et al. 1996; Sperberg-McQueen & Huitfeldt (1999); Sperberg-McQueen & Huitfeldt s.d., and chapter 31 "Multiple Hierarchies"of the TEI Guidelines.

3. The suggested methods are CONCUR, milestone elements, fragmentation of an element, virtual joints, and redundant encoding of information in multiple forms. Cf. Chapter 31 "Multiple Hierarchies" of the TEI Guidelines. The Rossetti Archive based in Virginia, and the Wittgenstein Archives at Bergen created their own encoding system–respectively RAD (Rossetti Archive Document), and MECS (Multi Element Code System),–out of dissatisfaction with the operationality of the options suggested by TEI. Cf. McGann 2001, 88-97, and "Text Encoding at the Wittgenstein Archives" <http://www.hit.uib.no/wab/1990-99/textencod.htm>

Literature

Centrum voor Teksteditie en Bronnenstudie. Website. <http://www.kantl.be/ctb/>

Barnard, D.T. , R. Hayter, M. Karababa, G. Logan, and J. McFadden (1988). "SGML-Based Markup for Literary Texts: Two Problems and Some Solutions." In: Computers and the Humanities, 22 (1988): 265-276.

Barnard, D.T., L. Burnard, J.-P. Gaspart, L.A. Price, C.M. Sperberg-McQueen, and G.B. Varile (1995). "Hierarchical Encoding of Text: Technical Problems and SGML solutions." In: Computers and the Humanities, 29: 211-231.

DeRose, Steven J., D.G. Durand, E. Mylonas, and A. Renear (1990). "What is Text, Really.?" In: Journal of Computing in Higher Education, 1 (1990): 3-26.

Driscoll, M.J. (2000). "Encoding Old Norse/Icelandic Primary Sources using TEI-Conformant SGML. in: Literary and Linguistic Computing, 15/1: 81-91.

Durand, David, Elli Mylonas, and Steve DeRose (1996). "What should markup really be? Applying theories of text to the design of markup systems." ALLC/ACH'96 Joint Conference of the ALLC and ACH, Bergen 1996.

Ferrer, Daniel (1995). "Hypertextual Representation of Literary Working Papers." in: Literary and Linguistic Computing, 10/2: 143-145.

Huitfeldt, C. (1995). "Multi-Dimensional Texts in a One Dimensional Medium." In: Computers and the Humanities, 28 (1995): 235-241.

Grésillon, Almuth (1994). Eléments de critique génétique. Lire les manuscrits modernes. Paris: Presses Universitaires de Paris.

Reiman, Donald H. (1987). Romantic Texts and Contexts. Columbia: University of Missouri Press. (Chapter 10: "'Versioning': The Presentation of Multiple Texts.", 167-180).

Renear, A., D. Durand, and E. Mylonas (1996). "Refining our Notion of What Text Really Is." In: S. Hockey and N. Ide (eds.). Research in Humanities Computing 4: Selected Papers from the 1992 ALLC/ACH Conference. Oxford: OUP. 263-280.

Robinson, P.M.W. (ed.) (1996). The Wife of Bath's Prologue on CD-ROM. Cambridge: Cambridge University Press.

Robinson, Peter M.W. (1997). "New Directions in Critical Editing." Kathryn Sutherland (ed.). Electronic Text. Investigations in Method and Theory. Oxford: Clarendon Press, 145-171.

Sperberg-McQueen, C. M. (1991). "Text in the Electronic Age: Textual Study and Text Encoding, with Examples from Medieval Texts." in: Literary and Linguistic Computing, 6/1: 34-46.

Sperberg-McQueen, C. M. (1994). "Textual Criticism and the Text Encoding Initiative." Paper presented at MLA '94, San Diego, 1994. Accessed on March 15, 2002. <http://www.tei-c.org/Vault/XX/mla94.html>

Sperberg-McQueen, C. M. (1996). "Textual Criticism and the Text Encoding Initiative." Finneran, Richard, J. (ed.) (1996). The Literary Text in the Digital Age. Ann Arbor: The University of Michigan Press, 37-61.

Sperberg-McQueen, C. M. and Lou Burnard (eds.) (1994). Guidelines for Electronic Text Encoding and Interchange. (TEI P3). Chicago and Oxford: Text Encoding Initiative.

Sperberg-McQueen, C. M. and Lou Burnard (eds.) (2001). TEI P4 Guidelines for Electronic Text Encoding and Interchange. XML-compatible edition. Oxford, Providence, Charlottesville, and Bergen: The TEI Consortium.

Sperberg-McQueen, C.M., and Claus Huitfeldt. (1999). "Concurrent Document Hierarchies in MECS and SGML." In: Literary and Linguistic Computing, 14 (1999): 29-42.

Sperberg-McQueen, C.M., and Claus Huitfeldt (s.d.). "GODDAG: A Data Structure for Overlapping Hierarchies." Accessed on March 15, 2002. <http://www.hit.uib.no/claus/goddag.html>

---

On the Advantages and Disadvantages of XSLT for Manuscript Transcription

Co-author 1

The point of departure is the electronic edition of one of Joyce's notebooks for Finnegans Wake. The 'Guiltless notebook' (47471b, so called because the first word is 'Guiltless') is about 90 pages long and contains very early draft versions of chapters 2, 3, 4, 5, 7 and 8 of Book I. Over a number of years the James Joyce Center at the University of Antwerp (UIA) has made a full transcription of this notebook using TEI-conformant XML/SGML. In a first phase this highly complex collection of textual units was approached from a teleological standpoint: the transcriptions were made on the basis of the narrative structure of the novel as it was first published years later, and encoded in order to leave open the possibility of other approaches (chronological or document-oriented) in a second phase.(1)

A first point of focus in this paper will be the transcription itself. Seamus Deane called Finnegans Wake "in an important sense, unreadable" (Deane 1992, vii) and the same applies to Joyce's notebooks. Joyce intended to write only on the recto pages, but almost immediately started using the verso pages for additions too lengthy to write in between the lines or in the margins, and soon thereafter scribbled whole additional paragraphs on the verso's, all linked to each other by means of a confusing and unsystematic complex of lines and sigla. Because of the complexity of this manuscript, we've had to stretch the available tags defined in the full XML-ized TEI Document Type Definition up to and beyond the limits and decide quite randomly when to store information in elements, attributes or as data content.(2) Specifically for encoding additions and deletions on a manuscript page, TEI offers insufficient distinctive attributes for the editor to include the information about the type of addition/deletion he

The following sample code demonstrates how hard it is to encode a clear distinction between additions Joyce made immediately (and wrote inline) or added later while rereading (and most of the time wrote above or below the line):

He had left the country by a subterranean tunnel lined shored with bedboards. An infamous private ailment (vario lo venereal) had claimed him.

A second point of focus will be the usability of this transcription: can this XML archive be used to automate or generate all the different visualisations of the manuscript material textual critics have come to expect in an electronic edition? Or is our encoding so dependent on the way we structured the material that it can no longer be broken down and restructured automatically? With reasonable success, I've been using XSL Transformations to this end. I've transformed the teleologically structured files (section per section) to manuscript-orientated files (page per page) through three consecutively run XSLTs. I've generated unique id's in certain tags, generated new elements from attribute values and automatically linked them to corresponding anchors in the edition.

XSLT is a very powerful tool. It has however been developed for and by people using it to extract information from databases. In the humanities we deal with texts. In a database all information is stored mainly in elements, there is no advantage in 'locking away' information as an attribute value. Encoded literary texts on the other hand are linear and have to remain legible at all times, so any information the editor wishes to add, ends up in attributes. This results in very 'heavy' tags limiting the possibilities of XSLT a great deal. But if you take some specific factors into account while encoding, your archive can still benefit from the unmistakable power of XSLT:

1. Predefine all attribute values or make an inventory of all values used, because in XSLT you can only match exact strings, no regular expressions!(3)

2. Make tags context-independent so they don't loose all meaning and usability when you extract them from their context or restructure the data using XSLT.

eg. an addition encoded simply as when it's on the facingleaf of page 4 is dependent for its absolute location in the manuscript on the tag which precedes it. If you extract all 'facingleaf'-additions from their context using XSLT, the acquired data is unusable.

3. Try not to make the divisions in your document dependent on empty tags like XSLT only copies from starting tag to closing tag, and the latter is of course missing in empty tags. For instance, XSLT cannot extract up to in a 100 page document.

Notes

1. First presented in Van Hulle & Vanhoutte (2001).

2. A similar problem is discussed in detail by Birnbaum (2001). Birnbaum presents arguments for element-orientated encoding which are very relevant to the material I'm working on: "elements can provide types of structural control that are unavailable with attributes" (32).

3. "support for regular expressions for matching against any or all of text nodes, attribute values, attribute names, element type names" is announced in the XSLT 1.0 specification as a "Feature under Consideration for Future Versions" <http://www.w3.org/TR/xslt> and will be available in XSLT 2.0.

Literature

Birnbaum, David J. (2001), "The relationship between general and specific DTDs: criticizing TEI critical editions." in: Markup Languages: Theory & Practice 3.1 (2001): 17-53.

Deane, Seamus (1992) "Introduction", in: James Joyce, Finnegans Wake. Penguin Books, 1992.

Sperberg-McQueen, C. M. (1994), "Textual Criticism and the Text Encoding Initiative." Paper presented at MLA '94, December 1994, San Diego <http://www.tei-c.org/Vault/XX/mla94.html>.

Van Hulle, Dirk and Edward Vanhoutte (2001), The "Guiltless" Notebook. Paper presented on Genetic Joyce Studies. Antwerp: University of Antwerp (UFSIA), 30 March 2001.

XSLT 1.0 <http://www.w3.org/TR/xslt>

---

Stretching the TEI for the transcription of corpora of modern correspondence.

Co-author 2

In preparing an annotated edition of the complete correspondence between the Flemish author Stijn Streuvels (1871-1969) and his Dutch and Flemish publishers, a corpus of a couple of thousand letters need to be transcribed and encoded in a machine readable format. Following up on previous research (vanhoutte 2001), thorough analyses of the TEI tag sets currently available and the structural and 'letter-specific' features to be encoded in the corpus, show that the TEI tag sets have to be stretched in order to be useful in the project. It was necessary for instance to develop new tags for encoding the information on the envelope, such as the postmark, which can be of importance for a correct situation of the letter in the reconstruction of the dialog between both the letter writers (Eide 2001). The most radical extension, however, is the introduction of a means to encode the complex network of different "layers" which we can distinguish in a letter. In the case of the Streuvels corpus, the text of a letter

In this paper I will comment on the advantages of applying the extended DTD to the material and the envisioned uses of the corpus in fields other than literary criticism. Further I will situate the project in the larger Digital Archive of Letters written by Flemish authors and composers in the 19th and 20th century (DALF) and will demonstrate an early version of the electronic edition.

Literature

Eide, Øyvind (2001), "Putting the dialog back together : Re-creating structure in letter publishing." Paper, ACH/ALLC01, New York, 2001. Abstract on <http://www.nyu.edu/its/humanities/ach_allc2001/papers/eide/index.html>. Accessed on 13 March 2002.

Vanhoutte, Edward (2001), "Dancing with DALF: Towards a Digital Archive of Letters written by Flemish authors and composers in the 19th and 20th century." Paper, ACH/ALLC01, New York, 2001. Abstract on <http://www.nyu.edu/its/humanities/ach_allc2001/papers/vanhoutte/>. Accessed on 13 March 2002.

159

Kornbulh, Mark. Michigan State University
Goldman, Jerry. Northwestern University
Cohen, Steve. Tufts University

Building Multimedia Digital Respositories

First, there was text. Then, there were images. Now, there is audio and visual streaming media. Digital libraries, we must remember, were born around text-based projects. Over a decade of experience with text, had provided the digital community with many tools and a broad set of best practices to guide our work with structured textual documents. We know a great deal about how to work with words in the digital environment to produce, preserve, and deliver useful digital objects. While tools for searching and analyzing digital images are not as robust and well developed as those as for text, best practices in digitization, preservation, and delivery of digital images are well established.

Compared to text and image, however, our experience in working with streaming media---audio and visual, individually and in combination, is in its infancy. Digitization standards are still disputed. Preservation media is not yet clear. And tools for access are in their infancy. The sheer size of streaming media files poses special challenges for digital repositories. At the same time, audio-visual materials open up knew largely untapped opportunities to link digital repositories directly to learning.

This panel is structured around three papers that emerge from three years of working on the National Gallery of the Spoken Word, a Digital Library II Initiative project. Mark Kornbluh’s paper, "Historical Voices: Reflections on Developing a Distributive Multimedia Digital Repository for Aural Resources" explains the development of the repository structure with the adaptation of the OAIS reference model to a distributive multimedia repository and the translation of METS into a highly flexible mysql database. Jerry Goldman’s "The Sound of Silence" explores the delivery side of the project, demonstrating the value and demands of a multimedia delivery system that can link text, image, and sound. Finally, Steve Cohen’s "Towards a Philosophy of Digital Libraries: Implications for Learning" explores the potential that multimedia digital repositories such as "Historical Voices," holds for the learning environment. Lorna Hughes will chair the session and comment on the papers.

---

Historical Voices: Reflections on Developing a Distributive Multimedia Digital Repository for Aural Resources

The National Gallery for the Spoken Word was conceptualized as a research project to think holistically about the full range of challenges facing the digital library community in working with spoken word aural resources. The project’s professed aims were to look at the FULL range of issues involved in digitizing, storing, searching and delivery of spoken word resources in a multitude of environments to multiple audiences. The research team was thus diverse and multidisciplinary. To be frank, in part this approach reflected a strategy of good grantsmanship. Without a doubt, an additive strategy proved far more attractive to the granting agency than one that prioritized and limited. An additive grant strategy allowed, in part, for a distributive research project in which different parts of the research team focused on what interested them the most. Thus, the sound engineers could explore watermarking and large vocabulary voice recognition research largely independently from the educators looking at ho

The additive strategy, however, posed a particular challenge for those of us who were charged with designing and building the actual digital repository. We needed to build a repository that was above all else, pliable and flexible. We needed a system architecture and metadata structure that would allow the content in the repository to be deposited, accessed, and repackaged for all conceivable (and not yet conceivable) purposes. The architecture had to facilitate distributive partnerships, both for ingestion and delivery. We were working with a medium – sound --- that took up a great deal of space and processing speed, and had to facilitate the delivery of that medium, not only by itself, but also in conjunction with text and images. We knew from the start that derivatives would play a major role in the repository. We had to allow our aural resources to be able to be broken up and repackaged in an infinite variety for our disparate audiences. As a result of these challenges, the work of conceptualizing,

OAIS

Our starting point, as with many who are designing digital repositories today, was NASA’s Open Archive Information System Reference Modal (OAIS). OAIS posed the questions and set the framework that allowed us to conceptualize our repository as a distributive system. We developed an adaptation of a true OAIS model that allowed for a centralized search integrator to tie together a federated system of ingestion, storage, and delivery. The OAIS agreements and protocols at the heart of the system provide a standard way to integrate searches across multiple repositories without taxing the resources of any one participating repository. All OAIS compliant partners can exchange metadata and digital objects in a predetermined fashion. Amongst our most important adaptations, is the addition of a caching scheme and "reservation" system for educators and most used files. This system is designed to deal make the repository useable for low-bandwidth customers and in a restrictive educational environment. We also d

METS

Whereas OAIS provided our starting point for designing the repository, we struggled for a long time to find a metadata schema with the flexibility required by our additive vision that could meet the needs of working within streaming media. Over the past year, we engaged in discussions amongst various digital libraries to extend MOA2 to include streaming media and to develop a common meta metadata schema that could be used across repositories. We believe that the result, METS, Metadata Encoding and Transmission Standard, meets our needs for standardization and flexibility in use. The METS bucket approach allows any kind of metadata to be wrapped or referenced within a METS object, therefore allowing for the seamless integration of legacy materials as well as expressing the complex relationships between different forms of metadata. This flexibility and level of modularity positions METS as essential tool for the exchange of digital objects and their associated metadata between repositories/institution

Currently, however, the tools are not yet available to easily deliver web-based content from the raw xml. As a result, we have developed a relational database in mysql that provides the functionality of METS and the XML Schema Language as well as the ability to easily produce METS XML files in the future. Schematized database design uses the principle of name-spacing and the avoidance of statically defined tables to develop a database model that is infinitely scalable and affords the ability to integrate any number of different metadata standards. Storing information about each element facilitates workflow by allowing non-programmers to create customized ingestion and administration forms and the creating of dynamic forms.

Designing for Flexibility: Building for Preservation and Access:

A physical library or archive cannot be all things for all people. The design of the physical structure and the implementation of policies to control the flow of content within the repository inevitably privileges some uses over others. A digital repository, however, can and should be much more flexible. While designed for long-term preservation and access, digital repositories can be engineered with flexibility in mind. OAIS and METS provide a framework for conceptualizing and building repositories that can be not only interoperable, but also infinitely repackaged and repurposed. Working with streaming media magnifies the challenges as the resources necessary to manage such digital objects are larger and the tools less well developed, but the potential for developing building blocks educational use is enormous.

---

The Sound Of Silence

Scholars have been mining spoken-word archives with great success. In the days of analog-only information, the Columbia Oral History Project provided a treasure-trove of information and served as the basis for countless studies surrounding the men and women whose memories endure through transcriptions of their experiences. The revelations of a secret taping system in the White House during the Nixon administration served as the basis for the prosecution of Nixon's closest aides and led to Nixon's resignation. Subsequent revelations provided more evidence of secret presidential recordings in the Johnson and Kennedy administrations. And for years, scholars mined the National Archives collection of Supreme Court oral arguments under substantial constraints until one chose to violate his agreement not to copy the tapes for distribution. His successful face-off with the Court opened a new chapter in access to one of America's most reclusive institutions.

Public assess to these source materials still remains troublesome. The National Archives -- repository for many of these audio collections -- has made eliminate barriers to access by creating web-based repositories for its spoken-word collections. Others have filled this void. By making these materials accessible to a wide audience, scholars and their students have a chance to render independent assessments of scholarly work relying on these spoken-word materials.

I shall argue that scholarship relying on spoken-word archives still confronts an enormous challenge because written transcripts cannot accurately render the experience of listening to the source itself. The spoken word contains emotive and substantive information. Scholars are skilled at rendering the substantive information with considerable accuracy, but they do not systematically address emotive issues. Moreover, scholars may overlook ambient sound, which may provide clues to deeper meaning. And text transcriptions, accurate to the last syllable or pause, will fail to capture the silent pauses that may hold as much meaning as the words that precede or follow.

The purpose of my presentation will be to demonstrate the value in listening to -- and citing as authoritative -- the actual source material rather than a transcription of the source. My illustrations will come from the White House tapes of Richard Nixon and Lyndon Johnson and from the OYEZ Project Supreme Court audio archive. I will propose a system by which all spoken-word archives can be made accessible to the public and then incorporated into scholarly epublications.

Let me provide two examples to illustrate my point. The first example comes from the leading privacy case, Griswold v. Connecticut. The issue in this case was a state statute that forbade the use of contraceptives. Toward the end of the two-hour argument, Justice Hugo Black posed a question to Robert Emerson, the attorney advocating that the state law be overturned. The question was a challenging hypothetical that prefigured the issue of abortion, which would come to the court seven years later. Confronted with this hypothetical, Emerson seemed caught off guard and tried to hedge until he could find some safe ground on which to plant his case. Listening to this segment, you can sense the rush of adrenaline and envision the beads of sweat as the two sides clashed. It was a dramatic moment captured with reasonable though not complete accuracy in the scholarly literature.

The second example comes from the leading abortion-rights case, Roe v. Wade. Here Sarah Weddington offered a scatter-shot proposal in favor of a constitutional right to abortion. When she concluded her remarks, her opponent, Jay Floyd, approached the bench and began his remarks with a "good old boy" story aimed at the nine men before whom he stood. His jocular effort was a failure, signaling both the serious nature of the issues and a spectacular stumble from which he never recovered. Weddington's weak arguments were met by even weaker advocacy. When the Court set the case for re-argument, it gave Weddington a chance to profit from her mistakes and to sharpen her case for abortion. And while a more seasoned advocate replaced Floyd, the issues were now more clearly focused and balanced.

The final segment of my presentation will demonstrate how it is possible to link accurate transcriptions with time code and then synchronize audio with this enhanced text. It is also possible to render selective 'clips' so as to annotate on-line audio repositories. This annotation process will overcome the deficiencies inherent in text-only renderings of spoken-word collections.

---

Towards a Philosophy of Digital Libraries: Implications for Learning

"The house got smaller when the sun came out."

What does this seemingly incomprehensible sentence have to do with improving learning in higher education? It demonstrates the need to build learning environments which help students generate insight into complex, often confusing, ideas: learning environments that can return a picture of an igloo when students are faced with ideas like the one above. We are at a point when such systems may be possible, and we want to figure out how to make them work. Our approach takes advantage of the development of digital libraries, semantic networks of information and the psychology of generative learning. The genesis of this idea comes from thinking carefully about just what a digital library is, and from developing a philosophy of digital libraries.

______________________________________

 What is a Digital Library?

Colleges and universities around the world are creating libraries of digital content. These libraries are filled with scholarship and materials for learning: pictures for art history courses, audio of historic speeches, diagrams of medial procedures, videos of physics experiments, decision simulations, and more. Each entry in the digital library can be connected to another based on any criteria: authors name, color, size, meaning, type of media, etc. Connections can be subtle or transparent. Librarians may assign them, or they may be assigned by (artificially) intelligent systems that watch how the libraries are used. All of these processes go into constructing and maintaining a digital library.

Looking carefully at how digital libraries are assembled and used can help illustrate how best to think about and design them. Through these kinds of inspections differences between digital libraries and traditional libraries can be isolated. These differences may be thought of as differences in kind, or difference in degree. Here, we will focus on differences in degree. For now we focus on three kinds of differences:

Differences in access: Clearly digital access is different than access in a traditional library. In traditional libraries immediate access requires you to be in the same space as the library. Digital libraries require that you have the technology to acquire the materials and the knowledge to use the technology.

Differences in connectedness: This connectedness, and the ability to modify the strength and nature of connections between objects, is a key difference between digital libraries and traditional libraries. In traditional libraries objects are assigned a relatively fixed space and meaning. In digital libraries, objects do not occupy a relatively fixed space, and meaning is a function of the connections throughout the digital library. In this way digital libraries resemble human minds more than traditional libraries.

Difference in organization owing to use: In traditional libraries, searching for materials does not typically influence how the materials are organized. In digital libraries, preferences of individuals and user groups as a whole can influence the organization of objects and their relation to one another.

What difference do these differences make?

Our goal is to exploit these differences and invent ways for digital libraries to provoke the kind of insight demonstrated in the opening paragraph. To date, digital libraries have been designed so information returned will match a search question in very narrow ways. However, these differences suggests that the one powerful application of digital libraries lies in their potential to return items that offer insight into each other, as would a human mind. They key is to return items that might complement, and generate insight, into a selected object in the digital library.

An Educational Project Based on Identified Differences in Degree

We propose to exploit the differences via a theory of generative learning that has, when experimentally tested, produced superior learning results. The idea of generative learning is to juxtapose objects whose meaning, as a function of the network and the students, offer insight in to each other. The goal is to build and an insight generating digital library (IGDL). Four features will distinguish this IGDL, each of which stems from the differences identified above. One, the information that describes each object in the IGDL will be based on how it relates to every other object in the library. Works by Van Gogh may be connected to audio of fireside chats. Illustrations of medical procedures can be linked to engineering plans. The meaning of a single object will depend on its relationship to all other objects. Software for charting how the IGDL is used will modify these relationships and create a network that helps the IGDL return complementary items and inspire learning. Librarians may initially assi

A second factor that distinguishes the IGDL is its use of scholarly and personal profiles of students who use it. Decisions about how to respond to student queries will take into account differences among students. Student aptitudes come into to play here. Students with strong verbal skills (as measured by SATs) and a history of courses requiring reading and writing will likely see visual objects from the IGDL to complement text based entries. The user profiles constitute a second platform for meaning – the predispositions of the person using the library impacts the meaning and utility of an entry. We propose to use this student – network fit to catalyze ideas.

Third, a suite of networks that capture meaning and idenity will connect the objects in the IGDL. We will use a range of semantic networks and descriptive data to make sure we capture a range of meaning across objects.

Finally, the IGDL will use interfaces that present complementary objects in ways that help students make connections and catalyze ideas. Ultimately, we expect the students browsing through an IGDL will have "aha" experiences similar to the one built into the opening paragraph.

 

 

 

 

 

 

165

Allard, Geneviéve. National Archives of Canada
Downey, Cara. National Archives of Canada
Lechasseur, Antonio. National Archives of Canada.

Universal Accessibility and Social Cohesion: Making Records Relevant to Citizens Through Digitization

Digitization helps to increase the relevance and accessibility of archives to all citizens. In Canada, a country with diverse geography and individuals with different ethnic and religious backgrounds, digitizing historical resources, particularly the 1901 census, helps to create a sense of social cohesion. While this is a goal for the National Archives of Canada, this strategy does change the nature of archival work and the way interaction occurs with various researchers. This panel will demonstrate the various methods which digitization helps social cohesion, the changes to the work of the archivist and different access points which can be used. More specifically:

1. Geneviève Allard's paper will demonstrate how the National Archives of Canada's digitization strategy, and the 1901 census indexation project in particular, works to promote Canadian identity and increases the relevance of archives to all Canadians, in a context where social cohesion is considered a government priority.

2. Cara Downey's paper will discuss the effect of digitization on the archivist's work and the need to anticipate the numerous researchers, with a variety of backgrounds, that will view material on the internet.

3. Antonio Lechasseur will demonstrate how the Canadian Genealogy Centre uses digitization projects, with the 1901 census serving as the first experiment, to develop partnerships with communities and networks with institutions which will be advantageous for all. Biographical sketches

Geneviève Allard began her career at the National Archives of Canada in 1999 as a reference archivist in the Reference Services Division. She then worked as an archivist in the Military, State and Justice section of the Government Records Branch. When the National Archives of Canada started elaborating digitization projects, she worked as a content specialist and coordinator on virtual exhibitions and she is now a project coordinator in the On-Line Services Division. She has a Master's Degree in history from Laval University and specializes in contemporary medical and military history. Since 2001, she is the French Editor of the Canadian Historical Association booklet series.

Cara Downey completed her Master's Degree in history at the University of Toronto where she specialized in Canadian history. In 1999, she started working at the National Archives of Canada as an archivist working in electronic records. In 2000, she became the archivist responsible for Statistics Canada, the agency which performs the Census. In 2001, she gave a paper at the Association of Canadian Archivists conference on the appraisal of electronic records which is to be published in the nest edition of the Association's bulletin.

Antonio Lechasseur began his career in 1989 at the National Archives of Canada as an archivist with the Government Archives Division. Later, he occupied different positions as chief within that division, and also within the Records Disposition Division and the Manuscript Division. Since December 1999, he is Director of the Researcher Services Division. He is an historian specializing in 19th and 20th century Quebec economic and social history. He co-authored an history of the Lower St. Lawrence region in 1993, Histoire du Bas-Saint-Laurent. More recently, an abridged version of this book was published, entitled Le Bas-Saint-Laurent... histoire en bref. In 1997, with Danielle Lacasse, he published a brief history of the National Archives for the Canadian Historical Association. From 1993 to 1995, he was Vice-President and then President of the Association of Canadian Archivists. Since the Fall of 2001, Antonio Lechasseur chairs the new International Council on Archives Committee on outreach and user serv

---

Universal Accessibility and Social Cohesion: Making Records Relevant to Citizens Through Digitization

Paper 1

History in the first person singular: digitising the 1901 census at the National Archives of Canada

The 1901 Census is a major component of the National Archives of Canada's digitization strategy. This paper intends to demonstrate how this digitization strategy sustains a policy of accessibility and promotes social cohesion in a context where cultural institutions need to prove their relevance to all citizens.

At the beginning of the new millennium, the government of Canada asked its departments and agencies to better serve all Canadians through the use of new technologies and by considerably enlarging the quantity of Canadian content on-line. At the National Archives of Canada, this initiative was implemented in a context where client services were deemed the first strategic priority. The Internet, because of its flexibility, its almost unlimited growth potential and its capacity to transcend geographical limitations seemed the ideal medium to reach the diverse and widely dispersed Canadian population. An On-line Services Division thus was created and the first building blocks of a digitization strategy were assembled.

After the tragic events of September 11th 2001, Canadian cultural institutions were faced with the additional necessity of re-examining their role in society. The Department of Canadian Heritage determined that, "more than ever" Canadians needed their cultural institutions to promote a sense of social cohesion in a society that tends to exclude certain groups or individuals. In this new context, the National Archives of Canada, as a keeper of the nation's memory, needed not only to make its material more accessible, but also to demonstrate its relevance to all Canadians.

The digitization of the 1901 Census was seen as a powerful tool to achieve those ends. The census was already available in microfilm form and endlessly requested through inter-library loan. Digitising these records would eliminate the need to borrow the reels (with the long delays often involved), allow researchers to access the material from their own homes as well as be able to print copies of the records they deemed relevant. Most importantly, digitising the 1901 census would provide an on-line content that is of great interest to many communities across Canada; content that could potentially interest every Canadian. The 1901 Census was thus understood as a way to make history available to all. This was, in the words of Ian Wilson, Canada's National Archivist, history in the first person singular.

This project represents only the first step in a strategy devised to target archival material that is meaningful to a variety of Canadian communities and make it easily accessible, downloadable, printable, in other words, deeply relevant for all Canadian citizens. The census was a logical choice, as the digitization of those records had been requested by the genealogical community for the past few years. A combination of the appropriate technology and cultural context facilitated the work. As well, the creation of the Canadian Centre for Genealogy, a vast, pan-Canadian networking initiative led by the National Archives, provided the perfect setting to promote and give visibility to the project. Making the 1901 Census records available to the public related to the provision and management of access. It is part of the larger digitization strategy of the National Archives of Canada, which will lead to the creation of digital archives. Digitization of those types of records is also an essential component of

This paper will also seek to demonstrate how digitization in general has transformed the archival work at the National Archives of Canada, to the point where digitization is now driving intellectual control work and therefore, forcing the National Archives to re-think its databases, its norms and its standards in innovative ways. More specifically, the digitization of the Census, because of various problems, has demanded that the National Archives provide innovative technical solutions to make the material readable and usable by its client. Provisions also had to be made to insure easy and timely access, as well as substantial downloading capacities, as considerable demand is expected once the material goes on-line. Finally, the digitization of census records has allowed the National Archives to develop, through the Canadian Genealogy Centre, a partnership with the community; digitised records are being used in a vast indexation project, which will ultimately serve the needs of the Archives, but of the

This presentation will address the following conference themes:

1. Provision and management of access - the digitization strategy of the On-Line Services Division is a key part of the future of the National Archives of Canada, and the 1901 census is an important part of this framework

2. Digital libraries, archives and museums - through the digitization of archival material, the National Archives of Canada is building, in essence, a virtual archives

3. Information analysis, design and modelling in humanities - digitization is changing archival work, and therefore changing the way archivists regard the record and manage it.

The conclusions of this presentation, stem from its demonstration of how a digitization strategy can work to increase social cohesion and the relevance of records to all individuals.

The demonstrable value to the broad humanities community of this presentation stems from the discussion of how digitization expands access beyond the boundar ies of traditional researchers, thus making records relevant to all Canadians.

---

Universal Accessibility and Social Cohesion: Making Records Relevant to Citizens Through Digitization

Paper 2

"Why are there no middle names?"- an Archivist's Perspective on the Digitization of the1901 Census

When an archives decides to digitize their holdings, the expanded audience that will be reached must be remembered. Traditionally, archival researchers know what archives are, what they are looking for and, to varying degrees, how to get it. The internet, however, potentially expands the audience for archival records to the whole world - which includes those who are totally unfamiliar with archival holdings. This paper will discuss the reasons for the choice of the 1901 census for digitization; the traditional users of the census for research and the finding aids that existed for these traditional users; the challenges faced in preparing the ‘contextual help' for the 1901 digitized material (and thus would also exist for other digital material); and some of the solutions which the National Archives chose.

Within the holdings of the National Archives, the censuses of 1825 to 1901 are incredibly popular, with an entire microfilm room in our researcher services division devoted to their ready access. Copies of the microfilm are also scattered throughout archives, universities and libraries across the country. All of these records exist on microfilm, with the years 1881 to 1901 only existing in this format. The existing paper records (1825-1871) are not available to researchers because of their delicate condition. The popularity of these records made digitization a priority, and within our holdings a choice had to be made of which year should be processed in this manner. Condition of the microfilm, language (as Canada is a bilingual country, forms which are bilingual are preferable, and not all years are bilingual), the number of questions asked and the format of the page, were all considerations in the choice of a first project. For this part of the discussion, I would offer examples of the censuses in Nati

The primary users of the census records are genealogists - and these users have a very strict understanding of what they want. Other users include historians, economists, statisticians, sociologists and medical researchers. Given the knowledge which it is assumed these people already have, the existing finding aids consist of a location and the microfilm reel number on which the records for this location exist. Each page of the 1901 census consists of fifty names, and each location can consist of anywhere from three to one hundred pages. Researchers looking for an individual must then go on, what one genealogist has termed, a "fishing expedition". Visual examples of the finding aids would also be used for this part of the presentation. The internet, however, offers the opportunity to expand beyond these traditional users of the census record - with a related potential to increase reference questions beyond our capacity to respond. In order to ensure that this does not happen, it was necessary to inform

The conclusions which were reached from this exercise is a reiteration of the fact that the internet potentially expands audience, and that archives, and others involved in digitization, must be prepared to respond to as many types of users as possible. Under traditional circumstances, only those with knowledge of the record viewed documents, and individuals often gathered information about what they were studying in a haphazard fashion, often by asking questions of a professional familiar with the documents. The internet, with its potentially huge volume of users, significantly decreases the ability of professionals to respond to every question. However the internet also offers the ability to provide large quantities of information, with users choosing that which is relevant to them. This information capacity will (hopefully) respond to the majority of users questions, and thus ultimately decrease the basic questions asked of professionals.

This work is valuable to the humanities community in the access granted to the census, and the explanation of the information surrounding this record.

---

Universal Accessibility and Social Cohesion: Making Records Relevant to Citizens Through Digitization

Paper 3

The Canadian Genealogy Centre, a Single Window Access to Canadian Genealogical Resources

As a means of contributing to a greater sense of cohesion in Canadian society, the National Archives of Canada, along with other major partners such as the National Library of Canada and the Department of Canadian Heritage, have launched a new initiative : The Canadian Genealogy Centre. This new online initiative is devoted to the promotion of Canadian heritage through the discovery of family history. Our goal is also to promote genealogy, archives and library resources as tools for life-long learning. In other words: building the future by sharing the past.

In this era when family members are often scattered, genealogy and family history are increasingly becoming popular interests amongst Canadians. Due to personal motivation and curiosity, many people are asking themselves: Who are my ancestors? Where did they come from? What was our family background? Only a few years ago, in order to trace their ancestors, genealogists and family historians had to spend a lot of time digging into old papers, travelling to different archival repositories, without the possibility of sharing their concerns or results with other colleagues. The internet, and digitization, offers the ability to reach those scattered across Canada without access to archives and/or libraries.

Canada is a country built by peoples of many cultural origins who derive a sense of national pride and connection to each other through a shared collective memory. The emergence of the Internet has also opened the possibility of sharing information and fostering greater awareness and dialogue among Canadians by putting them more directly in contact with their heritage and with each other. With this in mind, the Canadian Genealogy Centre will provide a single window that will allow access to a pool of diverse resources like databases, finding aids, guides, links, etc, all related to the search for ancestors.

After establishing this context, the paper will discuss how the Centre will respond to the rising demand for seamless access to services, digitized collections, learning opportunities, and dynamic information-sharing technologies through the internet - one of the first such centres of this kind in Canada. Among one of the experiments, the digitization of the 1901 census by the National Archives of Canada will be a starting point for the services provided by this new virtual tool. Among the sources currently used by genealogists and family historians, the census is one of the most often consulted and among the censuses in National Archives of Canada custody, the 1901 census is very popular among genealogists due to the accuracy and quality of information that it contains : name, age, birth date, country of origin, trade, education, and so on.

However, the 1901 census, as well as other censuses, is not indexed by name, thus presenting a great challenge to the researcher who must locate an ancestor by location among numerous reels of microfilm or digitized pages on the web. By providing online access through the digitized version of the 1901 census, the Canadian Genealogy Centre will being in April 2002 a project indexing the census by genealogical communities or individuals in the comfort of their home. The genealogical community, in collaboration with the Canadian Genealogy Centre, will establish the norms, standards and procedures to be followed by these voluntary indexers. Furthermore, a venue for partnership with the community will be created, hopefully leading to community involvement in other projects of a similar nature, which will make the improvement of the online content of Canadian archives a reality. Already, the Canadian Genealogy Centre is arousing great interest among potential partners such as provincial archives repositories,

This paper will also seek to present the different components of the Canadian Genealogy Centre web site and the projects (like research tools, an index of genealogical resources, training, on-line visits to various Canadian archival institutions, a youth centre, a discussion forum, and so on) and partnerships that are presently being developed. The paper will also outline what is in store for the Centre during the next two years. This work, in and of itself, represents an innovative solution to the issue of access.

This paper meets the conference themes of:

1. Provision and management of access - the Canadian Genealogy Centre is a single window access point, providing access to a network of genealogical resources to Canadians, and individuals across the globe

2. Digital libraries, archives and museums - the Canadian Genealogy Centre will be a single window access point for various digital collections across Canada.

3. Network technologies used to support a national community program - through the indexing project and the single window access point, the Canadian Genealogy Centre is developing different types of networks to support their program.

Despite the genealogical orientation of the Canadian Genealogy Centre, researchers will have access to a wide variety of tools - which will expand the use of this material beyond the humanities.

 

Posters

70

Hoekstra, Rik. Institute of Netherlands History.

Publishing Framework

The Institute of Netherlands History has been publishing historical sources concerning the history of the Netherlands for a century now. Publications range from archival guides to source-text publications and comprise (in principle) the entire history of the Netherlands. Digital publications are inevitable and logical extension of more traditional publishing forms.

We chose to build a web framework for our digital publications instead of publishing each publication on our website separately and leave it entirely to the user to try and integrate data in them. The framework is meant to provide context and coherence to the publications which they would not have when they would be stand-alone digital publications, but not to force them into a technological straightjacket. The idea is that as the framework grows, so will the context for each publication.

On the other hand each of our electronic publications makes its own demands in terms of accessibility and electronic form. All publications are designed to be used for research, obviously mainly historical. We use wordprocessing, imaging (mainly of manuscripts), databases, XML and all sorts of combinations to prepare the publications. Still, at the frontend (the part our users see) this heterogenity should not show as only a minor part of our target groups is more than marginally interested in technology. In the future this situation is not likely to change, so from the onset technical heterogenity is another question the framework had to addresses. Even more: technological heterogenity should not prevent users from cross-searching arbitrary publications (within limits).

Out of these requirements grew the web framework implementation of the Institute of Netherlands History. It consists of a framework part, giving a helicopter or introductory view of the Institute and our research and a structured way to access our ‘Projects’ into which the research and publications are divided. On the other hand the projects themselves provide a collection of sub-websites of their own that are much more specific and tailored for the research and/or publications they give access to. The projects all have a structured homepage, that provides basic information about the research and links to details about paper publications, but they also provide entrance points for exploring the project further. The sub-websites of the various projects do not have a uniform structure. Even if layout and design are should match that of the main framework, we strive to give each of the current projects something of an identity of their own. The reason is to make them recognizable for both its target group a

At the time of writing there are over one hundred projects available. Most of them describe traditional publications and the number of projects with electronic pubications attached is still limited. However, content is growing fast. The same is true for the framework as a whole, to which features are added constantly. It is made to be flexible and each step contributes to make it stronger, without the necessity to make fundamental changes or making the construction less maintainable.

Apart from giving a short demonstration and elaborating on the considerations and choices in designing the framework, in the paper I shall elaborate on the challenges we see in the future.

77

Whitbread, Delia. University of Surrey, Roehampton.

Reconfiguring the Rose-- a Collaborative Internet Design Project

Does the visual language and media of the Sacred Art of the past still have a place in the New Age and if it does how can that form be reclaimed for the modern consciousness? How can new technology help in visionary ventures of the spirit? This project, called 'In the Womb of the Rose' was begun at the Royal College of Art in 1988, is intended to be a collaborative venture and to be made in Virtual Reality on the internet using women artists and imagery from all over the world. It uses the understanding of Sacred Geometry that was the hallmark of the great Gothic cathedrals and stained glass. Imagined as 45 foot in diameter, it was conceived in computer graphics on a scale of 1:12. There are also two full scale sample pieces in stained glass - the Indian goddess of life and Death, Kali, and the Babylonian goddess of Instinctual Wisdom and Fertility - Ishtar (known as Lilith in the ancient Judiac stories).

The poster would show examples of the digital artwork and contibutions from other artists using the templates involved to make works from comparative religious traditions. There will be a description of the evolution of the geometry and design of rose windows historically as well as a breakdown of the process of re-visioning that medieval tradtion for digital media. This is an on-going project and can be found on <www.wombrose.co.uk>. This project is currently the basis of a practice-based PhD based in the Arts, Design and Media Department of the University of Sunderland. The research involves ongoing archiving and curatorial issues of interest to anyone in digital media work.

The eventual outcome will be an archive of work from artists all over the world, a CD Rom of the final design and featured artists and a full scale projection of the whole piece at 15metres diameter in locations worldwide.

This revolutionary use of new media not only creates a major collaborative internet artwork but also updates a medieval art form for the Twenty first century. This makes the site an important educational toolas well as a vehicle for individual artistic expression. Currently the project is in its initial stages with nationwide publicity and promotion planned for the Autumn.

102

Whitfield, Susan. British Library

Mapping Collaborations Along the Silk Road and Beyond: The Experience of the International Dunhuang Project at The British Library

Spatial data analysis using computers is something that has been used in industry, government and science for many years, and yet humanities scholars have barely woken up to its potential. It is, however, inevitable that it will at some point become an essential scholarly tool. At present there are no standard models or tried and tested methodologies, and therefore humanities project developers have to try to predict the course of future research needs. During the course of the new few years as this process develops, there are bound to be changing expectations, and it is essential to approach this subject with a great deal of flexibility. There is also the need to build complex collaborations among software developers, designers, GIS specialists, geographers, web 'publishers',holders of data and scholars. And just as we can not predict the scope and nature of the findings that will be a result of mapping data in the humanities, neither can we predict the scope and nature of the collaborations which

IDP was established in 1993 as a database for Dunhuang and Central Asian manuscripts worldwide, with the primary objective of reuniting the manuscripts on the WWW so that anyone with an Internet connection could have access to them. Starting with a staff of one, it is now a thriving international project, with centres in London and Beijing,, partners throughout Europe, collaborators in the US, Asia and Australia, and users worldwide. How this collaboration developed, the logistics associated with coordinating such a spatially diverse project, the potential risks inherent on relying on individuals and groups outside any central control with differing agendas, and the constant need to develop yet more collaborations to enhance the data further, will form the core of this paper.

Some of the collaborations will be discussed in more detail: for example, those with developers of software such as GeoTools and TimeMap; web publishers such as ECAI, the Californian Digital Library and the British Library; holders of material - potential digital data; and the main user groups - scholars.

The presentation will show the various incarnations of the IDP web database, its development as more collaborations have been forged and intellectual input increased, and and discuss plans for future growth.

 

110

Little, David. The Wellcome Trust.

Evaluating History of Medicine Internet Resources

The aim of this poster presentation is be to highlight the issues and challenges involved in selecting and evaluating Internet resources relating to the history of medicine. It is based upon the experiences of developing the MedHist history of medicine gateway at the Wellcome Trust; although it is hoped that the general issues raised will be pertinent to those involved in providing access to historical digital resources as well as to those developing them.

The Wellcome Library for the History and Understanding of Medicine, in co-operation with the Biome health and life sciences hub, is currently developing a gateway to high quality Internet resources in the history of medicine. The MedHist gateway is due to be launched in July 2002, and will form part of the Biome network of gateways. It will also be a service of the Resource Discovery Network (RDN). MedHist records will be available in a number of ways: via the MedHist Website, the Biome site <http://biome.ac.uk>, the RDN’s Resource Finder database <http://www.rdn.ac.uk> and also via other RDN hubs providing access to resources in related fields. The service will be primarily, but not exclusively aimed at students and researchers working within the further and higher education sectors.

The main challenge faced by researchers using the Internet is the location of suitable Internet resources. Within any subject area there are a plethora of Websites set up by a range of creators, from university departments to individuals with a personal interest in a particular topic. It is often not difficult to locate such resources through the use of Internet search tools such as search engines and directories. On the contrary, using such tools will often return far too many results leaving the searcher with a bewildering array of material to examine, only a tiny fraction of which will be relevant and of suitable quality. The role of the Internet gateway is to provide researchers within a particular discipline with a search tool that not only makes the location of Internet resources much easier and less time-consuming, but also provides access to only the highest quality resources.

This poster presentation aims to describe the methods that can be employed to locate and evaluate Internet resources, with particular reference to the history of medicine. The main types of tools for locating suitable Internet resources will be listed; these will be relevant to anyone involved in resource discovery within the humanities. The main evaluation criteria employed by the MedHist team will be outlined, grouped into major headings.

By doing this the common themes that apply to the selection and evaluation of resources within any branch of the humanities and related disciplines will be demonstrated. In addition it will also serve to illustrate the fact there are different issues that apply to specific subjects that must be addressed. To help demonstrate this latter point, attention will be drawn to the challenges and problems of locating an historical gateway within a broader network of life sciences gateways (i.e. the Biome hub) which aim to provide access to very different types of information source.

This poster presentation should hopefully be of use and interest to those involved in providing access to Internet resources within the humanities and overlapping subject areas, such as the history of science and technology and the social sciences. It will fit into one of the major themes of the conference, namely "Provision and management of access".

112

Peterson, Elaine. Montana State University

Collection Development for Digital Libraries

Introduction.

Collection Development principles such as developing selection criteria or understanding usage of materials should be employed for all formats, including digital.

Collaboration.

Current discussions of digital libraries now contain the concept of collaboration. As noted in the OCLC Digital Report (Jan. 2002), cooperative structures are no longer desirable, they are imperative. But within this milieu, two competing models have emerged. The first is one of networking individual digitized collections. Many libraries and museums are digitizing complete collections and then linking them using portals.

The other model is that of a single, integrated digital library based on multiple collections, running on a single platform with one index. This model can be viewed in various locations, one of which is the database "Images of the Indian Peoples of the Northern Great Plains" at: http://liuse.msu.montana.edu:4000/NAD/nad.home. The process of combining five physical collections into one digital library is described in "Building a Digital Library" at http://www.uidaho.edu/~mbolin/lppv3n2.htm. This alternative model has the subject specialist pre-selecting the pieces for the digital library. The library created becomes a new entity, often greater than the sum of its parts.

Both models of collaboration share similarities—best practices, preservation concerns, and even system requirements. However, the principles of cooperative Collection Development are often ignored in the development of individual digital collections linked by portals. Collection Development guidelines such as the development of selection criteria or discovering patron usage of materials are just as valuable when working with digital formats.

Who Are the Web Patrons Using Digital Libraries?

One of the overarching problems in creating a digital library is the lack of knowledge of who will be the library patrons on the Web. Despite all of our statistics and tracking of address domains, digital library users for the most part remain unknown. One does not know how most Internet collections are used. Because of that lack, trying to digitize everything can be a common tendency. Since the Internet patrons are unknown, an argument can be made for selection by a knowledgeable subject specialist before the materials are digitized and made available on the Web.

What Do Patrons Want From A Digital Library?

In November 2001 an initial survey was conducted of Web users of the Montana Natural Resources Information System library (http://nris.state.mt.us). An in depth survey was conducted, often taking up to an hour to interview users. Questions focused on what information was being used, for what purpose, and if they found what they needed. A subsequent, broader Web survey is planned, but for this survey the interviews were conducted in person or on the telephone. The survey administered to site users was based on a combination of nonrandom stratified and snowball sampling. The main purpose was to discern patterns of use and qualitative statements from selected users.

Some initial results show that patrons want:

  • Interoperability. Web users are creating their own libraries and wish to download and manipulate data/information for their own purposes.
  • Description and indexing to the item level. Broad collection level information is not as useful when creating image databases.
  • Unique and accurate data not found elsewhere on the Web is valued.
  • A straightforward and easy to navigate search engine is essential. Savvy users manipulate the data for themselves and do not need applications layered on top of the raw data. Indeed, it is hard to anticipate the imaginative, multiple uses of the digital library by Web users.

Conclusion.

We are now able to digitize anything and put it up on the Web. This is an opportune time in our short history of digital libraries to step back and create selected, meaningful digital libraries based on our professional expertise. It is also valuable to try to discover who the Internet library patrons are so that we can best serve them the information we hold.

117

Günter, Mühlberger. University of Innsbruck
Stehno, Birgit. University of Innsbruck
Retti, Gregor. University of Innsbruck

The MetaData Engine Project

METAe is a R&D-project co-funded by the European Commission (5th Framework, IST-Programme, area "Digital Heritage and Cultural Content"). It aims at the development of a comprehensive software package - the METADATA-engine - suitable for a highly automated digitisation of monographs and serials. It will provide a basic workflow module where all steps of the digitisation process can be carried out. The most innovative aspect of the project is the automated zoning, extraction and labelling of structural elements - like page numbers, caption lines, titles, paragraphs, footnotes, etc. - from the digitised pages. These elements will be arranged in their logical hierarchy, i.e. in logical units like volumes, prefaces, issues, chapters, subchapters, indices, and assembled into an XML-file. As this output-file will be formed according to the METS-standard, it acts as an 'archival information package' (OAIS) - ready for further processing and integration into digital library applications.

The encoding of the document structure is an expensive and time-consuming process which up to now had to be done manually by keying. Thus, many digitisation projects avoided this conversion step reducing the encoding of documents to a minimum. The approach of the METAe project is to dramatically enrich the structural encoding of printed documents by applying automated layout and document analysis.

In the last decades, the automatic re-formatting of scanned images into structured documents has become an important field of research within the domain of Computer Sciences. Most of the approaches come from the research on artificial intelligence and try to develop algorithms that best perform document analysis. To demonstrate the performance of those algorithms, small documents or parts of larger documents are sufficient. Therefore, a considerable amount of studies limit themselves to one-page documents like invoices or business letters. This is true also for software companies oparating in this field. For these reasons, up to now there is no approach that can face the needs of digital libraries. One of the most ambitious aspects of the METAe project is to fill this gap.

Even without understanding the content, a human reader is able to intuitively understand the logical functions while perceiving the physical elements of books and journals. This ability is based on the same conventions and traditions which determine the production process of printed objects. To enable a computer to automatically perform document analysis, the knowledge about these conventions has to be implemented in the recognition software. To say it in other words: the automated capturing of structural metadata requires a recognition model that represent rules and principles which allow the extraction of logical units on the basis of their component elements, of their physical characteristics and their syntactic relations. Having this information encoded in a model, the physical structure of a scanned image can be mapped onto the logical one.

In contrast to most AI-approaches, the model used in the METAe project has been generated by hand on the basis of a detailed analysis of monographs and journals. The structures found in monographs and serials since 1820 were represented by 'Augmented Transition Networks', a formal grammar used normally within the field of natural language parsing. Moreover, a set of layout principles of logical units was described by formal rules.

In a second step, the representation model was drawn from the ‘grammar of books’. In contrast to a recognition model, the representation model has to define the well-formed structures of the XML output-file. As such a model can but need not encode the precise syntactical order of logical elements, it represents a weak version of the grammar which can be generated by omitting information related to syntactical order and layout principles. The METAe-project utilizes the METS standard for the encoding of these structures and further metadata.

The modelled rules and principles have been implemented successfully in the recognition software. First tests of the prototype show encouraging results. During Summer 2002 a pilot installation will be set up at the University Library Innsbruck, other library partners of the project will follow in autumn. The development of the METAe-engine is carried out by the German software company CCS and the University of Florence. The University of Innsbruck is responsible for the ‘grammar of books’ and the METS-schema, i.e. for the recognition and representation model. The project tasks are completed with the development of a special OCR-engine for old type faces as well as a native XML search engine. ‘Tresy’ will be developed by the Scuola Normale of Pisa.

132

McKinney, Peter J. University of Glasgow

Cross-Sectoral Solutions for Digital Resources

The Humanities Advanced Technology and Information Institute (HATII) based at the University of Glasgow focuses strongly on the use and creation of digital resources. HATII is entering into an exciting new phase, and the breadth, depth and importance of the department’s activities bear witness to this. This presentation will detail the exciting initiatives that HATII is currently pursuing in order to advance knowledge and expertise not only in the Humanities sphere, but increasingly in cross-sectoral domains.

HATII is both a teaching and research department and on these levels it encourages and develops networks over the entire scape of digital resources. HATII's teaching encompasses both traditional humanities computing classes and unique modules for MSc IT students, Archive students, and Theatre, Film and TV students. This year HATII introduced a new MPhil course in Digital Management and Preservation which was developed in conjunction with the University Archives. These students are receiving training in all fields contained within the humanities stream, and beyond. Moreover, modules offered to the MSc students in Information Technology have expanded rapidly in recent years, and are further helping to blur boundaries between faculties and domains.

Another step in sharing resources and information has resulted in the teaching packages developed in partnership with the Archaeology Data Service. The PATOIS Project (Publications and Archives in Teaching: Online Information Sources) is creating electronic teaching resources to enable students to experiment with the technology and the way in which it is used to study archaeological data. The full series of packages have obvious pedagogical value. Exposing students to Archaeological digital resources will not only enhance their own research, but will also push archaeology as a discipline into further exploiting technology. Conversely, the modules will help foster an understanding of how technology can continue to evolve archaeological practices.

In terms of research HATII has strong credentials in ensuring it is not only humanities disciplines benefit from new avenues of knowledge, and is exploiting European funding to achieve this. The department is a founding partner in ERPANET. ERPANET - Electronic Resource Preservation and Access Network - is a three-year, European Commission funded project that will establish an expandable European Consortium. In that time it will hold a number of seminars and workshops, create best practice guides, and offer a virtual clearing-house and help desk. These will help achieve the main objectives of ERPANET: to render cross-sectoral (memory organisations, software industry, research institutions, government organisations, entertainment and creative industries, and commercial sectors) information sharing in what has hitherto been a field of distinct and disparate sectors; to raise awareness of the problems of electronic preservation and access and to offer examples of best practice in this area; to develop both an online and physical community focussed on preservation.

This presentation will display the ways in which HATII, through its teaching initiatives and projects such as ERPANET and DIGICULT, is creating a multifarious community which will be vital to the successful creation and preservation of electronic materials. By exploring and developing multi-domain solutions for continued access to digital resources, their creation and extended use can be assured.

134

Beavan, Dave. University of Glasgow

Scottish Corpus of Texts and Speech

Background

Recent years have brought significant changes to the political situation in Scotland. This new political situation has been accompanied by a resurgence of interest in the languages and culture of Scotland. The present-day linguistic situation in Scotland is complex, with speakers of Scottish English, Scots, Gaelic and numerous community languages making up Scottish society. However, surprisingly little reliable information is available on a variety of language issues such as the survival of Scots, the distinguishing characteristics of Scottish English, or the use of non-indigenous languages such as Chinese and Urdu. This lack of information presents significant problems for those working in education and elsewhere.

At present there is no electronic archive specifically dedicated to the languages of Scotland. Such a resource would provide valuable material not only for language researchers, but also for those working in education, government, the creative arts, media and tourism, who have a more general interest in Scottish culture and identity. It would provide important data about English as used in Scotland, and also Scots, in its many varieties, Gaelic, and the principal community languages. It is against this background that plans for the Scottish Corpus of Texts and Speech (SCOTS) project have been developed, and work is now underway.

The SCOTS project is the first large-scale project of its kind for Scotland. It aims to build a large electronic collection of both written and spoken texts for the languages of Scotland. This is a resource which is urgently needed if we are to address the gap which presently exists in our knowledge of Scotland's languages. Initially, the focus will be primarily on the collection of Scottish English and Scots texts, but it is also planned to include Gaelic and material from non-indigenous community languages such as Punjabi, Urdu and Chinese. Thus the Scottish Corpus of Texts and Speech aims to give a full and accurate picture of the complex linguistic situation which exists in Scotland today.

Once the texts have been collected, they will be gathered together to form a large electronic archive. SCOTS will be a publicly available resource, mounted on the Internet. It is envisaged that SCOTS will allow those interested in Scotland's linguistic diversity, and in Scottish culture and identity, to investigate the languages of Scotland in new ways. It will also preserve information on these languages for future generations.

The project is being carried out at two sites. The University of Glasgow is responsible for the collection of texts and speech and the creation and maintenance of the corpus. The University of Edinburgh will develop the corpus architecture and examine various research issues in the representation of multi-modal corpora.

Proposal

We wish to attract attention to the project's goals and technologies and give the opportunity for discussion with interested delegates.

A live, interactive demonstration of the project's website will allow visitors a taster of the project's goals, including the opportunity to explore a searchable database of Scots texts, retrieve full texts, browse metadata and view multimedia items etc.

Further information will be available in the form of brochures and response forms for those who may have texts they are willing to contribute and an opportunity to request updates and additional information about the project's progress.

Over the three days some key project staff (both linguistic and technical) will be manning the demonstration and will be on hand to discuss the various elements that are brought together to make this project, some of which are:

* Collection of texts - project publicity, gathering contacts, following up leads

* Administration - database structure, workflow, documentation

* Metadata - identifying valuable data, form design, maintaining standards

* Permissions & copyright - IPR, data protection, licences

* Corpus structure - database design (relational and XML), text encoding, interoperability

* Computing resources - hardware/software, web serving

The project is in its infancy and we welcome the opportunity to openly discuss decisions and issues and learn from the experience of attendees.

136

Rourk, Will. University of Virginia
Newman, Dave. Tibetan and Himalayan Digital Library

Specialization Within the University Library: Mapping the Cultural Heritage of Tibet

Present day academic institutions are being challenged with new kinds of information. This challenge is being met by university libraries who must expand their roles by managing digital information. This poster session entry will present the visualization efforts of the University of Virginia’s Digital Media Lab, an electronic center providing support to faculty and students. The Digital Media Lab provides support through media specialists who assist the faculty and students with a variety of digital content solutions. Of these specialists the visualization specialist has the duty of providing the means of presenting ideas and concepts using rich media content in a web-based medium. Rich media encompasses a variety of digital technologies such as digital video, dynamic graphics and animations, 3D technologies, digital audio and other technologies that the visualization specialist must bring together to develop an enriching educational experience. A case study in rich media deployment is presented he

The University of Virginia Digital Media Lab is currently providing assistance to the Tibetan and Himalayan Digital Library. This endeavor led by Tibetan Studies professor David Germano is a comprehensive collection of the diverse aspects of Tibetan and Himalayan cultural heritage. Broad areas of the collection include language, religion, geography and architecture. The role of the visualization specialist has been to present GIS information and architectural data in a web accessible form. The goal for visualizing Tibetan and Himalayan resources starts with a map of the entire region as it relates to the Asian continent and scales down to the level of objects within an edifice. The visitor to the THDL scales the resolution of information both visually and textually as each map presents links relevant to the current position. The regional map will provide links to county maps which in turn will provide links to the township level and then to the neighborhood level and on to the level of buildings and eve

Current progress involves the creation of vector graphic maps that interface specific information regarding Tibet’s capital city Lhasa and the neighborhoods within. Provided below are links that focus on the Barkor neighborhood of Lhasa. This area was chosen for its cultural significance as a World Heritage Site and holy center of Tibetan Buddhism which is a major force in Tibetan culture. A variety of rich media solutions are employed in this web-based presentation to immerse the visitor in the experience of visiting a Barkor building. From the vector graphic map of the Barkor one can explore the Meru Nyingba monastery via web technologies such as QuickTime VR panoramas and objects, Virtual Reality Modeling Language (VRML) representations of actual spaces and places, an online slide show of 3D rendered images and a digital movie animation of the 3D rendering of the monastery. These forms of rich media web content provided by the visualization specialist assist visitors to this site to gain many differe

Current and future development of the geographic and architectural aspects of the THDL involve the building of a web database to house a growing architectural record of Lhasa. This record is being composed from field notes and data acquired by THDL and Digital Media Lab personnel as well as through international collaborative efforts with the Tibetan Heritage Fund an architectural group that has done extensive survey and restoration of historic buildings in Lhasa..

University libraries today must expand their expertise to accommodate the diverse nature of digital information. The role of the visualization specialist is being expanded from building visual presentations to building the means to access database information. Access can be made more enriching through the construction of rich media content. As university libraries are increasingly met with the diverse challenges of digital information their roles adapt to provide specialists to meet these needs.

Relevant URLs

Tibet Lhasa Barkor Map
http://maewest.itc.virginia.edu/~wmr5a/tibet/barkorMap/barkorMap.html

The Digital Media Lab
http://www.lib.virginia.edu/clemons/RMC/dml.html

The Tibetan and Himalayan Digital Library
http://www.thdl.org

 

139

Rourk, Will. University of Virginia.

An Evolution of a Virtual Gallery

Present day academic institutions are being challenged with new kinds of information. This challenge is being met by university libraries who must expand their roles by managing digital information. This poster session entry will present the visualization efforts of the University of Virginia’s Digital Media Lab, an electronic center providing support to faculty and students. Four projects are presented here that illustrate the evolution of a metaphor for accessing information through a web-based solution. The concept of a virtual online gallery has been in existence for some time now. This presentation will show how the library visualization specialist shapes the virtual gallery from the representation of an actual space to the development of a contrived space to provide the means of displaying and acquiring digital content.

Spatial visualization of information can start with the museum gallery. An actual museum gallery presents subjects upfront and singularly. Just as art hanging on a wall represents an emotion or idea so can an icon within a virtual space represent a concept or body of work. Two different gallery spaces were developed to represent the space of an actual art gallery. The first is a representation of the Bayly Art gallery on the campus of the University of Virginia rendered in Quick Time VR a technology that allows the experience of rotating a panoramic image of an actual space. This web technology also allows the very important feature of being able to link to web pages with higher resolution data

(http://maewest.itc.virginia.edu/~wmr5a/DML/baylyQTVR/ourtime/ourtime2.html).

The second representation is rendered in Virtual Reality Modeling Language or VRML, a web technology that allows visitors to navigate a representation of space in a non- linear fashion and also discover spatial icons that link to higher resolution data

(http://maewest.itc.virginia.edu/~wmr5a/worx/projects/places/bayly.html). In these two cases art is being directly represented though the way we experience the art is an informational experience made possible from remote locations through the Web. The evolution of the gallery leaps forward greatly when the gallery objects become abstracted from art to pure ideas such as the next phase of the virtual gallery found in a project for a French language class.

(http://nmc.itc.virginia.edu/Efolio/Fall1997/Dreyfus/Detailcpf.CFM?MainDreyfus__IDMain=846) In this gallery the exhibition pieces are icons of web page reports on a topic in French history. The gallery space is completely contrived yet shaped to accommodate the information that is represented in a way particular to the class assignment. The future of the web gallery is being explored presently through the use of data base management of the classroom(these are prototypes – the database connected models are password protected and currently viewable on campus only : http://lucy.itc.virginia.edu/~wmr5a/VRML/ColdFusion_RoomUnitVR/demoRoom02/roomUnit01.html).

Further development of this work will abstract the space to represent the classroom and the students within it. A virtual space can be built based on the class roster contained in a media database where each student is represented by their own gallery space and the files held with in their respective database becomes the exhibition spaces. Communities can be formed through virtual VRML chat spaces for collaboration or exploration of the class project as a whole.

The gallery metaphor is an enriching means of presenting a collection of information whether it is an exhibit of artwork or a classroom assignment. A virtual gallery presented on the web becomes not just a passive experience with the exhibit items, but an interactive interplay with digital content. Used in the classroom the web-based gallery becomes a valuable tool for helping students and faculty access information.

150

Giedenbacher, Erwin. University of Salzburg
Jadin, Tanja. University of Salzburg
Mader, Martin. University of Salzburg

Creating Online Resources – "Information Pool USA – Europe"

The research forum "History @ Internet" (www.sbg.ac.at/hai/heimat.htm), founded 1998 by Reinhold Wagnleitner at the History Department, University of Salzburg, is a project dedicated to create and evaluate online resources for historians.

On of the key issues of intellectual discourse among the humanities in the twentieth century in has been in the field of "Americanisation – Anti-Americanism". Hardly can a social, economic, political, military, and cultural context of Modern History be explained without referring to those fields. Obviously, the 21st century will also be deeply influenced by this conflict (cf. globalisation). Especially in central Europe, there is demand for properly organised and conceptualised material on these subjects that can be used by scholars, students, and the public as well.

As there is neither a chair for British nor American history in Austria, there have been no efforts to collect and organise sources on this important subject of cultural history so far. The members of "History @ Internet" are now collecting papers, primary sources, text books etc. in order to digitise those texts, songs, film-clips etc. and to establish a (for Austrian standards) unique database on the socio-cultural relations between the United States and Europe between the late 18th and the early 21st century.

In cooperation with History and English Departments at the universities of Klagenfurt, Graz, Salzburg and Vienna, a data-pool will be created and programmed. The main focus of the data-pool shall be on an analysis of European-American relations. Beginning with the origins in the 18th and 19th century, the files in the database will show the American influence on society, economy and culture in central Europe and especially in Austria. A second part will contain literature and deal with questions of transport of cultural images through film and other media. A special focus will be on US-American popular culture such as television series, comics, and films. Those media coming from abroad have left a deep mark in today’s Austrian society.

The data-files will be accessible to students via Online and Distance Education-modules in the World Wide Web. Students should approach the topics from different positions and perspectives. The creation of an exemplary online-course will demonstrate how materials of the data-pool can be applied to learner-orientated courses. This specific course called "Americanization/Anti-Americanism" will be accessible for all Austrian Universities. Information will be provided on a broad scale, e.g. applied geography, history, politics, literature, film, music etc. There will be core modules on key issues such as the Cold War and important notabilities, biographical essays etc. Multimedia and interactive elements will have functions as complementary elements to traditional forms of reception.

The database’s sources will consist of elements from everyday life, popular culture, archives, fine arts, comics, literature, film, historical traditions and monuments, internet-files, interviews (oral history), TV-clips, commercials, digitised newspaper clippings, and scientific literature. A user-orientated menu will enable the students to navigate through the online-course and the data-pool along topics of information as well as different media types of information.

One of the key issues of the project is to serve the interests of the students and to help them to develop self-managed learning skills. The use of e-learning-platforms such as "Blackboard" (at the University of Salzburg) will safeguard that the content of the data-pool will be prepared in an adequate delivery system that can easily be handled by students and teachers. Asynchronous discussion forums will be used to consolidate the acquired knowledge and foster interaction between teachers and students. The didactical concept of Internet courses will be based on adult learning principles.

The use of "new media" communication tools, as planned in the project, in order to direct the content acquisition of learners, will demonstrate an important alternative to traditional learning concepts. The development of learner-orientated concepts based on the use of e-learning platforms in addition to traditional lectures will be one of the great challenges to University teachers in the next few years. Both teachers and students will play a more active role in courses, and as a result of the more intensive communication the quality of learning will improve.

We would like to illustrate the problems of the interested historian in creating a database like the projected info-pool. Our first problem was to find a group of interested people from different universities and departments who were not only accounted for scientists in there own fields of research but also technically educated which means a sufficient knowledge in creating online-material for distance education purposes should have been present.

Our second and most far-reaching problem was (and still is) copyright. Citing sources of popular culture, e.g. video clips, commercials etc. even for pure scientific purposes can be a very expensive matter, especially in cases where the rights of authors cannot sufficiently be cleared. This topic also includes the problems of linking on World Wide Web sites and pages (according to international law, the owner of a page can disclaim to be "linked"). Professional sources we are going to use are e.g. CORBIS, the Library of Congress, the National Archives and Record Administration (Washington), and the Public Record Office.

We also would like to show how the architecture of a database like this both in a technical and a conceptual way can be planned and establish, how a vast amount of different media-types can sustain the creation of concise scientific online-resources and especially, how the humanities – in our case historians – can contribute to a new way of presenting and explaining crucial and complicated circumstances and facts to students and the public.

Our goal will also be the development of a new approach towards multimedia. Users will be able to choose among archived video-taped lectures and combine them with related and relevant sources within the database. Dynamic websites in connection with streaming media tools should provide a new way of information on the topics of "Americanisation – Anti-Americanism".

As the technical and conceptual development of our database has started a few months ago, we should be able to show parts of the database and the software-tools as "works in progress".

156

Bia, Alejandro. Miguel de Cervantes Digital Library
Sanchez-Quero, Manuel. Miguel de Cervantes Digital Library

Building Resources to Spell-Check Ancient Spanish Texts

The huge development of information technology has motivated the appearance of this new type of libraries, called digital libraries (Arms, 2000). The Miguel de Cervantes Digital Library (http://cervantesvirtual.com) is one of the most ambitious projects of its kind ever to have been undertaken in the Spanish-speaking world with more that 4000 digital books at present. This enormous amount of digitised works are mostly Hispanic classics from the 12th up to the 20th century. The development of these digital books require a lot of care from the point of view of correction and editing, but can be processed in a massive uniform way afterwards to produce the different publications formats and services offered to the readers.

Concerning human resources involved in the project, the biggest group by far corresponds to correction and markup people (Bia and Pedreño, 2000), who are in charge of the hardest-to-automate part of the production process, which involves reading and correcting digitisation errors, structurally marking up the texts, and taking important editing decisions that involve both rendering and functionality of the hypertext documents to be published. These humanists are highly skilled people with at least a bachelor degree in philology, or other humanistic disciplines. We want them to devote their time to higher intellectual tasks like taking editing or markup decisions, or preparing the texts for interesting Internet services (like text analysis or concordance queries), than to spend their energies in the tedious mechanical task of correction, the main bottleneck in our production workflow, and by far the most time-consuming task.

In the case of contemporary works, spell-checkers turned out to be a useful aid to the correction process, but for literary works written in ancient Spanish, commercially available modern spell-checkers may produce more mistakes than they can prevent. The reason for this is that spell-checker-dictionaries include only modern uses of the language, and when they are applied to old texts, the result is that they take correct ancient uses of words for mistakes and try to correct them. Unable to use spell-checking as an aid, correctors have to do a side by side comparison of the original and the digitised texts to detect the errors.

Being aware of the usefulness of spell-checkers on the correction of modern works, and lacking this facility for ancient texts, we decided to build dictionaries for ancient Spanish. These decision led to new problems and new questions. As there is no such thing as ancient Spanish, but instead a dynamically evolving language that changes through the centuries, how many old-Spanish dictionaries should we build? Should we set arbitrary chronological limits?

Taking advantage of the 4000 books already digitised and corrected at the Miguel de Cervantes Digital Library, as a corpus covering several centuries of Spanish writings, we've built a time-aware system of dictionaries that takes into account the temporal dynamics of language, to help solve the problem of ancient Spanish spell-checking.

In this paper we present the problems we have found, the decisions we have made and the conclusions and results we arrived at. We have also been able to extract statistical information on the evolution of the Spanish language through time.

The final section of the paper deals with the technical details of this project and the innovative application of digital methods like the use of TEI ans XML markup.

References:

William Arms, Digital Libraries, MIT Press, 2000, Cambridge, Massachusetts, ISBN 0-262-01880-8
C.M. Sperberg-McQueen and Lou Burnard, editors, Guidelines for Electronic Text Encoding and Interchange (Text Encoding Initiative P3), Revised Reprint, Oxford, May 1999, TEI P3 Text Encoding Initiative, Chicago - Oxford, May 1994

Alejandro Bia and Andrés Pedreñ, The Miguel de Cervantes Digital Library: The Hispanic Voice on the WEB, LLC (Literary and Linguistic Computing) journal, Oxford University Press, v.16, n.2, 161-177, 2001.
Presented at ALLC/ACH 2000, The Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the humanities, 21/25 July 2000, University of Glasgow.

David Hunter, Curt Cagle, Dave Gibbons, Nikola Ozu, Jon Pinnock, and Paul Spencer, Beginning XML, Programmer to Programmer. Wrox Press, 1102 Warwick Road, Acocks Green, Birmingham, B27 6BH, UK, 1st edition, 2000.

Alejandro Bia, Automating the Workflow of the Miguel de Cervantes Digital Library, Poster at the ACM-DL'2000 Digital Libraries conference, June 2000, Menger Hotel, San Antonio, Texas, USA.

Olumide Owolabi, Efficient pattern searching over large dictionaries, Information Processing Letters, v.47, n.1, 17-21, August 1993, Real Academia Española, Diccionario de la lengua española, Espasa Calpe, 1992, Madrid.

 

174

Short, Harold. King's College London
Bradley, John. King's College London
Burns, Arthur. King's College London
Burghart, Alex. King's College London
Pelteret, David A E. King's College London
Tinti, Francesca. King's College London

Historical Scholarship in a Digital World: case studies in prosopography

Session overview
Digital methodologies are transforming many aspects of scholarly work. This session is based around a group of prosopographical projects based at King's College London, and uses these as case studies in exploring how technical methods have changed and may change the longer established historical methods. Prosopography, being the study of persons, is a particularly interesting area of research for this discussion, partly because in principle computational methods should be able to assist in the matching of names and careers and the investigation of patterns, and partly because it is possible to contrast in quite explicit ways the methods of information gathering and analysis made possible by the new technologies with the well-established traditions of prosopographical work. Another important factor, however, is the close inter-relationship between prosopographical work and a number of other areas of scholarly activity, such as the study of inscriptions, seals, coins, and placenames.
The session will concentrate on three broad themes:
1. the interaction between technical methods and historical methods and the implications for historical research;
2. the ways in which the technical methods can aid the scholar by prompting new questions and new lines of enquiry;
3. the new opportunities for inter-disciplinary work and for integration or cross-referencing of materials from a range of scholarly endeavour.


---


Burns, Arthur. King's College London

Paper 1: Finding priests in unlikely places: the Clergy of the Church of England Database

This is a 5-year project funded by the AHRB; it is now in its third year. Its aim is to create a relational database covering all clerical careers in the Church of England between 1540 and 1835, to be made available in electronic form for public access over the world-wide web. This resource, once created, has tremendous potential as a tool for a wide range of both academic and non-academic research. Over the period we are covering, between the Henrician Reformation and the creation of the first reliable national sources of statistics and information concerning the Church of England, this institution was the single most important employer of educated males in England and Wales. An understanding of the dynamics of the clerical profession both as experienced by individuals and in terms of the development of a profession, is thus of considerable importance not to only to religious history, but to a wide variety of other social, political and cultural history.
Recent decades have seen a renewed emphasis placed by historians of the period on the political salience of religion and the state's relationship with the church, and this is hardly surprising, given that the church at times possessed an institutional presence which surpassed that of the state. But much of that presence was a local presence, and partly in consequence, producing an effective account of the functioning of the clerical profession at a national level has hitherto been exceptionally difficult. The relevant archives are not only geographically dispersed, but of a disparate nature. The records of the dioceses of the Church, the most relevant adminstrative level, are held in 27 different repositories.
The CCED aims to capitalise on the fact that for all these difficulties, the diocesan authorities maintained accurate documentary records of all major career events involving the clergy at a local level to create a single resource bringing together the most important data contained in all the diocesan record offices. The database will record the events rather contain prose biographies, and will enable a wide variety of data retrieval and analysis. Users will be able to establish the succession of clergy in particular localities, or investigate more complex issues such as patterns of clerical migration and patronage (for example the role of women in this respect). It should for the first time be possible systematically to investigate the changing size, educational background and career patterns of the English clergy. We envisage users ranging from academic researchers to the expatriot genealogist seeking information on a clerical great-great-grandfather; and indeed the project website has already been attracting a wide range of inquiries from potential users.
The paper will discuss the data collection methods across the 27 repositories made possible by the use of computers, and the scholarly and technical issues in integrating the collected materials in a single master database as well as the effects of these methods on the relationships between university and local historians, and the attitudes and responses of the archivists involved. It will describe a number of ways in which historical issues and perspectives are having to be re-conceptualised. It will highlight new insights that have already come out of the project, and new reference materials that are already being produced. Finally, the paper will assess the expected uses of the database in the future for a range of users, and its potential for inspiring new research.


---

Burghart, Alex. King's College London
Pelteret, David A E. King's College London
Tinti, Francesca. King's College London

Comparing Lives: The Interpretation of Biographical Evidence Assembled in Database Form - the Prosopography of Anglo-Saxon England Project


The Prosopography of Anglo-Saxon England is a five year AHRB-funded project in its third year. It aims at collecting biographical information from primary sources on all named persons associated with Anglo-Saxon England from AD 597, when the first papal missionaries arrived in the country, to AD 1042, the year in which the last Anglo-Saxon king acceded to the throne. This material is being assembled in a relational database that will permit a wide variety of comparative studies of Anglo-Saxon people to be made.
Information on those who lived a millennium and more ago inevitably is fragmentary and incomplete. It is thus essential to develop a database that will extract as much evidence from the diverse extant sources as possible. The very diversity of the sources poses an especial challenge. Three types of source from Anglo-Saxon England illustrate this. Narrative sources such as Saint's Lives and Bede's Ecclesiastical History of the English People contain evidence on people in a fairly fluid form with varying degrees of density of information, depending on the sources lying behind them and on the literary purposes of the authors. These sources, however, do not always provide precise chronological details about the persons they mention. Another kind of source, the Anglo-Saxon Chronicle, is organized on a strictly chronological basis (not always accurately), but is extant in a variety of versions containing information that is often found only in a single manuscript. Land charters, a third major source, frequently contain lengthy witness lists, behind which a hierarchical order of precedence can be discerned: the association of persons and properties often provide indirect evidence of power relationships.
Having outlined how the Prosopography of Anglo-Saxon England project team has tried to meet the challenges of designing a database that permits this diversity of source information to be recorded, the paper will proceed to explore how the resulting formal structure of the database may affect the way in which the historical material it contains is interpreted. It will describe patterns of evidence and new research questions that have been prompted in the process of analysis and design which began at the start of the project and in the first phases of data collection and data entry. The paper will discuss the variety of users expected to exploit the database and their varying purposes. Finally it will describe and assess the current and potential interaction between this project and other projects concerned with Anglo-Saxon or wider medieval materials.

---

Short, Harold. King's College London

Technical methods, digital scholarship and the global digital library

The technical methods required in order to prepare and manipulate humanities data involve formality, explicitness, absolute consistency and rigour of a kind and to a degree not widely familiar in humanities research. This paper will begin with a review of the technical methods used in another of the prosopographical projects at King's College London - the Prosopography of the Byzantine Empire. This project began in the 1980s as a British Academy project, and its current phase is funded by the AHRB, with a continuing contribution from the Academy. The technologies and formal methods used in the project have changed over its existence, and this in itself raises important questions about the use of technology in research. The ways in which the methods used in the this and the previously discussed projects will be assessed.
Among the most significant characteristics of these projects - and many others like them - are the new modes of collaboration that are on the one hand made necessary and on the other made possible by the use of the technologies and the formal methods associated with them. The required collaborations include primarily those between humanities scholars and technical specialists. The potential collaborations arise from the possibilities for combining, linking or integrating different kinds of source materials and from the fact that the internet enables people in different geographical locations to work together. Both types of collaboration have important implications for the way the scholarship is pursued, and for the products of research. These issues will be elaborated with particular reference to the three projects.
Computers are particularly useful in processing large quantities of data, and one of the transformations under way in the 'digital revolution' is the introduction of a new scale of evidence in humanities research. This is a common characteristic of the three projects under discussion - each aims to bring together all the evidence from their respective sources - and even if the 'all' must in truth be qualified the body of evidence aimed at is still substantial. This makes possible the production of indices and other reference materials of a particularly comprehensive kind, enables exhaustive searches, and enables new kinds of analyses.
Although the term 'database' appears early in any description of these three projects, it is important to note that other technologies are relevant to this kind of prosopographical work. This part of the paper will place the projects within the 'methodological commons' whose shape and character have started to become evident over the past decade. As an aid to the discussion an 'intellectual map' will be shown which attempts to place the methods of the 'commons' in relation to the discipline areas from which the methods are derived and the humanities disciplines in whose service they are being applied. Reference will also be made to other projects which employ multiple technologies - an increasing majority.
This leads naturally on to the range of issues associated with how the 'digital objects' created by projects such as these are to take their place in the 'global digital library', so they may be available and useful to future generations of scholars. The paper will consider some of these issues, including 'metadata', 'cross-domain searching' and 'inter-operability'. The point is not the jargon, but to look at how repositories of digital objects can be organised and managed, and how information and knowledge created by one group of researchers can be compared or integrated with that created by another - how a search based on a particular church at a particular time, for example, might elicit composite information about the building and its statuary and glass, including perhaps a floor plan, as well as information on who the rector was at the time, and who is known to have been living, trading or governing in the parish, the city and the county.

 

175

Mike Cave, Marilyn Deegan and Dave Griffiths
Refugee Studies Centre, Oxford University

Forced Migration Online: a World of Information on Human Displacement

Forced Migration Online
In November 2002, the Forced Migration Online team at the Refugee Studies Centre, University of Oxford, will launch a major portal of information sources on forced migration. The development of the portal is a major international endeavour, involving the RSC and a number of international partners who are providing advice on both forced migration and technical issues.

Forced Migration Online (FMO) will provide instant access to a wide variety of online resources concerning the situation of forced migrants worldwide. Designed for use by practitioners, researchers, policy makers, students or anyone interested in the field, FMO aims to give comprehensive information in an impartial environment and to promote increased awareness of human displacement issues to an international community of users.
For research or reference, FMO will offer a host of key published and unpublished literature (both current and historical); specially commissioned guides written by subject experts; selected web resources; an organizations directory and other useful and time-saving resources - all in one place.

FMO is funded by the Andrew W. Mellon Foundation and the European Commission.

The FMO Digital Library
The FMO Digital Library currently has two collections: the largest is a collection of grey literature selected from the holdings of the Refugee Studies Centre library, covering many topics and regions of the world. In addition, there is a smaller collection from the Feinstein International Famine Center at Tufts University, mostly concerned with nutritional issues. FMO can be accessed via www.forcedmigration.org. The system we have put in place is simple yet powerful, and it allows rapid access to an unparalleled resource from anywhere in the world where there is an Internet connection. Documents can be searched, viewed, saved or printed. As well as the digital library, FMO also provides access to a number of key documents on the psychosocial dimensions of forced migration at http://earlybird.qeh.oxa.c.uk/psychosocial/xmlfiles/index.html, as a pilot for a whole range of special collections on the topics area.

ReliefSim
The FMO team, along with the Technology Assisted Lifelong Learning (TALL) at Oxford University and the Centre for New Media Teaching and Learning at Columbia University, New York (CCNMTL), are also working on a project to investigate the use of computer simulations for training humanitarian workers in the procedures needed for the management of complex emergencies, in particular in refugee situations. The Andrew W. Mellon Foundation has granted £200,000 for a two-year joint pilot study beginning in January 2002 to evaluate, design and ultimately deliver simulation models which can be used in educating both students and practitioners in the complexities of emergency work. Effective emergency relief demands fast and informed decision making, in-depth knowledge of the needs and problems of affected populations, strict prioritization of key tasks and implementation of acknowledged minimum standards in healthcare, often in the direst of circumstances. With so much at stake, practical, professional and comprehensive training of humanitarian workers is essential. With this in mind and taking into account the current shortfall in training of this type, the RSC, TALL and CCNMTL plan to develop computer simulations of emergency settings. These will provide practitioners and students with the opportunity to solve problems, analyse situations, recommend future actions and deal with complex environments such as establishing new relief camps. The simulations will help users to visualize what might happen in real-life settings and give them practical, hands-on experience of the types of problems they might encounter in the field. ReliefSim is the first project of its kind to apply complex modelling technology to relief settings.

The Digital Shikshapatri
FMO is also working on a project funded by the UK's New Opportunities Fund Digitisation Programme (NOF-Digitise), the Digital Shikshapatri. In partnership with the Indian Institute Library, University of Oxford, the manuscript of the Shikshapatri, a key religious tecxt of the Swaminarayan Hindu community (mostly forced migrants from Uganda in the 1970s) will be digitized and made available in a complex network of interlinked contextual materials.

The Poster Session
The poster session proposed here will present a work-in-progress summary of these three projects being carried out by the FMO team.

 

176

Weitz, Nancy. Oxford University

The Text Enhancement Project: Pilot Phase

I began the Text Enhancement Project in order to explore interface design for digitized literary texts. Until recently, creators of digital text projects have concentrated most of their efforts on creating encoding technologies (e.g. XML) which allow for longevity, archiving and in many cases the performance of linguistic and textual analyses. However, with recent web developments, the time is ripe to explore the delivery mechanisms for digitized texts and consider the possibilities for aesthetics and interpretive or critical analysis. Digitized literary texts can be much more than "data sets" and can offer added value for literary scholars and students: they are ideal for building variora and content-rich critical editions. One can add a nearly limitless amount of interpretive and contextual information.

My project will eventually take as its core texts such works as Spenser's Faerie Queene and Milton's Paradise Lost. However, the pilot phase is based on the enhancement of a single sonnet from Sir Philip Sidney's sequence, Astrophel and Stella, to which I add the following:

· Word Definitions: (quoted from the Oxford English Dictionary), which are particularly useful for gaining insight into how words were defined through history. One can freely use the online OED if visiting the site from an institution that has a subscription to it.
· Images: which move into the image box (set by default as the portrait of Sidney) when the mouse hovers over an appropriate word or phrase. Some images may also be triggered by clicking a link, which will open a new window.
· Notes: which appear in text boxes and as links to new pages. These notes include the following kinds of information: biographical, historical, textual, critical. All quotations from sources other than the TEP are clearly marked and linked to the Bibliography (see below).
· Bibliography: a page with full bibliographical details and links to online sources where available (some of these require subscription for access, like the OED).
· Annotation Tool: This is not yet available, but will provide a private, client-side annotation tool for assisting the reader's private engagement with the material
· Mappings / Visualizations: This is the most abstract intended feature of the project. As the core texts will be dense and multi-layered, I would like to experiment with ways to visualize such interpretive aspects as layers of allegory and complex lines of narrative.

What is new about the information already in place is the manner in which it is delivered: I use DHTML rollovers and other interactive methods which do not significantly interrupt the reading experience by forcing attention onto a new page. Tool tips, image boxes and popup windows appear when triggered by the movement of the mouse, and I have chosen not to indicate the anchor words in the text in order to keep those words from being privileged in a literary sense. The fact that these effects are transitory (unless one clicks on a link to bring up a new page) means that the page remains simple and clean until they are needed, whereas the use of frames and substitute pages can complicate the interface and disrupt concentration with distractions.

I'm interested in talking to others who are working on interface design, those who have knowledge of transforming XML-encoded text into HTML, and those who are involved with other literary projects and editions.

173

Burnard, Lou. Oxford University

TEI and XML: A Marriage made in Heaven

New roles for the TEI?

 The Text Encoding Initiative's "Guidelines for electronic text encoding and interchange" (TEI, 1994) were the result of an extensive international research project, which aimed to provide exhaustive recommendations for the encoding of key features in literary and linguistic textual materials. These recommendations, taking the form of a modular, SGML-based, architecture in which DTD fragments and documentation are combined according to user-specified requirements, proved very effective and have been widely adopted in digital library, language engineering, and many other projects. A new membership Consortium was formed in 2001, based at four universities worldwide, which has since revised the whole of the Guidelines, and produced a new version re-expressed in XML. Included in its new technical programme are further work on harmonization with emerging related standards and a new schema-based version of the underlying DTDs.

In this paper we consider the extent to which the TEI, designed to *describe* existing texts, is also suitable for the authoring of new material, such as web sites, academic papers, or digital editions. It is clear that the TEI Guidelines require very little extension to enable them to describe the structure and contents of most web pages. We describe how its modular architecture readily permits addition of short-cut attributes for <figure, <xptr and <xref to allow URLs directly (rather than via entities), provision of `file', `scale', `width' and `height' attributes to <figure, and addition of special purpose elements for other multimedia content such as sound and image maps. Although the TEI is very definitely "book-centric" in its concept of text, we find it sufficiently flexible to cater for information organized in a completely non-book-like manner.

We will further argue that, as well as being able to model web pages adequately, the non-prescriptive nature of the TEI makes it a good candidate for a neutral interchange language amongst other competing and often more prescriptive XML schemes. The class system underlying the TEI model of textual structure is robust, well-tested, and facilitates precisely this kind of application. As new subject-focussed XML-based schemas and DTDs proliferate, there will be an increasing demand for a non-specific interlingua for the interchange and integration of digital textual data. Our claim is that the TEI is probably the most comprehensive and certainly the most extensively documented public markup system yet devised. This paper restates its key role as a means of facilitating interchange and exchange of information, in conformance with emerging standards.